CPSC120A
Fundamentals of Computer Science I

Lab 23

Lists

Use the command line to create a new directory called lab23 in your labs directory. Make sure all of the .py files that you create for this activity are in that directory.

Batting Average

The World Series just ended with Scotty's favorite team losing. Maybe they would have done better if they had written programs to compute player statistics, like batting average. Batting average is the ratio of hits to at bats. To compute a player's batting average you have to count the number of hits a player makes.

Details

Create a Python function called batting_average(appearances) in a file called batting.py. The function parameter, appearances, is a list of ints that encodes the results of a baseball player's appearances at the plate. Each plate appearance can be interpreted as follows:

4 Home Run
3 Triple
2 Double
1 Single
0 Walk
-1 Out

The function should return the batting average for the plate appearances in the list. The batting average can be computed using the following equations:

at_bats = plate_appearances - walks
hits = singles + doubles + triples + home_runs
batting_average = hits / at_bats

Example

>> print(batting_average([1, 0, 1, -1]))
0.6666666666666666
>> pedroia_at_bats = [-1, 1, -1, -1, 1, -1, -1, 2, -1, -1, -1, -1, -1, -1, -1, -1, 1, -1, 0, -1, 0, -1, -1, -1, 2, 1, -1, -1, -1, -1, 1, 0, -1, -1, -1, 1, -1, -1, 1, -1, 1, -1, 0, -1, 1, 1, -1, -1, 2, 0, -1, -1, -1, -1, -1, -1, -1, 1, -1, 2, -1, -1, -1, -1, -1, -1, -1, 1]
>> print(batting_average(pedroia_at_bats))
0.25396825396825395

Create a loop, either a while or a for, that iterates through every int in the appearances list. Create a variable for the number of hits that is initially zero and increment it for every item in the list that is a hit, a 1, 2, 3, or 4. Also create a variable for the number of walks that is initially zero and increment it for every item in the list that is a walk, a 0. Compute the number of plate appearances from the length of the list.

Challenge

Sabermetrics is the application of statistics to the management of baseball teams that was popularized with the the book Moneyball. Sabermetrics tries to improve upon the batting average metric by not just computing a batter's hits but by also incorporating a batter's contribution to the number of runs scored. The runs created stat can be computed with the following formulas:

total_bases = (singles) + (2 × doubles) + (3 × triples) + (4 × home_runs)
runs_created = ((hits + walks) × (total_bases)) / plate_appearances

>> ortiz = [-1, 1, 4, -1, 0, 4, 1, -1, 0, 1, 0, 1, 2, 0, 1, 2, 1, -1, 1, 0, 0, 0, -1, 0]
>> print(runs_created(ortiz))
15.04166666666666

Birthday Paradox

Suppose someone comes into the class, and bets you $100 that some two people share a birthday within the individuals of the class. Do you share your birthday with anyone in the class? Is there a shared birthday between any two people in the class? It seems that the odds would be pretty slim. However, if you are putting up $100 on this bet, you might want to figure out your odds of winning this bet.

Details

In a file called birthdays.py, create a function called shared_probability(group_size). This function takes one positive integer parameter, the number of people in a given group. This function should return a floating point value in the range [0, 1], a probability that there is a shared birthday in a group of size group_size.

Use a Monte Carlo simulation to estimate the probability. The simulation should repeatedly generate a list of group_size random birthdays. The probability of two people sharing a birthday is the fraction of simulations where there is a shared birthday.

What is the minimum group size that the probability exceeds 50%? What is the minimum group size that there is a virtual guarantee (0.9999) there is a shared birthday?

Example

>> print(shared_probability(1))
0.0
>> print(shared_probability(366))
1.0

Create a separate function simulate_birthdays(group_size) that generates a list of group_size random birthdays. Use the in operator to test if any of the birthdays are the same day. The simulate_birthdays function should return True if any days in the list are the same and False otherwise. The shared_probability function should call the simulate_birthdays function many times and count the number of times it returns True and the number of times it returns False. Note, generating a random birthday does not have to be as complicated as generating a random month and a random day. Instead, represent each birthday as a number of days from January 1st. Assume there are always 365 days in a year.

Challenge

The above program assumes that birthdays are uniformly distributed: For a given person, their odds of being born on a particular date is the same for all dates. However, it is probably pretty obvious to you that this is not true in real life cases. In fact, you are much more likely to be born in the months July - October than any other month in the year.

How could you use the random number generator, which outputs numbers in a uniform distribution, to generate some skewed non-uniform random distribution? Try to redo your shared_probability function using this new, non-uniform distribution of birthdays. How do the results for the above questions change? Do they get higher, or lower? You may assume, for simplicity sakes, that all months have 30 days for this challenge portion.

Submission

Please show your source code and run your programs for the instructor or lab assistant. Only a programs that have perfect style and flawless functionality will be accepted as complete.