CPSC120
Fundamentals of Computer Science

Activity 25

Lists

Batting Average

The book, and later movie, Moneyball described how the use of statistics changed baseball. Using statistics like batting average can help managers make decisions. Batting average is the ratio of hits to at-bats. Computing a player's batting average requires counting the different types of hits a player makes.

Details

Write the Python function batting_average(appearances: [int]) -> float that computes a player's batting average. The parameter appearances is a list of integers that encodes the results of a baseball player's appearances at-bat. Each appearance can be interpreted as follows:

4 Home Run
3 Triple
2 Double
1 Single
0 Walk
-1 Out

The function should return the batting average for the appearances in the list. The batting average can be computed using the following equations:

\( num\_at\_bats = num\_appearances - num\_walks\)

\( num\_hits = num\_singles + num\_doubles + num\_triples + num\_home\_runs\)

\( batting\_average = num\_hits / num\_at\_bats \)

Test Cases

import test

def batting_average(appearances: [int]) -> float:
    # Put your code here

def main() -> None:
    test.equal(batting_average([1, 1, -1]), 0.6666666666)
    test.equal(batting_average([-1, 1, -1, -1, 1, -1, -1, 2, -1, -1, -1, -1, -1, -1, -1, -1, 1, -1, 0, -1, 0, -1, -1, -1, 2, 1, -1, -1, -1, -1, 1, 0, -1, -1, -1, 1, -1, -1, 1, -1, 1, -1, 0, -1, 1, 1, -1, -1, 2, 0, -1, -1, -1, -1, -1, -1, -1, 1, -1, 2, -1, -1, -1, -1, -1, -1, -1, 1]), 0.25396825396825395)
    return None

main()

Hint

  • To compute the batting average, the function needs to count the number of walks, singles, doubles, triples, and home runs. To do this, count the number of 0's, 1's, 2's, 3's, and 4's.

  • Count an integer by iterating over all the list elements and incrementing an accumulator every time the integer is encountered.

Challenge

Sabermetrics is the application of statistics to the management of baseball teams that was popularized with the book Moneyball. Sabermetric tries to improve upon the batting average metric by not just computing a batter's hits but by also incorporating a batter's contribution to the number of runs scored. Write the function runs_created(appearances: [int]) -> float that returns the Sabermetrics runs created statistic for the specified appearances list. The runs created stat can be computed with the following formulas:

\( total\_bases = (num\_singles) + (2 \cdot num\_doubles) + (3 \cdot num\_triples) + (4 \cdot num\_home\_runs)\)

\(runs\_created = ((num\_hits + num\_walks) \cdot (total\_bases)) / num\_appearances \)

Birthday Paradox

Suppose someone comes into our class and bets you $100 that two people in the room share a birthday. It seems that the odds would be pretty slim. However, if you are putting up $100 on this bet, you might want to figure out your odds of winning this bet.

Details

Write the function shared_probability(group_size: int) -> float that computes the probability that two people in a group share a birthday. The parameter group_size is a positive integer that specifies the number of people in a group. The function should return a floating-point value between 0 and 1, the probability that there is a shared birthday in a group of size group_size. A probability of 0.0 means a 0% chance, and a probability of 1.0 means a 100% chance. The function should not compute the exact probability; instead, it should approximate the probability with repeated simulation.

To simulate a group of people, create a list of group_size integers where each number represents a birthday. There are 365 different birthdays, ignoring leap years, so the list should be filled with random numbers between 0 364. If there are any duplicate numbers in the list, then two people share a birthday.

To compute the probability, run the simulation many times. The more times it runs, the more accurate it will be. The probability is the average number of times that two people share a birthday.

Once you have written and tested the function, use it to answer the following question:

What is the minimum group size that the probability exceeds 50%?

Example

import test

def shared_probability(group_size: int) -> float:
    # Put your code here

def main() -> None:
    test.equal(shared_probability(1), 0.0)
    # Find the smallest group size with probablility over 0.5
    return None

main()

Hint

Functions are your friends. They make writing complex functions more simple by beaking it into more simple, easier to write functions. Here are some functions that would make writing the shared_probability function easier:

  • create_group(group_size: int) -> [int] - return a new list of group_size random integers in the range [0, 364]

  • are_duplicates(group: [int]) -> bool - return whether there are any elements of the list group that are identical.

With these two functions -- don't forget to test them -- the shared_probability function should repeatedly create a group, counting the number of times there are duplicate birthdays. The probability is the number of groups with a duplicate divided by the number of groups created.

Challenge

The above program assumes that birthdays are uniformly distributed: For a given person, their odds of being born on a particular date is the same for all dates. However, it is probably pretty apparent to you that this is not true in real-life cases. In fact, you are more likely to be born in July through October than any other month in the year.

How could you use the random number generator, which outputs numbers in a uniform distribution, to create a non-uniform random distribution? Redo your shared_probability function using this new, non-uniform distribution of birthdays. How do the results for the above questions change? Do they get higher or lower? You may assume, for simplicity sakes, that all months have 30 days for this challenge portion.