Lecture 28 - Lists


As usual, create two directories for today's class. Create a directory called lecture28 under activities, and a directory called lab28 under labs.


Lists

We have seen a number of data types up to now: int, float, string, boolean even turtle is technically a data type. Today, you will learn your first data structure, lists. Lists are a mechanism for holding other data types. This structure will allow us to store an arbitrary number of values, even if we don't know that number a priori.


Lab Activity 1

Batting Average

The World Series just ended with my favorite team losing. Maybe they would have done better if they had written programs to compute player statistics, like batting average. Batting average is the ratio of hits to at bats. To compute a player's batting average you have to count the different types of hits a player makes.

Create a Python function called batting_average(appearances) in the same file. The function parameter, appearances, is a list of ints that encodes the results of a baseball player's appearances at batting. Each plate appearance can be interpreted as follows:

4Home Run
3Triple
2Double
1Single
0Walk
-1Out

The function should return the batting average for the plate appearances in the list. The batting average can be computed using the following equations:

at_bats = plate_appearances - walks
hits = singles + doubles + triples + home_runs
batting_average = hits / at_bats

>> print(batting_average([1, 1, -1]))
0.6666666666
>> pedroia_at_bats = [-1, 1, -1, -1, 1, -1, -1, 2, -1, -1, -1, -1, -1, -1, -1, -1, 1, -1, 0, -1, 0, -1, -1, -1, 2, 1, -1, -1, -1, -1, 1, 0, -1, -1, -1, 1, -1, -1, 1, -1, 1, -1, 0, -1, 1, 1, -1, -1, 2, 0, -1, -1, -1, -1, -1, -1, -1, 1, -1, 2, -1, -1, -1, -1, -1, -1, -1, 1]
>> print(batting_average(pedroia_at_bats))
0.25396825396825395
The batting_average function would probably be much easier if you had some mechanism to count the number of appearances of some value in the list. Towards that end, create a Python function called count(a_list, element) in the same file. The function should return the number of times element occurs in the the list a_list. It would be wise to test this function before you try to use it in batting_average.
Challenge

Sabermetrics is the application of statistics to the management of baseball teams that was popularized with the the book Moneyball. Sabermetric tries to improve upon the batting average metric by not just computing a batter's hits but by also incorporating a batter's contribution to the number of runs scored. The runs created stat can be computed with the following formulas:

total_bases = (singles) + (2 × doubles) + (3 × triples) + (4 × home_runs)
runs_created = ((hits + walks) × (total_bases)) / plate_appearances

>> ortiz = [-1, 1, 4, -1, 0, 4, 1, -1, 0, 1, 0, 1, 2, 0, 1, 2, 1, -1, 1, 0, 0, 0, -1, 0]
>> print(runs_created(ortiz))
15.04166666666666

Lab Assignment 28

Suppose someone comes into the class, and bets you $100 that some two people share a birthday within the individuals of the class. Do you share your birthday with anyone in the class? Is there a shared birthday between any two people in the class? It seems that the odds would be pretty slim. However, if you are putting up $100 on this bet, you might want to figure out your odds of winning this bet.

In a file called birthdays.py, create a function called shared_probability(group_size). This function takes one positive integer parameter, the number of people in a given group. This function should return a floating point value in the range [0, 1], a probability that there is a shared birthday in a group of size group_size.

What is the minimum group size that the probability exceeds 50%? What is the minimum group size that there is a virtual guarantee (0.9999) there is a shared birthday?

>> print(shared_probability(1))
0.0
>> print(shared_probability(366))
1.0
The shared_probability function should execute a fixed number (some constant defined number greater than one) of simulations for a given group size. Each simulation consists of generating the requested number of random birthdays and determining if there is a shared birthday in the group
Challenge

The above program assumes that birthdays are uniformly distributed: For a given person, their odds of being born on a particular date is the same for all dates. However, it is probably pretty obvious to you that this is not true in real life cases. In fact, you are much more likely to be born in the months July - October than any other month in the year.

How could you use the random number generator, which outputs numbers in a uniform distribution, to generate some skewed non-uniform random distribution? Try to redo your shared_probability function using this new, non-uniform distribution of birthdays. How do the results for the above questions change? Do they get higher, or lower? You may assume, for simplicity sakes, that all months have 30 days for this challenge portion.


Submission

When you have finished, create a tar file of your lab28 directory. To create a tar file, execute the following commands:

cd ~/cs120/labs
tar czvf lab28.tgz lab28/

To submit your activity, go to cseval.roanoke.edu. You should see an available assignment called Lab Assignment 28. Only one of your pair should submit your activity. Make sure both partners are listed in the header of your files.

Do not forget to email your partner today's files!


Last modified: Wed Nov 6 10:43:00 EST 2013