Lecture 31 - Dictionaries


As usual, create two directories for today's class. Create a directory called lecture31 under activities, and a directory called lab31 under labs.


Dictionaries

We explored the associativity of dictionaries on Monday. Today, we will explore how we can better iterate over the dictionaries. Then, we will use these new techniques to create a program that potentially solves a complex problem.


Lab Activity 1
Code Lab

It worked decently well on Monday, so here we are again. Once again, try them first on your own, then discuss your answers with your partner


Lab Assignment 31
Histograms

Histograms allow us to visualize the distribution of some data set. For example, We could visualize the distribution of the birth months of children:

Where the purple bars are the expected distributions based on month lengths, and the blue bars are the actual, observed distributions from some data set. In a previous lab, we used this information to change the probability distributions for computing the birthday paradox. Today, We will generate a histogram that could possibly be used to break a substitution cipher.

Details

Create a program in a file called histogram.py. This program will read a block of text (using input), and will display a histogram. You will need to keep track of the characters that you find, store them in a dictionary with their associated frequencies, and then display a bar graph in ther terminal. Your output should be ordered from most commonly occuring (with ties broken arbitrarily).

Example

$ python3 histogram.py
Enter your text to analyze: This is a test!

s: ###
 : ###
i: ##
t: ##
T: #
h: #
a: #
e: #
!: #

Hint

 

Challenge

A substitution cipher is a non-linear cipher that relies on a "look-up" table (the key for the cipher) in order to accomplish the encryption. It avoids the problems of the Caesar cipher, because the mapping from plain-text character to cipher-text character is not a linear function; you have to figure out each character individually in a substitution cipher.

If something is known about the plain-text, we can make an attempt at figuring out the entirety of the mapping. Suppose it is known that the plain-text was written in English. We know the distribution of characters in the English language:

If we take the most commonly occurring character from the cipher-text, chances are it is the encrypted version of 'e'. The second most common character is probably the encrypted version of 't'. The third most common is probably the encrypted version of 'a', and so on.

You are provided with the module ciphers.py, which includes functions encode_substitution and decode_substitution. Create a program in a file called hack_substitution.py. This program should read in a cipher-text string, and attempt to decode the string using the sequence of the most commonly occurring characters. Try your program on this encoded file. You can redirect a file to your program by using the "<" operator at the command line.

  $ python3 hack_substitution.py < encoded.txt

How well does it work? Can you at least read some words? Why does this not work out exactly correctly?


Submission

When you have finished, create a tar file of your lab31 directory. To create a tar file, execute the following commands:

cd ~/cs120/labs
tar czvf lab31.tgz lab31/

To submit your activity, go to cseval.roanoke.edu. You should see an available assignment called Lab Assignment 31. Only one of your pair should submit your activity. Make sure both partners are listed in the header of your files.

Do not forget to email your partner today's files!


Last modified: Thu Nov 7 21:44:28 EST 2013