CPSC 120 B




Lecture
MWF 10:50am - 11:50am
Lab
MWF 12:00pm - 1:00pm

Scotty Smith

Office
Trexler 365-B
Office Hours
Monday / Thursday
3:00pm - 5:00pm
Email
chssmithATroanoke.edu

< Back
Printable Copy

Lecture 31 - Dictionaries


As usual, create two directories for today's class. Create a directory called lecture31 under activities, and a directory called lab31 under labs.


Dictionaries

We explored the associativity of dictionaries on Monday. Today, we will explore how we can better iterate over the dictionaries. Then, we will use these new techniques to create a program that potentially solves a complex problem.


Lab Activity 1
Code Lab

It worked decently well on Monday, so here we are again. Once again, try them first on your own, then discuss your answers with your partner


Lab Assignment 31
Histograms

Histograms allow us to visualize the distribution of some data set. For example, We could visualize the distribution of the birth months of children:

Where the purple bars are the expected distributions based on month lengths, and the blue bars are the actual, observed distributions from some data set. In a previous lab, we used this information to change the probability distributions for computing the birthday paradox. Today, We will generate a histogram that could possibly be used to break a substitution cipher.

Details

Create a program in a file called histogram.py. This program will read a block of text (using input), and will display a histogram. You will need to keep track of the characters that you find, store them in a dictionary with their associated frequencies, and then display a bar graph in ther terminal. Your output should be ordered from most commonly occuring (with ties broken arbitrarily).

Example

$ python3 histogram.py
Enter your text to analyze: This is a test!

s: ###
 : ###
i: ##
t: ##
T: #
h: #
a: #
e: #
!: #

Hint

  • Create a function called compute_frequencies(input_text), which takes a string as a parameter. This function should return a dictionary that maps the characters of input_text to an integer count of their frequencies.

    This function will iterate (loop over the items, in this case the characters) through the input_text. You will need to first check to see if the character is a key in the dictionary. If it is not, add it to the dictionary with a value of 1. Otherwise, increment the value of the key by 1.

  • To order the histogram in decending order, you need to simply find the maximum value in the dictionary. Once you have the maximum value, you need perform a reverse lookup to find a key associated with that value. If only you knew how to perform a reverse lookup …

  • If you remove the key with the maximum value from the histogram, all you have to do is perform the above process again to get the second most commonly occurring character. You can continue this process, until the dictionary is empty.

 

Challenge

A substitution cipher is a non-linear cipher that relies on a "look-up" table (the key for the cipher) in order to accomplish the encryption. It avoids the problems of the Caesar cipher, because the mapping from plain-text character to cipher-text character is not a linear function; you have to figure out each character individually in a substitution cipher.

If something is known about the plain-text, we can make an attempt at figuring out the entirety of the mapping. Suppose it is known that the plain-text was written in English. We know the distribution of characters in the English language:

If we take the most commonly occurring character from the cipher-text, chances are it is the encrypted version of 'e'. The second most common character is probably the encrypted version of 't'. The third most common is probably the encrypted version of 'a', and so on.

You are provided with the module ciphers.py, which includes functions encode_substitution and decode_substitution. Create a program in a file called hack_substitution.py. This program should read in a cipher-text string, and attempt to decode the string using the sequence of the most commonly occurring characters. Try your program on this encoded file. You can redirect a file to your program by using the "<" operator at the command line.

  $ python3 hack_substitution.py < encoded.txt

How well does it work? Can you at least read some words? Why does this not work out exactly correctly?


Submission

When you have finished, create a tar file of your lab31 directory. To create a tar file, execute the following commands:

cd ~/cs120/labs
tar czvf lab31.tgz lab31/

To submit your activity, go to cseval.roanoke.edu. You should see an available assignment called Lab Assignment 31. Only one of your pair should submit your activity. Make sure both partners are listed in the header of your files.

Do not forget to email your partner today's files!


Last modified: Thu Nov 7 21:44:28 EST 2013