As usual, create two directories for today's class. Create a
directory called lecture31
under activities, and
a directory called lab31
under labs.
We explored the associativity of dictionaries on Monday. Today, we will explore how we can better iterate over the dictionaries. Then, we will use these new techniques to create a program that potentially solves a complex problem.
It worked decently well on Monday, so here we are again. Once again, try them first on your own, then discuss your answers with your partner
Histograms allow us to visualize the distribution of some data set. For example, We could visualize the distribution of the birth months of children:
Where the purple bars are the expected distributions based on month lengths, and the blue bars are the actual, observed distributions from some data set. In a previous lab, we used this information to change the probability distributions for computing the birthday paradox. Today, We will generate a histogram that could possibly be used to break a substitution cipher.
Create a program in a file called histogram.py
. This
program will read a block of text (using input
), and will
display a histogram. You will need to keep track of the characters
that you find, store them in a dictionary with their associated
frequencies, and then display a bar graph in ther terminal. Your
output should be ordered from most commonly occuring (with ties broken
arbitrarily).
$ python3 histogram.py Enter your text to analyze: This is a test! s: ### : ### i: ## t: ## T: # h: # a: # e: # !: #
Create a function called
compute_frequencies(input_text)
, which takes a
string as a parameter. This function should return a dictionary
that maps the characters of input_text to an integer count of
their frequencies.
This function will iterate (loop over the items, in this case the characters) through the input_text. You will need to first check to see if the character is a key in the dictionary. If it is not, add it to the dictionary with a value of 1. Otherwise, increment the value of the key by 1.
To order the histogram in decending order, you need to simply find the maximum value in the dictionary. Once you have the maximum value, you need perform a reverse lookup to find a key associated with that value. If only you knew how to perform a reverse lookup …
If you remove the key with the maximum value from the histogram, all you have to do is perform the above process again to get the second most commonly occurring character. You can continue this process, until the dictionary is empty.
A substitution cipher is a non-linear cipher that relies on a "look-up" table (the key for the cipher) in order to accomplish the encryption. It avoids the problems of the Caesar cipher, because the mapping from plain-text character to cipher-text character is not a linear function; you have to figure out each character individually in a substitution cipher.
If something is known about the plain-text, we can make an attempt at figuring out the entirety of the mapping. Suppose it is known that the plain-text was written in English. We know the distribution of characters in the English language:
If we take the most commonly occurring character from the cipher-text, chances are it is the encrypted version of 'e'. The second most common character is probably the encrypted version of 't'. The third most common is probably the encrypted version of 'a', and so on.
You are provided with the module ciphers.py, which
includes functions encode_substitution
and
decode_substitution
. Create a program in a file called
hack_substitution.py
. This program should read in a
cipher-text string, and attempt to decode the string using the
sequence of the most commonly occurring characters. Try your
program on this
encoded file. You can redirect a file to your program by using the
"<" operator at the command line.
$ python3 hack_substitution.py < encoded.txt
How well does it work? Can you at least read some words? Why does this not work out exactly correctly?
When you have finished, create a tar file of your lab31
directory. To create a tar file, execute the following commands:
cd ~/cs120/labs tar czvf lab31.tgz lab31/
To submit your activity, go to cseval.roanoke.edu. You should
see an available assignment called Lab Assignment 31
. Only
one of your pair should submit your activity. Make sure both partners
are listed in the header of your files.
Do not forget to email your partner today's files!