CPSC120A - Lab 26

Build Dictionary

Write the function build_dictionary(keys, values) that creates a dictionary out of a list of keys and a list of values. The parameter keys is a list of immutable objects. The parameter values is a list of values that is the same length as keys. The function should return a new dictionary that contains the specified keys with their respective values. That is, keys[i] is associated with values[i] for all i less than the length of keys.

Test Cases

print(build_dictionary(['a', 'b', 'c'], [1, 2, 3]))
print(build_dictionary([1, 2, 3], ['a', 'b', 'c']))
print("Input:    ['a', 'b', 'c'], [1, 2, 3]")
print("Actual:  ", build_dictionary(['a', 'b', 'c'], [1, 2, 3]))
print("Expected: {'a': 1, 'b': 2, 'c': 3}")
print()
print("Input:    [1, 2, 3], ['a', 'b', 'c']")
print("Actual:  ", build_dictionary([1, 2, 3], ['a', 'b', 'c']))
print("Expected: {1: 'a', 2: 'b', 3: 'c'}")

Dictionary Increment

Write the Python function increment(dictionary, key) that increments a value in a dictionary. The parameter dictionary is a dictionary with string keys and integer values. The parameter key is a string. The function should not return anything. Instead it should modify the dictionary by adding 1 to the value associated with key. If the key does not exist in the dictionary, the function should create an entry for the key with the value of 0 and then increment the value.

Test Cases

dictionary = {'a': 1, 'b': 2, 'c': 3}
print("Input:   ", dictionary)
increment(dictionary, 'a')
print("Actual:  ", dictionary)
print("Expected: {'a': 2, 'b': 2, 'c': 3}")
print()
dictionary = {'a': 1, 'b': 2, 'c': 3}
print("Input:   ", dictionary)
increment(dictionary, 'b')
print("Actual:  ", dictionary)
print("Expected: {'a': 1, 'b': 3, 'c': 3}")

Histogram

Histograms allow us to visualize the distribution of some data set. For example, We could visualize the distribution of the birth months of children:

Figure 1: Distribution of Birthdays by Month, from http://www.panix.com/~murphy/bdaymm.gif

Where the purple bars are the expected distributions based on month lengths, and the blue bars are the actual, observed distributions from some data set. In a previous lab, we had the opportunity to use this information to change the probability distributions for computing the birthday paradox. Today, We will generate a histogram that could be used to break a substitution cipher.

Details

Create a program that prompts the user to enter text and prints a sorted histogram that represents the number of occurrences of the characters in entered text. The program should use a dictionary to represent the histogram, with the keys of the dictionary as characters and the values of the dictionary as integers.

Example

Enter some text: hello
l: ##
h: #
o: #
e: #

Enter some text: This is a test!
s: ###
 : ###
i: ##
t: ##
T: #
h: #
a: #
e: #
!: #

Hint

This activity can be broken into two steps, creating the histogram and printing the histogram.

To create the histogram, use an accumulator, however, instead of a single accumulator there will be multiple accumulators all stored in a dictionary. Begin with an empty dictionary. Use a for loop to get each character in the input text and increment the value of the character's entry in the dictionary. If the character is not already in the dictionary, add it before incrementing it.
To print the histogram in descending order, first find the entry with the largest value in the dictionary. Use the dictionary values method and the built-in max function to find the largest value in the dictionary. And use the * operator to print the appropriate number of # characters. Then find the key associated with the max value and use the del operator to delete the entry from the histogram dictionary. Repeat this process for the next largest value until the dictionary is empty.

Challenge

A substitution cipher is a non-linear cipher that relies on a "look-up" table (the key for the cipher) in order to accomplish the encryption. It avoids the problems of the Caesar cipher, because the mapping from plain-text character to cipher-text character is not a linear function; you have to figure out each character individually in a substitution cipher.

If something is known about the plain-text, we can make an attempt at figuring out the entirety of the mapping. Suppose it is known that the plain-text was written in English. We know the distribution of characters in the English language:

From Wikipedia

If we take the most commonly occurring character from the cipher-text, chances are it is the encrypted version of 'e'. The second most common character is probably the encrypted version of 't'. The third most common is probably the encrypted version of 'a', and so on.

Write a program that uses a histogram of character occurances in some encrypted text to crack the key used to encode the text. Once you have the key, use your substitution cipher program to convert the text back to plain text. Once you have tested it on some simple examples, you can test it on this encrypted text. How well does it work? Can you at least read some words? Why does this not work out exactly?