Post Lab 11: Frequency Analysis

Due Friday Before Class on December 2, 2011

Rumor has it that Boris and Natasha have a new plot for world domination. You have been recruited by the CIA to help scan network traffic for encrypted messages that detail their plans (with a proper warrant, of course). Fortunately for you Boris and Natasha use the relatively simple encryption method, the substitution cipher, which you wrote an encryption program for in the last lab.

Your job is to write a program that helps reverses the encryption and allows you to uncover the details of the plot. This would be simple if you knew the cypher alphabet, but you do not (Boris and Natasha aren't that dumb). One solution would be to use "brute force" - try substituting all 26 letters until you find a substitution that works. However, a better way to break the code is to use a technique called frequency analysis. For any substitution cipher, such as the substitution cipher, where a given letter is always replaced by some fixed letter, the frequency of letters is not hidden by the cipher. Hence, the best way to try to break the cipher text is to count the number of instances of each letter in the cipher text, then try substituting the most frequent letter in English for the most frequent letter in the cipher text. If that doesn't work, substitute the next most frequent letter in English for the most frequent letter in the cipher text. Eventually you get a substitution that "breaks" the cipher text.

Your program should read a line of text from the command line and print the number of occurrences of every letter of the alphabet in the input text. The program should then print the six most frequently occurring letters in order from most frequent to least frequent. These letters are probably the substitution for the most frequent letters in English, which in order of descending frequency are e, t, a, o, i, and n. Finally, the program should print the original enciphered text with the six most frequent letters substituted. The original enciphered letters should be in lower-case, and the deciphered letters should be in upper-case. In order to do this, the program should:

The program's output should look similar to the following. Note, the deciphered text is incorrect because the input text is too small.

Frequency Analysis
------------------

Enter the enciphered text: pbxh bd! pbxh bd! hbqy zmggdp qy kxsfqsa nxk! bm bxhb vmms vqhhms vr hbm hxexshjgx. 

Letter occurrences:
a: 1
b: 9
c: 0
d: 3
e: 1
f: 1
g: 3
h: 9
i: 0
j: 1
k: 2
l: 0
m: 6
n: 1
o: 0
p: 3
q: 4
r: 1
s: 5
t: 0
u: 0
v: 3
w: 0
x: 8
y: 2
z: 1

Six most frequent letters in descending order: b h x m s q 

Potential deciphered text: pEAT Ed! pEAT Ed! TENy zOggdp Ny kAIfNIa nAk! EO EATE vOOI vNTTOI vr TEO TAeAITjgA.

Other requirements: As usual you must use good programming techniques and follow the course code conventions. Be sure to thoroughly test your code before submitting.

Submission: