CPSC 120 - Assignment #4

CPSC 120 -- Assignment #4: Cryptology
Due Friday, December 10, 2010 by 4 p.m.

Security is a major concern in modern computing systems. Confidential information stored in computer databases and transmitted over computer networks needs to be protected from unauthorized access. The systems themselves must be restricted to authorized users. One of many means of ensuring such security is by encrypting data that is stored and transmitted by computers, and using encrypted passwords to limit access to systems. Encryption is the process of converting data (generally called a message) from its original form, called plaintext, into "scrambled" form, called ciphertext. The process of turning ciphertext back to plaintext is called decryption. A goal in systems security is to find encryption algorithms (called ciphers) that are relatively easy to compute but difficult to "reverse" (to break) without some knowledge of the decryption method (especially the "key" needed to decrypt). Most modern algorithms rely heavily on number theory and use the fact that it is difficult to factor very large numbers (a few hundred digits long) in a reasonable length of time. Early algorithms, however, relied on much simpler ideas. One of the simplest algorithms is called the monoalphabetic substitution cipher. In this cipher each letter in the plaintext is replaced by another letter in the alphabet. Basically there is the regular alphabet (called the plain alphabet) and the corresponding scrambled alphabet (called the cipher alphabet) that defines the substitutions. This type of cipher was used in exercise 3 of Lab 11 and is used in lots of puzzles such as CryptoQuote in the daily newspaper. Unfortunately it is very easy to "break" - that is figure out the plaintext message if you have the ciphertext. The key to breaking it is frequency analysis; that is, figure out the frequency of each letter in the ciphertext and try substituting the most frequent letters in the English language for those. Ever since this was figured out by Arab cryptanalysts (people who figure out how to "break" ciphers) in the ninth century, cryptographers have tried to come up with new ciphers that would defy frequency analysis. One such early method was the homophonic substitution cipher. In this cipher, each letter has possibly many different substitution symbols. The idea is for each letter to have a number of substitutes that is proportional to its frequency in the English language.

In this assignment you will write a program to implement a homophonic substitution cipher. Recall that in lab 11 the "key" to the cipher was an array of characters to represent the cipher alphabet. For the homophonic substitution cipher we need a list of substitution characters for each letter. One way to do this is to represent the key as an array of strings. Each string will consist of the characters that can substitute for the corresponding letter. The following illustrates part of such an array.


      Plain Alphabet    Key Array    Cipher Alphabet (Substitution symbols)
                         ------
                        |      |
           a            | -----|---> "#4Avz"
                        |      |                  
                        |------|
                        |      |
           b            | -----|---> "?"
                        |      |                  
                        |------|
                        |      |
           c            | -----|---> "q7>"
                        |      |                  
                        |------|
   
                           ...

To encipher a message, each letter in the message is replaced with a randomly chosen character in the corresponding string of substitution symbols. For example, using the above example key the plaintext message "cabba" might by enciphered as "7A??#" or as "qv??4". There are several other possibilities.

To decipher a message, each letter in the ciphertext must be replaced with the corresponding plaintext letter. To do this there needs to be a way to find the correct letter. For example, in the illustration above the symbol > in a ciphertext message must be replaced with the letter c. So, there needs to be a way to find the symbol > to know which letter it corresponds to. One way to do this is to search through every string in the key array every time. A slightly more efficient method is to build a "reverse" key as the key is read in. For that we will use the idea of parallel arrays - two arrays with related entries. One array will be an array of the symbols in the cipher alphabet (so an array of characters), the other array will be an array of the corresponding letters. This can be visualized as follows for the key above.


      Cipher Alphabet     Corresponding Plain
         Symbols            Alphabet Letters

          ------                ------
         |  #   |              |  a   |
         |------|              |------|
         |  4   |              |  a   |                  
         |------|              |------|
         |  A   |              |  a   |
         |------|              |------|
         |  v   |              |  a   |                
         |------|              |------|
         |  z   |              |  a   |
         |------|              |------|
         |  ?   |              |  b   |       
         |------|              |------|
         |  q   |              |  c   |                
         |------|              |------|
         |  7   |              |  c   |
         |------|              |------|
         |  >   |              |  c   |       
         |------|              |------|
            ...                  ...

Details

Write a class to represent a homophonic substitution cipher and, to test your class, a write a program that creates a cipher object, reads in the key, then allows the user to encipher and decipher messages (use a menu of options similar to that in the program for testing methods in the CDCollection class from lab 11). The class needs to have the following:

Instance Data:

An array of 26 String objects representing the key (as described above).
An int for the number of symbols in the cipher alphabet.
An array of characters representing the symbols in the cipher alphabet. This array should initially be large enough for 100 symbols but it may need to be increased if the user has a larger cipher alphabet. (This idea was used in the CDCollection class in the textbook. See the increaseSize method on page 343.)
An array of characters representing the plain alphabet letters corresponding to each letter in the cipher alphabet.

Methods:

A constructor that takes no parameters. All it does is instantiate each array.
A method readKey that takes one String parameter representing the name of a file that contains the key. This method will read the key information in from the file and set up the encryption key array and the two arrays for decryption. This method should return nothing. (Details about reading from a file and what this method should do are below.)
A method encipher that takes one String parameter representing the name of a file that contains a plaintext message. The method will encipher the message and return the result as a string. (Details below)
A method decipher that takes one String parameter representing the name of a file that contains a ciphertext message. The method will decipher the message and return the result as a string.
Accessor methods for each instance variable.
A private method increaseSize that will increase the size of two arrays that hold the "reverse" key information (the array of cipher alphabet symbols and the corresponding plain alphabet letters). The method should double the size of each array.
A private method findSymbol that takes a character as a parameter and searches the array of characters in the cipher alphabet for that character. The method should return the index of the character if it is found and -1 if it is not.

Reading from a Text File

Section 4.6 of the textbook (pages 137 - 141) discusses the idea of iterators and reading data from a textfile. Another example is in the file FileExample.java which uses a method in Course.java to read in data. Note that both the main method and a method in a class that reads from a file must throw an IOException. You can try this program out using the files grades1.txt, grades2.txt, and grades3.txt.

The readKey method

The method will read the key information from a file and set up the arrays. The format of the file will be as follows: There will be 26 lines in the file. Each line corresponds to a letter of the alphabet with the first line corresponding to a, the second to b, and so on. The first entry on each line will be an integer representing the number of cipher symbol substitutes for the letter. Then there will be a list of integers representing the Unicode value of each symbol. For example, the file that corresponds to the example key above would be as follows (see the Unicode chart on page 788):

        5  35  52  65  118   122
        1  63
        3  113  55  62
        ...

NOTE: Your documentation for the method must describe the file format!

For each plain alphabet letter, the method must read in the Unicode value of each corresponding symbol and use that to compose the string for the key array. It must also keep track of the number of cipher alphabet symbols and create entries in both the array of cipher symbols and the array of corresponding plain alphabet characters.

The encipher method

The encipher method will read in a message from a file and, for each letter, substitute a random corresponding cipher alphabet letter. Each line from the file should be converted to lowercase after being read in so you only need to worry about lowercase letters. Characters that are not letters (such as blank spaces and punctuation symbols) should be omitted from the ciphertext (just throw them away). That means the ciphertext will have no blank spaces or punctuation symbols. To encipher a letter a random index needs to be generated to choose a random substitution symbol.

Some Test Files

The following are files you can use for testing your program.

The following files contain keys: key1.txt, key2.txt, and key3.txt. The first key contains only symbols from the standard ASCII character set (from the chart in the book) but the other two contain some symbols from the extended character set that may not print correctly on some computers (they should on the lab computers). Even if your computer does not correctly print the characters the encryption and decryption should work correctly.
The following files contain plaintext messages you may use to test the encipher method (using any of the keys): msg1.txt, msg2.txt, and msg3.txt.
The following files contain ciphtertext messages you may use to test the decipher method: secret1.txt, secret2.txt, and secret3.txt (these were enciphered using key1 - you should be able to tell if secret1.txt is deciphered correctly; secret2.txt is an encrypted version of the first paragraph on page 787 of the textbook and secret3.txt is an encrypted version of msg3.txt); secret4.txt, secret5.txt and secret6.txt (these were enciphered using key2); secret7.txt, secret8.txt and secret9.txt (these were enciphered using key3).

Requirements:

You must use meaningful names for variables, constants, and your class. Use Java conventions for case -- constants (declared with final) are all uppercase (with the underscore separating words), variables start with a lowercase letter (but each separate word within the variable starts with a capital), the class name is capitalized. Choosing meaningful names makes your program easier to read and follow.
You must use constant identifiers where appropriate (there are several constants in the program).
Use white space (blank lines and blank spaces) in your program to make it easier to read. Separate the sections of the program with blank lines. Indent and align your code properly (emacs helps you do this - it automatically indents in most cases -- if a line isn't indented correctly go to it and press TAB OR select the whole program using CTRL-x, CTRL-p or your mouse then use ALT-CTRL-\. If TAB doesn't make it indent correctly you probably have a syntax error).
Document your program. At the top you must have documentation that includes the file name, a brief description of what the program does (this MUST include a clear description of the output the program will produce), the date, and your name. Use inline documentation to document each major section of code. However, don't document every line!! The header documentation for the class must have a description of the purpose of the class (what type of objects it represents) or of the program. Have header documentation at the beginning of each method that describes what the method does, what it takes as parameters, and what it returns. This documentation should have borders that make it stand out (so it is easy to see where methods begin). Be sure to include a description of the format of the file when a method reads data from a file.

Warning: A program that does not compile will receive at most 20% on the assignment even if the error is a small one preventing compilation.

Submission

Tar your assignment 4 directory and copy the tar file containing your code to the directory:

   ~ingram/CPSC120/yourname

Academic Integrity Reminder!!! Programming assignments are to be your own work. You may get help on the specifics of the assignment from no one except the instructor. You may not show your program to anyone or look at anyone else's program or share ideas with anyone about how to write the program.