CPSC120 Assignment 8

What Your Email Says About You

A researcher from the Georgia Institute of Technology found that certain phrases in corporate emails are indicative of a hierarchical relationship between the sender and receiver. For example, “thought you would” suggests the email sender is higher up, while “let’s discuss” suggests the receiver is higher up. If you have a free email account, the provider may be using similar techniques to figure out who you are in order to target advertisements.

Details

Write a program that attempts to identify the unique words that individuals are prone to use. The program should prompt the user to enter two different pieces of text, each written by a different individual. For each input text, the program should print the words that do not occur in the other input text at all. Along with each unique word the program should print the number of times the unique word appears in the text. The program should ignore capitalization and punctuation. That is, the strings “spam”, “Spam”, and “spam,” should all be considered the same.

Submission: Submit your code as a zip file on the course Inquire site by 5PM on Friday November 9th.

Test Data

The test data for the program should consist of two text files. One text file should contain the input to the program, and the other text file should contain the expected output.The input file should consist of two lines of text and include several unique words, words with different capitalizations, and punctuation. The output file should consist of what the program will print if the two lines in the input file are entered as the two text sequences to the program.

Submission: Submit your test data as a zip file on the course Inquire site by 9AM on Monday November 5th.

Extra

Frequently Unique Words: Write a program that prints the unique words from two different input text sequences in order from most frequent to least frequent.

Finding Key Phrases: Write a program that attempts to classify a person as belonging to one of two groups by analyzing some text the person has written. Begin by first choosing the two groups your program will attempt to identify. Some examples include professor or student, man or woman, and Shakespeare or Hemingway. Next, find a list of frequently used unique words by redirecting a file containing the two text sequences that exemplify the two groups to your frequent unique word program.

The lists of frequently occurring unique words can be hard-coded into the program. The program should prompt the user to enter some text and print which group the author of the text belongs to. The program should determine the author’s group by finding which list of words have more occurrences in the text.