< Back

Lab 32: Spell Checking

As usual, create a directory to hold today's files. All programs that you write today should be stored in this directory.

$ cd ~/cs120/labs
$ mkdir lab32
$ cd lab32


Spell Checking

You are probably as reliant on spell checking as I am. It is a process that has evolved tremendously over the past couple of decades. Real spell checkers us a data structure known as a Tree to make retreiving of mispelled words incredibly efficient. However, if you ignore the efficiency, generating some close suggestions for mispelled words is not terribly complicated

Details

Create a file called spell_check_word.py That will allow a user to type in a word to get spell checked. Your program will read in a dictionary of words that it knows are correctly spelled. If the word the user typed in is in the dictionary file, then output that the word is spelled correctly. If it is not in the dictionary file, output all of the words that have the smallest hamming distance with the word the user typed in.

The hamming distance between the two words are the number of characters that need to be flipped in order to make the words the same. Note that the hamming distance between two words is only defined when the strings are of the same length.

Word 1 Word 1 Hamming Distance
hello hello 0
hello helol 2
hello world 4

Example

$ python3 spell_checker.py 
What do you want to spell check? helol
Suggestions:
helot

$ python3 spell_checker.py 
What do you want to spell check? hello
hello is spelled correctly

Hint

  • Create a function called compute_hamming_distance(word1, word2). This function takes two strings of the same length as a parameter. It should return the hamming distance between the two words, as an integer.
  • Create a function called read_dictionary_file(file_name). The function should take a single parameter, a string which is the name of a text file representing the correctly spelled words. It should return a list of the strings from the file. Make sure you strip the whitespace off of the words from the file, otherwise your length comparisons will be incorrect.
  • You need to iterate over the entirety of the word list to find the list of suggestions. You will keep track of the smallest known hamming distance so far, as well as the list of words that have that hamming distance. If you ever find a word with a smaller hamming distance, clear out the old list and update the current known smallest hamming distance.
  • After you have tested the above functions, write a function that performs all of the steps of the program. It should use the input function to get a word from the user, and it should print out the list words with the minimum hamming distance.