CPSC 220 Fall 2004
Program 3: A Trie Dictionary Implementation
Your assignment is to write and analyze an interactive dictionary
program that reads in and stores a list of words, then
allows the user to look up words until he or she asks to quit. Follow
the guidelines below:
- The dictionary should be implemented using an alphabetic trie as
discussed in class. Your trie should have the following properties:
- Each internal node should be capable of holding an edge for each letter
A..Z (upper case only). However, non-null branches should be created only
when necessary to distinguish between entries. So if the trie contains
entries "anteater" and "another", there should be an edge from A at the root,
from N at the next node, and from T and O at the next node, but then the words
"anteater" and "another" should be entered at leaves at that level.
(You may wish to add one more level for an end-of-word character.)
If the word
"antelope" is then entered, additional edges will be needed to distinguish
it from "anteater."
- Each external node will be labeled with the word that is represented
by the path from the root to that node, so each external node should be
capable of holding a word. However, since external nodes can become
internal nodes during the insertion process, it may be convenient
to define a single
type of trie node that holds both edges and a word. This allows
it to serve as an internal or external node.
- Write a driver that takes a filename and uses your Trie class
to create a dictionary
from the words (alphabetic tokens) in the file. Skip over any tokens
that contain non-alphabetic characters. (Note that such characters may
not appear at the beginning of the word.) After creating the
dictionary, your driver should enter an
interactive loop in which it asks the user for a word and then says "Found" or
"Not Found".
- Run your program on some small files, then try it on /usr/share/dict/linux.words, which is an online dictionary file of about 45,000 words.
Using the Java method
long System.currentTimeMillis(), which returns the current system
time in milliseconds,
find the time required to build the trie for the
first 1000 words,
then for the first 2000 words, then for the first 4000 words,
then for the first
8000 words. In each case also find the time to do a worst-case search (think
about how to produce this time.) Record both the build and search
times and compare the results to the balanced and unbalanced BST times from your last program.
What to turn in
Turn in hardcopy of your Trie class and your driver
along with your recorded times and
discussions of these times (these must be typed). E-mail your programs
and writeups to me
with cpsc220 prog3 in the subject.