CPSC 220 Fall 2002
Program 5: Enhancing your Search Engine

Overview

In this part of the project you will enhance your search engine. You must do Enhancement #1 and at least one of Enhancements #2 and #3; you may do the other as well for extra credit. You may also enhance your engine in other ways for possible extra credit.

Enhancement #1: "Marking" Nodes in the BFS

In Program 4, you essentially used a breadth-first search of the Web to find web pages to scan. However, you were not required to "mark" pages that you had visited, so when a page that had already been processed was dequeued it would be processed again. Clearly, this is unacceptable. Modify your program so that a) it does not enqueue a URL if it has already been processed, and b) when it dequeues a URL, it checks to see if it has already been processed before processing it again. One way to do this is by using a simple (if inefficient) linear search of the vector holding the URLContent objects.

Enhancement #2: A Graphical User Interface

Add a GUI front end to your search engine that has the following components: This is not as hard as it sounds, but it takes some putting together. Getting your GUI to display web pages and to display search result strings is easy if you use the JEditorPane's setPage and setText methods. There is a good tutorial that demonstrates this and other relevant features (scroll bars, etc); start by reading it at
http://www.javasoft.com/docs/books/tutorial/uiswing/components/simpletext.html.
You can get most of what you need just by trying out and studying the example (TextSamplerDemoHelp.java). Don't forget to import javax.swing.*, javax.swing.text.*, java.awt.*, and java.awt.event.*.

Enhancement #3: An Efficient BFS

The linear search described in #1 to determine whether a URL has already been processed is simple but inefficient. Remedy this by using a hash table instead, which (under favorable circumstances) will give constant instead of linear time lookup. You will need to write a HashTable class that uses the bucket hashing technique described in class and in the text to hash Strings (or URLs, if you prefer). The hash function can be simple but should have a reasonable likelihood of avoiding collisions.

Collaboration

Unlike on most programs in this class, you may discuss this project with other members of the class. You may not EVER take code directly from someone else or turn in work that is substantially someone else's as your own, but you may ask each other questions and work together in developing solutions. Each student should turn in his or her own final product.

What to Turn In

Turn in hardcopy of the following: Also tar your directory and e-mail it to bloss@roanoke.edu. The subject should be cpsc220 prog5.