CPSC 220 Fall 2002
Program 5: Enhancing your Search Engine
Overview
In this part of the project you will enhance your search engine. You
must do Enhancement #1 and at least one of Enhancements #2 and #3;
you may do the other
as well for extra credit. You may also enhance your engine in other
ways for possible extra credit.
Enhancement #1: "Marking" Nodes in the BFS
In Program 4, you essentially used a breadth-first search of the Web to
find web pages to scan. However, you were not required to "mark" pages
that you had visited, so when a page that had already been processed was
dequeued it would be processed again. Clearly, this is unacceptable.
Modify your program so that a) it does not enqueue a URL if it has already
been processed, and b) when it dequeues a URL, it checks to see if it
has already been processed before processing it again. One way to
do this is by using a
simple (if inefficient) linear search of the vector holding the
URLContent objects.
Enhancement #2: A Graphical User Interface
Add a GUI front end to your search engine that has the following components:
- An input box where the user types a query.
- A display area where the URLs that are most relevant to the query are
listed.
- Another display area where the web page with the highest
relevance is displayed.
- Another input box where the user can enter a URL (presumably taken
from the list displayed) and the corresponding web page will replace the
one currently in the web page display area.
This is not as hard as it sounds, but it takes some putting together.
Getting your GUI to display web pages and to display search result
strings is easy if you use the JEditorPane's setPage and setText methods.
There is a good tutorial that demonstrates this and other relevant
features (scroll bars, etc); start by reading it at
http://www.javasoft.com/docs/books/tutorial/uiswing/components/simpletext.html.
You can get most of what you need just by trying out and studying the example
(TextSamplerDemoHelp.java).
Don't forget to
import javax.swing.*, javax.swing.text.*, java.awt.*, and java.awt.event.*.
Enhancement #3: An Efficient BFS
The linear search described in #1 to determine whether a URL has already
been processed is simple but inefficient. Remedy this by
using a hash table instead, which (under favorable circumstances) will
give constant instead of linear time lookup. You will need to write
a HashTable class that uses the bucket hashing technique described
in class and in the text to hash Strings (or URLs, if you prefer).
The hash function can be
simple but should have a reasonable likelihood of avoiding collisions.
Collaboration
Unlike on most programs in this class, you may discuss this project with
other members of the class. You may not EVER take code directly from
someone else or turn in work that is substantially someone else's as your
own, but you may ask each other questions and work together in developing
solutions. Each student should turn in his or her own final product.
What to Turn In
Turn in hardcopy of the following:
- Every class that you wrote for the project that you are using in
this final version.
- A list of the classes that you are using from the cs220proj package.
- A list of enhancements that you made to your search engine that you
would like to submit for extra credit.
- A list of known bugs in your search engine. Bugs that I find that
are not on this list will be penalized more heavily than those that are.
Also tar your directory and e-mail it to bloss@roanoke.edu.
The subject should be
cpsc220 prog5.