CPSC 220 Fall 2003
Program 5: Links and a Web Interface
In this part of the project you will enhance your search engine by adding
two features that you have already worked on in lab:
- Have the search engine follow links from an initial web page instead of
giving it a list of URLs to process.
This will draw on the work you
did in the NetTest program in the second lab. Do a breadth-first search
from the initial page, that is, search each of the pages linked
to from the initial page before you search any page linked to from one of
those pages, and so on. Since the same page could be linked to
from multiple places,
you'll have to be sure you don't visit the same page twice. You
can't mark the pages themselves, so you'll have to note which pages
you have already visited. For now, you can use the Java Hashtable class
to store urls that have been visited and look up candidate urls.
Remember that real
web pages are often poorly formed with missing
quotes, missing end tags, and so on. Be sure that you handle these
situations gracefully, extracting as many links as possible but most of
all not allowing your program to crash.
- Provide a web interface that allows the user to enter a query, an
initial URL, and the maximum number of pages to search, and
returns a page with the list of hits in order of relevance. Your interface
should be simple but attractive and easy to use. The returned page
should display the three pieces of information given along with the
actual number of pages searched and the results. The results should
be displayed in an HTML table, and the entire page should be nicely formatted.
Feel free to use colors, fonts, and so on to improve the appearance of
your page.
Your web interface
should be in file search.html in your public_html directory.
WARNING: Do not wait too long to get started on this!! The servlet
engine still is not thoroughly tested, and I cannot guarantee that it will
be available at all times. As the DMV says: Expect delays!
What to Turn In
Turn in hardcopy for each of your new classes, and any previous classes that
you modified. You do not need to e-mail your code to me, as I will view
it as a servlet.