CPSC 220 Fall 2003
Program 6: Assorted Improvements
In this final part of the project you will incorporate a number of
improvements into your search engine, including the following:
- Make the items in return list of URLs actual links.
- Count the number of references to each page and report this number along
with the relevance (but continue to rank by relevance).
- Search for links in both HREF and SRC attributes.
- Fix the problem of using a URL with an empty file as a base URL (that is,
make it work for these). Also treat http://x and http://x/ as the same URL.
- Provide a button on the results page
that allows the user to sort current results by
relevance or by # of links.
- Avoid re-processing web pages whenever possible as follows:
- Modify your servlet so that if the user does not provide
an initial URL, it uses a default URL. (This can be a default string in
the URL box if you wish.) The processed pages from this URL
should be retrieved from a file -- they should not have to be generated
in response to the query. Think about how to gracefully and efficiently
handle requests for the default page but for a higher or lower page limit.
- Provide an "Update Database" button that re-processes and re-stores
the pages starting from the default URL.
- When returning query results, also provide (on the same page)
a text box for another query. A query submitted from here should
use the same URL and page limit as the original query (not necessarily
the default), and should similarly
return a page allowing another query.
- There are many other possible improvements, any of which you can
undertake for extra credit. The amount of additional credit will depend
on the difficulty and impact of the feature. Be sure to clearly document
any such improvements that you undertake.
What to Turn In
Turn in hardcopy for any new class and for any previous classes that
you modified. You do not need to e-mail your code to me, as I will view
it as a servlet.