CPSC 430 Spring 2007
HW 11: Too Many Books!
Background
As a longtime resident of SmallTown, USA, you try to support the community
however you can. So when the library burned to the ground, with no insurance
money
to replace the lost books, you helped town leaders work with publishers
to get digital versions of many of the books so that local residents
could access them online. But the publishers were so cooperative that
you now have more "books" than you can fit in your database, and there is
no money to acquire new storage. Sadly, commonly used software tools
such as gzip are not available in SmallTown.
SmallTown has asked you to write a compression program for the text files
that hold these books. They have no money to pay you directly, but
have offered free municipal parking for all of your employees for one year.
Since this is an employee benefit that usually costs you about what you would
charge for this job, you feel that it's a reasonably good deal
and have agreed to take on the project.
Assignment
Help out the SmallTown library by implementing the
Lempel-Ziv text compression algorithm that we discussed
in class. Your program should take at the command line the name of the
file to be compressed and should write a compressed file
of the same name with a .lz extension.
Of course, the library also needs a decompression program, but you're not
sure you'll have time to write both.
Fortunately, they
found a decompression program
written in Java called LZdecompress that was
donated to the rummage sale last month. It didn't
sell -- it wasn't
much use to anyone without the corresponding compression program,
which had been lost -- so you can have it.
A scrawled note on the cover says that the decompression
program assumes Lempel-Ziv compression and
that each triple is represented with 24 bits,
allocated as follows:
- 13 bits for the pointer
- 4 bits for the length of the matched phrase
- 7 bits for the next character
It takes (at the command line)
the name of the file to be decompressed, verifies
that it has a .lz extension, then writes an uncompressed file of the
same name without the extension.
You figure it's worth a try. If you can't get it to work with your compression
program, you can always write your own.
Follow Up
Recognizing that you really bailed them out,
SmallTown is thinking about trying to get access to utilities such as
gzip. However, they are skeptical of their
quality, and wonder if they're better off hiring you whenever they
need something. Help them make their decision by comparing the performance
of your program to
gzip on cs.
Look at how much compression is achieved and how
long the compression process takes for files of widely varying sizes.
Present and interpret your results.