CPSC 430 Spring 2007
HW 11: Too Many Books!

Background

As a longtime resident of SmallTown, USA, you try to support the community however you can. So when the library burned to the ground, with no insurance money to replace the lost books, you helped town leaders work with publishers to get digital versions of many of the books so that local residents could access them online. But the publishers were so cooperative that you now have more "books" than you can fit in your database, and there is no money to acquire new storage. Sadly, commonly used software tools such as gzip are not available in SmallTown.

SmallTown has asked you to write a compression program for the text files that hold these books. They have no money to pay you directly, but have offered free municipal parking for all of your employees for one year. Since this is an employee benefit that usually costs you about what you would charge for this job, you feel that it's a reasonably good deal and have agreed to take on the project.

Assignment

Help out the SmallTown library by implementing the Lempel-Ziv text compression algorithm that we discussed in class. Your program should take at the command line the name of the file to be compressed and should write a compressed file of the same name with a .lz extension.

Of course, the library also needs a decompression program, but you're not sure you'll have time to write both. Fortunately, they found a decompression program written in Java called LZdecompress that was donated to the rummage sale last month. It didn't sell -- it wasn't much use to anyone without the corresponding compression program, which had been lost -- so you can have it. A scrawled note on the cover says that the decompression program assumes Lempel-Ziv compression and that each triple is represented with 24 bits, allocated as follows:

It takes (at the command line) the name of the file to be decompressed, verifies that it has a .lz extension, then writes an uncompressed file of the same name without the extension.

You figure it's worth a try. If you can't get it to work with your compression program, you can always write your own.

Follow Up

Recognizing that you really bailed them out, SmallTown is thinking about trying to get access to utilities such as gzip. However, they are skeptical of their quality, and wonder if they're better off hiring you whenever they need something. Help them make their decision by comparing the performance of your program to gzip on cs. Look at how much compression is achieved and how long the compression process takes for files of widely varying sizes. Present and interpret your results.