Huffman Coding in ML
Recall that
Huffman coding is a compression technique that uses a variable-length code
constructed from the frequency counts of the objects being represented.
In this program you will use ML to build a Huffman tree for a given
list of (item, frequency) pairs.
Your top-level function huffman : ('a * int) list -> 'a Hufftree
should take a list of (item,frequency) tuples and return a Huffman
tree of items built using those tuples. Some things to think about:
- You will need to define a Hufftree data type to represent a Huffman tree.
A Huffman tree needs to hold an integer at each internal node and
an integer and the character to be encoded at each leaf.
- The strategy in building a Huffman tree is to initially turn each
tuple into a single-node Huffman tree, then combine the two trees with the
lowest values (frequencies) into a new tree. The value stored at the root
of the new tree is the sum of the values of the two trees from which it is
formed. This new tree is then put into list of trees, and again the
two trees with the lowest values are combined, and the result reinserted into
the list. This is repeated until the
list holds a single tree. Note that implementing this is easy if you keep
the list of trees sorted by value and maintain sorted order when
inserting the newly formed trees. You cannot assume that the initial list of tuples is sorted!
- Note that when you reload your datatype definition it becomes a new type,
so functions that were defined for the old type won't work until they
are reloaded as well. This can be confusing until you get used to it.
- Be sure to use good ML style, including pattern matching and
higher-order functions as appropriate.
The ML interpreter does not display the tree nicely, so you can use
my pretty-printer to display your tree after you
build it. This code works with my Hufftree datatype; you may
need to modify it
to work with yours.
Note that your huffman function takes tuples holding any kind
of element, but the pretty-printer assumes that it holds characters.
Keep your huffman function general, but I will limit large test cases
to characters.