$ mkdir ~/cs170/labs/lab35 $ cd ~/cs170/labs/lab35
It's time for your final quiz of the semester! Please lock your computers, as this is a closed computer (as well as closed notes and book) quiz. You may move on to today's lab activities when you finish, but keep in mind that I do have a brief lecture as soon as everyone has finished the quiz.
On Monday we wrote hash tables using the linear probing technique. However, you hopefully noticed that the linear probing has an issue with clustering, which causes in unrelated table cells. Today we will go over two additional probing techniques, with the goal of reducing the clustering.
A common question from Monday was also how to handle running out of space in the hash table. We assumed there would always be a location that we could put the key/value pair in. However, this might not always be the case. We will cover table resizing today as well.
Determining when to resize the hash table is a bit of a tricky one. We could wait until the table is completely full, but this would cause relatively long lookup times for almost everything in the hash table. Instead, what is commonly done is resizing the table once some load factor is exceeded.
This load factor is a constant defined value, which is a percentage of the hash table locations that are filled in. This is typically 75% of the size of the table, but can vary depending on implementation.
Copy your hash_table.py file from lab 34 into your current directory.
$ cd ~/cpsc170/labs/lab35 $ cp ../lab34/hash_table.py .
Add a method called resize(self)
, which resizes the hash
table. By default, we will always double the size of the hash table.
You need to create a new hash table with this new size, and re-insert
all of the key/value pairs into their new locations in the new table.
Modify your insert
method, so that the table is resized
when the load threshold is exceeded. You may need to add other method
or attributes to make determining this easier.
>>> my_hash_table = HashTable(5) >>> my_hash_table.insert("Professor", "Scotty") >>> print(my_hash_table.hash_table) [None, None, None, ('Professor', 'Scotty'), None] >>> my_hash_table.insert("Class", "CPSC 170") >>> print(my_hash_table.hash_table) [None, None, ('Class', 'CPSC 170'), ('Professor', 'Scotty'), None] >>> my_hash_table.insert("Room", "Trexler 363") >>> print(my_hash_table.hash_table) [None, None, ('Class', 'CPSC 170'), ('Professor', 'Scotty'), ('Room', 'Trexler 363')] >>> my_hash_table.insert("Wet", "Ceiling") >>> print(my_hash_table.hash_table) [None, None, ('Class', 'CPSC 170'), ('Professor', 'Scotty'), ('Room', 'Trexler 363'), ('Wet', 'Ceiling'), None, None, None, None] >>> my_hash_table.insert("Slow", "Computers") >>> print(my_hash_table.hash_table) [None, ('Slow', 'Computers'), ('Class', 'CPSC 170'), ('Professor', 'Scotty'), ('Room', 'Trexler 363'), ('Wet', 'Ceiling'), None, None, None, None]
Add an attribute called load to the class. This attribute should store the number of key/value pairs currently held by the table. You can use this value to determine when you need to resize your table.
If you create your new table in the hash_table
attribute, you can simply call insert
for all of
the key value pairs in the table. However, you will need to
keep a local copy of the old hash table, so you can find all of
the original key/value pairs.
As we will see in the next section, having the table size be a prime number can be quite advantageous. Our current resizing technique does not guarantee this. As a matter of fact, it will always generate a composite number. Moving to the next prime is also not necessarily good,
.Rewrite the resize structure, so that it finds the first prime larger than 2 * self.size. Your table should use that as the size.
We have used linear probing to determine where in the hash table a key/value pair should appear. However, you have likely noticed a clustering trend. There tends to be large, contiguous locations of key/value pairs in the hash table. This is a problem, because it will cause future collisions in unrelated locations to the problem keys.
There are two additional probing techniques that typically get used. Quadratic probing defines a quadratic equation as the probing sequence. This reduces "primary clustering," but has an issue with "secondary clustering." These are clusters around the secondary locations of the probes. The better technique is known as Double Hashing. In double hashing, a secondary hash function is used to modify the probe sequence.
You will need to write at least two more methods to your
HashTable
class. The first should be
quadratic_probe(self, a_key, probe_iteration)
.
a_key is a string, while probe_iteration is an
integer. This method should return an integer in the range [0,
self.size). It will compute that value using the following equation:
Where h(key) is your hash function, i is your probe iteration, and c1 and c2 are prime constants. For our purposes, use the values 5 and 7 respectively. This method will tell you where to put the key, based on the ith time you have had to generate a probe key.
Write another method called double_hash(self, a_key,
probe_iteration)
. This method will behave much like quadratic
probe, except it will encode the following equation:
Where h1(key) is your original hash function, and h2(key) is a secondary hash function. The value of this second hash function needs to be relatively prime to the size of the hash table. We can
this, by using the following equation for h2(key):Modify your insertion and lookup methods, so you can easily test each of these methods.
>>> my_hash_table = HashTable(5) >>> my_hash_table.insert("Professor", "Scotty") >>> print(my_hash_table.hash_table) [None, None, None, ('Professor', 'Scotty'), None] >>> my_hash_table.insert("Class", "CPSC 170") >>> print(my_hash_table.hash_table) [None, None, ('Class', 'CPSC 170'), ('Professor', 'Scotty'), None] >>> my_hash_table.insert("Room", "Trexler 363") >>> print(my_hash_table.hash_table) [('Room', 'Trexler 363'), None, ('Class', 'CPSC 170'), ('Professor', 'Scotty'), None] >>> my_hash_table.insert("Wet", "Ceiling") >>> print(my_hash_table.hash_table) [None, None, ('Class', 'CPSC 170'), ('Room', 'Trexler 363'), ('Wet', 'Ceiling'), ('Professor', 'Scotty'), None, None, None, None] >>> my_hash_table.insert("Slow", "Computers") >>> print(my_hash_table.hash_table) [None, ('Slow', 'Computers'), ('Class', 'CPSC 170'), ('Room', 'Trexler 363'), ('Wet', 'Ceiling'), ('Professor', 'Scotty'), None, None, None, None]
>>> my_hash_table = HashTable(5) >>> my_hash_table.insert("Professor", "Scotty") >>> print(my_hash_table.hash_table) [None, None, None, ('Professor', 'Scotty'), None] >>> my_hash_table.insert("Class", "CPSC 170") >>> print(my_hash_table.hash_table) [None, None, ('Class', 'CPSC 170'), ('Professor', 'Scotty'), None] >>> my_hash_table.insert("Room", "Trexler 363") >>> print(my_hash_table.hash_table) [('Room', 'Trexler 363'), None, ('Class', 'CPSC 170'), ('Professor', 'Scotty'), None] >>> my_hash_table.insert("Wet", "Ceiling") >>> print(my_hash_table.hash_table) [('Wet', 'Ceiling'), None, ('Class', 'CPSC 170'), ('Room', 'Trexler 363'), ('Professor', 'Scotty'), None, None, None, None, None] >>> my_hash_table.insert("Slow", "Computers") >>> print(my_hash_table.hash_table) [('Wet', 'Ceiling'), ('Slow', 'Computers'), ('Class', 'CPSC 170'), ('Room', 'Trexler 363'), ('Professor', 'Scotty'), None, None, None, None, None]
Partially as a result of the Chinese
remainder theorem, we don't
have to worry about changing our current return value for
generate_hash_code
. We can leave it exactly as it was.
Performing linear probing was pretty easy. However, if we complicate that method a little bit, it might make the insertion method a bit easier to modify. Imagine instead that the linear probing was it's own method:
def linear_probe(self, a_key, probe_iteration): return (generate_hash_code(a_key) + probe_iteration) % self.sizeThen, instead of performing the probe code in insertion, we simply call
linear_probe
in the probing loop. You
will need to do something similar with quadratic and double hashing.
probe(self, a_key, probe_iteration)
,
which calls one of the other probe methods. Then, all you need
to do is call probe in insert and lookup! This is following a
design pattern known as Adapter, which can really make editing
your code a lot easier in the long run.
This challenge is a bit unrelated to the probings, but can really increase performance of the hash table. One of the ways we can improve lookup speed is by reordering the elements in the hash table based on their "popularity." As lookups are performed on the elements of the hash table, we can move them up the probing chain.
Whenever you perform a lookup that had to probe to find the correct value, move the key/value pair one step up the probing chain by swapping it with the value immediately proceeding it.
When you have finished, create a tar file of your lab35
directory. To create a tar file, execute the following commands:
cd ~/cs170/labs tar czvf lab35.tgz lab35/
To submit your activity, go to cseval.roanoke.edu. You should
see an available assignment called Lab Assignment 35
.
Make sure you include a header listing the authors of the file.