Lecture 33 - Paths


As usual, create a directory to hold today's activities:

$ mkdir ~/cs170/labs/lab33
$ cd ~/cs170/labs/lab33

Linked Graphs and Shortest Paths

This week, we have been representing graphs using adjacency matrices. That is one of three common ways to represent a graph in our programs. Another common format is related to trees: linked structures. Today, you will implement a graph using a linked structure, as opposed to the usual adjacency matrix.

We are also finally going to solve our shortest path problem. On Wednesday, we computed minimum spanning trees. These trees contained every node in the graph, and there was exactly one path between any two arbitrary nodes. The minimum spanning tree was constructed in such a way that the total sum of all of the paths was minimized. However, we are not usually worried about the total sum of all of the paths. We usually are only worried about the length of just one path.

One of the algorithms for computing the shortest path from an arbitrary starting node is known as Dijkstra's algorithm. This algorithm even computes the shortest path to every node in the graph, not just an arbitrary one.


Lab Activity 1
Linked Structures and DFS

An adjacency matrix can very easily represent the structure of a graph. Commonly, that is the most important part of the graph. There are occasions where what each vertex represents is important. It is somewhat difficult to represent data in the node with our adjacency matrix: all of the vertex labels must the integers. The common way around this is to create objects that represent verticies, very similar to how we represented a linked list using nodes.

Creating such a structure is very similar to building the adjacency matrix. Working with them is very different. To get some easier experience working with the linked structures by writing an in-order depth-first traversal of the graph.

Details

Download the linked_graph.py file. This file contains a class called Graph, which reads in the typical test files for our graphs. You should read over the code in this file carefully, to make sure you understand exactly how it is representing the graph internally. Do not move on until you understand where and how the edges are being stored.

Add a method called compute_dfs(self, curr_vertex_label). This method takes a string representing the label of a node to start at. It should return a string representing the depth-first traversal of the specified graph. You may add any additional parameters to this function as you need to accomplish your goal. Make sure you update the test cases as necessary.

Example

There are using the same test case files from Wednesday. You should copy those files into this directory. Also note that your results might vary from mine. Dictionaries are unordered structures, so the order that you traverse edges might be different. Draw out the graphs by hand, and make sure your output matches a DFS of the graph.

$ cp ../lab32/test_files.zip .
$ unzip test_files.zip
>>> tree = Graph("test_files/tree.txt")
>>> print(tree.compute_dfs("0"))
0 1 3 4 2 5 6
>>> print(tree.compute_dfs("4"))
4 1 0 2 5 6 3
>>> weighted = Graph("test_files/weighted.txt")
>>> print(weighted.compute_dfs("0"))
0 1 3 2 4 5
>>> print(weighted.compute_dfs("4"))
4 0 1 3 2 5

Hint

  • When we performed a DFS on a tree, we didn't have to worry about cycles. However, graphs can have cycles. You need to keep track of nodes you have visited some how. The easy way to accomplish this is to pass a list of the visited verticies to your method, and only explore those verticies if they haven't been visited before. This means you need to write a helper method for dfs, which takes the label of the current vertex as well as the list of all visited verticies.

  • The depth-first traversal algorithm is inherently recursive. However, you don't have to write it recursively. You could use a Stack object and a loop to accomplish the same thing. The vertex on the top of the stack is your current node to explore. You pop that vertex off, and add all of its unexplored neighbors to the top of the stack.

 

Challenge

The formal definition of a graph G = (V, E) specifies that a graph is described by the set of verticies and the set of edges. However, our current list structure does not represent the edges in a easily accessible format. This isn't necessarily a problem, but Prim's algorithm would be a little bit more cumbersome using this representation.

Add a method called prim(self, starting_vertex_label) to this new version of the Graph class. It should compute the minimum spanning tree, just as we computed it on Wednesday. Are there any changes you could make to this version of the Graph class, to make it easier to write Prim?


Lab Assignment 33
Dijkstra's Shortest Path Algorithm

The shortest path problem is one that has a myriad of uses. It can be used for route planning, such as determining the shortest route from Roanoke to Richmond. It can also be used to determine the cheapest route, if tolls for roads are used as weights. Finding these shortest paths was a major issue for quite a while in graph theory. A mathematician named Edsger Dijkstra developed a decently efficient mechanism for computing not just the shortest path from a source to a destination, but the shortest path from a source to EVERY destination. I'll describe the algorithm today in class. For now, we will only be worried about the length of the shortest path, not the actual paths.

Details

Add a method to the graph class called dijkstra(self, starting_vertex_label). This method takes a string representing the label of a node to start at. This method will not return anything. Instead, it will modify the dictionary of verticies, so that it will contain the required distances.

There are two key things you need to do in Dijkstra's algorithm. First, you need to add an attribute called distance for all of the verticies in the graph. You can do that IN the dijkstra method, as opposed to the constructor for Verticies. The other is you need some mechanism for specifying infinity. This can be done using the float('inf') function call.

Dijkstra's algorithm starts by initializing the distances to all of the verticies to be infinity, except for the starting vertex. This vertex gets a distance of 0. It also initializes a list of all of the 'visited' verticies as an empty list. The main part of the algorithm will find the unvisited node with the smallest current distance. It will mark that vertex as visited, and then iterate over all of the edges of that vertex. If the distance stored for the current node plus the distance to the destination of the edge is less than the shortest known distance to the destination vertex, you simply update the distance for the destination vertex. You continue this process until all of the verticies are visited.

Example

Again, your order might be different. However, your values should be the same in this case.

>>> def print_distances(vertex_dictionary):
...    for label, vertex in vertex_dictionary.items():
...        print(label, vertex.distance)
...
>>> weighted.dijkstra("0")
>>> print_distances(weighted.verticies)
1 0.2
0 0
3 0.7
2 0.3
5 0.1
4 0.4
>>> weighted.dijkstra("4")
>>> print_distances(weighted.verticies)
1 0.6000000000000001
0 0.4
3 0.7
2 0.7
5 0.5
4 0

Hint

  • Python is what is known as a "Prototype" based object oriented language. This means that you can add attributes to an object at any point of the execution. So, if you have a Vertex v, you can state v.some_attribute = 7 to add an attribute to that object at run time. You will need to do this in the dijkstra method, so that every vertex has a distance attribute only when dijkstra is called.

  • You are going to have to repeatedly find the unvisited vertex with the lowest distance. While you can write this code within the dijkstra method, you should really make this its own method. Write a method called get_unvisited_min(self, visited), which takes your list of visited verticies as a parameter. It will simply iterate over the self.verticies dictionary attribute, and return the label of the unvisited node with the smallest distance.

 

Challenge

The specification above only wants to know the shortest distance. It doesn't specify that you need to return the shortest path. Modify your code so that you can compute the actual shortest path. This sounds complicated, but if you create new attributes for the verticies you should be able to easily reverse engineer the path.

 

Challenge

This specification of Dijkstra's algorithm is O(V2). This is largely due to the fact that we have to find the vertex with the lowest current distance, which is currently done in O(V) time. A better solution would be to use a more complicated data structure to perform this action in O(log V) time.

The better solution uses a priority queue. Whenever you update the distance to a vertex, you add it to the queue with its currently known shortest distance as its priority. Whenever you go to get the unvisited min, you simply dequeue the element from the front of the queue. A vertex might appear multiple times within the queue! Make sure you handle this case gracefully.


Submission

When you have finished, create a tar file of your lab33 directory. To create a tar file, execute the following commands:

cd ~/cs170/labs
tar czvf lab33.tgz lab33/

To submit your activity, go to cseval.roanoke.edu. You should see an available assignment called Lab Assignment 33. Make sure you include a header listing the authors of the file.


In-Class Notes


Last modified: Fri Apr 11 12:50:47 EDT 2014