$ mkdir ~/cs170/labs/lab13 $ cd ~/cs170/labs/lab13
Overall, everyone did quite well on the test. The average was an 86, even throwing out statistical outliers. I'll give you a moment to review your test. We are not going to go over the answers in class. If you have questions, feel free to visit me during my office hours.
We have been using Dictionaries to get around the searching problem up to this point. However, We won't always be able to rely on dictionaries to solve out problems. If we have some objects in a list, how can we determine if an item is in the list? How can we determine where an element is in the list? How quickly can we answer those questions?
The most straight forward method of determining if and where some item is a process known as Linear Search. With linear search, we examine every element in the list starting from the front of the list. If we find the element we are looking for, we return the index. If we look through the entirety of the list and don't find the item, we return some value representing the element was not in the list.
In a file called searching.py, write a function
called linear_search(a_list, a_string)
. This function takes two
parameters: a_list: a list of strings and a_string: a string. This
function should return True if a_string is in the list, and False otherwise.
>>> my_list = ["So", "long", "and", "thanks", "for", "all", "the", "fish"] >>> linear_search(my_list, "thanks") True >>> linear_search(my_list, "Ford") False
You should use a for loop to iterate through elements of the list. You can simply return True when you find the element a_string in my_list.
You need to set up a default case, where you have searched through the entirety of the list and have not found the element in question. In this case, you should return False.
Linear search might take a very long time. However, our computers today are fast enough that even with very large lists we will not notice the difference between this and the other search mechanism discussed today.
Add to your program a function that will create a random list of strings of some specified size. Modify your program so that you use increasingly larger lists. Add a comment in your program which lists the point at which linear search starts taking a noticeable amount of time.
Note that for a fair comparison, you are going to need to always spend the most amount of time searching. What is this worst case going to be, given an arbitrary list?
Binary search is a searching technique that leverages additional knowledge about the contents of the list. Namely, it knows that the items in the list are in some order. If we have items in some order, we can leverage this knowledge of the relationships between elements, in an attempt to speed up our search process.
In a file called searching.py (the same file linear search
is in), create a function
called binary_search(a_list, an_string)
. This function
takes two parameters: a_list, an ordered
list of strings; and a_string, a string. This function should
simply return True if a_string is in a_list, and False otherwise.
The steps for binary search are as follows. First, look at the element at the middle of the list. If that item is the string being searched for, you are done. If it is not, then you figure out which half of the list the a_string should appear. If the parameter a_string is less than the middle element, then a_string is in the left half of the list. Otherwise it is on the right. You then continue this process on the appropriate half of the list, until you find a_string. Of course, you also need to stop if you run out of elements to examine.
Make sure you test your code on all of the boundary conditions! It is easy to get the basic gist of this algorithm coded quite quickly, but you need to be able to handle the edge cases well.
To save yourself some time, you can use the sort
method
of lists, to produce an ordered list of elements.
>>> my_list = ["So", "long", "and", "thanks", "for", "all", "the", "fish"] >>> my_list.sort() #Remember, lists are mutable. This sorts in place. >>> binary_search(my_list, "and") True >>> binary_search(my_list, "Ford") False
You should define two local variables in your
function: start
and end
. These
variables are going to represent the beginning and the end
of the list you are considering at this point.
The
arithmetic mean of start and end is the index of the element
you are examining at this iteration of the loop. You should
store this in a variable called mid
.
If the element at the current mid index is greater than the search string, then assign mid - 1 to end. If the element at the current mid index is less than the search string, then assign mid + 1 to start.
You will need to use a while loop for this, as opposed to a
for loop. You will want to stop your loop
when start
is greater than end
.
If you followed the hint, then you probably stored indicies in variables to determine where in the list you were searching. This is the generally accepted algorithm for binary search, which will work in ALL programming languages.
However, there is usually a more Python-centric version of the algorithms we will be discussing in class. I will always show and discuss the general solution. It will be up to you to figure out the Python version.
You already know of a mechanism that will pull a specific sub-set of elements out of a list. Leverage this technique to simplify your binary search program considerably.
The true power of binary search is not that it is able to determine if an element is in the list, but how quickly it can do so. The number of comparisons necessary to determine whether or not an item is in a list should be less than that of linear search. Modify your functions so they output the number of comparisons needed to find an element.
Try your new functions with lists of various sizes. Can you define the pattern that emerges, as a function of the size of the list? Does the element being searched for affect your results? Can you find instances where linear is faster than binary?
When you have finished, create a tar file of your lab13
directory. To create a tar file, execute the following commands:
cd ~/cs170/labs tar czvf lab13.tgz lab13/
To submit your activity, go to cseval.roanoke.edu. You should
see an available assignment called Lab Assignment 13
.
Make sure you include a header listing the authors of the file.