CPSC170
Fundamentals of Computer Science II

Lab 9: Debugging and Backtracking

Debugging: hand-trace-like debugging output

Hand-tracing is an effective way to debug a program. As programs get large, it is best to debug (hand-trace) the program function by function. Once a particular function, say F, has been thoroughly debugged, one can go on to debug (hand-trace) the function(s) that call F.

In the last lab, you wrote a function to compute the n-th Fibonacci number. Your function probably looked like:

      // PRE: num = n >= 0
      // POST: RV = the n-th Fibonacci number, 
      //              where the 0-th Fibonacci number = 1
      //              and the 1st Fibonacci number = 1
      int fibo (int num) {
        int answer;
        if (num <= 1) {
          // ASSERT: n = 0 or n = 1
          answer = 1;		 
          // ASSERT: answer = 1
        }
        else {
          // ASSERT: n > 1
          answer = fibo(num-1) + fibo(num-2);
          // ASSERT: answer = fibo(n-1) + fibo(n-2)
        }
        // ASSERT: answer = the n-th Fibonacci number.
        return (answer);
      }

A hand-trace of the function call:

      int aNumber = 3;
      int theAnswer = fibo(aNumber);

will tell us that the function is called with the value 3. Thus, the function is recursively called with the value 2, and when that incarnation returns, the function is recursively called with the value 1. The incarnation that was called with the value 2, will first recursively call the function with the value 1, and when that incarnation returns, the function is recursively called with the value 0. And so on. This can be concisely written as a trace of the function calls and their returned values:

      fibo(3)     <- the first call
      calls fibo(2)
      calls fibo(1)
      fibo(1) returns 1
      calls fibo(0)
      fibo(0) returns 1
      fibo(2) returns 2
      calls fibo(1)
      fibo(1) returns 1
      fibo(3) returns 3

One has to carefully look at this trace to see the order in which the recursive functions were called, and then to see which incarnation returned which value. A better way to hand-write this trace using indentation would be:

      fibo(3)     <- the first call
        fibo(2) called
          fibo(1) called
          fibo(1) returns 1
          fibo(0) called
          fibo(0) returns 1
        fibo(2) returns 2
        fibo(1) called
        fibo(1) returns 1
      fibo(3) returns 3

We get a better sense of the order of the function calls and their corresponding return values. Essentially, the indentation (number of spaces) helps up determine which function was called at which level. One could say that the first call to fibo(3) is a level 0 (thus 0 spaces of indentation), the next call to fibo(2) is at level 1 (thus 2*1 = 2 spaces of indentation), etc. Below is the above trace annotated with the levels of the calls:

      fibo(3)                    <- level 0
        fibo(2) called           <- level 1
          fibo(1) called         <- level 2
          fibo(1) returns 1      <- level 2
          fibo(0) called         <- level 2
          fibo(0) returns 1      <- level 2
        fibo(2) returns 2        <- level 1
        fibo(1) called           <- level 1
        fibo(1) returns 1        <- level 1
      fibo(3) returns 3          <- level 0

It is always useful to compare our hand-trace with what the program actually does when it is run. We can use debugging output statements to accomplish this. Consider the following version of the function fibo:

      // PRE: num = n >= 0
      // POST: RV = the n-th Fibonacci number, 
      //              where the 0-th Fibonacci number = 1
      //              and the 1st Fibonacci number = 1
      int fibo (int num) {
        cout << "fibo(" <<  num << ") called" << endl;
        int answer;
        if (num <= 1) {
          // ASSERT: n = 0 or n = 1
          answer = 1;		 
          // ASSERT: answer = 1
        }
        else {
          // ASSERT: n > 1
          int previous1 = fibo(num-1);
          int previous2 = fibo(num-2);
          answer = previous1 + previous2;
          // ASSERT: answer = fibo(n-1) + fibo(n-2)
        }
        // ASSERT: answer = the n-th Fibonacci number.
        cout << "fibo(" << num << ") returns " << answer << endl;
        return (answer);
      }

Try running a program with a suitable function main and the above fibo function so that the program outputs the 3-rd Fibonacci number. What do you see as the trace? Is it helpful?

As seen above, it would have been more useful if the debugging output was indented according to the level of the function calls. One way to accomplish this is to pass the level of a function call as an additional parameter to the function. Then, write a function printSpaces that gets the level number as a parameter and prints the appropriate number of spaces to indent the debugging output for that function call. Modify your program to print the n-th Fibonacci number so that the debugging output looks like the last version of the trace shown above (you do not need to print out the annotations, e.g., <- level 0).

Note that just such debugging output is not sufficient for efficient debugging. There is no substitute for a careful hand-trace before running a program to verify if the program will do what you expect it to do.

Backtracking: a technique for exhaustive search

Consider the following problem:

Write a C++ program that reads from the user a set of (distinct) integers and prints out a subset of these integers so that the sum of the integers in the subset is half the sum of the integers in the entire set. If no such subset is found the program prints out a suitable message. You may assume that the first number the user enters is the number of numbers in the set of integers, and that the number of integers in the user-given set is no more than 20.

We will walk through the process of developing a program, and see how recursion and the technique of backtracking can be used effectively, to solve this problem.

The first step in solving this problem is to understand the problem, and working out some examples will help. These examples can then become our test cases for our program.
Clearly, we can see that if the sum, of the entire set of integers given to us, is odd, then there cannot be a solution to the problem. So, let us first construct such an example. Consider the set of integers: 3 5 8 2 1. The sum of the integers is 19, and thus odd. So, no solution can be found for this instance. We can go ahead and create an input file for this instance. Test1.txt is this input file. We can also start our TestCases.txt file.
Since the user can give us a set of integers, 0 and negative numbers may be a part of the set too. Construct an example that contains only negative numbers such that the sum of the numbers is odd. Create the file Test2.txt, and add to the file TestCases.txt an explanation for this test.
It will be useful to also test the program with a set of integers that contains 0, positive integers and negative integers, and such that the sum of the set of integers is odd. Construct such an example, create the file Test3.txt, and add to the file TestCases.txt an explanation for this test.
Now, we can work out examples for sets of integers where the sum of the integers is even. Is it possible to have a set of positive integers whose sum is even, say S, and such that there is a subset whose sum is S/2? Consider the set: 4 1 2 3. In this case the sum of the integers is 10, and the subsets 4, 1 and 2, 3, each have a sum of 5. Our program needs to output any one of these subsets. Create a test file Test4.txt, and include the explanation in the TestCases.txt file. For example, the explanation could be:
```
	4. Test for a set of positive integers whose sum is even.
           Input set: 4 1 2 3
           Expect that the program prints out "4 1" or "2 3".
           Test file: Test4.txt
      
```
We should also test for a set of positive integers whose sum is even, say S, a subset whose sum is S/2 exists, and the number of integers in the set is odd. Construct such an example, create the file Test5.txt, and include an explanation in the file TestCases.txt.
Similar to the above two examples, we should test for sets that have only negative numbers. Construct such examples, create the files Test6.txt and Test7.txt for these examples, and include the explanations in the file TestCases.txt.
Now construct an example with a set of integers that contains 0, positive integers and negative integers, such that the sum of the set of integers is even, say S, and such that there is a subset whose sum is S/2, create the file Test8.txt, and include an explanation in the file TestCases.txt.

Construct an example with a set of integers that contains 0, positive integers and negative integers, such that the sum of the set of integers is even, say S, such that there is a subset, with at least 3 elements, whose sum is S/2, and such that the elements of the subset are distributed in the original set, i.e., one element at the beginning of the original set, one in the middle, and one at the end. Create the file Test9.txt, and include an explanation in the file TestCases.txt.
Is it possible to have a set of positive integers whose sum is even, say S, and such that there is no subset whose sum is S/2? Consider the set: 4 5 6 1. The sum of these integers is 16, but there is no subset that will add up to 8. Thus there is no solution for this instance. Create the file Test10.txt, and include an explanation in the file TestCases.txt.
Similar to the above, construct one example with only negative numbers, and one example with 0, negative numbers and positive numbers, such that, in each case, the integers add up to an even number, say S, and there is no subset whose sum is S/2. Create the files Test11.txt and Test12.txt, and include the explanations in the file TestCases.txt.
Are there other cases that need to be considered? If so, construct those examples, the corresponding input test files, and include explanations in the file TestCases.txt.

Now that we have an exhaustive collection of test cases, covering all the possible scenarios, we can start working on the solution. We know we have to read in the set of integers, and then find if there is a subset whose sum is half that of the sum of the original set. We need to have one array to store the user-given set, and an integer to store the number of integers in this set. We also need to store the solution subset, if any, and the number of integers in the solution subset. Finally, we need a boolean variable to indicate if a solution subset was found. So, we can write our function main as:

        // PRE: IS contains n, 0 < n < MAXNUMBERS, followed by
        //        a1, a2, ..., an
        // POST: OS contains b1, b2, ..., bk such that
        //        {b1, b2, ..., bk} subset of {a1, a2, ..., an} and
        //        SUM{b1, b2, ..., bk} = SUM{a1, a2, ..., an}/2.	       
        //       If such a subset does not exist, 
        //        OS contains "No solution found."
	int main () {
          int Set[MAXNUMBERS];  // holds the user-given set
          int numInSet;         // holds the number of elements in the
                                // user-given set
	  readSet (Set, numInSet);
          // ASSERT: numInSet = n				       
          //         Set[0] = a1, Set[1] = a2, ..., Set[n-1] = an.
          printSet (Set, numInSet);
          // ASSERT: OS contains Set[0]..Set[numInSet-1]

          int subset[MAXNUMBERS]; // holds the solution subset, if any
          int numInSubset = 0;    // holds the number of elements in the
                                  // solution subset, if any.
                                  // Currently 0 elements.
          bool foundSolution = false; // true iff there is a solution
                                      // for the given instance
	  findSubset (Set, numInSet, subset, numInSubset, foundSolution);
	  // ASSERT: if foundSolution is true then  
          //            numInSubset = k, 0 < k < n and
          //            subset[0]..subset[k-1] is defined and
          //            subset contains elements from Set and
          //            SUM(subset) = SUM(Set)/2

          if (foundSolution) {
            printSet (subset, numInSubset);
            // ASSERT: OS contains subset[0]..subset[numInSubset-1]
          }
          else {
	    cout << "No soulution was found" << endl;
          }
          return (0);
	}

If each of the functions readSet, findSubset, and printSet are written so that the assertions b written after each of these calls is true, then our program above is guaranteed to work correctly.

Write the above function main in a file, say subset.cc. Add all the necessary include pre-processor directives. Write a file subsetHelper.h that contains the declarations (just the headers), along with precise pre and post conditions for each, for each of the three functions readSet, findSubset, and printSet. Start a suitable Makefile that, for now, contains a target for subset.o and an executable called subset.

Write a file subsetHelper.cc and add the definitions for the functions readSet and printSet. Add all the necessary include pre-processor directives in this file. Do you need the constant MAXNUMBERS to be accessible in this file? Do you need it accessible in the file subset.cc? Where would you add the pre-processor directive to define this constant? Make the appropriate additions. Add to the Makefile as necessary.

At this stage, only the function findSubset has not been defined. So, if you comment out the code in the function main from the declaration of subset until the return statement, then you should be able to use the Makefile to create the exectable subset, and run it. The program should simply output the user-given set. Make sure this is the case.

Now we can start working on our algorithm for findSubset. The purpose of findSubset is to compute a suitable subset, from the user given set. (You should already have precise pre/post conditions for this function that clearly state what the function needs to accomplish.) As we noted earlier, if the sum of the elements of the user-given set is an odd number, then there is no solution for the given instance of the problem. So, we could start our function as:
```
    void findSubset(int given[], int numInGiven,
                    int solution[], int & numInSolution,
                    bool & foundSolution) {
      int givenSetSum = Sum(given, numInGiven);
      // ASSERT: givenSetSum = SUM (given)
      if (odd(givenSetSum)) {
        foundSolution = false;
      }
      else {
        // ASSERT: SUM(given) is even.
        //         Thus, a solution may be possible.
      }
    }
  
```
Why is the parameter numInSolution passed by reference? Why is the parameter foundSolution passed by reference?
Now we can consider the process for finding a solution when the sum of the given set is even. Since we have already computed the sum of the given set and stored it in the variable givenSetSum, our task is now to find a subset whose sum is givenSetSum/2, say S - we will call this value the target sum.

If we could somehow consider every possible subset of the user-given set, we could easily check if any of the subsets has the required target sum. A methodical way of constructing each possible subset of the user-given set is to look at two possibilities for each element of the user-given set: (1) include it in the subset, and (2) not include it in the subset. Given a current partial subset, say P, (having included some of the elements at indices less than i, we will include a number, say at index i, in the subset, and then try to include some of the numbers from the rest of the user-given set (i.e., starting at index i+1. If we could not find a solution, then we conclude that any subset that includes elements of P and the number at index i does not give us a required solution. So, we remove the number at index i from our subset and then try to extend P using elements from the user-given set at indices greater than i. This is the essence of the technique of backtracking - we first included the number at index i and then backtracked on our decision.

Suppose the next number to be considered for inclusion in our solution (subset) is p. If this number p is less than or equal to S, then we can add this number to the solution (because there is still hope that there is a solution), and try to find a solution in the rest of the array, but for the target sum S - p. This looks exactly like what we were doing when p was the next number to be considered, and S was the target sum. This suggests using recursion as the programmatic strategy. Moreover, the array size has reduced (since we don't need to consider p), so in the recursion we will be considering a smaller instance; this is a good, desirable thing, because this suggests that the recursion will eventually end with an empty instance.

So, it seems that it will be useful for the function, that calls itself recursively, to have access to the index of the next number in the set (or index of the next number in the array) to consider including in the solution. The first "incarnation" of this recursive function should start by considering the element at index 0 in the array with the target sum of S. Our function findSubset does not have these useful quantities as parameters, so it is useful to call a helper function that will recursively call itself with these additional parameters.

We can now add to our function findSubset thus:
```
    void findSubset(int given[], int numInGiven,
                    int solution[], int & numInSolution,
                    bool & foundSolution) {
      int givenSetSum = Sum(given, numInGiven);
      // ASSERT: givenSetSum = SUM (given)
      if (odd(givenSetSum)) {
        foundSolution = false;
      }
      else {
        // ASSERT: SUM(given) is even.
        //         Thus, a solution may be possible.
        int targetSum = givenSetSum/2; // Half of the sum of original set
        int indexToConsider = 0;       // Index of first numebr to consider
                                       //  to include in the solution
        findSubsetHelper (given, numInGiven,
                          solution, numInSolution,
                          foundSolution,
                          targetSum, indexToConsider);
        // ASSERT: foundSolution is true iff there is a subset, say T, of 
        //          given[indexToConsider] .. given[numInGiven-1]
        //          such that the sum of the subset is targetSum.
        //          If foundSolution is true, then the subset T is included
        //          in the array solution (at the end of the array)
      }
      // ASSERT: foundSolution is true iff there is a solution to the 
      //           given instance, and a solution subset is          
      //           solution[0]..solution[numInSolution-1]
    }
  
```
Consider the user-given data in Test4.txt. If we could look at the calls as they were made when the program is executed for this data, we expect that the call to findSubset would look like
```
      findSubset({4, 1, 2, 3}, 4, {}, 0, false);
    
```
The {} indicates that we sent an empty array as the parameter solution, that numInSolution is 0 and foundSolution is false when the function is called. This function would then compute givenSetSum to be 10, and thus, execute the else part of the conditional. There, targetSum would be computed as 5, and we would make the function call
```
      findSubsetHelper ({4, 1, 2, 3}, 4, {}, 0, false, 5, 0);
    
```
i.e., look for a subset, starting at index 0, whose sum is 5.

Now, since the index is 0, the next number to be considered is 4. So, we include 4 in the solution, and we would like the helper function to be recursively called as:
```
      findSubsetHelper ({4, 1, 2, 3}, 4, {4}, 1, false, 1, 1);
    
```
Note that we included 4 in the solution, updated the target sum, and also updated the index of the next number to be considered.

Using the same logic as above, we would now like the function to be recursively called as:
```
      findSubsetHelper ({4, 1, 2, 3}, 4, {4, 1}, 2, false, 0, 2);
    
```
Verify that this is what we would like to happen.

Note that in this recursive incarnation of the function, we are looking for a subset with a zero sum, i.e., the subset we have found is correct! So, we can set foundSolution to be true to indicate this, and the function is done with its work.

Suppose the user-given input is {4, 5, 6, 1}, verify that the trace of the calls that we would like is:
```
      findSubset({4, 5, 6, 1}, 4, {}, 0, false);
      ...
      findSubsetHelper ({4, 5, 6, 1}, 4, {}, 0, false, 8, 0);
      ...
      findSubsetHelper ({4, 5, 6, 1}, 4, {4}, 1, false, 4, 1);
      ...
      findSubsetHelper ({4, 5, 6, 1}, 4, {4, 5}, 2, false, -1, 2);
      ...
      findSubsetHelper ({4, 5, 6, 1}, 4, {4, 5, 6}, 3, false, -7, 3);
      ...
      findSubsetHelper ({4, 5, 6, 1}, 4, {4, 5, 6, 1}, 4, false, -8, 4);
      
    
```
Note that at this stage, the function is given a target sum that is not zero, and the function is expected to consider the number at index 4. Nonetheless, index 4 is beyond the user-given input, since the user-given set has only 4 elements (indexed 0 through 3). Thus, the function finds that with the current solution set (i.e, {4, 5, 6, 1}), it is not possible to find a solution, and it completes, i.e., returns to the calling incarnation of the function. The calling incarnation had added the 1 to the solution before calling the failed incarnation. So, the calling incarnation removes the 1 from the solution to see if that allows the next incarnation to succeed. Thus, after the last function call in the above sequence will return having failed, the previous incarnation will call another incarnation as:
```
      findSubsetHelper ({4, 5, 6, 1}, 4, {4, 5, 1}, 3, false, -2, 4);
    
```
In other words, each incarnation tries to find a solution in the two cases: (1) by including the number it is to consider, and (2) by not including the number it is to consider. This strategy must succeed since a solution, if it exists, is essentially some numbers in the given set included in the solution and some numbers not included.

In the above case, each incarnation of findSubsetHelper will fail for each of the two possibilities (including or not including the number corresponding to indexToConsider), and thus the function findSubset will complete with a value of false in foundSolution.

Work out a similar "hand-trace" of calls to findSubset and findSubsetHelper for one of your test cases where the sum of the given set is even, and there is a solution, and for one test case where the sum of the given set is odd, and there is no solution.
Write the function findSubsetHelper (header with pre/post in the .h file and the definition, i.e., header and body, in the .cc file). Include cout statements and/or suitable function calls at the beginning of the functions findSubset and findSubsetHelper so that your program prints out the values of all the parameters to the function. This way you can compare your program's output to the "trace" you hand-wrote (in the item above) when you run your program on the same inputs.

Add any additional include pre-processor directives you may need in any of your program files, update the Makefile as necessary, and then make the executable.

Run your executable to ensure that the parameters printed out match your hand-trace (from the item above). If the traces don't match, check your hand-trace from the item above, and then hand-trace your program to understand where, and why, the program deviates from your expected hand-trace.