CPSC170
Fundamentals of Computer Science II

Lab 10

Type casting and Working with strings (arrays of characters)


Pre-lab

Casting variables

C++ allows values of one type to be used as a value of another compatible type using the mechanism of casting. We will discuss the notion of compatibility of types more in detail later.

A character variable is allocated one byte of memory. The bit patterns to be stored for the various characters are defined by the American Standard Code for Information Interchange (ASCII). Each character is associated with an numeric value between 0 and 127 (both inclusive). For example, the character 'a' is associated with the decimal (numeric) value 97 (see the table on the website referenced above). Thus, the statement

    char ch = 'a';
  

will cause one byte to be allocated for the variable ch, and the bit pattern stored in the byte will be the binary representation for the number 97.

Note that, in C++, single characters are denoted by single quotes, and strings, i.e., sequences of characters, are denoted by double quotes.

C++ allows variables of type char to be cast as int values. As we have discussed before, char variables are stored in one byte, while int variables are stored in four bytes. When a char variable is cast an as int, the contents of the one byte representing the char variable are copied as the contents of the rightmost of the four bytes representing the int variable, and the leftmost three bytes representing the int variable are set to 0. Thus, after casting as an int object, the value of a char variable is simply the integer value of the ASCII code for the character.

Similarly, an int variable can be cast as a char value. In this case, though, since the range of values that an int object can hold is much larger than the range of values that a char variable can hold, there is some loss of information. The contents of the rightmost byte of the word representing the int variable are copied as the contents of the byte representing the char variable, and the char variable's value now is that character whose ASCII value is the integer value of the contents of that one byte.

An int variable can be cast as a float value. Suppose the int variable has the value x, then the float value after casting is x.0, i.e., the integer value with a value of 0 for the decimal part.

On the other hand, if a float variable is cast as an int value, then the int value after casting is simply the integer part of the value of the float variable.

The file chars.cc contains a program that reads in characters from standard input, i.e., the terminal, using the cin.get() function. The program reads in one character at a time, and prints out the ASCII value of the character and the character itself. Note that when you run the program and enter input at the terminal, a character will be read in only after you type the "Enter" key after the character. Do you think the ASCII value of the "Enter" key will be printed as well? The program expects to read the end of file character to terminate. The end of file character is the control-d character on the keyboard. One checks for the end of file character using the cin.eof() function. This function returns true iff the end of file character has been read. Note that one does not pass the character read as a parameter to the eof() function; the cin object stores a boolean flag that is set to true when the end of file character is read, and the eof() function simply returns the value of this flag.

Working with characters and strings

As noted above, casting a character variable as an int gives us the integer ASCII value for the character. For example,

    char ch = 'a';
    int chAsInt = (int) ch;
    cout << chAsInt << endl;
  

will print 97, since that is the numeric value of the ASCII code for 'a'.

Since characters are stored as their corresponding numeric value according to the ASCII code, C++ allows assignment and comparison of character variables. Thus,

    char ch1 = 'a';
    char ch2 = '7';
    ch1 = ch2;
  

will first cause the variable ch1 to contain the binary representation of 97, then will cause the variable ch2 to contain the binary representation of 55 (since the ASCII value for '7' is 55), and then will assign the value of ch2 to the variable ch1. Thus, ch1 will not contain the binary representation for 55.

As noted above, casting a character variable as an int gives us the integer ASCII value for the character. For example,

    char ch = 'a';
    int chAsInt = (int) ch;
    cout << chAsInt << endl;
  

will print 97, since that is the numeric value of the ASCII code for 'a'.

Similarly, the condition

      (ch1 < ch2)
    

will return true if and only if the numeric value stored in ch1 (i.e., the ASCII value of the character that was assigned to ch1) is less than the numeric value stored in ch2. The other comparison operators (greater than, less than or equal to, etc.) work similarly.

Note that the ASCII value for the character '7' is not 7.

As we saw above, we can get the integer ASCII value of a character. Although we may not know the exact ASCII values for charcters, it is often sufficient to know that in the table of ASCII values, the characters '0' through '9' have a contiguous range of values, with the value for '0' being the smallest, the characters 'a' through 'z' have a contiguous range of values, with the value for 'a' being the smallest, and the characters 'A' through 'Z' have a contiguous range of values, with the value for 'A' being the smallest.

Suppose ch is a char variable. What conditional expression would you use to check if the character stored in ch is a digit? What conditional expression would you use to check if the character stored in ch is a lower case letter?

Suppose ch is a char variable whose value is a lower case letter. What expression would you use so that value of the expression is the corresponding upper case letter?

We have seen that in C++ strings are arrays of characters with one extra character at the end: the end of string character. The end of string character is denoted by '\0'. The individual elements of the string can be accessed as the individual elements of the array by indexing the elements as usual. What process would you use to find the length, i.e., the number of characters not including the end of string character, of a given string?

Suppose you have two arrays of characters, say name1 and name2. How would you check if the two variables contain identical strings?

Assuming that each of the characters in two strings is a lower case letter, what process would you use to check if one string comes before another in dictionary ordering?

Suppose course is a variable that holds a string (i.e., course is an array of char objects), say s1. What process would you use so that at the end of the process you have another variable course1 whose value is also s1, and course and course1 are not aliases for the same object, i.e., making a change to the array course will not cause the contents of course1 to change?

Suppose noun1 and noun2 are two variables that hold strings, say s1 and s2. What process would you use so that at the end of the process the value of noun1 is s2 and the value of noun2 is s1?

Suppose subject, verb and object are three variables that hold strings, say s1, s2 and s3. What process would you use so that at the end of the process you have a new variable that holds the string made up of s1 followed by s2 followed by s3?

Suppose the variable genome holds a string, and the variable protein holds another string. What process would you use to determine if the string represented by protein appears as a contiguous part of the string represented by genome?

Suppose the variable genome holds a string, and the variable protein holds another string. What process would you use to determine if the string represented by protein appears as a (not necessarily contiguous) part of the string represented by genome?

Suppose the variable codeWord holds a string that contains only lower case letters. The string denotes an integer number that is written in base 26, where the letter 'a' stands for the value 0, the letter 'b' stands for the value '1', ..., the letter 'z' stands for the value 25. What process would you use to determine the integer value of the string held in codeWord?