Type casting and Working with strings (arrays of characters)
C++ allows values of one type to be used as a value of another compatible type using the mechanism of casting. We will discuss the notion of compatibility of types more in detail later.
A character variable is allocated one byte of memory. The bit patterns to be stored for the various characters are defined by the American Standard Code for Information Interchange (ASCII). Each character is associated with an numeric value between 0 and 127 (both inclusive). For example, the character 'a' is associated with the decimal (numeric) value 97 (see the table on the website referenced above). Thus, the statement
char ch = 'a';
will cause one byte to be allocated for the
variable ch
, and the bit pattern stored in the byte
will be the binary representation for the number 97.
Note that, in C++, single characters are denoted by single quotes, and strings, i.e., sequences of characters, are denoted by double quotes.
C++ allows
variables of type char
to be cast as int
values. As we have discussed before, char
variables are
stored in one byte, while int
variables are stored in
four bytes. When a char
variable is cast an
as int
, the contents of the one byte representing
the char
variable are copied as the contents of the
rightmost of the four bytes representing the int
variable, and the leftmost three bytes representing
the int
variable are set to 0. Thus, after casting as
an int
object, the value of a
char
variable is simply the integer value of the ASCII
code for the character.
Similarly, an int
variable can be cast as
a char
value. In this case, though, since the range of
values that an int
object can hold is much larger than
the range of values that a char
variable can hold,
there is some loss of information. The contents of the rightmost
byte of the word representing the int
variable are
copied as the contents of the byte representing
the char
variable, and the char
variable's
value now is that character whose ASCII value is the integer value
of the contents of that one byte.
An int
variable can be cast as a float
value. Suppose the int
variable has the
value x, then the float
value after casting
is x.0, i.e., the integer value with a value of 0 for the
decimal part.
On the other hand, if a float
variable is cast as
an int
value, then the int
value after
casting is simply the integer part of the value of
the float
variable.
The file chars.cc
contains a program that reads in characters from standard input,
i.e., the terminal, using the cin.get()
function. The
program reads in one
character at a time, and prints out the ASCII value of the character
and the character itself. Note that when you run the program and
enter input at the terminal, a character will be read in only after
you type the "Enter" key after the character. Do you think the ASCII
value of the "Enter" key will be printed as well? The program
expects to read the end of file character to terminate. The end of
file character is the control-d character on the keyboard. One
checks for the end of file character using
the cin.eof()
function. This function returns true iff
the end of file character has been read. Note that one does not pass
the character read as a parameter to the eof()
function; the cin
object stores a boolean flag that is
set to true when the end of file character is read, and
the eof()
function simply returns the value of this
flag.
As noted above, casting a character variable as an int
gives us the integer ASCII value for the character. For example,
char ch = 'a'; int chAsInt = (int) ch; cout << chAsInt << endl;
will print 97, since that is the numeric value of the ASCII code for 'a'.
Since characters are stored as their corresponding numeric value according to the ASCII code, C++ allows assignment and comparison of character variables. Thus,
char ch1 = 'a'; char ch2 = '7'; ch1 = ch2;
will first cause the variable ch1
to contain the binary
representation of 97, then will cause the variable ch2
to contain the binary representation of 55 (since the ASCII value
for '7' is 55), and then will assign the value of ch2
to the variable ch1
. Thus, ch1
will not
contain the binary representation for 55.
As noted above, casting a character variable as an int
gives us the integer ASCII value for the character. For example,
char ch = 'a'; int chAsInt = (int) ch; cout << chAsInt << endl;
will print 97, since that is the numeric value of the ASCII code for 'a'.
Similarly, the condition
(ch1 < ch2)
will return true if and only if the numeric value stored
in ch1
(i.e., the ASCII value of the character that
was assigned to ch1
) is less than the numeric value
stored in ch2
. The other comparison operators
(greater than, less than or equal to, etc.) work similarly.
Note that the ASCII value for the character '7' is not 7.
As we saw above, we can get the integer ASCII value of a character. Although we may not know the exact ASCII values for charcters, it is often sufficient to know that in the table of ASCII values, the characters '0' through '9' have a contiguous range of values, with the value for '0' being the smallest, the characters 'a' through 'z' have a contiguous range of values, with the value for 'a' being the smallest, and the characters 'A' through 'Z' have a contiguous range of values, with the value for 'A' being the smallest.
Suppose ch
is a char
variable. What
conditional expression would you use to check if the character
stored in ch
is a digit? What
conditional expression would you use to check if the character
stored in ch
is a lower case letter?
Suppose ch
is a char
variable whose value
is a lower case letter. What expression would you use so that value
of the expression is the corresponding upper case letter?
We have seen that in C++ strings are arrays of characters with one
extra character at the end: the end of string character. The end of
string character is denoted by '\0'
. The
individual elements of the string can be accessed as the individual
elements of the array by indexing the elements as usual. What
process would you use to find the length, i.e., the number of
characters not including the end of string character, of a given
string?
Suppose you have two arrays of characters, say name1
and name2
. How would you check if the two variables
contain identical strings?
Assuming that each of the characters in two strings is a lower case letter, what process would you use to check if one string comes before another in dictionary ordering?
Suppose course
is a variable
that holds a string (i.e., course
is an array
of char
objects), say s1.
What process would you use so that at the end of the process you
have another
variable course1
whose value is
also s1, and course
and course1
are not aliases for the same object, i.e.,
making a change to the array course
will not cause the
contents of course1
to change?
Suppose noun1
and noun2
are two variables
that hold strings, say s1
and s2. What process would you use so that at the
end of the process the
value of noun1
is s2 and the value
of noun2
is s1?
Suppose subject
, verb
and object
are three variables that hold strings,
say s1, s2
and s3. What process would you use so that at
the end of the process you have a new variable that holds the string
made up of s1 followed by s2
followed by s3?
Suppose the variable genome
holds a string, and the
variable protein
holds another string. What process
would you use to determine if the string represented
by protein
appears as a contiguous part of the string
represented by genome
?
Suppose the variable genome
holds a string, and the
variable protein
holds another string. What process
would you use to determine if the string represented
by protein
appears as a (not necessarily
contiguous) part of the string
represented by genome
?
Suppose the variable codeWord
holds a string that
contains only lower case letters. The string denotes an integer
number that is written in base 26, where the letter 'a' stands for
the value 0, the letter 'b' stands for the value '1', ..., the
letter 'z' stands for the value 25. What process would you use to
determine the integer value of the string held
in codeWord
?