C++ - Working with characters and text


Characters are stored as their ASCII (American Standard Code for Information Interchange) values, i.e., integers in the range 0 through 127. Thus, each character takes up 8 bits, or one byte. C++ allows the programmer to work with characters as characters, e.g., 'a' or as integers. Note that the ASCII value of the character '0' is not the value 0. In the ASCII code, the lower case characters are all contiguous, the upper case characters are all contiguous, and the digits (as characters) are all contiguous, although these three sets of characters are not next to each other.

Usually, a string in C++ is represented as an array of characters. If the string has n characters, the array that is used to store the string must have space for at least n+1 characters because C++ uses an end of string character at the ends of strings. The end of string character is the character with ASCII value 0, also represented in programs as '\0'.

Two member functions of the cin object of C++ are typically used to read character input. The functions are get and getline. Typically, get is used to read the input one character at a time, and getline is used to read one line of input (terminated by the new line character).

  1. The function get takes a variable of type char as a reference parameter and returns the next character from the input stream in this variable. Thus, for example,
        char nextCh;
        cin.get(nextCh);
      
    will return the next character from the input stream in nextCh. This character can now be printed to standard output using the cout object of C++. For example, the first three lines of the following code will print the character, the decimal representation integer value (ASCII value) of the character and the hexadecimal representation of the integer value of the character. hex is a C++ object in the std namespace the directs the cout object to output the next integer in hexadecimal instead of decimal.
      cout << nextCh << endl;
      cout << (int) nextCh << endl;
      cout << "0x" << std::hex << (int) nextCh << endl;
      cout << std::hex << nextCh << endl;
      
    What do you think the last line will print?
  2. Write a complete C++ program that reads in one character from standard input and then prints out the character and its ASCII value in decimal and hexadecimal. Note that the program fragment prints out the string "0x" before the hexadecimal value to indicate that it is a hexadecimal value: that is typically how hexadecimal values are written.
  3. Run your program to find the ASCII values of the characters 'a', 'A', and '0'. Verify that the decimal values are 97, 65 and 48, respectively.
  4. Suppose, in the above program, the variable you read the character in is nextCh. Consider the following:
        char newChar;
        newChar = (int) nextCh + 1;
        cout << newChar << endl;
      
    What do you think the above program fragment, if added to your program will do? Verify your answer by including this fragment in your code and running it.
  5. Since a character is represented by its ASCII value, changing a lower case character to upper case involves changing the ASCII value to the appropriate value. So, if the character in a variable is 'a' (i.e., ASCII value 97 in decimal), then assigning the integer value 65 (i.e., ASCII value of 'A') to the variable and then printing it should print the character 'A'. Is that the case? Write a program that declares a char variable, assigns the value 65 to it and prints it to verify.
  6. Now, modify the program you wrote in Item 2 above so that when the user inputs a lower case letter, the program prints out the corresponding upper case letter and its ASCII value as decimal and hexadecimal. Do not use any magic numbers in the program, even as declared constants.
  7. To read a sequence of characters, one can simply loop, reading one character in each iteration (say the i-th) of the loop and storing it as the i-th element of an array of characters. Usually, some terminating character is known so that one knows when to end the loop. Also, since we are storing the characters in an array, the array must be declared before the loop begins, and thus we need to have some information about the maximum size of the words being read from input. Once the terminating character is encountered, instead of storing that character, we store the end of string character ('\0').
  8. Write a complete C++ program that reads in a word from standard input and then prints out the word. Assume that the word is no more than 10 characters. Assume that the terminator for the word is the blank character.
  9. Modify your above program so that after printing out the word, the program also prints whether or not every character in the word is a letter of the alphabet (it could be lower case or upper case).
  10. Further modify the program so that if the word is made up of letters of the alphabet, then the word is printed in all upper case.
  11. Modify your above program to print out whether every character in the word is a digit (i.e., if the word is a number).
  12. Modify your above program so that if the word is a number, then the program prints out the decimal and hexadecimal representations of the number. (For example, if the user input 48, then the program should print out 48 and 0x30.) To print the hexadecimal value of the number, you need to convert the string "48" to its decimal value and then have it printed (as done above) in hexadecimal. Once again, do not use any magic numbers (except for the size of the array) in your program.
  13. Write a complete C++ program to print out the number of lines on standard input. Each occurrence of the newline character (use the Enter key when you give input to your program), that counts as one more line. How would you check to see if the character entered is a newline character? (To begin with write your program to simply print out the ASCII values of each character entered by the user. This will tell you the ASCII value of the newline character.)
  14. The function getline takes two parameters: an array of characters, say Line and the size of the array, say MAXCHAR. It returns the characters of the next line of standard input (delimited by the newline character) in the array Line. If the number of characters on teh line exceeds MAXCHAR, the first MAXCHAR characters of the line are returned in Line. The array Line has an end of string character as the character after all the characters read from standard input. Write a complete C++ program to verify the behaviour of getline. The member function gcount (without any parameters) of the cin object returns the number of characters that were read by getline. For example,
        #define MAXCHAR 5
        char Line[5];
        cin.getline(Line, MAXCHAR);
        cout << "*" << Line << "*" <<" read " 
             << cin.gcount() << " characters." << endl;
      
    will print out the line that was read and the number of characters read. (The output line prints out stars around the string just so we can see exactly the string that is being output. This becomes important while debugging programs where the strings may or may not contain the blank character or unprintable characters like control characters.) Test it by giving as input a line with fewer than 5 characters, 5 characters and more than 5 characters. What do you notice about the number of characters read? What is the explanation?
  15. Write a complete C++ program that reads in one line of input (maximum of 80 characters) and prints out the words, and the total number of words in the input line. Words are sequences of characters delimited by the blank character, the comma character and the period character.
  16. The end of input is usually denoted as the end of file (EOF) character. When input comes from a file, the end of the file is the end of file marker. When input is being given from the terminal, the end of file is indicated by entering the character control-d. After an attempt has been made to read input (using cin by itself or with a member function as above), if the character read was the end of file character, then the member function eof (no parameters) of the cin object returns the value true. Thus, when reading input until the end of file, you read the next input, check if eof returns true, and if not, then process the input.

    Write a complete C++ program that prints out each line that the user enters, and finally the number of lines that the user enters. (The user indicates end of file by entering control-d. In your program you do not check for the control-d character, but call the member function eof.)

  17. Write a C++ program that reads a paragraph (a sequence of lines) and prints out all the words in the paragraph and the number of lines in the paragraph. The word delimiters are the blank character, the comma character and the period character. A word may be split across two lines; in this case the last character on the first line of the split will be a hyphen ('-' character).