Chapter 9
Key Terms
null character, ‘\0’
C-string variables
C-string variables vs. arrays of characters
initializing C-string variables
indexed variables for C-string variables
Do not destroy the ‘\0’
assigning a C-string value
testing C-strings for equality
lexicographic order
input/output with files
reading blanks and’\n’
detecting the end of an input line
+ does concatenation
converting C-string constants to the type string

Brief Outline
9.1 An Array Type for Strings
C-string values and C-string variables
Other Functions in <cstring>
C-string input and output
Command Line Arguments
9.2 Character Manipulation
Character I/O
The Member Functions get and put
The putback, peek, and ignore Member Functions
Character-Manipulating Functions
9.3 The Standard class string
Introduction to the Standard class string
I/O with the class string
string processing with class string
Converting Between string objects and C-strings

1. Introduction and Teaching Suggestions

C-strings are the kind of strings C++ gets from its C language heritage. A C-string is an ordinary
array of char with the additional termination requirement. The last character position used in the
array must be indicated by a null character (‘\0’) in the next position.

The Standard C++ library provides the class string. This class is defined in the header <string>.
This class has many useful member functions and overloaded operators that make string
processing easier and more intuitive. This class is not properly part of the Standard Template
Library. Nevertheless, the development of the STL influenced the design of the string class as it
appears in the ANSI/ISO C++ Standard and in implementations provided by compiler

We will call the type of string C++ inherits from C, C-strings. We will call a C++ Standard string
class object simply a “string”.

2. Key Points
Escape Sequences. The text's discussion of cstring variables points out that the null character '\0'
is the terminator for cstring variables. The text points out that the library functions that process
cstrings (including the iostream functions) use the null character as a sentinel indicating the last
character of the cstring. The backslash (\) signals that the next character to be dealt with in a way
different than it would be handled without the backslash. The backslash is called the 'escape
character' because it escapes the normal meaning of certain characters, allowing them to have
special meaning. (And it removes or escapes special meaning of characters that have special

meaning, such as the backslash itself. To print a backslash, use two backslashes \\ in the cstring
literal. Here the first backslash escapes the special meaning of the second backslash, which is
inserted in the output stream.)

If we just put a ‘0’ into some position in a cstring, the encoding for 0, 48 decimal, or 30
hexadecimal, is stored at that character position. (See Appendix 3 for the decimal encoding of
the printing ASCII characters.) To cause the null character to be stored, we use the \ before the 0
to escape the usual meaning of 0. This tells the compiler that we want the numeric value of the
null character to be embedded in the cstring. (The value of the null character really is zero, not
the normal encoding of the character 0.)

C-string values and C-string variables. A c-string is a sequence of characters terminated by a

null symbol. It is the default type for literal strings using "".

Character Manipulation with get and put. The get and put functions allow your program to
read in one character of input or store one character at a time. It is interesting to note that
cin.get() returns an int. The put member takes a char argument which causes the int to be
converted to a char. Casts may be required depending on your use.

The putback, peek, and ignore Member Functions. The putback function returns the
argument character to the input stream then decrements the next-character pointer. The
declaration of the putback member istream function is
istream& putback(char c);
The precondition for this function is that the char argument be the last character extracted, or if
the istream state is not good, this function calls setstate(failbit) which may throw the exception,
ios_base::failure, and return. If we ignore exceptions, the program will just terminate. We won’t
discuss exceptions until Chapter 18. The instructor needs to be aware of this because inevitably
some student will run into a situation where her program terminates inexplicably.

The peek function allows us to examine a not-yet-read char from the input.
This istream function has declaration
int peek();
The function peek returns an EOF if at end of file, or if the function ios::good() returns false.
Being able to examine a character prior to reading it is desirable, particularly when we want
robust input that allows for human error at the keyboard.

The ignore function extracts and discards a number of characters equal to the first argument or
characters up to a character equal to the second argument. The declaration for this istream
member is
istream& ignore(streamsize n = 1, int delim = EOF);
The type streamsize is an unsigned integral type. In most implementations, this is unsigned int.

Character Manipulation Functions. Many functions in <cctype> exist to manipulate

characters, such as converting between upper and lower case. Note that toupper and tolower
return the int representation of the corresponding letter.

The Standard Class string. The string class overloads many operators: + and += for
concatenation, = for assignment, <, <=, >, and >=, for string comparison use lexicographic
ordering, and the indexing operation, [i], for access to characters in the string. There is no range
checking for indexing on string objects. If range checking is needed, there is a string member
function called at() that is range checked. It provides many member functions that provide useful

I/O with the class string. The insertion, <<,and extraction, >>, operators are overloaded for
string objects. By default, extraction to a string object
string s;
cin >> s;
ignores initial whitespace, just at extraction to a C-string. This operation then extracts the next
sequence of nonwhite characters, and stops input at the next white space.
There is a version of getline defined specifically for string objects. The syntax is
string s;
getline(cin, s);
This version of getline is necessarily a stand-alone function. The committee of authors of the
string library had two choices. They could have rewritten the iostream library to add a getline
member function, or they could have written a stand-alone getline function. The iostream library
was already written and stable. Just as you have to write a stand-alone overloading of the output
operator for your classes, these authors chose to write a stand-alone overloaded version of getline
for this pair of arguments.

The getline function extracts from the input stream characters up to the delimiter character. The
characters from the start to the delimiter are placed in the string variable, and the termination
character is discarded. The delimiter character defaults to the newline character, ‘\n’. By
supplying a third argument, the default termination character can be changed. It is a bad practice
to mix cin >> x style input and getline.

Converting Between string objects and C-strings. Several functions require C-string values as
arguments. The c_str() member can be used to retrieve a null terminated C-string r-value from a
string object.

3. Tips
Lack of null termination in a C-String. The following declarations are not equivalent:
char shortStringA[] = {'a', 'b', 'c'};
char shortStringB[] = "abc";
The first is not null-terminated, while the second is. A quick way to convince the student that
these are not equivalent is to put these declarations in a program and send them to the screen.
You should get: "abc" from the first output statement, then "abc" followed by several ugly
characters from memory locations following the array, up to the next ‘\0’ (or the next memory
location that is protected, resulting in a segmentations violation).

C-string input and output. Suppose non-digit input is received by the first loop in the
#include <iostream>
using namespace std;

int main()
int x;
cout << "enter some int values: \n";

while( cin >> x ) //intVariable )

cout << "You pressed " << x << "\n";

cout << "finished" << endl;

cout << "enter some more int values: \n";

while( cin >> x ) //intVariable )
cout << "You pressed " << x << "\n";
cout << "finished" << endl;

return 0;
The first loop terminates when non-digit input is encountered. The input stream is set to stream
state of bad, and any later input attempt fails. By convention, all input functionality is disabled
with the stream state is set to other than a good state.
We sometimes need to be able to restore the i/o state to good. To do this, we must reset the i/o
state to goodbit and remove the offending characters. The iostream library has two member
functions to change the stream state explicitly. Here the prototype of the first one.
void clear(iostate state = goodbit);
This function sets the state of the stream to the argument, where the function has a default
argument of goodbit. Thus if you issue the call
then the state flags are set to goodbit.
A second function has prototype
void setstate(iostate addstate);
This function sets the state to whatever the state of the stream was prior to the call, (let’s call this
oldStreamState) to oldStreamState | addstate
In this expression, the | is the bitwise OR operator.
For example, you can reset the state of the input stream by:
#include <iostream>
using namespace std;

int main()
int x;
cout << "enter some int values: \n";

while( cin >> x ) //intVariable )

cout << "You pressed " << x << "\n";

cout << "finished" << endl;

cout << "unrecoverable error\n";
else if(
cout << " returned true.\n";
char discard;
cin.clear(); // We must clear the stream flags first.
cin >> discard; // The offending input is a char. It
// remains on input queue and must be
// removed and discarded before
// continuing with input operations
cout << "discarded bad char: " << discard << endl;

cout << "enter some more int values: \n";

while( cin >> x ) //intVariable )
cout << "You pressed " << x << "\n";
cout << "finished" << endl;

return 0;

4. Pitfalls
Dangers in Using functions defined in <cstring>. A danger in using functions from <cstring>
lies in using arrays of char without a null terminator as the preconditions prescribe. Each of the
functions strcpy, strcat and strcmp depend on the terminating null character for their action.
Improper use of these functions can result in crashed programs or even worse, security breaches.
In addition, the iostream output function expects there to be a terminating null character.
Otherwise, as we note elsewhere in this IRM, the output routine will run on until either we
encounter a null character or protected memory. The strncpy, strncat, and strncmp functions are
slightly safer as they take a third argument that specifies how many characters to process.

Not leaving enough space for the null terminator. Please be certain the student is aware of the
necessity for providing space for the null terminator when using getline as well as in other C++
string uses. A typical call might look like this
cin.getline(stringVariable, MAX_CHARACTERS + 1);
The getline member function overwrites the first argument (let's call it arg1) with the smaller of
strlen(arg1) characters (including the null terminator) and MAX_CHARACTERS. A null
character is always appended after the MAX_CHARACTERS number of characters. This
caution is expressed in the text’s getline sidebar. It bears emphasis.

Using = and == with C-Strings. Since c-strings are pointers to arrays of chars, = simply copies
the pointer and == compares pointer addresses. Instead the strcpy or strcmp routines should be
used and adequate space must exist in the target prior to use of strcpy.

Unexpected \n in Input. When reading input you must account for every character, including
the newline character. To clear the input stream of any leftover newlines, use the ignore

Mixing cin >> variable; and getline. When using cin >> n, the newline is left in the input
stream. This is not a problem if all input is done this way, as the newline will be skipped on the
next input since all whitespace is skipped. However, if getline is used in combination with cin
>> n, then the getline call may read the newline and exit. Use the
cin.ignore(1000,’\n’) function to ignore any remaining newlines.

5. Programming Projects Answers

1. Format a Sentence.
While it may be cleaner to use C++ Standard class string to do this problem, I chose to use
only the features of <cstring> to do this problem. Students need to be able to use both
<string> and <cstring> facilities.

This program is to read in a (single) sentence (defined as a sequence of words terminated by a

period (.). The sentence is to be formatted and written to the output. Formatting consists of
capitalizing first words in a sentence and compressing all multiple blanks to single blanks. For
this purpose, a line-break is a blank. No other capital letters are to be included in the output. A
sentence is assumed end with a period and there are no other periods.

Get a 100 character line. The period is the terminating sentinel.
Process: scan the line, capitalizing the first letter in the string, looking for multiple blanks which
it eats, except for one, and newline == blank.

Some questions asked during design:

Do I replace new-lines with blanks?
Answer: Defer this question until later. (Answered later: yes)

What do I do with sentences that are longer than my line width on my output device?

Answer: Scan for and replace new-lines with blanks before replacing multiple blanks with one
blank, and allow wraps on output.


Copy the modified string to the output.

This solution uses only the C style string (C-string) features.

The input is taken character by character. If a period (.) is encountered, the loop terminates.

First, any new-line characters are replaced by blanks.
Next, all characters are made lower case.

There is a one-time switch to capitalize the first non-blank.

There is a state machine for compressing multiple blanks to single blanks:

If we have a blank and haven't seen one since the last non-blank, set a variable, haveBlank.
Next if we see a blank and haveBlank is already set, cause overwriting the current blank with
the new one by backing up the index to the sentence array by one. Once we read a non-blank, if
the haveBlank note is set to true, turn it off.

These pieces, along with replacing the newlines with blanks, serve to compress each sequence of
blanks and newlines to one blank.

Finally, there is the matter of putting in a null character if the string is shorter than the limit, and
putting in a period and null if the string is too long.

An alternative solution is to use a loop around cin >> word, where word is a C-string
variable. This ignores blanks and newlines. Then process the words much as I have processed the
characters here. Then output the words with a single blank between.

#include <iostream>
#include <cstring>
#include <cctype>

void getSentence(char sentence[], int& size);

// fetches characters for sentence until a period. The period is
// put at the end of characters fetched. size is set to number of
// characters fetched, including the period.

void process(char str[], int size);

// replaces newline by blanks, compresses multiple blanks to one
// capitalizes the first letter of the first word, lower cases rest.

int main()
using namespace std;
char sentence[100];

int size;
cout << "Enter one sentence. The period is the sentinel to quit."
<< endl;
getSentence(sentence, size);
cout << endl;
cout << sentence << endl;

void getSentence(char sentence[], int& size)

int haveBlank = false;
int isFirstLetter = true;

for(int i = 0; '.' != (sentence[i] = cin.get()) && i < 100; i++)

if('\n' == sentence[i])
sentence[i] = ' ';
sentence[i] = tolower(sentence[i]);
if(isFirstLetter && isalpha(sentence[i]))
isFirstLetter = false;
sentence[i] = toupper(sentence[i]);

if(' ' == sentence[i] && !haveBlank)

haveBlank = true;
else if(' ' == sentence[i] && haveBlank)
else if(' ' != sentence[i] && haveBlank)
haveBlank = false;
if(i < 99)
sentence[i+1] = '\0';
sentence[98] = '.';
sentence[99] = '\0';
size = i;

A typical run follows:


noW iS thE TiMe fOr aLl


ThE CounTRY.

Enter one sentence. The period is the sentinel to quit.

Now is the time for all good men to come to the aid of the country.

2. Count words and letters

This program reads in a line of text, counts and outputs the number of words in the line and the
number of occurrences of each letter.

Define a word to be a string of letters delimited by white space (blank, newline, tab), a comma,
or a period. Assume that the input consists only of characters and these delimiters.

For purposes of counting letters, case is immaterial.

Output letters in alphabetic order and only output those letters that occur.

Algorithm development:
The word count is carried out with a state machine.

We enter with our state variable, inWord set to false, and our word count set to 0.

while(input characters is successful)

if we have encountered a blank, newline or a tab,
we set state to false
else if inWord is false,
set state to true
increment word count
lowCase = tolower(inChar);
charCount[int(lowCase) - int('a')]++;

cout << wordCount << " " words" << endl;

for(i = 0; i < 25; i++)
if(charCount[i] != 0)
cout << charCount[i] << " " << char(i + 'a')<< endl;

Comments on the letter count code:

We run tolower() on all characters entered. The data structure for the letter count is a 26-
letter int array with indices in the range 0-25, calculated by

index = int(character) - int('a')

as each letter is read in, increment the appropriate array element.

Output of the letter count is a loop running from 0-25, with an if statement that allows output
if the array entry isn't zero.

The word counting results of this agree with the Unix wc utility, and agree with hand counted
simple files.

#include <iostream>
#include <cctype>

// character and word count.

int main()
using namespace std;
int inWord = false;
int wordCount = 0;
char ch;
char lowCase;
int charCount[26];
int i; //i is declared outside for to avoid warnings about
//differences between new and old binding rules for the
//for loop variable.

for(i = 0; i < 26; i++)

charCount[i] = 0;

while('\n' !=(ch = cin.get()))

if(' ' == ch || '\n' == ch || '\t' == ch)
inWord = false;
else if(inWord == false)
inWord = true;
lowCase = tolower(ch);
charCount[int(lowCase) - int('a')]++;

cout << wordCount << " words" << endl;

for(i = 0; i < 25; i++)
if(charCount[i] != 0)
cout << charCount[i] << " " << char(i + 'a')
<< endl;
return 0;

3 First, Middle and Last Names

Only notes are provided for this problem.

Problem Statement:
This program is supposed to accept first name, middle name or initial, then last name. Even if
the middle name is only an initial, an initial followed by a comma, the output should be
lastName, firstName, middleInitial. (period after middle

If the middle name is missing altogether the output should be

lastName, firstName

This is easy if there is at least a middle initial. Using three variables,
char firstName[20], middleName[20], lastName[20];
This works well for most names. The input is:
cin >> firstName >> middleName >> lastName;
and for output:
cout << lastName << ", " << firstName
<< middleName[0] << '.' << endl;
If the middle name is missing, the program must detect this (strlen), copy the middle name
to the last name, and set the middle name to an empty string (or just arrange not to print the
middle name).

Or you can use <string> functions.

4 Text replacement
This program accepts a line of text. It replaces all four-letter words in the string with "love". If
the four-letter word is capitalized, the word love should be also. Only capitalize the first letter of
the word.

If the length is 4, call output(char firstLetter) that outputs Love or love depending
on whether the firstLetter is uppercase or lower case.

A Simple Minded “Almost Solution”:

In developing this, we first present a very simple minded solution that almost meets the
requirements of the problem.

Caveat: This solution collapses all blanks to one blank, and requires that one must enter the end-
of-file character to terminate the input, rather than "accepting a line of text."
But it is a simple solution:
#include <iostream>
#include <cctype>

void outputLove( char firstLetter );

// word has 4 letters, if firstLetter is uppercase, output Love
// else output love

int main()
using namespace std;
char word[81];
while ( cin >> word )
if ( strlen(word) == 4 )
outputLove( word[0] );
cout << word;
cout << " ";
cout << endl;

void outputLove( char firstLetter )

using namespace std;
if ( isupper( firstLetter ) )
cout << "Love" ;
cout << "love";

A “Better” Solution
(Is it better? - It is certainly a more complicated solution. It does meet the requirements of the
problem exactly.)

The following solution accepts a line of input. It uses cin member getline to fetch a line of
input. A function, breakUp, separates the non-alphabetic and alphabetic substrings of the input
line into an array of strings. The main function then examines the strings one at a time and if
alphabetic and of length 4, prints 'love' or 'Love' depending on case of the string being
replaced. The main part prints the string intact if it is of length not equal to 4 or it is non-

Note that there is a stub included. It was well worth the trouble to create it to properly debug the
main part of the program first. This allows confident debugging of the breakUp function.

// File:
// Text Replacement

#include <iostream>
#include <cctype>

void breakUp ( char line[], char list[][80], int & n );

// accepts line of up to 79 characters
// returns in list the succession of null terminated words
// and non-alpha character strings between them null terminated.
// returns number of words in list in parameter n

int main()

using namespace std;

char line[80];
char list[80][80]; // ample space for 80 words of max length 79
int n;
cin.getline(line, 80);
breakUp( line, list, n);
int i = 0;
while ( i < n )
if ( 4 == strlen( list[i] ) && isalpha(list[i][0] ))
if ( isupper(list[i][0]) )
cout << "Love";
else if ( islower( list[i][0] ) )
cout << "love";
cout << list[i];
cout << endl;

void breakUp ( char line[], char list[][80], int & number )

// separator is anything not alpha
int i = 0, k = 0, wordNumber = 0;

while( i < strlen( line ) )

k = 0;
// after the start of the line, line[i] is non-alpha
// extract the next non-alphabetic substring
while ( !isalpha(line[i]) )
list[wordNumber][k] = line[i];
i++; k++;
list[wordNumber][k] = '\0';
k = 0;
// line[i] is an alphabetic character.
// extract the next alphabetic substring
while ( isalpha (line[i]) )
list[wordNumber][k] = line[i];
i++; k++;
list[wordNumber][k] = '\0';
number = wordNumber;


// It is WELL WORTH the trouble to create this to get the rest of the

// program running correctly.

// Note that array initializers would have been better than this long
// drawn out collection of assignments.
void breakUp ( char line[], char list[][80], int & n)
list[0][0] = ' ';
list[0][1] = ' ';
list[0][2] = ' ';
list[0][3] = ' ';
list[0][4] = '\0';

list[1][0] = 'n';
list[1][1] = 'o';
list[1][2] = 'w';
list[1][3] = '\0';

list[2][0] = ' ';

list[2][1] = '\0';

list[3][0] = 'i';
list[3][1] = 's';
list[3][2] = '\0';

list[4][0] = ' ';

list[4][1] = '\0';

list[5][0] = 'a';
list[5][1] = '\0';

list[6][0] = ' ';

list[6][1] = '\0';

list[7][0] = 'G';
list[7][1] = 'o';
list[7][2] = 'o';
list[7][3] = 'd';
list[7][4] = '\0';

list[8][0] = ' ';

list[8][1] = '\0';

list[9][0] = 't';
list[9][1] = 'i';
list[9][2] = 'm';
list[9][3] = 'e';
list[9][4] = '\0';

list[10][0] = ' ';

list[10][1] = ' ';
list[10][2] = ' ';
list[10][3] = ' ';
list[10][4] = '\0';

n = 11;

