Professional Documents
Culture Documents
Brief Outline
9.1 An Array Type for Strings
C-string values and C-string variables
Other Functions in <cstring>
C-string input and output
Command Line Arguments
9.2 Character Manipulation
Character I/O
The Member Functions get and put
The putback, peek, and ignore Member Functions
Character-Manipulating Functions
9.3 The Standard class string
Introduction to the Standard class string
I/O with the class string
string processing with class string
Converting Between string objects and C-strings
The Standard C++ library provides the class string. This class is defined in the header <string>.
This class has many useful member functions and overloaded operators that make string
processing easier and more intuitive. This class is not properly part of the Standard Template
Library. Nevertheless, the development of the STL influenced the design of the string class as it
appears in the ANSI/ISO C++ Standard and in implementations provided by compiler
developers.
We will call the type of string C++ inherits from C, C-strings. We will call a C++ Standard string
class object simply a “string”.
2. Key Points
Escape Sequences. The text's discussion of cstring variables points out that the null character '\0'
is the terminator for cstring variables. The text points out that the library functions that process
cstrings (including the iostream functions) use the null character as a sentinel indicating the last
character of the cstring. The backslash (\) signals that the next character to be dealt with in a way
different than it would be handled without the backslash. The backslash is called the 'escape
character' because it escapes the normal meaning of certain characters, allowing them to have
special meaning. (And it removes or escapes special meaning of characters that have special
meaning, such as the backslash itself. To print a backslash, use two backslashes \\ in the cstring
literal. Here the first backslash escapes the special meaning of the second backslash, which is
inserted in the output stream.)
If we just put a ‘0’ into some position in a cstring, the encoding for 0, 48 decimal, or 30
hexadecimal, is stored at that character position. (See Appendix 3 for the decimal encoding of
the printing ASCII characters.) To cause the null character to be stored, we use the \ before the 0
to escape the usual meaning of 0. This tells the compiler that we want the numeric value of the
null character to be embedded in the cstring. (The value of the null character really is zero, not
the normal encoding of the character 0.)
Character Manipulation with get and put. The get and put functions allow your program to
read in one character of input or store one character at a time. It is interesting to note that
cin.get() returns an int. The put member takes a char argument which causes the int to be
converted to a char. Casts may be required depending on your use.
The putback, peek, and ignore Member Functions. The putback function returns the
argument character to the input stream then decrements the next-character pointer. The
declaration of the putback member istream function is
istream& putback(char c);
The precondition for this function is that the char argument be the last character extracted, or if
the istream state is not good, this function calls setstate(failbit) which may throw the exception,
ios_base::failure, and return. If we ignore exceptions, the program will just terminate. We won’t
discuss exceptions until Chapter 18. The instructor needs to be aware of this because inevitably
some student will run into a situation where her program terminates inexplicably.
The peek function allows us to examine a not-yet-read char from the input.
This istream function has declaration
int peek();
The function peek returns an EOF if at end of file, or if the function ios::good() returns false.
Being able to examine a character prior to reading it is desirable, particularly when we want
robust input that allows for human error at the keyboard.
The ignore function extracts and discards a number of characters equal to the first argument or
characters up to a character equal to the second argument. The declaration for this istream
member is
istream& ignore(streamsize n = 1, int delim = EOF);
The type streamsize is an unsigned integral type. In most implementations, this is unsigned int.
The Standard Class string. The string class overloads many operators: + and += for
concatenation, = for assignment, <, <=, >, and >=, for string comparison use lexicographic
ordering, and the indexing operation, [i], for access to characters in the string. There is no range
checking for indexing on string objects. If range checking is needed, there is a string member
function called at() that is range checked. It provides many member functions that provide useful
functionality.
I/O with the class string. The insertion, <<,and extraction, >>, operators are overloaded for
string objects. By default, extraction to a string object
string s;
cin >> s;
ignores initial whitespace, just at extraction to a C-string. This operation then extracts the next
sequence of nonwhite characters, and stops input at the next white space.
There is a version of getline defined specifically for string objects. The syntax is
string s;
getline(cin, s);
This version of getline is necessarily a stand-alone function. The committee of authors of the
string library had two choices. They could have rewritten the iostream library to add a getline
member function, or they could have written a stand-alone getline function. The iostream library
was already written and stable. Just as you have to write a stand-alone overloading of the output
operator for your classes, these authors chose to write a stand-alone overloaded version of getline
for this pair of arguments.
The getline function extracts from the input stream characters up to the delimiter character. The
characters from the start to the delimiter are placed in the string variable, and the termination
character is discarded. The delimiter character defaults to the newline character, ‘\n’. By
supplying a third argument, the default termination character can be changed. It is a bad practice
to mix cin >> x style input and getline.
Converting Between string objects and C-strings. Several functions require C-string values as
arguments. The c_str() member can be used to retrieve a null terminated C-string r-value from a
string object.
3. Tips
Lack of null termination in a C-String. The following declarations are not equivalent:
char shortStringA[] = {'a', 'b', 'c'};
and
char shortStringB[] = "abc";
The first is not null-terminated, while the second is. A quick way to convince the student that
these are not equivalent is to put these declarations in a program and send them to the screen.
You should get: "abc" from the first output statement, then "abc" followed by several ugly
characters from memory locations following the array, up to the next ‘\0’ (or the next memory
location that is protected, resulting in a segmentations violation).
C-string input and output. Suppose non-digit input is received by the first loop in the
following:
#include <iostream>
using namespace std;
int main()
{
int x;
cout << "enter some int values: \n";
return 0;
}
The first loop terminates when non-digit input is encountered. The input stream is set to stream
state of bad, and any later input attempt fails. By convention, all input functionality is disabled
with the stream state is set to other than a good state.
We sometimes need to be able to restore the i/o state to good. To do this, we must reset the i/o
state to goodbit and remove the offending characters. The iostream library has two member
functions to change the stream state explicitly. Here the prototype of the first one.
void clear(iostate state = goodbit);
This function sets the state of the stream to the argument, where the function has a default
argument of goodbit. Thus if you issue the call
istreamObject.clear();
then the state flags are set to goodbit.
A second function has prototype
void setstate(iostate addstate);
This function sets the state to whatever the state of the stream was prior to the call, (let’s call this
oldStreamState) to oldStreamState | addstate
In this expression, the | is the bitwise OR operator.
For example, you can reset the state of the input stream by:
#include <iostream>
using namespace std;
int main()
{
int x;
cout << "enter some int values: \n";
if(cin.bad())
{
cout << "unrecoverable error\n";
exit(1);
}
else if(cin.fail())
{
cout << "cin.fail() returned true.\n";
char discard;
cin.clear(); // We must clear the stream flags first.
cin >> discard; // The offending input is a char. It
// remains on input queue and must be
// removed and discarded before
// continuing with input operations
cout << "discarded bad char: " << discard << endl;
}
return 0;
}
4. Pitfalls
Dangers in Using functions defined in <cstring>. A danger in using functions from <cstring>
lies in using arrays of char without a null terminator as the preconditions prescribe. Each of the
functions strcpy, strcat and strcmp depend on the terminating null character for their action.
Improper use of these functions can result in crashed programs or even worse, security breaches.
In addition, the iostream output function expects there to be a terminating null character.
Otherwise, as we note elsewhere in this IRM, the output routine will run on until either we
encounter a null character or protected memory. The strncpy, strncat, and strncmp functions are
slightly safer as they take a third argument that specifies how many characters to process.
Not leaving enough space for the null terminator. Please be certain the student is aware of the
necessity for providing space for the null terminator when using getline as well as in other C++
string uses. A typical call might look like this
cin.getline(stringVariable, MAX_CHARACTERS + 1);
The getline member function overwrites the first argument (let's call it arg1) with the smaller of
strlen(arg1) characters (including the null terminator) and MAX_CHARACTERS. A null
character is always appended after the MAX_CHARACTERS number of characters. This
caution is expressed in the text’s getline sidebar. It bears emphasis.
Using = and == with C-Strings. Since c-strings are pointers to arrays of chars, = simply copies
the pointer and == compares pointer addresses. Instead the strcpy or strcmp routines should be
used and adequate space must exist in the target prior to use of strcpy.
Unexpected \n in Input. When reading input you must account for every character, including
the newline character. To clear the input stream of any leftover newlines, use the ignore
function.
Mixing cin >> variable; and getline. When using cin >> n, the newline is left in the input
stream. This is not a problem if all input is done this way, as the newline will be skipped on the
next input since all whitespace is skipped. However, if getline is used in combination with cin
>> n, then the getline call may read the newline and exit. Use the
cin.ignore(1000,’\n’) function to ignore any remaining newlines.
Input:
Get a 100 character line. The period is the terminating sentinel.
Process: scan the line, capitalizing the first letter in the string, looking for multiple blanks which
it eats, except for one, and newline == blank.
What do I do with sentences that are longer than my line width on my output device?
Answer: Scan for and replace new-lines with blanks before replacing multiple blanks with one
blank, and allow wraps on output.
Output:
The input is taken character by character. If a period (.) is encountered, the loop terminates.
Process:
First, any new-line characters are replaced by blanks.
Next, all characters are made lower case.
If we have a blank and haven't seen one since the last non-blank, set a variable, haveBlank.
Next if we see a blank and haveBlank is already set, cause overwriting the current blank with
the new one by backing up the index to the sentence array by one. Once we read a non-blank, if
the haveBlank note is set to true, turn it off.
These pieces, along with replacing the newlines with blanks, serve to compress each sequence of
blanks and newlines to one blank.
Finally, there is the matter of putting in a null character if the string is shorter than the limit, and
putting in a period and null if the string is too long.
Note:
An alternative solution is to use a loop around cin >> word, where word is a C-string
variable. This ignores blanks and newlines. Then process the words much as I have processed the
characters here. Then output the words with a single blank between.
#include <iostream>
#include <cstring>
#include <cctype>
int main()
{
using namespace std;
char sentence[100];
int size;
cout << "Enter one sentence. The period is the sentinel to quit."
<< endl;
getSentence(sentence, size);
cout << endl;
cout << sentence << endl;
}
ThE CounTRY.
Output:
Enter one sentence. The period is the sentinel to quit.
Now is the time for all good men to come to the aid of the country.
Define a word to be a string of letters delimited by white space (blank, newline, tab), a comma,
or a period. Assume that the input consists only of characters and these delimiters.
Output letters in alphabetic order and only output those letters that occur.
Algorithm development:
The word count is carried out with a state machine.
We enter with our state variable, inWord set to false, and our word count set to 0.
Output of the letter count is a loop running from 0-25, with an if statement that allows output
if the array entry isn't zero.
The word counting results of this agree with the Unix wc utility, and agree with hand counted
simple files.
#include <iostream>
#include <cctype>
int main()
{
using namespace std;
int inWord = false;
int wordCount = 0;
char ch;
char lowCase;
int charCount[26];
int i; //i is declared outside for to avoid warnings about
//differences between new and old binding rules for the
//for loop variable.
Problem Statement:
This program is supposed to accept first name, middle name or initial, then last name. Even if
the middle name is only an initial, an initial followed by a comma, the output should be
lastName, firstName, middleInitial. (period after middle
initial)
Notes:
This is easy if there is at least a middle initial. Using three variables,
char firstName[20], middleName[20], lastName[20];
This works well for most names. The input is:
cin >> firstName >> middleName >> lastName;
and for output:
cout << lastName << ", " << firstName
<< middleName[0] << '.' << endl;
If the middle name is missing, the program must detect this (strlen), copy the middle name
to the last name, and set the middle name to an empty string (or just arrange not to print the
middle name).
4 Text replacement
This program accepts a line of text. It replaces all four-letter words in the string with "love". If
the four-letter word is capitalized, the word love should be also. Only capitalize the first letter of
the word.
If the length is 4, call output(char firstLetter) that outputs Love or love depending
on whether the firstLetter is uppercase or lower case.
Caveat: This solution collapses all blanks to one blank, and requires that one must enter the end-
of-file character to terminate the input, rather than "accepting a line of text."
But it is a simple solution:
#include <iostream>
#include <cctype>
int main()
{
using namespace std;
char word[81];
while ( cin >> word )
{
if ( strlen(word) == 4 )
outputLove( word[0] );
else
cout << word;
cout << " ";
}
cout << endl;
}
A “Better” Solution
(Is it better? - It is certainly a more complicated solution. It does meet the requirements of the
problem exactly.)
The following solution accepts a line of input. It uses cin member getline to fetch a line of
input. A function, breakUp, separates the non-alphabetic and alphabetic substrings of the input
line into an array of strings. The main function then examines the strings one at a time and if
alphabetic and of length 4, prints 'love' or 'Love' depending on case of the string being
replaced. The main part prints the string intact if it is of length not equal to 4 or it is non-
alphabetic.
Note that there is a stub included. It was well worth the trouble to create it to properly debug the
main part of the program first. This allows confident debugging of the breakUp function.
// File: ch10prg5.cc
// Text Replacement
#include <iostream>
#include <cctype>
int main()
{
list[1][0] = 'n';
list[1][1] = 'o';
list[1][2] = 'w';
list[1][3] = '\0';
list[3][0] = 'i';
list[3][1] = 's';
list[3][2] = '\0';
list[5][0] = 'a';
list[5][1] = '\0';
list[7][0] = 'G';
list[7][1] = 'o';
list[7][2] = 'o';
list[7][3] = 'd';
list[7][4] = '\0';
list[9][0] = 't';
list[9][1] = 'i';
list[9][2] = 'm';
list[9][3] = 'e';
list[9][4] = '\0';
n = 11;
}
*/