You are on page 1of 20

Cosc 2150

String parsing in c++ with


regular expressions.

String parsing
One of main tasks that a program may
need to do is take a string and parse it to
determine the next step in the program.
Command line applications
Search applications (like bing and google).
Most network applications, send and receive
data as strings.
to many more to even begin to name.

How to parse
As with all things in c/c++ you can do it
any number of ways.
Develop a functions and algorithms to parse a
string up.
Use the methods functions in the string class
String parsing.

Use the sscanf functions


More like regular expressions.

Use the regex stl


Which is regular expressions.
+ Requires visual studio 2010 or gcc 4.3.0+

Reading a line of input.


The standard cin reads to a space and
then stops.
This is not always the functionality we want.

getline function: (2 methods)


cin.getline(c_str, 256)
Reads to end of line marks or number of characters,
which ever comes first.
Requires a c_str, instead of a string.
Example:
char stuff[256];
Cin.getline(stuff, 256);

But this is still not the method we want since it


requires c-strings.

Reading a line of input (2)


Getline second method, which is the
method we want to use, since it returns a
string.
Part of the string class
getline(cin, string)
Example:
string stuff;
getline(cin, stuff)

Regular Expressions
Regex for short.
Likely the most powerful way to do any string
processing.

Use:
Create a pattern that you want to match with
Run the match
If returns true, then the string matched the
pattern
Also can get all the matches into an array to use as well.

Problem:
Regex patterns can be very complex and we dont
have to time (about 6 lectures) to cover the entire
regex set. This will only cover the very basics.

Code for regex


Include the regex stl, which is part of tr1
#include <regex>
Define the pattern
Note pattern is a variable!

std::tr1::regex pattern ( string pattern);


object that will contain the sequence of
sub-matches (optional)
std::tr1::match_results<std::string::const_iterator>
result;

code for regex


regex_match to match the full string
If (std::tr1::regex_match(string, result,
pattern))
If true there was a match
if capturing matches, result should have
matches. if result.size() >0 or if !
result.empty()

regex_search to match any part of a string


If (std::tr1::regex_search(string, result,
pattern))
same as match, with result.

pattern
assume we are using regex_search unless other
noted.

Matching text
regex ex1(ello); //matches anything with ello
//such as hello world

alternation
regex ex2(Fred|Wilma|Pebbles);
//true if string contains Fred, Wilma, or Pebbles

alternation and grouping


regex ex3((p|g|m|s|b)et);
//true if contains contains: pet, get, met, set, or
bet
//note () are also used to capture the match

pattern (2)
single character ord matching, using []
regex ex4([0-9]); //match a single digit
note the dash is a range operator ie 0 to 9

regex ex5([a-zA-Z0-9.]);
match any one character a through z or 0 to 9 or
the period

match quantifiers
+ 1 or more times
? zero or 1 time
* zero or more times
regex ex6([0-9]+); //find 1 or more digits
regex ex7([a-z]*); //find zero or more
characters

pattern (3)
matching quantifiers {}
{min number, max number}
regex ex8([0-9]{1,3});
find 1 to 3 digits

regex ex9(fo*ba?r{1,2});
matches f, 0 or more o's, b, 0 or 1 a, then 1 or
2 rs
match: fobar, fbr, fbrr, fooobr, fooobarr, etc

pattern (4)
metasymbols
Match any thing using the period
regex ex10(.+); //find 1 or more ascii character
ie 123, atr, \t there all match
+ unless the string is empty, this will match.

\d match a Digit

[0-9]

regex ex11(\\d+); //match 1 or more digits

\D match a Non-digit
[^0-9]
\s match whitespace
[ \t\n\r\f]
\S match a Non-whitespace
[^ \t\n\r\f]
\w match a Word character
[a-zA-Z0-9_]
regex ex12(\\w+); //match 1 more word character

\W match a Non word Character


9_]

[^a-zA-Z0-

pattern (5)
capturing the matches
use the () around the part you want to capture
regex ex13((\\w+));
find 1 or more word characters and capture the
resulting match

regex ex14((\\w+)\s+(\\w+));
find 1 or more word characters, then white space,
then 1 or more word characters. Capture the word
character matches
example: hi there
result[1]=hi, result[2]=there

regex ex15((\\d+) (.*));


What does this capture? How much this be useful
with the regex_match?

Examples
tr1::regex pattern1("(\\d+) (.*)")

tr1::regex pattern2("load M\\((\\d+)\\)");

tr1::regex_match(str,result,pattern1);
result[1] =

result=[2]=

tr1::regex_match(str,result,pattern2);
result[1]=

Regex reference
http://www.codeguru.com/cpp/cpp/cpp_mfc/
stl/article.php/c15339
http://www.codeproject.com/KB/string/TR
1Regex.aspx
Patterns http://msdn.microsoft.com/enus/library/bb982727.aspx

Converting strings to integers (1)


Can use the sscanf function:
#include <cstdlib>
#include <cstdio>
int GetIntVal2(string strConvert) {
int intReturn =0;
//if sscanf fails, because no digits, intReturn is already set to zero.
sscanf(strConvert.c_str(),"%d",&intReturn);
return (intReturn);
}

Converting strings to integers (2)


Use the atoi method
#include
#include
#include
#include

<cstdlib>
<cstdio>
<iostream>
<string>

int GetIntVal(string strConvert) {


int intReturn;
// NOTE: You should probably do some checks to ensure that
// this string contains only numbers. If the string is not
// a valid integer, zero will be returned.
intReturn = atoi(strConvert.c_str());
return(intReturn);
}

Converting integers to strings


Uses the ostringstream (in the <sstream>)
Put the integer into the stream, then put it
back out as string.
#include <sstream>
#include <iostream>
string GetStrVal(int intConvert) {
ostringstream cstr; //create the stream
cstr << intConvert; //put integer into the
stream
return cstr.str(); //put out the string
}

Converting string to integer example:


int main() {
string str, str2;
str = "12";
str2 = "1d2";
cout <<"aoti method str: "<<GetIntVal(str)<<endl;
// prints out 12
cout <<"aoti method str2: "<<GetIntVal(str2)<<endl;
//prints out 1
cout <<"sscanf method str:
"<<GetIntVal2(str)<<endl;
// prints out 12
cout <<"sscanf method str2:
"<<GetIntVal2(str2)<<endl;
//prints out 1
return 0;
}

Q&A

You might also like