Professional Documents
Culture Documents
String parsing
One of main tasks that a program may
need to do is take a string and parse it to
determine the next step in the program.
Command line applications
Search applications (like bing and google).
Most network applications, send and receive
data as strings.
to many more to even begin to name.
How to parse
As with all things in c/c++ you can do it
any number of ways.
Develop a functions and algorithms to parse a
string up.
Use the methods functions in the string class
String parsing.
Regular Expressions
Regex for short.
Likely the most powerful way to do any string
processing.
Use:
Create a pattern that you want to match with
Run the match
If returns true, then the string matched the
pattern
Also can get all the matches into an array to use as well.
Problem:
Regex patterns can be very complex and we dont
have to time (about 6 lectures) to cover the entire
regex set. This will only cover the very basics.
pattern
assume we are using regex_search unless other
noted.
Matching text
regex ex1(ello); //matches anything with ello
//such as hello world
alternation
regex ex2(Fred|Wilma|Pebbles);
//true if string contains Fred, Wilma, or Pebbles
pattern (2)
single character ord matching, using []
regex ex4([0-9]); //match a single digit
note the dash is a range operator ie 0 to 9
regex ex5([a-zA-Z0-9.]);
match any one character a through z or 0 to 9 or
the period
match quantifiers
+ 1 or more times
? zero or 1 time
* zero or more times
regex ex6([0-9]+); //find 1 or more digits
regex ex7([a-z]*); //find zero or more
characters
pattern (3)
matching quantifiers {}
{min number, max number}
regex ex8([0-9]{1,3});
find 1 to 3 digits
regex ex9(fo*ba?r{1,2});
matches f, 0 or more o's, b, 0 or 1 a, then 1 or
2 rs
match: fobar, fbr, fbrr, fooobr, fooobarr, etc
pattern (4)
metasymbols
Match any thing using the period
regex ex10(.+); //find 1 or more ascii character
ie 123, atr, \t there all match
+ unless the string is empty, this will match.
\d match a Digit
[0-9]
\D match a Non-digit
[^0-9]
\s match whitespace
[ \t\n\r\f]
\S match a Non-whitespace
[^ \t\n\r\f]
\w match a Word character
[a-zA-Z0-9_]
regex ex12(\\w+); //match 1 more word character
[^a-zA-Z0-
pattern (5)
capturing the matches
use the () around the part you want to capture
regex ex13((\\w+));
find 1 or more word characters and capture the
resulting match
regex ex14((\\w+)\s+(\\w+));
find 1 or more word characters, then white space,
then 1 or more word characters. Capture the word
character matches
example: hi there
result[1]=hi, result[2]=there
Examples
tr1::regex pattern1("(\\d+) (.*)")
tr1::regex_match(str,result,pattern1);
result[1] =
result=[2]=
tr1::regex_match(str,result,pattern2);
result[1]=
Regex reference
http://www.codeguru.com/cpp/cpp/cpp_mfc/
stl/article.php/c15339
http://www.codeproject.com/KB/string/TR
1Regex.aspx
Patterns http://msdn.microsoft.com/enus/library/bb982727.aspx
<cstdlib>
<cstdio>
<iostream>
<string>
Q&A