You are on page 1of 30

CSC 416

Lecture 2 Regular expression R l i and Regular language

Regular Expressions
A RE is a language for describing simple languages g g g p g g and patterns. Algorithm for using RE 1. Define a pattern: S1StudentnameS2 2. Loop p Find next pattern Store StudentName into DB or encrypt StudentName yp Until no match REs are used in applications that involve file p pp parsing g and text matching. Many Implementations have been made for RE.
Dr.Hussien Sharaf 2

RE by C++ B b Boost lib t library


//Pattern matching in a String
// Created by Flavio Castelli <flavio.castelli@gmail.com> // distrubuted under GPL v2 license

#include <boost/regex.hpp> Please check #include <string> g [http://flavio.castelli.name/regexp-with-boost] int main() { boost::regex pattern ( bg|olug boost::regex constants::icase| ("bg|olug",boost::regex_constants::icase| boost::regex_constants::perl); std::string stringa ("Searching for BsLug"); if (boost::regex_search (stringa, pattern, boost::regex_constants::format_perl)) printf ("found\n"); else printf("not found\n"); return 0; }
Dr.Hussien Sharaf 3

//Substitution

RE by C++ B b Boost lib t library

// Created by Flavio Castelli <flavio.castelli@gmail.com> // distrubuted under GPL v2 license

#include <boost/regex.hpp> #include <string> int main() Please check [http://flavio.castelli.name/regexp-with-boost] {boost::regex pattern ("bok",boost::regex_constants::icase| boost::regex_constants::perl); b t t t l) std::string stringa ("Searching for bok"); std::string replace ( book ); (book"); std::string newString; newString = boost::regex_replace (stringa, pattern, replace); printf("The new string is: |%s|\n newString c str()); |%s|\n",newString.c_str()); return 0; }
Dr.Hussien Sharaf 4

RE b C++ STL by

needs extra feature pack

#include <regex> g #include <iostream> #include <string> string bool is_email_valid(const std::string& email) {
// define a regular expression

const std::tr1::regex pattern("(\\w+)(\\.|_)?(\\w*)@(\\w+)(\\.(\\w+))+");

// try to match the string with the regular expression h h i ih h l i return std::tr1::regex_match(email, pattern); }
Please check http://www.codeguru.com/cpp/cpp/cpp_mfc/stl/article.php/c15339 htt // d / / / f / tl/ ti l h / 15339
Dr.Hussien Sharaf 5

RE b C++ STL by
#include <regex> g #include <iostream> #include <string> string bool is_email_valid(const std::string& email) {
// define a regular expression

const std::tr1::regex pattern("(\\w+)(\\.|_)?(\\w*)@(\\w+)(\\.(\\w+))+");

// try to match the string with the regular expression h h i ih h l i return std::tr1::regex_match(email, pattern); }
Please check http://www.codeguru.com/cpp/cpp/cpp_mfc/stl/article.php/c15339 htt // d / / / f / tl/ ti l h / 15339
Dr.Hussien Sharaf 6

RE b C++ STL by
std::string str("abc + (inside brackets)dfsd"); g ( ( ) ); std::smatch m; std::regex_search(str, std::regex search(str, m, std::regex(b")); std::regex( b )); if (m[0].matched) std::cout << "Found " << m.str(0) << '\n'; Found \n ; else std::cout << "Not found\n; Not found\n ;

Please check http://www.codeguru.com/cpp/cpp/cpp_mfc/stl/article.php/c15339 htt // d / / / f / tl/ ti l h / 15339


Dr.Hussien Sharaf 7

Other code samples


Other samples can be check at:
1. 2. 3.

1. 2. 3. 3 4.

Boost Library http://stackoverflow.com/questions/5804453/c-regularexpressions-with-boost-regex expressions with boost regex http://www.codeproject.com/KB/string/regex__.aspx http://www.boost.org/doc/libs/1_43_0/more/getting_started/w indows.html#build-from-the-visual-studio-ide STL Library http://www.codeproject.com/KB/recipes/rexsearch.aspx http://www.codeguru.com/cpp/cpp/cpp_mfc/stl/article.php/c1 5339 [recommended] http://www.microsoft.com/download/en/confirmation.aspx?id= http://www microsoft com/download/en/confirmation aspx?id= 6922 [feature pack VS2008 Pack] http://channel9.msdn.com/Shows/Going+Deep/C9-LecturesStephan-T-Lavavej-Standard-Template-Library-STL-8-of-n Stephan T Lavavej Standard Template Library STL 8 of n [Video]
8

Dr.Hussien Sharaf

RE Rules -1
1. 1 Let ={x}, Then L = * ={x}

In RE, we write x* 2. Let 2 L t ={x, y}, Then L = * { } Th In RE, we write (x+y)* 3. Kleenes star * means any combination of letters of length zero or more.

Dr.Hussien Sharaf

RE Rules -2
Given an alphabet the set of regular expressions is defined , by the following rules. 1. For every letter in , the letter written in bold is a regular expression. is a regular expression. 2. If r1 and r2 are regular expressions, then so are:
(r ( 1) 2. r1 r2 3. r1+r2 4. r1* NOTE: r1+ is not a RE
1.

3.

Nothing else is a regular expression.

Dr.Hussien Sharaf

10

RE-1
Example 1 ={a, b} - F Formally describe all words with a followed b any ll d ib ll d ith f ll d by number of bs L = a b* = ab* b - Give examples for words in L {a ab abb abbb ..}

Dr.Hussien Sharaf

11

RE-2
Example 2 ={a, b} - F Formally d ll describe th l ib the language th t contains that t i nothing and contains words where any a must be followed by one or more bs bs L = (abb*)* - Gi examples f words in L Give l for d i { ab abb abababb ..}

Dr.Hussien Sharaf

12

RE-3
Example ={a, b} - F Formally describe all words with a followed b one ll d ib ll d ith f ll d by or more bs L = a b+ = abb* bb - Give examples for words in L {ab abb abbb ..}

Dr.Hussien Sharaf

13

RE-4
Example ={a, b, c} - F Formally describe all words th t start with an a ll d ib ll d that t t ith followed by any number of bs and then end with c. L = a b*c - Give examples for words in L {ac abc abbc abbbc ..}

Dr.Hussien Sharaf

14

RE-5
Example ={a, b} - F Formally describe all words where as if any come ll d ib ll d h before bs if any. L = a* b* - Give examples for words in L { a b aa ab bb aaa abb abbb bbb..} NOTE: a* b* (ab)*
because first language does not contain abab but second language has. Once single b is detected then no as can be added as
Dr.Hussien Sharaf 15

RE-6
Example ={a} - F Formally describe all words where count of a i odd. ll d ib ll d h t f is dd L = a(aa)* OR (aa)*a - Give examples for words in L {a aaa aaaaa ..}

Dr.Hussien Sharaf

16

RE-7.1
Example ={a, b, c} - F Formally describe all words where single a or c ll d ib ll d h i l comes in the start then odd number of bs. L = (a+c)b(bb)* ( + )b(bb) - Give examples for words in L {ab cb abbb cbbb ..}

Dr.Hussien Sharaf

17

RE-7.2
Example ={a, b, c} - F Formally describe all words where single a or c ll d ib ll d h i l comes in the start then odd number of bs in case of a and zero or even number of bs in case of c . bs L = ab(bb)* +c(bb)* - Gi examples f words in L Give l for d i {ab c abbb cbb abbbbb ..}

Dr.Hussien Sharaf

18

RE-8
Example ={a, b, c} - F Formally describe all words where one or more a or ll d ib ll d h one or more c comes in the start then one or more b s. bs L = (a+c) + b+= (aa*+cc*) bb* - Gi examples f words in L Give l for d i {ab cb aabb cbbb ..}

Dr.Hussien Sharaf

19

RE-9
Example ={a, b} - F Formally d ll describe all words with length three. ib ll d ith l th th L = (a+b) 3 =(a+b) (a+b) (a+b) - List all words in L {aaa aab aba baa abb bab bba bbb} - What is the count of words of length 4? 16 = 24 - What is the count of words of length 44? 244
Dr.Hussien Sharaf 20

Dr.Hussien Sharaf

21

RE-10.1
Example ={a, b}, What does L describe? - L = (a+b)* a(a+b) * ( +b) ( +b)
Any An string of a's and b's Single a Any string of a's and b's

- Give examples for words in L {a ab aab bab abb ..}

Dr.Hussien Sharaf

22

RE-10.2
abbaab can parsed in 3 ways

a bbaab
a bbaab

abb a ab
abb a ab

abba a b
abba a b

Dr.Hussien Sharaf

23

RE-11 RE 11
Example ={a, b} - F Formally describe all words with at l t t ll d ib ll d ith t least two a's. ' 1) L = b*ab*a(a + b)* Start with a jungle of b's (or no b's) until we find the first a, then more b's (or no b's), then the ( ) second a, then we finish up with anything. - Give examples for words in L {abbbabb aaaaa bbbabbbbabab..}
Dr.Hussien Sharaf 24

RE-12 RE 12
Example ={a, b} - F Formally d ll describe all words with exactly two a's. ib ll d ith tl t ' 1) L = b*ab*ab* - Give examples for words in L {aab, baba, and bbbabbbab ..} To make the word aab, we let the first and second b* become and the last becomes b

Dr.Hussien Sharaf

25

RE-13.1
Example ={a, b}
- F Formally describe all words with l ll d ib ll d i h least one a and least dl

one b. 1) L = (a + b)*a(a + b)* b(a + b)* ( y g) ( y g) ( y g) = (anything) a(anything) b(anything) But (a+b)*a(a+b)*b(a+b)* expresses all words except words of the form some bs (at least one) followed by bs some as (at least one). bb*aa*
Dr.Hussien Sharaf 26

RE-13.2
2) L =(a+b)*a(a+b)*b(a+b)* + bb aa (a+b) a(a+b) b(a+b) bb*aa* Thus: (a+b)*a(a+b)*b(a+b)* + (a+b)*b(a+b)*a(a+b)* = (a+b)*a(a+b)*b(a+b)* + bb*aa*

Notice that it is necessary to write bb*aa* because ba b*a* will admit words we do not want such as aaa want, aaa. Does this imply that D thi i l th t

(a+b)*b(a+b)*a(a+b)*= bb*aa*??
False Left side includes the word aba, which the expression on the right side does not. p g
Dr.Hussien Sharaf 27

Regular L R l Languages 1
A language that can be defined by a RE is called a regular language. RE can not express some languages such as a t l h
1 2 1 2 1

bn .

1. If L1 and L2 are regular languages then L + L , L L and L * are also regular languages. 2. If L is a regular language then L (L complement) g g g ( p )

is also a regular language.

L L : is the language that is not accepted by r where r is the RE that accepts language L. Note: (L) =L and (r) =r (L ) L (r ) r
Dr.Hussien Sharaf 28

Regular Languages 2
3. 3 If L1 and L2 are regular languages then
L1 L2 is also a regular language.

L1 =words starting with a d h


r1 =a(a+b)* ( )

L2 =words ending with a


r2 = (a+b)*a

L3 =L1 L2 = a(a+b)*a
L3 = words that start and end with a.
Dr.Hussien Sharaf 29

Dr.Hussien Sharaf

30