You are on page 1of 25

Regular Expressions

• Introduction

• Special Symbols and Characters

• REs and Python


Introduction
• Regular expressions (REs) provide an infrastructure for advanced text
pattern matching, extraction, and/or search-and-replace functionality.

• REs are simply strings that use special symbols and characters to
indicate pattern repetition or to represent multiple characters so that
they can "match" a set of strings with similar characteristics described
by the pattern.

• Python supports REs through the standard library re module.


Special Symbols and Characters
Matching More Than One RE Pattern with
Alternation ( | )
• Examples:
Matching Any Single Character ( .)

• Examples
Matching from the Beginning or End of
Strings or Word Boundaries ( ^/$ / \b / \B)
• Examples
Creating Character Classes ( [ ] )
Denoting Ranges ( - ) and Negation ( ^ )
Multiple Occurrence/Repetition Using Closure Operators ( *, +, ?, { } )

• Examples
Special Characters Representing Character Sets
Designating Groups with Parentheses ( ( ) )
REs and Python
re Module: Core Functions and Methods
Matching Strings with match()
• EX 1:
• import re
• m = re.match('foo', 'foo') # pattern matches string
• if m is not None: # show match if successful
• print(m.group())
• print(m)
• Ex 2:
• import re
• m = re.match('foo', 'food on the table') # match succeeds
• print(m.group())
Looking for a Pattern within a String with
search() (Searching versus Matching)
• Ex 1:
• import re
• m = re.match('foo', 'seafood') # no match
• if m is not None: print(m.group())
• Ex 2:
• m1 = re.search('foo', 'seafood') # use search() instead
• if m1 is not None: print(m1.group())
Matching More than One String ( | )
• import re
• bt = 'bat|bet|bit' # RE pattern: bat, bet, bit
• m = re.match(bt, 'bat') # 'bat' is a match
• if m is not None: print(m.group())
• m1 = re.match(bt, 'blt') # no match for 'blt'
• if m1 is not None: print(m1.group())
• m2 = re.match(bt, 'He bit me!') # does not match string
• if m2 is not None: print(m2.group())
• m4 = re.search(bt, 'He bit me!') # found 'bit' via search
• if m4 is not None: print(m4.group())
Matching Any Single Character ( . )
• import re
• anyend = '.end'
• m1 = re.match(anyend, 'bend') # dot matches 'b'
• if m1 is not None: print(m1.group())
• m2 = re.match(anyend, 'end') # no char to match
• if m2 is not None: print(m2.group())
• m3 = re.match(anyend, '\nend') # any char except \n
• if m3 is not None: print(m3.group())
• m4 = re.search('.end', 'The end.') # matches ' ' in search
• if m4 is not None: print(m4.group())
• import re
• patt314 = '3.14' # RE dot
• pi_patt = '3\.14' # literal dot (dec. point)
• m1 = re.match(pi_patt, '3.14') # exact match
• if m1 is not None: print(m1.group())
• m2 = re.match(patt314, '3014') # dot matches '0'
• if m2 is not None: print(m2.group())
• m3 = re.match(patt314, '3.14') # dot matches '.'
• if m3 is not None: print(m3.group())
Creating Character Classes ( [ ] )
• import re
• m1 = re.match('[cr][23][dp][o2]', 'c3po') # matches 'c3po'
• if m1 is not None: print(m1.group())
• m2 = re.match('[cr][23][dp][o2]', 'c2do') # matches 'c2do'
• if m2 is not None: print(m2.group())
• m3 = re.match('r2d2|c3po', 'c2do') # does not match 'c2do'
• if m3 is not None: print(m.group())
• m4 = re.match('r2d2|c3po', 'r2d2') # matches 'r2d2'
• if m4 is not None: print(m4.group())
Finding Every Occurrence with findall()
• import re
• l1=[]
• l1=re.findall('car', 'car')
• print(l1)
• l2=[]
• l2=re.findall('car', 'scary')
• print(l2)
• l3=[]
• l3=re.findall('car', 'carry the barcardi to the car')
• print(l3)
Matching from the Beginning and End of
Strings and on Word Boundaries
• import re
• m1 = re.search('^The', 'The end.') # match
• if m1 is not None: print(m1.group())
• m2 = re.search('^The', 'end. The') # not at beginning
• if m2 is not None: print(m2.group())
• m3 = re.search(r'\bthe', 'bite the dog') # at a boundary
• if m3 is not None: print(m3.group())
• m4 = re.search(r'\bthe', 'bitethe dog') # no boundary
• if m4 is not None: print(m4.group())
• m5 = re.search(r'\Bthe', 'bitethe dog') # no boundary
• if m5 is not None: print(m5.group())
Splitting (on Delimiting Pattern) with
split()
• import re
• print(re.split(':', 'str1:str2:str3'))

• string = "Hello, there!Welcome to educative"


• pattern = r"[, !]" # Split on comma, space, or exclamation mark

• result = re.split(pattern, string)


• print(result)

You might also like