You are on page 1of 3

Python Regular Expressions

A regular expression is a special sequence of characters that helps you match or find pattern
in a given string.

Regular Expressions patterns
^
$
.
\d
\D
\s
\S
\w
\W
*
*?
+
+?
[aeiou]
[^XYZ]
[a-z0-9]
(
)

Matches the beginning of a line
Matches the end of the line
Matches any character
Matches any digit
Matches any non-digit
Matches whitespace character
Matches any non-whitespace character
Matches any alphanumeric character
Matches any non-alphanumeric character
Repeats a character zero or more times (greedy)
Repeats a character zero or more times (non-greedy)
Repeats a character one or more times (greedy)
Repeats a character one or more times (non-greedy)
Matches a single character in the listed set
Matches a single character not in the listed set
The set of characters can include a range
Indicates where string extraction is to start
Indicates where string extraction is to end

Regular-expression Examples
Regular Expression
R1: python
R2: [pP]ython
R3: [aeiou]
R4: [^aeiou]
R5: (a|b)*
R6: 0001*
R7: 0001+
R8: 0001?
R9: (a|b)(ba)$
R10: ^(a)(a|b)*

Set of matched patterns
{python}
{python, Python}
{a, e, i, o, u}
U– {a, e, i , o, u}
Where U=Universal character set
{‘’, a, b, aa, ab, ba, bb, aaa, aab, … }
{000, 0001, 00011, 000111, …}
{0001, 00011, 000111, 0001111, …}
{000, 0001}
{aba,bba}
{a, aa, ab, aaa, aab, aba, abb, … }

Note: If you want a special regular expression character to just behave normally,
prefix it with '\'.
1

group() ## print the matched string Sample Program: Greedy Vs. To match simple email addresses: Pattern= \w+@(\w+\. string) ## If the search is successful. search becomes “non-greedy” as the search stop as soon as it finds the first matching character.+?. you must import the module using "import re" match = re. To match a dollar amount: Pattern= \$[0-9. 2 . and second it tries to use up as much of the string as possible. Note: match is just a variable print match.)+(com|org|net|edu) # . Non-Greedy + and * are said to be "greedy" because it first finds the leftmost match for the pattern. But if you add a ? at the end. such as .]+ # $(dollar character) preceded with a \ (backslash) Python in-built module: re Before you can use regular expressions in your program.search(pattern.*? or .Examples: 1.(dot character) preceded with \ (backslash) 2. search() returns a match object or None otherwise.

* found GATGCCATTGTCCCCCGGCCTCCTGC 3. string= ‘ACAAGATGCCATTGTCCCCCGGCCTCCTGC’ pattern= TC+ found TCCCCC 4. string = ’ABC’ pattern= [^A-Z]+? match not found 9. string = ‘ABC\tDEF’ pattern= [^A-Z] found (\t(tab) is matched) 6. string= ‘ACAAGATGCCATTGTCCCCCGGCCTCCTGC’ pattern= G. string = ‘ABC\\tDEF’ pattern= [^A-Z]+ found \t 7. string= ‘ACAAGATGCCATTGTCCCCCGGCCTCCTGC’ pattern= GATGC found GATGC 2. string= ‘ACAAGATGCCATTGTCCCCCGGCCTCCTGC’ pattern= ^A[\w\W]* found ACAAGATGCCATTGTCCCCCGGCCTCCTGC 5.Sample output: 1. string = ’ABC’ pattern= ([^A-Z]+)? found (+ is greedy but ? is non-greedy) 10. string = ‘aabababaaa’ pattern= (ba)+ ## or it can be written as (ba)(ba)* found bababa 3 . string ='ABCabcd' pattern= [^A-Z] found a 8.