You are on page 1of 2

REGEX

Basic Import and compile

import re
pattern = re.compile("hello")
# with flags [ here I is ignore case ]
pattern = re.compile("hello", flags=re.I)

python functions
match
Determine if the RE matches at the beginning of the string
search
Scan through a string, looking for any location where this RE matches
findall
Find all substrings where the RE matches, and returns them as a list
finditer
Find all substrings where the RE matches, and returns them as an iterator.

RE metacharacters
There are twelve metacharacters that should be escaped if they are to be used with their literal meaning:

Backslash \ (Escape)
Caret ^ (Means not when used inside a character class. Means the first non-print character when used outside it)
Dollar sign $ (Last non priting character)
Dot . (Any character)
Pipe symbol | (Or)
Question mark ? (1: One or more 2: Lazy 3: Lookup)
Asterisk * (Zero or more)
Plus sign + (One or more)
Opening parenthesis ( (Group Start)
Closing parenthesis ) (Group End)
Opening square bracket [ (Start of character class)
The opening curly brace { (Start of quantifier range)

Character Classes shorthands

Element Description

. This element matches any character except newline

\d This matches any decimal digit; this is equivalent to the class [0-9]

\D This matches any non-digit character; this is equivalent to the class [^0-9]

\s This matches any whitespace character; this is equivalent to the class [ \t\n\r\f\v]

\S This matches any non-whitespace character; this is equivalent to the class [^ \t\n\r\f\v]

\w This matches any alphanumeric character; this is equivalent to the class [a-zA-Z0-9_]

This matches any non-alphanumeric character; this is equivalent to the class [^a-zA-Z0-
\W
9_]

Alteration
| the pipe symbol is used as or to match one or more known RE

Quantifiers
Quantification of previous
Symbol Name
character

? Question Mark Optional (0 or 1 repetitions)

* Asterisk Zero or more times

+ Plus Sign One or more times

{n,m} Curly Braces Between n and m times

Syntax Description

{n} The previous character is repeated exactly n times.

{n,} The previous character is repeated at least n times.

{,n} The previous character is repeated at most n times.

The previous character is repeated between n and m times (both


{n,m}
inclusive).

Boundary Matchers

Matcher Description

^ Matches at the beginning of a line

$ Matches at the end of a line

\b Matches a word boundary

Matches the opposite of \b. Anything that is not a word


\B
boundary

\A Matches the beginning of the input

\Z Matches the end of the input

Greedy and non Greedy


The non-greedy (or reluctant) behaviour can be requested by adding an extra question mark to the quantifier.

Example, ?? , *? or +? .

You might also like