You are on page 1of 9

BIGDATA

REGULAR EXPRESSION

Regular Expression or RegEx or RegExp

 A regular expression is a special text string for describing a search pattern


 It is used in search engines, search and replace dialogs of word processors and text
editors. Many programming languages provide regex capabilities, built-in, or
via libraries.

1 abc… Letters

2 123… Digits

3 \d Any Digit

4 \D Any Non-digit character

5 . Any Character

6 \. Period

7 [abc] Only a, b, or c

8 [^abc] Not a, b, nor c

9 [a-z] Characters a to z

10 [0-9] Numbers 0 to 9

11 \w Any Alphanumeric character

12 \W Any Non-alphanumeric character

13 {m} m Repetitions

14 {m,n} m to n Repetitions

15 * Zero or more repetitions

16 + One or more repetitions

17 ? Optional character

18 \s Any Whitespace

1 sairavi.bigdata@gmail.com
99520 29030
BIGDATA
19 \S Any Non-whitespace character

20 ^…$ Starts and ends

21 (…) Capture Group

22 (a(bc)) Capture Sub-group

23 (.*) Capture all

24 (abc|def) Matches abc or def

2 sairavi.bigdata@gmail.com
99520 29030
BIGDATA
 The Dot ( . )

The dot ( . ) represents any character.

 The ' .* ' matches zero or more of any character.

 The ' . ' Matches only a single character.

3 sairavi.bigdata@gmail.com
99520 29030
BIGDATA
 The ' .. ' matches only a two characters.

 Here the searching of 'th' characters.

 Range of digits

 Range of small alphabets

4 sairavi.bigdata@gmail.com
99520 29030
BIGDATA

 Range of capital alphabets

 Range of digits, small alphabets, capital alphabets

 Range with Repetitions {m}

5 sairavi.bigdata@gmail.com
99520 29030
BIGDATA
 Range with zero or more Repetitions ' * '

Negated set [^ ]

 Match any character that is not in the set.

Plus ( + )

 Matches 1 or more of the preceding token.

6 sairavi.bigdata@gmail.com
99520 29030
BIGDATA

 Alternation ( | ) and Optional ( ? )

Acts like a boolean OR. Matches the expression before or after the |.
Optional matches 0 or 1 of the preceding token, effectively making it optional.

 Optional ( ? )

 Quantifier {m,n}

Matches the specified quantity of the previous token. {1,2} will match 1 to 2.

7 sairavi.bigdata@gmail.com
99520 29030
BIGDATA
 Anchors ( ^ $ )

' ^ ' matches the beginning of the string, or the beginning of a line.
' $ ' matches the end of the string, or the end of a line.

 Regular expression for email id

 To detect the below string 200 6248

[0-9] -> range between 0-9


{3} -> 3 times digit
\s -> space character

8 sairavi.bigdata@gmail.com
99520 29030
BIGDATA
 To detect the below string:-

[0-9] -> range between 0-9


{3} -> 3 times digit
\. -> Escape Dot (.) Separator

 To detect the below string

[0-9] -> range between 0-9


{4} ->4 times digit
\[ -> Escape character ' [ '
\] -> Escape character ' ] '
\: -> Escape character ' : '
\W -> Escape special character

9 sairavi.bigdata@gmail.com
99520 29030

You might also like