0% found this document useful (0 votes)

87 views28 pages

Unit - 4 Regex

The document provides an overview of regular expressions (regex) in Python, detailing their definition, applications, and how to create them using the re module. It covers various functions such as re.match(), re.search(), re.findall(), and re.finditer(), explaining their purposes and differences. Additionally, it discusses character classes, predefined character classes, and regex metacharacters, along with practical examples of usage.

Uploaded by

SHADOW GAMING

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

87 views28 pages

Unit - 4 Regex

Uploaded by

SHADOW GAMING

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

REGULAR EXPRESSIONS AND OOP CONCEPTS

UNIT – 4 | I SEM | MCA |2024-26 BATCH | RIT

REGEX IN PYTHON

 What is Regular expression?

 Applications of regular expressions
 How to create Regular expressions in Python?
REGULAR EXPRESSION

 A Regular Expression (regex) is a sequence of characters that defines a search

pattern.

 It is commonly used for string matching, searching, and replacing text.

 It is a code or way of describing what kind of text is being looked for in a bigger
chunk of text.

 Python provides the re module to work with regular expressions.

APPLICATIONS OF REGULAR EXPRESSIONS

 Data validations
 Ex: mobile number validation, email validation, etc
 Data extraction
 Specific info from data can be extracted
 Data cleaning, web scrapping
 Functionalities of
 ctrl+f and replace, grep commands (UNIX), LIKE operator in SQL
 To create translators – compilers, interpreters, assemblers
 For syntax analysis and lexical analysis
 Password Policies
 Used in NLP to identify specific patterns in data.
BASIC SEARCH FUNCTIONS

 search()
 match()
 finditer()
 findall()
re.match()

 Purpose: search for a pattern at the beginning of a string.

 Syntax: re.match (pattern, string, flags = 0)
 pattern: The regular expression pattern you want to search for
 string: input string in which you want to search for pattern
 Returns: if a match is found at the beginning of the string, it returns a
match object; otherwise it returns None.
Using the re Module in Python

Python’s re module provides powerful tools for regex operations.

re.match() – Matches the Beginning of a String. It only checks the start of the string.

import re .group() → Returns the actual match.

.span() → Returns the start and end positions
pattern = r"Hello" of the match.
text = "Hello, world!"

match = re.match(pattern, text) if match:

if match: print("Matched text:", match.group())
print("Match found!") # Returns matched text ("Hello")
else: print("Start and End positions:",
print("No match") match.span())
# Output: Match found! # Returns (0, 5)
What is a Raw String (r"")?

• In Python, the r before a string (like r"^\d$") makes it a raw string literal.

• In a normal string, backslashes (\) are treated as escape characters

(e.g., "\n" for a newline, "\t" for a tab).

• A raw string (r"") tells Python not to interpret backslashes as escape

sequences.

• In regex, we often use \d, \s, \b, etc., where \ has a special meaning.
Using r"" prevents Python from treating \ as an escape character.

• Always use r"" for regex patterns to avoid unexpected errors.

re.search()

 Purpose: The search() function in the re module scans a string for the
first occurrence of a pattern.
 Syntax: re.search (pattern, data)
 pattern: The regular expression pattern you want to search for
 data: input string in which you want to search for pattern
 Returns: match object if match is found or None if no match found
re.search() – Finds the First Match Anywhere

Unlike match(), search() checks the entire string.

import re

pattern = r"world"
text = "Hello, world!"

match = re.search(pattern, text)

if match:
print("Match found!")

# Output: Match found!

In Python, you can use regular expressions in two ways:
1. Directly as a string pattern

You pass a raw string directly to functions like re.search(), re.match(), etc.

2. Using Regular Expression Objects

You first compile the pattern using re.compile(), creating a reusable regex
object. This is useful for repeated searches.
Using Regular Expression Objects

import re

pattern = re.compile(r"World") # Compile the regex pattern

text = "Hello, World!"

match = pattern.search(text) # Using the compiled object

if match:
print("Found:", match.group())
re.finditer()

Purpose: re.finditer() returns an iterator yielding match objects for all

non-overlapping occurrences of a pattern in a string.
 Syntax: re.finditer (pattern, data, flags = 0)
 pattern: The regular expression pattern you want to search for
 data: input string in which you want to search for pattern
 Returns: iterator object containing match info.
re.finditer() – Returns Matches as an Iterator

import re import re

pattern = r“Hello" pattern = re.compile('ab', re.IGNORECASE)

text = "Hello, world!" data = 'abaababa'
match_iter = re.finditer(pattern, data)
matches = re.finditer(pattern, text) count = 0
for match in matches: for match in match_iter:
print(match.group()) count += 1
print(f"start:{match.start()},
# Output: Hello, world end:{match.end()}, element:{match.group()}")
print("total:", count)

Useful when handling large data, as it yields results lazily.

re.findall()

Purpose: re.findall() returns a list of all non-overlapping matches of a

pattern in a string.
 Syntax: re.findall (pattern, data, flags = 0)
 pattern: The regular expression pattern you want to search for
 data: input string in which you want to search for pattern
 Returns: A list containing all matching substrings
re.findall() – Returns All Matches in a List

import re

pattern = r“[0-9]” # Find all numbers

text = "My number is 123 and my friend's is 456"

matches = re.findall(pattern, text)

print(matches) # Output: ['1', '2', '3', '4', '5', '6']

import re

pattern = re.compile('ab', re.IGNORECASE)

data = ‘abaababa’

match_list = re.findall(pattern, data)

print(match_list) # Output: ['ab', 'ab', 'ab']
DIFFERENCE BETWEEN findall() AND finditer()

Both re.findall() and re.finditer() are used to search for all occurrences of a
pattern in a string, but they differ in how they return results.

Feature re.findall() re.finditer()

Return Type Returns a list of matching Returns an iterator yielding

substrings. match objects.
Memory Usage Stores all matches in a list Uses an iterator (more memory-
(higher memory usage for efficient).
large data).

Accessing Match Info Only returns matched Provides full match details
substrings, no details like (start, end, groups).
position.
Use Case When only matched strings When additional match details
are needed. (index, groups) are needed.
Understanding Non-Overlapping Matches in re.findall() and finditer()

In re.findall() and finditer(), matches are non-overlapping, meaning

once a match is found, the search continues after the match, rather
than inside it.

import re

data = "ababab"
matches = re.findall(r"aba", data)
print(matches)

#Output: ['aba']
CHARACTER CLASS IN PYTHON REGEX

 A character class typically refers to a set of characters that you can

define using regular expressions
 Character classes are used to specify range or group of characters you
want to search in data
 These classes help in defining flexible patterns for text searching and
validation.
Character Classes in Python Regex

Square Brackets [ ]
•Used to define a set of characters.
•Example: [abc] matches 'a', 'b', or 'c'.
Range of Characters
•[a-z] → Matches any lowercase letter (a to z).
•[A-Z] → Matches any uppercase letter (A to Z).
•[0-9] → Matches any digit (0 to 9).
Negation [^ ] (Caret Inside Brackets)
•Matches anything except the characters inside the brackets.
•Example: [^0-9] matches anything except digits.
Predefined Character Classes
•\d → Matches any digit (equivalent to [0-9]).
•\D → Matches any non-digit character (equivalent to [^0-9]).
•\w → Matches any word character (letters, digits, underscore) [a-zA-Z0-9_].
•\W → Matches any non-word character (opposite of \w).
•\s → Matches any whitespace character (space, tab, newline).
•\S → Matches any non-whitespace character.
Special Character Classes
•[aeiou] → Matches any vowel.
•[13579] → Matches any odd digit.
•[02468] → Matches any even digit.
Regex Meaning Example Matches Does Not Match
Pattern
\b Word boundary (start or \bcat\b "The cat is here" → "caterpillar",
end of a word) "cat" "wildcat" →

\A Matches only at the start of \AHello "Hello world" → "world Hello" →

a string

\Z Matches only at the end of tutorial\Z "This is a tutorial" → "tutorial on regex"

a string →

. Matches every character

Find digits in given data

import re import re

pattern = r'[0-9]' pattern = r'[0-9]'

data = "The price is $." data = "The price is $100."

match_list = re.findall(pattern, data) match_iter = re.finditer(pattern, data)

if match_list: for match in match_iter:

print("digits present") print(match)
else:
print("not present")
Table 1: Basic Regex Metacharacters

Symbol Description
. Matches any character except a newline
^ Matches the start of a string
$ Matches the end of a string
Matches 0 or more occurrences of the preceding
*
character
Matches 1 or more occurrences of the preceding
+
character
? Matches 0 or 1 occurrence of the preceding character
{n} Matches exactly n occurrences
{n,} Matches n or more occurrences
{n,m} Matches between n and m occurrences
\ Escape character (e.g., \. matches a literal dot .)
Table 2: Character Classes and Groups

Pattern Description
\d Matches any digit (0-9)
\D Matches any non-digit character
\w Matches any word character (a-z, A-Z, 0-9, _)
\W Matches any non-word character
\s Matches any whitespace (space, tab, newline)
\S Matches any non-whitespace character
[abc] Matches any one of a, b, or c
[^abc] Matches anything except a, b, or c
Matches word boundaries (e.g., \bword\b
\b
matches the word "word" exactly)
re.sub() – Replaces Text in a String

import re

text = "Python is fun!"

new_text = re.sub(r"Python", "Java", text)

print(new_text)

# Output: Java is fun!

https://regexr.com/

https://www.kaggle.com/code/albeffe/regex-exercises-
solutions/notebook
Character classes
. any character except newline
\w\d\s word, digit, whitespace
\W\D\S not word, digit, whitespace
[abc] any of a, b, or c
[^abc] not a, b, or c
[a-g] character between a & g
Anchors
^abc$ start / end of the string
\b word boundary
Escaped characters
\. \* \\ escaped special characters
\t \n \r tab, linefeed, carriage return
Groups
(abc) capture group
Quantifiers & Alternation
a* a+ a? 0 or more, 1 or more, 0 or 1
a{5} a{2,} exactly five, two or more
a{1,3} between one & three
a+? a{2,}? match as few as possible
ab|cd match ab or cd

9 RegEx
No ratings yet
9 RegEx
57 pages
9 RegEx
No ratings yet
9 RegEx
57 pages
Module 24 Regular Expressions Revisited
No ratings yet
Module 24 Regular Expressions Revisited
15 pages
App Dev Using Python-Chapter 3
No ratings yet
App Dev Using Python-Chapter 3
16 pages
17 - Regular Expression
No ratings yet
17 - Regular Expression
20 pages
Python Regular Expressions Guide
No ratings yet
Python Regular Expressions Guide
67 pages
Python Assignment Date: 08-11-2021: Name-Navjeet Kaur Sap ID-500076160 Roll No - R134219065
No ratings yet
Python Assignment Date: 08-11-2021: Name-Navjeet Kaur Sap ID-500076160 Roll No - R134219065
3 pages
Python Re
No ratings yet
Python Re
18 pages
Regular Expressions - Regexes in Python (Part 1) - Real Python
No ratings yet
Regular Expressions - Regexes in Python (Part 1) - Real Python
44 pages
Unit7 RegularExpressionpdf 2023 10 17 09 16 29
No ratings yet
Unit7 RegularExpressionpdf 2023 10 17 09 16 29
17 pages
Mastering Regular Expressions in Python
No ratings yet
Mastering Regular Expressions in Python
4 pages
Manipulating Text With Regular Expression in Python
No ratings yet
Manipulating Text With Regular Expression in Python
4 pages
Python Regex Basics and Usage
No ratings yet
Python Regex Basics and Usage
12 pages
Python Regex: Match, Search, Replace
No ratings yet
Python Regex: Match, Search, Replace
14 pages
Regular Expressions in Python Guide
No ratings yet
Regular Expressions in Python Guide
48 pages
Understanding Regular Expressions in Python
No ratings yet
Understanding Regular Expressions in Python
20 pages
RegEx in Python
No ratings yet
RegEx in Python
5 pages
Regex Lab for Data Scientists
No ratings yet
Regex Lab for Data Scientists
11 pages
Assignment 3
No ratings yet
Assignment 3
3 pages
Regular Expressions in Python
No ratings yet
Regular Expressions in Python
12 pages
Untitled
No ratings yet
Untitled
53 pages
Python Regular Expressions Guide
No ratings yet
Python Regular Expressions Guide
8 pages
Python Regex Basics
No ratings yet
Python Regex Basics
16 pages
Lec 06 - Regular Expression
No ratings yet
Lec 06 - Regular Expression
19 pages
Python Regular Expressions Tutorial
No ratings yet
Python Regular Expressions Tutorial
27 pages
Python Regex Essentials Guide
No ratings yet
Python Regex Essentials Guide
11 pages
Python - Regular Expressions
No ratings yet
Python - Regular Expressions
13 pages
Understanding Python Regular Expressions
No ratings yet
Understanding Python Regular Expressions
21 pages
Understanding Regular Expressions in Python
100% (1)
Understanding Regular Expressions in Python
10 pages
Python Regex
No ratings yet
Python Regex
8 pages
Unit-3 - Regular Expression
No ratings yet
Unit-3 - Regular Expression
15 pages
Python Reg Expressions PDF
No ratings yet
Python Reg Expressions PDF
8 pages
Python Regex Cheat Sheet
No ratings yet
Python Regex Cheat Sheet
29 pages
Regular
No ratings yet
Regular
9 pages
Regular Expressions - Regexes in Python (Part 2) - Real Python
No ratings yet
Regular Expressions - Regexes in Python (Part 2) - Real Python
27 pages
Unit 2
No ratings yet
Unit 2
69 pages
Python RegEx
No ratings yet
Python RegEx
11 pages
Understanding Regular Expressions in Python
No ratings yet
Understanding Regular Expressions in Python
56 pages
Python Regex: Match, Search, Findall
No ratings yet
Python Regex: Match, Search, Findall
10 pages
Ii MSC Python Unit V Notes
No ratings yet
Ii MSC Python Unit V Notes
18 pages
Python Complete Unit 3
No ratings yet
Python Complete Unit 3
40 pages
Python Regular Expression
100% (1)
Python Regular Expression
31 pages
Unit 4 Regular Expression
No ratings yet
Unit 4 Regular Expression
16 pages
Regular Expressions in Python
No ratings yet
Regular Expressions in Python
4 pages
Chapter 10
No ratings yet
Chapter 10
28 pages
Module5 RegularExpressions
No ratings yet
Module5 RegularExpressions
10 pages
Lecture 11 Regular Expressions
No ratings yet
Lecture 11 Regular Expressions
17 pages
Understanding Python's re Module
No ratings yet
Understanding Python's re Module
9 pages
Python Regex Cheatsheet With Examples: Re Module Functions
No ratings yet
Python Regex Cheatsheet With Examples: Re Module Functions
1 page
Python Unit 5
No ratings yet
Python Unit 5
143 pages
3.III-Regular Expression Part-I & II 2022-23
No ratings yet
3.III-Regular Expression Part-I & II 2022-23
14 pages
Python Regex and Multithreading Guide
No ratings yet
Python Regex and Multithreading Guide
46 pages
Python Regular Expressions Guide
No ratings yet
Python Regular Expressions Guide
10 pages
Python Regex: Mastering String Patterns
No ratings yet
Python Regex: Mastering String Patterns
19 pages
Day-13 Python Regx
No ratings yet
Day-13 Python Regx
11 pages
Lecture 9 Python
No ratings yet
Lecture 9 Python
8 pages
Regular Expression
No ratings yet
Regular Expression
39 pages
Fractions, Decimals & Percents Guide
100% (1)
Fractions, Decimals & Percents Guide
4 pages
AMAL HRSG 1 Project Overview
No ratings yet
AMAL HRSG 1 Project Overview
35 pages
Technical Data Sheet: Lasteek E500
No ratings yet
Technical Data Sheet: Lasteek E500
2 pages
1.1 Normality PDF
No ratings yet
1.1 Normality PDF
2 pages
Maths PD
No ratings yet
Maths PD
14 pages
MRCET College Contact and Details
No ratings yet
MRCET College Contact and Details
12 pages
Mathematical Reasoning Proofs MAT 1362 Fall 2017 Alistair Savage Ebook Ultra-Clear PDF
100% (2)
Mathematical Reasoning Proofs MAT 1362 Fall 2017 Alistair Savage Ebook Ultra-Clear PDF
45 pages
Three Domains of Learning-: Cognitive, Affective and Psychomotor
No ratings yet
Three Domains of Learning-: Cognitive, Affective and Psychomotor
8 pages
Permanent Personnel Data Overview
No ratings yet
Permanent Personnel Data Overview
24 pages
The Adopted Daughter Saves The World Chapter 66 Quick Access To Manga Chapters
No ratings yet
The Adopted Daughter Saves The World Chapter 66 Quick Access To Manga Chapters
34 pages
Rural Development and Poverty in the Philippines
No ratings yet
Rural Development and Poverty in the Philippines
4 pages
Attachment Report For Dennis Muchararadza R214343S Digital Banking
100% (1)
Attachment Report For Dennis Muchararadza R214343S Digital Banking
43 pages
Bcse301l Cat 1 C2 2023 2024
No ratings yet
Bcse301l Cat 1 C2 2023 2024
3 pages
UA MGE Brochure
No ratings yet
UA MGE Brochure
7 pages
Top 10 Tips for Growing Cycads
No ratings yet
Top 10 Tips for Growing Cycads
3 pages
Grammatical Theory and Metascience PDF
100% (1)
Grammatical Theory and Metascience PDF
366 pages
Advanced Growth Calculation
No ratings yet
Advanced Growth Calculation
5 pages
Gis 02
No ratings yet
Gis 02
10 pages
Spectronic 501 601 Spectrophotometer
No ratings yet
Spectronic 501 601 Spectrophotometer
24 pages
Year 1 Mathematics Teaching Schedule
No ratings yet
Year 1 Mathematics Teaching Schedule
1 page
Engineering Data Analysis Problem Set
No ratings yet
Engineering Data Analysis Problem Set
2 pages
s2017 Pbs Pixar Notes PDF
No ratings yet
s2017 Pbs Pixar Notes PDF
18 pages
Year 5 English Lesson Plan: Wildlife
No ratings yet
Year 5 English Lesson Plan: Wildlife
25 pages
Miniature High-Torque DC Servomotors
No ratings yet
Miniature High-Torque DC Servomotors
5 pages
3rd Year Preparatory English Answers
No ratings yet
3rd Year Preparatory English Answers
41 pages
Flower Image Classification with Transfer Learning
No ratings yet
Flower Image Classification with Transfer Learning
6 pages
The Basic Principles of Operational Art
100% (1)
The Basic Principles of Operational Art
301 pages
Webmail Configuration and Security Settings
No ratings yet
Webmail Configuration and Security Settings
606 pages
4.ladder Logic Programming
100% (2)
4.ladder Logic Programming
19 pages
Guidelines For Developing PPT and Other Specifications: Cips-Asci
No ratings yet
Guidelines For Developing PPT and Other Specifications: Cips-Asci
3 pages

Unit - 4 Regex

Uploaded by

Unit - 4 Regex

Uploaded by

REGULAR EXPRESSIONS AND OOP CONCEPTS

UNIT – 4 | I SEM | MCA |2024-26 BATCH | RIT

 What is Regular expression?

 A Regular Expression (regex) is a sequence of characters that defines a search

 It is commonly used for string matching, searching, and replacing text.

 Python provides the re module to work with regular expressions.

 Purpose: search for a pattern at the beginning of a string.

Python’s re module provides powerful tools for regex operations.

import re .group() → Returns the actual match.

match = re.match(pattern, text) if match:

• In a normal string, backslashes (\) are treated as escape characters

• A raw string (r"") tells Python not to interpret backslashes as escape

• Always use r"" for regex patterns to avoid unexpected errors.

Unlike match(), search() checks the entire string.

match = re.search(pattern, text)

# Output: Match found!

2. Using Regular Expression Objects

pattern = re.compile(r"World") # Compile the regex pattern

text = "Hello, World!"

match = pattern.search(text) # Using the compiled object

Purpose: re.finditer() returns an iterator yielding match objects for all

pattern = r“Hello" pattern = re.compile('ab', re.IGNORECASE)

Useful when handling large data, as it yields results lazily.

Purpose: re.findall() returns a list of all non-overlapping matches of a

pattern = r“[0-9]” # Find all numbers

matches = re.findall(pattern, text)

pattern = re.compile('ab', re.IGNORECASE)

match_list = re.findall(pattern, data)

Feature re.findall() re.finditer()

Return Type Returns a list of matching Returns an iterator yielding

In re.findall() and finditer(), matches are non-overlapping, meaning

 A character class typically refers to a set of characters that you can

\A Matches only at the start of \AHello "Hello world" → "world Hello" →

\Z Matches only at the end of tutorial\Z "This is a tutorial" → "tutorial on regex"

. Matches every character

pattern = r'[0-9]' pattern = r'[0-9]'

match_list = re.findall(pattern, data) match_iter = re.finditer(pattern, data)

if match_list: for match in match_iter:

text = "Python is fun!"

new_text = re.sub(r"Python", "Java", text)

# Output: Java is fun!

You might also like