Unit 2

PYTHON ADVANCED
PROGRAMMING
Basic Concepts
Chitra G M and P Rama Devi

Department of Computer Science Engineering
PYTHON ADVANCED
PROGRAMMING
Basic Concepts
P Rama Devi
Department of Computer Science and Engineering
PYTHON ADVANCED PROGRAMMING
re—Regular Expressions
Regular expressions are text-matching patterns described with a formal syntax. The
patterns are interpreted as a set of instructions, which are then executed with a string
as input to produce a matching subset or modified version of the original
Expressions can include literal text matching, repetition, pattern composition,

branching, and other sophisticated rules
Regular expressions are typically used in applications that involve a lot of text
processing. For example, they are commonly used as search patterns in text-editing
programs used by developers, including vi, emacs, and modern IDEs
Finding Patterns in Text
The most common use for re is to search for patterns in text. The search() function
takes the pattern and text to scan, and returns a Match object when the pattern is found.
If the pattern is not found, search() returns None .
import re
pattern = ’this’
text = ’Does this text match the pattern?’
match = re.search(pattern, text)
s = match.start()
e = match.end()
Compiling Expressions
re includes module-level functions for working with regular expressions as text strings,
but it is more efficient to compile the expressions a program uses frequently. The com-
pile() function converts an expression string into a RegexObject .
sentence='start with a beautiful smile end.'

pattern=re.compile(r'.')
matches=pattern.finditer(sentence)
for match in matches:
print(match)
Special Characters
1 . - Any Character Except New Line

2 \d - Digit (0-9)
3 \D - Not a Digit (0-9)
4 \w - Word Character (a-z, A-Z, 0-9, _)
5 \W - Not a Word Character
6 \s - Whitespace (space, tab, newline)
7 \S - Not Whitespace (space, tab,
newline)
Special Characters
1 \b - Word Boundary
2 \B - Not a Word Boundary
3 ^ - Beginning of a String
4 $ - End of a String
Special Characters
1 [] - Matches Characters in brackets
2 [^ ] - Matches Characters NOT in brackets
3 | - Either Or
4 ( ) - Group
Special Characters
Quanti3ers:
1 * - 0 or More
2 + - 1 or More
3 ? - 0 or One
4 {3} - Exact Number
5 {3,4} - Range of Numbers (Minimum,
Maximum)
THANK YOU

Department of CSE
pramadevi@pes.edu
chitragm@pes.edu
PYTHON ADVANCED
PROGRAMMING

PYTHON ADVANCED
PROGRAMMING
P Rama Devi
Multiple Matches
Search() is used to look for single instances of literal text strings. The
findall() function returns all substrings of the input that
match the pattern without overlapping.
import re
text = ’abbaaabbbbaaaaa’
pattern = ’ab’
for match in re.findall(pattern, text):
print(‘Found’,match)
Output:
Found ‘ab’
Found ‘ab’
finditer() returns an iterator that produces Match instances instead of the

strings returned by findall() .
import re
text = ’abbaaabbbbaaaaa’
pattern = ’ab’
for match in re.finditer(pattern, text):
s = match.start()
e = match.end()
print ’Found (text[s:e], s, e)
Output:
0:2
5:7
Raw String: String prefixed with r. Tells not to handle backslashes in any
special way.
Sample example:
print(‘\ttab’)
Output: tab
print(r’\ttab’)
Output:\ttab
Regarding ‘.’:
pattern=re.compile(r’.’)
Matches almost every thing except new line character
If you want print only ‘.’ then
pattern=re.compile(r’\.’)
Output will match only ‘.’ character
Example:
pattern=re.compile(r’rama\.com’)
Output can be rama.com

Lets have our text as Ha HaHa
Example: match all Ha’s
pattern=re.compile(r’\bHa’)
\b - word boundary
Output will be words of Ha but does not include last Ha since it is not a word
boundary. A word boundary will be start with a line word or a word followed by a
space
So if we want only the last Ha which is not a word boundary we use \B
Ex: pattern=re.compile(r’\BHa’)
Last Ha can be printed as output

Let sentence=’start of a sentence and end’
^ - Begin of string
$ - end of string
pattern=re.compile(r’^start’)
Output will be given as start is at the begining
pattern=re.compile(r’^a’)
Output will not be a match since a is not at begining
pattern=re.compile(r’end$’)
Output will be given as the text end is at the last
pattern=re.compile(r’a$’)
Output will not be a match since it is not end

Meta characters can be used for various use cases like matching phone numbers
ex. 080-12312
To match above we can use
pattern=re.compile(r’\d\d\d.\d\d\d\d\d’)
THANK YOU

Department of CSE
pramadevi@pes.edu
chitragm@pes.edu
PYTHON ADVANCED
PROGRAMMING

PYTHON ADVANCED
PROGRAMMING
P Rama Devi
Character Set:
Uses [ ] with characters we want to match
Example: re.compile(r’\d\d\d[-.]\d\d\d[-.]\d\d\d\d’)
This will match any character either . Or – given in charater set
If input is 321--431-1234
Above will not be matched since [-.] will match either – or . But not second – given in
input.
If we want to match phone numbers starting from 800 or 900
Example: 800-123-6721
900-213-4567
pattern=re.compile(r’[89]00[-.]\d\d\d[-.]\d\d\d\d’)
Output:
800-123-6721
900-213-4567
Example: matches digits between 1 to 5
pattern=re.compile(r’[1-5]’)
In the above 1-5 defines range
re.compile(r’[a-z]’)
re.compile(r’[a-zA-Z]’)
re.compile(r’^[A-Z][a-z]’)
Above one will negate everything not in character set. i.e not a lower case ,
upper case
Example:
Cat
Mat
Pat
Bat
Print all which end with at but not bat
So we can use re.compile(r’[^B]at’)
Output:
Cat
Mat
Pat
Quantifiers:
IN previous examples for phone numbers we used \d multiple times instead we

can use quantifiers
* - 0 or 1
+ - 1 or more
? - 0 or one
{3} – exact number
{3,4} – range of numbers (minimum , maximum)
Example: to match names

text=’’’Mr. Ram
Mr Kevin
Ms Kavitha
Mrs. Devi
Mr. T’’’
pattern=re.compile(r’Mr\.’)
matches = pattern.finditer(text)
print(match)
Output will give 2 ‘Mr.’ but not ‘Mr’

So we should tell ‘.’ is optional
Therefore re.compile(r’Mr\.?’)
Will give all ‘Mr’
re.compile(r’Mr\.?\s[A-Z]\w*’)
Output:
Mr. Ram
Mr Kevin
Mr. T
This did not match Ms, Mrs. Lets see use of groups in next class where we can use
groups concept to print all names
THANK YOU

Department of CSE
pramadevi@pes.edu
chitragm@pes.edu
PYTHON ADVANCED
PROGRAMMING

PYTHON ADVANCED
PROGRAMMING
P Rama Devi
Groups
Searching for pattern matches is the basis of the powerful capabilities provided
by regular expressions. Adding groups to a pattern isolates parts of the
matching text, expanding those capabilities to create a parser. Groups are
defined by enclosing patterns in parentheses ( ( and ) ).
For the names example we were seeing

If the pattern is written as re.compile(r’M(r|s|rs)\.?\s[A-Z]\w*’)
The above will print all the names we were discussing
re.compile(r’(Mr|Ms|Mrs)\.?\s[A-Z]\w*’)
Another Example:
import re
text='''
Mr. Sachin
Mr Virat
Mrs Karanam
Ms Sindhu
Mr. R
'''
pattern = re.compile(r'(Mr|Ms|Mrs)\.?\s[A-Z]\w*')
matches=pattern.finditer(text)
print(match)
emails='''
ramadevi@gmail.com
rama.devi@pes.edu
p-rama-devi@my-work.net
'''
pattern = re.compile(r'[a-zA-Z.]+@[a-zA-Z]+\.(com|edu)')
matches=pattern.finditer(emails)
print(match)
Output:
ramadevi@gmail.com
rama.devi@pes.edu
p-rama-devi@my-work.net
urls=
‘’’
https://www.google.com
http://ramadevi.com
https://pep.com
https://www.isro.gov
Question: we want to capture domain name and top level domain name
Example: google.com
If we use re.compile(r’https?://(www\.)?\w+\.\w+’)
Output all 4 will be matched. Lets see how to get the answer for the question
pattern=re.compile(r’https?://(www\.)?(\w+)(\.\w+)’)
matches=pattern.finditer(urls)
print(match.group(2))
Output will be for domain
match.group(3) will give top level domain
match.group(0)
Will give all the matches

We can use back references for the previous example called

as substitution
pattern=re.compile(r’https?://(www\.)?(\w+)(\.\w+)’)
subbed_urls=pattern.sub(r’\2\3’,urls)
print(subbed_urls)
Here \2,\3 are back references

i.e domain and top level domain information is printed
THANK YOU

Department of CSE
pramadevi@pes.edu
chitragm@pes.edu
PYTHON ADVANCED
PROGRAMMING

PYTHON ADVANCED
PROGRAMMING
P Rama Devi
Search Options
The way the matching engine processes an expression can be changed using
option flags. The flags can be combined using a bitwise OR operation, then
passed to compile() , search() , match() , and other functions that accept a
pattern for searching.
Case-Insensitive Matching
IGNORECASE causes literal characters and character ranges in the pattern to match both
uppercase and lowercase characters.
import re
text = ’This is some text -- with punctuation.’
pattern = r’\bT\w+’
with_case = re.compile(pattern)
without_case = re.compile(pattern, re.IGNORECASE)
print(text)
print(pattern)
print ’Case-sensitive:’
for match in with_case.findall(text):
print(match)
print ’Case-insensitive:’
for match in without_case.findall(text):
print(match)
Since the pattern includes the literal T , without setting IGNORECASE , the
only match is the word This . When case is ignored, text also matches.
Input with Multiple Lines
Two flags affect how searching in multiline input works: MULTILINE and DOTALL .
The MULTILINE flag controls how the pattern-matching code processes anchoring
instructions for text containing newline characters. When multiline mode is turned on,
the anchor rules for ^ and $ apply at the beginning and end of each line, in addition to
the entire string.
import re
text = ’This is some text -- with punctuation.\nA second line.’
pattern = r’(^\w+)|(\w+\S*$)’
single_line = re.compile(pattern)
multiline = re.compile(pattern, re.MULTILINE)
print(‘text’)
print(‘pattern’)
print(’Single Line’)
for match in single_line.findall(text):
print ’ %r’ % (match,)
print(‘Multiline’)
for match in multiline.findall(text):

print(match)
The pattern in the example matches the first or last word of the input. It
matches line. at the end of the string, even though there is no newline.
The previous code can print output for single line and multiline
something like this
Single Line :
(’This’, ’’)
(’’, ’line.’)
Multiline:
(’This’, ’’)
(’’, ’punctuation.’)
(’A’, ’’)
(’’, ’line.’)
DOTALL is the other flag related to multiline text. Normally, the dot character ( . )
matches everything in the input text except a newline character. The flag allows
dot to match newlines as well.
import re
pattern = r’.+’
no_newlines = re.compile(pattern)
dotall = re.compile(pattern, re.DOTALL)
print(‘text’)
print ’No newlines :’
for match in no_newlines.findall(text):
print(‘match’)
print ’Dotall:’
for match in dotall.findall(text):
print(‘match’)
Without the flag, each line of the input text matches the pattern separately. Adding
the flag causes the entire string to be consumed.
The output for the previous program can be something like this:
No newlines :
’This is some text -- with punctuation.’
’A second line.’
Dotall
’This is some text -- with punctuation.\nA second line.’

THANK YOU

Department of CSE
pramadevi@pes.edu
chitragm@pes.edu
Threads
A process is an active program i.e. a program that is under execution. It is more than the
program code as it includes the program counter, process stack, registers, program code
etc. Compared to this, the program code is only the text section.
A thread is a lightweight process that can be managed independently by a scheduler. It
improves the application performance using parallelism. A thread shares information like
data segment, code segment, files etc. with its peer threads while it contains its own
registers, stack, counter etc.
For simplicity, you can assume that a thread is simply a subset of a process!
def print_cube(num):
print("cube of a number",num * num * num)
def print_square(num):
print("cube of a number",num * num)
if __name__ == "__main__":
# creating thread
t1 = threading.Thread(target=print_square, args=(10,))
t2 = threading.Thread(target=print_cube, args=(10,))
# starting thread 1
t1.start()
# starting thread 2
t2.start()
# wait until thread 1 is completely executed
t1.join()
# wait until thread 2 is completely executed
t2.join()
# both threads completely executed
print("Done!")
To create a new thread, we create an object of Thread class. It takes following

arguments:
 target: the function to be executed by thread
 args: the arguments to be passed to the target function
To start a thread, we use start method of Thread class.
Once the threads start, the current program (you can think of it like a main thread) also
keeps on executing. In order to stop execution of current program until a thread is
complete, we use join method.
As a result, the current program will first wait for the completion of t1 and then t2.
Once, they are finished, the remaining statements of current program are executed.
import threading
import os
def task1():
print(threading.current_thread().name)
def task2():
print(threading.current_thread().name)
if __name__ == "__main__":
# print ID of current process

print("ID of process running main program:",os.getpid())
# print name of main thread

print("Main thread name:",threading.current_thread().name)
# get the current thread object
# creating threads
t1 = threading.Thread(target=task1, name='t1')
t2 = threading.Thread(target=task2, name='t2')
# starting threads
t1.start()
t2.start()
# wait until all threads finish

t1.join()
t2.join()
------------------------------------------------------------------------------------
Concept of global
1)
x=20 # global
def f1():
print(x)
f1() # 20
2)
x=20 # global
def f1():
x=x+1 # error
print(x)
f1()
3)
x=20 # global
def f1():
global x
x=x+1
print(x)
f1() # 21
Example 3:
import threading
# global variable x
x=0
def increment_global():
global x
x += 1
def taskofThread():
for _ in range(1000000):
increment_global()
def main():
global x
x=0
t1 = threading.Thread(target= taskofThread)
t2 = threading.Thread(target= taskofThread)
t1.start()
t2.start()
t1.join()
t2.join()
if __name__ == "__main__":
for i in range(10):
main()
print(i,x)
import time
import threading
from threading import Thread
def i_am_thread():
time.sleep(10)
print("this is thread")
th = Thread(target= i_am_thread,name='fast')
th.start()
print("Current Thread count:",threading.active_count())
print("state:",th.is_alive())
th.join()
print("state:",th.is_alive())
print("get name:",th.getName())
print("Main thread name:",threading.main_thread().name)
References:
www.python.org
www.geeksforgeeks.com
www.tutorialspoint.com
What is a race condition?
A race condition occurs when two or more threads try to access a shared variable
simultaneously, leading to unpredictable outcomes.
In this scenario, the first thread reads the value from the shared variable. At the same time,
the second thread also reads the value from the same shared variable.
Then both threads attempt to change the value of the shared variable. since the updates
occur simultaneously, it creates a race to determine which thread’s modification is
preserved.
The final value of the shared variable depends on which thread completes its update last.
Whatever thread that changes the value last will win the race.
Race condition example

The following example illustrates a race condition:
from threading import Thread
from time import sleep
counter = 0
def increase(by):
global counter
local_counter = counter
local_counter += by
sleep(0.1)
counter = local_counter
print(f'counter={counter}')
# create threads
t1 = Thread(target=increase, args=(10,))
t2 = Thread(target=increase, args=(20,))
# start the threads
t1.start()
t2.start()
# wait for the threads to complete
t1.join()
t2.join()
print(f'The final counter is {counter}')
In this program, both threads try to modify the value of the counter variable at the same
time. The value of the counter variable depends on which thread completes last.
If the thread t1 completes before the thread t2, you’ll see the following output:
counter=10
counter=20
The counter is 20
Otherwise, you’ll see the following output:
counter=20
counter=10
The final counter is 10
Using a threading lock to prevent the race condition
To prevent race conditions, you can use a threading lock.

A threading lock is a synchronization primitive that provides exclusive access to a shared
resource in a multithreaded application. A thread lock is also known as a mutex which is
short for mutual exclusion.
Typically, a threading lock has two states: locked and unlocked. When a thread acquires a
lock, the lock enters the locked state. The thread can have exclusive access to the shared
resource.
Other threads that attempt to acquire the lock while it is locked will be blocked and wait
until the lock is released.
In Python, you can use the Lock class from the threading module to create a lock object:
First, create an instance the Lock class:
lock = Lock()
By default, the lock is unlocked until a thread acquires it.
Second, acquire a lock by calling the acquire() method:
lock.acquire()
Third, release the lock once the thread completes changing the shared variable:
lock.release()
The following example shows how to use the Lock object to prevent the race condition in
the previous program:
from threading import Thread, Lock
from time import sleep
counter = 0
def increase(by, lock):
global counter
lock.acquire()
local_counter = counter
local_counter += by
sleep(0.1)
counter = local_counter
print(f'counter={counter}')
lock.release()
lock = Lock()
# create threads
t1 = Thread(target=increase, args=(10, lock))
t2 = Thread(target=increase, args=(20, lock))
# start the threads
t1.start()
t2.start()
# wait for the threads to complete
t1.join()
t2.join()
print(f'The final counter is {counter}')
Output:
counter=10
counter=30
The final counter is 30
Acknowledgement:
https://www.pythontutorial.net/python-concurrency/python-threading-lock/
PYTHON ADVANCED
PROGRAMMING

PYTHON ADVANCED
PROGRAMMING
P Rama Devi
Search Options
The way the matching engine processes an expression can be changed using
option flags. The flags can be combined using a bitwise OR operation, then
passed to compile() , search() , match() , and other functions that accept a
pattern for searching.
Case-Insensitive Matching
IGNORECASE causes literal characters and character ranges in the pattern to match both
uppercase and lowercase characters.
import re
text = ’This is some text -- with punctuation.’
pattern = r’\bT\w+’
with_case = re.compile(pattern)
without_case = re.compile(pattern, re.IGNORECASE)
print(text)
print(pattern)
print ’Case-sensitive:’
for match in with_case.findall(text):
print(match)
print ’Case-insensitive:’
for match in without_case.findall(text):
print(match)
Since the pattern includes the literal T , without setting IGNORECASE , the
only match is the word This . When case is ignored, text also matches.
Input with Multiple Lines
Two flags affect how searching in multiline input works: MULTILINE and DOTALL .
The MULTILINE flag controls how the pattern-matching code processes anchoring
instructions for text containing newline characters. When multiline mode is turned on,
the anchor rules for ^ and $ apply at the beginning and end of each line, in addition to
the entire string.
import re
pattern = r’(^\w+)|(\w+\S*$)’
single_line = re.compile(pattern)
multiline = re.compile(pattern, re.MULTILINE)
print(‘text’)
print(’Single Line’)
for match in single_line.findall(text):
print ’ %r’ % (match,)
print(‘Multiline’)
for match in multiline.findall(text):

print(match)
The pattern in the example matches the first or last word of the input. It
matches line. at the end of the string, even though there is no newline.
The previous code can print output for single line and multiline
something like this
Single Line :
(’This’, ’’)
(’’, ’line.’)
Multiline:
(’This’, ’’)
(’’, ’punctuation.’)
(’A’, ’’)
(’’, ’line.’)
DOTALL is the other flag related to multiline text. Normally, the dot character ( . )
matches everything in the input text except a newline character. The flag allows
dot to match newlines as well.
import re
pattern = r’.+’
no_newlines = re.compile(pattern)
dotall = re.compile(pattern, re.DOTALL)
print(‘text’)
print ’No newlines :’
for match in no_newlines.findall(text):
print(‘match’)
print ’Dotall:’
for match in dotall.findall(text):
print(‘match’)
Without the flag, each line of the input text matches the pattern separately. Adding
the flag causes the entire string to be consumed.
The output for the previous program can be something like this:
No newlines :
’This is some text -- with punctuation.’
’A second line.’
Dotall
:
’This is some text -- with punctuation.\nA second line.’
THANK YOU

Department of CSE
pramadevi@pes.edu
chitragm@pes.edu

Unit 2

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Unit 2

Uploaded by

Copyright:

Available Formats

PYTHON ADVANCED

Chitra G M and P Rama Devi

Expressions can include literal text matching, repetition, pattern composition,

sentence='start with a beautiful smile end.'

1 . - Any Character Except New Line

1 [] - Matches Characters in brackets

2 [^ ] - Matches Characters NOT in brackets

Chitra G M and P Rama Devi

Chitra G M and P Rama Devi

finditer() returns an iterator that produces Match instances instead of the

Matches almost every thing except new line character

If you want print only ‘.’ then

Output will match only ‘.’ character

Output can be rama.com

Lets have our text as Ha HaHa

Example: match all Ha’s

So if we want only the last Ha which is not a word boundary we use \B

Last Ha can be printed as output

Let sentence=’start of a sentence and end’

Output will be given as start is at the begining

Output will not be a match since a is not at begining

Output will not be a match since it is not end

To match above we can use

Chitra G M and P Rama Devi

Chitra G M and P Rama Devi

Uses [ ] with characters we want to match

This will match any character either . Or – given in charater set

If we want to match phone numbers starting from 800 or 900

Example: matches digits between 1 to 5

In the above 1-5 defines range

Print all which end with at but not bat

So we can use re.compile(r’[^B]at’)

IN previous examples for phone numbers we used \d multiple times instead we

Example: to match names

Output will give 2 ‘Mr.’ but not ‘Mr’

So we should tell ‘.’ is optional

Will give all ‘Mr’

Chitra G M and P Rama Devi

Chitra G M and P Rama Devi

For the names example we were seeing

The above will print all the names we were discussing

Output will be for domain

match.group(3) will give top level domain

Will give all the matches

We can use back references for the previous example called

Here \2,\3 are back references

Chitra G M and P Rama Devi

Chitra G M and P Rama Devi

Input with Multiple Lines

for match in multiline.findall(text):

’This is some text -- with punctuation.\nA second line.’

Chitra G M and P Rama Devi

print("cube of a number",num * num * num)

print("cube of a number",num * num)

# wait until thread 1 is completely executed

# wait until thread 2 is completely executed

To create a new thread, we create an object of Thread class. It takes following

To start a thread, we use start method of Thread class.

# print ID of current process

# print name of main thread

# wait until all threads finish

Race condition example

To prevent race conditions, you can use a threading lock.