You are on page 1of 69

PYTHON ADVANCED

PROGRAMMING
Basic Concepts

Chitra G M and P Rama Devi


Department of Computer Science Engineering
PYTHON ADVANCED
PROGRAMMING
Basic Concepts

P Rama Devi
Department of Computer Science and Engineering
PYTHON ADVANCED PROGRAMMING
re—Regular Expressions

Regular expressions are text-matching patterns described with a formal syntax. The
patterns are interpreted as a set of instructions, which are then executed with a string
as input to produce a matching subset or modified version of the original

Expressions can include literal text matching, repetition, pattern composition,


branching, and other sophisticated rules

Regular expressions are typically used in applications that involve a lot of text
processing. For example, they are commonly used as search patterns in text-editing
programs used by developers, including vi, emacs, and modern IDEs
PYTHON ADVANCED PROGRAMMING
Finding Patterns in Text

The most common use for re is to search for patterns in text. The search() function
takes the pattern and text to scan, and returns a Match object when the pattern is found.
If the pattern is not found, search() returns None .

import re
pattern = ’this’
text = ’Does this text match the pattern?’
match = re.search(pattern, text)
s = match.start()
e = match.end()
PYTHON ADVANCED PROGRAMMING
Compiling Expressions

re includes module-level functions for working with regular expressions as text strings,
but it is more efficient to compile the expressions a program uses frequently. The com-
pile() function converts an expression string into a RegexObject .

sentence='start with a beautiful smile end.'


pattern=re.compile(r'.')
matches=pattern.finditer(sentence)
for match in matches:
print(match)
PYTHON ADVANCED PROGRAMMING
Special Characters

1 . - Any Character Except New Line


2 \d - Digit (0-9)
3 \D - Not a Digit (0-9)
4 \w - Word Character (a-z, A-Z, 0-9, _)
5 \W - Not a Word Character
6 \s - Whitespace (space, tab, newline)
7 \S - Not Whitespace (space, tab,
newline)
PYTHON ADVANCED PROGRAMMING
Special Characters

1 \b - Word Boundary
2 \B - Not a Word Boundary
3 ^ - Beginning of a String
4 $ - End of a String
PYTHON ADVANCED PROGRAMMING
Special Characters

1 [] - Matches Characters in brackets

2 [^ ] - Matches Characters NOT in brackets

3 | - Either Or

4 ( ) - Group
PYTHON ADVANCED PROGRAMMING
Special Characters

Quanti3ers:
1 * - 0 or More
2 + - 1 or More
3 ? - 0 or One
4 {3} - Exact Number
5 {3,4} - Range of Numbers (Minimum,
Maximum)
THANK YOU

Chitra G M and P Rama Devi


Department of CSE
pramadevi@pes.edu
chitragm@pes.edu
PYTHON ADVANCED
PROGRAMMING

Chitra G M and P Rama Devi


Department of Computer Science Engineering
PYTHON ADVANCED
PROGRAMMING

P Rama Devi
Department of Computer Science and Engineering
PYTHON ADVANCED PROGRAMMING
Multiple Matches

Search() is used to look for single instances of literal text strings. The
findall() function returns all substrings of the input that
match the pattern without overlapping.

import re
text = ’abbaaabbbbaaaaa’
pattern = ’ab’
for match in re.findall(pattern, text):
print(‘Found’,match)

Output:
Found ‘ab’
Found ‘ab’
PYTHON ADVANCED PROGRAMMING

finditer() returns an iterator that produces Match instances instead of the


strings returned by findall() .

import re
text = ’abbaaabbbbaaaaa’
pattern = ’ab’
for match in re.finditer(pattern, text):
s = match.start()
e = match.end()
print ’Found (text[s:e], s, e)

Output:

0:2
5:7
PYTHON ADVANCED PROGRAMMING

Raw String: String prefixed with r. Tells not to handle backslashes in any
special way.

Sample example:

print(‘\ttab’)

Output: tab

print(r’\ttab’)

Output:\ttab
PYTHON ADVANCED PROGRAMMING

Regarding ‘.’:

pattern=re.compile(r’.’)

Matches almost every thing except new line character

If you want print only ‘.’ then

pattern=re.compile(r’\.’)

Output will match only ‘.’ character

Example:

pattern=re.compile(r’rama\.com’)

Output can be rama.com


PYTHON ADVANCED PROGRAMMING

Lets have our text as Ha HaHa

Example: match all Ha’s

pattern=re.compile(r’\bHa’)

\b - word boundary

Output will be words of Ha but does not include last Ha since it is not a word
boundary. A word boundary will be start with a line word or a word followed by a
space

So if we want only the last Ha which is not a word boundary we use \B

Ex: pattern=re.compile(r’\BHa’)

Last Ha can be printed as output


PYTHON ADVANCED PROGRAMMING

Let sentence=’start of a sentence and end’

^ - Begin of string
$ - end of string

pattern=re.compile(r’^start’)

Output will be given as start is at the begining

pattern=re.compile(r’^a’)

Output will not be a match since a is not at begining

pattern=re.compile(r’end$’)
Output will be given as the text end is at the last

pattern=re.compile(r’a$’)

Output will not be a match since it is not end


PYTHON ADVANCED PROGRAMMING

Meta characters can be used for various use cases like matching phone numbers

ex. 080-12312

To match above we can use

pattern=re.compile(r’\d\d\d.\d\d\d\d\d’)
THANK YOU

Chitra G M and P Rama Devi


Department of CSE
pramadevi@pes.edu
chitragm@pes.edu
PYTHON ADVANCED
PROGRAMMING

Chitra G M and P Rama Devi


Department of Computer Science Engineering
PYTHON ADVANCED
PROGRAMMING

P Rama Devi
Department of Computer Science and Engineering
PYTHON ADVANCED PROGRAMMING

Character Set:

Uses [ ] with characters we want to match

Example: re.compile(r’\d\d\d[-.]\d\d\d[-.]\d\d\d\d’)

This will match any character either . Or – given in charater set

If input is 321--431-1234

Above will not be matched since [-.] will match either – or . But not second – given in
input.
PYTHON ADVANCED PROGRAMMING

If we want to match phone numbers starting from 800 or 900

Example: 800-123-6721
900-213-4567

pattern=re.compile(r’[89]00[-.]\d\d\d[-.]\d\d\d\d’)

Output:

800-123-6721
900-213-4567
PYTHON ADVANCED PROGRAMMING

Example: matches digits between 1 to 5

pattern=re.compile(r’[1-5]’)

In the above 1-5 defines range

re.compile(r’[a-z]’)

re.compile(r’[a-zA-Z]’)

re.compile(r’^[A-Z][a-z]’)

Above one will negate everything not in character set. i.e not a lower case ,
upper case
PYTHON ADVANCED PROGRAMMING

Example:
Cat
Mat
Pat
Bat

Print all which end with at but not bat

So we can use re.compile(r’[^B]at’)

Output:

Cat
Mat
Pat
PYTHON ADVANCED PROGRAMMING

Quantifiers:

IN previous examples for phone numbers we used \d multiple times instead we


can use quantifiers

* - 0 or 1
+ - 1 or more
? - 0 or one
{3} – exact number
{3,4} – range of numbers (minimum , maximum)
PYTHON ADVANCED PROGRAMMING

Example: to match names


text=’’’Mr. Ram
Mr Kevin
Ms Kavitha
Mrs. Devi
Mr. T’’’

pattern=re.compile(r’Mr\.’)
matches = pattern.finditer(text)
for match in matches:
print(match)

Output will give 2 ‘Mr.’ but not ‘Mr’


PYTHON ADVANCED PROGRAMMING

So we should tell ‘.’ is optional

Therefore re.compile(r’Mr\.?’)

Will give all ‘Mr’

re.compile(r’Mr\.?\s[A-Z]\w*’)

Output:

Mr. Ram
Mr Kevin
Mr. T

This did not match Ms, Mrs. Lets see use of groups in next class where we can use
groups concept to print all names
THANK YOU

Chitra G M and P Rama Devi


Department of CSE
pramadevi@pes.edu
chitragm@pes.edu
PYTHON ADVANCED
PROGRAMMING

Chitra G M and P Rama Devi


Department of Computer Science Engineering
PYTHON ADVANCED
PROGRAMMING

P Rama Devi
Department of Computer Science and Engineering
PYTHON ADVANCED PROGRAMMING
Groups

Searching for pattern matches is the basis of the powerful capabilities provided
by regular expressions. Adding groups to a pattern isolates parts of the
matching text, expanding those capabilities to create a parser. Groups are
defined by enclosing patterns in parentheses ( ( and ) ).

For the names example we were seeing


If the pattern is written as re.compile(r’M(r|s|rs)\.?\s[A-Z]\w*’)

The above will print all the names we were discussing

re.compile(r’(Mr|Ms|Mrs)\.?\s[A-Z]\w*’)
PYTHON ADVANCED PROGRAMMING

Another Example:

import re
text='''
Mr. Sachin
Mr Virat
Mrs Karanam
Ms Sindhu
Mr. R

'''

pattern = re.compile(r'(Mr|Ms|Mrs)\.?\s[A-Z]\w*')
matches=pattern.finditer(text)
for match in matches:
print(match)
PYTHON ADVANCED PROGRAMMING

emails='''
ramadevi@gmail.com
rama.devi@pes.edu
p-rama-devi@my-work.net
'''

pattern = re.compile(r'[a-zA-Z.]+@[a-zA-Z]+\.(com|edu)')
matches=pattern.finditer(emails)
for match in matches:
print(match)

Output:

ramadevi@gmail.com
rama.devi@pes.edu
p-rama-devi@my-work.net
PYTHON ADVANCED PROGRAMMING

urls=
‘’’
https://www.google.com
http://ramadevi.com
https://pep.com
https://www.isro.gov

Question: we want to capture domain name and top level domain name

Example: google.com

If we use re.compile(r’https?://(www\.)?\w+\.\w+’)

Output all 4 will be matched. Lets see how to get the answer for the question
PYTHON ADVANCED PROGRAMMING

pattern=re.compile(r’https?://(www\.)?(\w+)(\.\w+)’)
matches=pattern.finditer(urls)
for match in matches:
print(match.group(2))

Output will be for domain

match.group(3) will give top level domain

match.group(0)

Will give all the matches


PYTHON ADVANCED PROGRAMMING

We can use back references for the previous example called


as substitution

pattern=re.compile(r’https?://(www\.)?(\w+)(\.\w+)’)
subbed_urls=pattern.sub(r’\2\3’,urls)
print(subbed_urls)

Here \2,\3 are back references


i.e domain and top level domain information is printed
PYTHON ADVANCED PROGRAMMING
THANK YOU

Chitra G M and P Rama Devi


Department of CSE
pramadevi@pes.edu
chitragm@pes.edu
PYTHON ADVANCED
PROGRAMMING

Chitra G M and P Rama Devi


Department of Computer Science Engineering
PYTHON ADVANCED
PROGRAMMING

P Rama Devi
Department of Computer Science and Engineering
PYTHON ADVANCED PROGRAMMING
Search Options

The way the matching engine processes an expression can be changed using
option flags. The flags can be combined using a bitwise OR operation, then
passed to compile() , search() , match() , and other functions that accept a
pattern for searching.

Case-Insensitive Matching

IGNORECASE causes literal characters and character ranges in the pattern to match both
uppercase and lowercase characters.
PYTHON ADVANCED PROGRAMMING

import re
text = ’This is some text -- with punctuation.’
pattern = r’\bT\w+’
with_case = re.compile(pattern)
without_case = re.compile(pattern, re.IGNORECASE)
print(text)
print(pattern)
print ’Case-sensitive:’
for match in with_case.findall(text):
print(match)
print ’Case-insensitive:’
for match in without_case.findall(text):
print(match)

Since the pattern includes the literal T , without setting IGNORECASE , the
only match is the word This . When case is ignored, text also matches.
PYTHON ADVANCED PROGRAMMING

Input with Multiple Lines

Two flags affect how searching in multiline input works: MULTILINE and DOTALL .

The MULTILINE flag controls how the pattern-matching code processes anchoring
instructions for text containing newline characters. When multiline mode is turned on,
the anchor rules for ^ and $ apply at the beginning and end of each line, in addition to
the entire string.
PYTHON ADVANCED PROGRAMMING

import re
text = ’This is some text -- with punctuation.\nA second line.’
pattern = r’(^\w+)|(\w+\S*$)’
single_line = re.compile(pattern)
multiline = re.compile(pattern, re.MULTILINE)
print(‘text’)
print(‘pattern’)
print(’Single Line’)
for match in single_line.findall(text):
print ’ %r’ % (match,)
print(‘Multiline’)

for match in multiline.findall(text):


print(match)

The pattern in the example matches the first or last word of the input. It
matches line. at the end of the string, even though there is no newline.
PYTHON ADVANCED PROGRAMMING

The previous code can print output for single line and multiline
something like this

Single Line :
(’This’, ’’)
(’’, ’line.’)

Multiline:

(’This’, ’’)
(’’, ’punctuation.’)
(’A’, ’’)
(’’, ’line.’)
PYTHON ADVANCED PROGRAMMING

DOTALL is the other flag related to multiline text. Normally, the dot character ( . )
matches everything in the input text except a newline character. The flag allows
dot to match newlines as well.

import re
text = ’This is some text -- with punctuation.\nA second line.’
pattern = r’.+’
no_newlines = re.compile(pattern)
dotall = re.compile(pattern, re.DOTALL)
print(‘text’)
print(‘pattern’)
print ’No newlines :’
for match in no_newlines.findall(text):
print(‘match’)
print ’Dotall:’
for match in dotall.findall(text):
print(‘match’)
PYTHON ADVANCED PROGRAMMING

Without the flag, each line of the input text matches the pattern separately. Adding
the flag causes the entire string to be consumed.

The output for the previous program can be something like this:

No newlines :
’This is some text -- with punctuation.’
’A second line.’

Dotall

’This is some text -- with punctuation.\nA second line.’


THANK YOU

Chitra G M and P Rama Devi


Department of CSE
pramadevi@pes.edu
chitragm@pes.edu
Threads
A process is an active program i.e. a program that is under execution. It is more than the
program code as it includes the program counter, process stack, registers, program code
etc. Compared to this, the program code is only the text section.
A thread is a lightweight process that can be managed independently by a scheduler. It
improves the application performance using parallelism. A thread shares information like
data segment, code segment, files etc. with its peer threads while it contains its own
registers, stack, counter etc.
For simplicity, you can assume that a thread is simply a subset of a process!

def print_cube(num):

print("cube of a number",num * num * num)

def print_square(num):

print("cube of a number",num * num)

if __name__ == "__main__":

# creating thread

t1 = threading.Thread(target=print_square, args=(10,))

t2 = threading.Thread(target=print_cube, args=(10,))

# starting thread 1

t1.start()

# starting thread 2

t2.start()

# wait until thread 1 is completely executed

t1.join()

# wait until thread 2 is completely executed

t2.join()
# both threads completely executed

print("Done!")

To create a new thread, we create an object of Thread class. It takes following


arguments:
 target: the function to be executed by thread
 args: the arguments to be passed to the target function

To start a thread, we use start method of Thread class.

Once the threads start, the current program (you can think of it like a main thread) also
keeps on executing. In order to stop execution of current program until a thread is
complete, we use join method.

As a result, the current program will first wait for the completion of t1 and then t2.
Once, they are finished, the remaining statements of current program are executed.

import threading
import os

def task1():
print(threading.current_thread().name)

def task2():
print(threading.current_thread().name)

if __name__ == "__main__":

# print ID of current process


print("ID of process running main program:",os.getpid())

# print name of main thread


print("Main thread name:",threading.current_thread().name)
# get the current thread object

# creating threads
t1 = threading.Thread(target=task1, name='t1')
t2 = threading.Thread(target=task2, name='t2')

# starting threads
t1.start()
t2.start()

# wait until all threads finish


t1.join()
t2.join()

------------------------------------------------------------------------------------
Concept of global

1)
x=20 # global
def f1():
print(x)

f1() # 20

2)
x=20 # global
def f1():
x=x+1 # error
print(x)

f1()

3)
x=20 # global
def f1():
global x
x=x+1
print(x)

f1() # 21

Example 3:
import threading
# global variable x
x=0
def increment_global():
global x
x += 1
def taskofThread():
for _ in range(1000000):
increment_global()
def main():
global x
x=0
t1 = threading.Thread(target= taskofThread)
t2 = threading.Thread(target= taskofThread)
t1.start()
t2.start()
t1.join()
t2.join()
if __name__ == "__main__":
for i in range(10):
main()
print(i,x)

import time
import threading
from threading import Thread

def i_am_thread():
time.sleep(10)
print("this is thread")

th = Thread(target= i_am_thread,name='fast')
th.start()
print("Current Thread count:",threading.active_count())
print("state:",th.is_alive())
th.join()

print("state:",th.is_alive())
print("get name:",th.getName())
print("Main thread name:",threading.main_thread().name)

References:
www.python.org
www.geeksforgeeks.com
www.tutorialspoint.com
What is a race condition?
A race condition occurs when two or more threads try to access a shared variable
simultaneously, leading to unpredictable outcomes.
In this scenario, the first thread reads the value from the shared variable. At the same time,
the second thread also reads the value from the same shared variable.
Then both threads attempt to change the value of the shared variable. since the updates
occur simultaneously, it creates a race to determine which thread’s modification is
preserved.
The final value of the shared variable depends on which thread completes its update last.
Whatever thread that changes the value last will win the race.

Race condition example


The following example illustrates a race condition:
from threading import Thread
from time import sleep
counter = 0
def increase(by):
global counter
local_counter = counter
local_counter += by
sleep(0.1)
counter = local_counter
print(f'counter={counter}')
# create threads
t1 = Thread(target=increase, args=(10,))
t2 = Thread(target=increase, args=(20,))
# start the threads
t1.start()
t2.start()
# wait for the threads to complete
t1.join()
t2.join()
print(f'The final counter is {counter}')

In this program, both threads try to modify the value of the counter variable at the same
time. The value of the counter variable depends on which thread completes last.
If the thread t1 completes before the thread t2, you’ll see the following output:
counter=10
counter=20
The counter is 20
Otherwise, you’ll see the following output:
counter=20
counter=10
The final counter is 10
Using a threading lock to prevent the race condition

To prevent race conditions, you can use a threading lock.


A threading lock is a synchronization primitive that provides exclusive access to a shared
resource in a multithreaded application. A thread lock is also known as a mutex which is
short for mutual exclusion.
Typically, a threading lock has two states: locked and unlocked. When a thread acquires a
lock, the lock enters the locked state. The thread can have exclusive access to the shared
resource.
Other threads that attempt to acquire the lock while it is locked will be blocked and wait
until the lock is released.
In Python, you can use the Lock class from the threading module to create a lock object:
First, create an instance the Lock class:
lock = Lock()
By default, the lock is unlocked until a thread acquires it.
Second, acquire a lock by calling the acquire() method:
lock.acquire()
Third, release the lock once the thread completes changing the shared variable:
lock.release()

The following example shows how to use the Lock object to prevent the race condition in
the previous program:
from threading import Thread, Lock
from time import sleep
counter = 0
def increase(by, lock):
global counter
lock.acquire()
local_counter = counter
local_counter += by
sleep(0.1)
counter = local_counter
print(f'counter={counter}')
lock.release()
lock = Lock()
# create threads
t1 = Thread(target=increase, args=(10, lock))
t2 = Thread(target=increase, args=(20, lock))
# start the threads
t1.start()
t2.start()
# wait for the threads to complete
t1.join()
t2.join()
print(f'The final counter is {counter}')
Output:
counter=10
counter=30
The final counter is 30

Acknowledgement:
https://www.pythontutorial.net/python-concurrency/python-threading-lock/
PYTHON ADVANCED
PROGRAMMING

Chitra G M and P Rama Devi


Department of Computer Science Engineering
PYTHON ADVANCED
PROGRAMMING

P Rama Devi
Department of Computer Science and Engineering
PYTHON ADVANCED PROGRAMMING
Search Options

The way the matching engine processes an expression can be changed using
option flags. The flags can be combined using a bitwise OR operation, then
passed to compile() , search() , match() , and other functions that accept a
pattern for searching.

Case-Insensitive Matching

IGNORECASE causes literal characters and character ranges in the pattern to match both
uppercase and lowercase characters.
PYTHON ADVANCED PROGRAMMING

import re
text = ’This is some text -- with punctuation.’
pattern = r’\bT\w+’
with_case = re.compile(pattern)
without_case = re.compile(pattern, re.IGNORECASE)
print(text)
print(pattern)
print ’Case-sensitive:’
for match in with_case.findall(text):
print(match)
print ’Case-insensitive:’
for match in without_case.findall(text):
print(match)

Since the pattern includes the literal T , without setting IGNORECASE , the
only match is the word This . When case is ignored, text also matches.
PYTHON ADVANCED PROGRAMMING

Input with Multiple Lines

Two flags affect how searching in multiline input works: MULTILINE and DOTALL .

The MULTILINE flag controls how the pattern-matching code processes anchoring
instructions for text containing newline characters. When multiline mode is turned on,
the anchor rules for ^ and $ apply at the beginning and end of each line, in addition to
the entire string.
PYTHON ADVANCED PROGRAMMING

import re
text = ’This is some text -- with punctuation.\nA second line.’
pattern = r’(^\w+)|(\w+\S*$)’
single_line = re.compile(pattern)
multiline = re.compile(pattern, re.MULTILINE)
print(‘text’)
print(‘pattern’)
print(’Single Line’)
for match in single_line.findall(text):
print ’ %r’ % (match,)
print(‘Multiline’)

for match in multiline.findall(text):


print(match)

The pattern in the example matches the first or last word of the input. It
matches line. at the end of the string, even though there is no newline.
PYTHON ADVANCED PROGRAMMING

The previous code can print output for single line and multiline
something like this

Single Line :
(’This’, ’’)
(’’, ’line.’)

Multiline:

(’This’, ’’)
(’’, ’punctuation.’)
(’A’, ’’)
(’’, ’line.’)
PYTHON ADVANCED PROGRAMMING

DOTALL is the other flag related to multiline text. Normally, the dot character ( . )
matches everything in the input text except a newline character. The flag allows
dot to match newlines as well.

import re
text = ’This is some text -- with punctuation.\nA second line.’
pattern = r’.+’
no_newlines = re.compile(pattern)
dotall = re.compile(pattern, re.DOTALL)
print(‘text’)
print(‘pattern’)
print ’No newlines :’
for match in no_newlines.findall(text):
print(‘match’)
print ’Dotall:’
for match in dotall.findall(text):
print(‘match’)
PYTHON ADVANCED PROGRAMMING

Without the flag, each line of the input text matches the pattern separately. Adding
the flag causes the entire string to be consumed.

The output for the previous program can be something like this:

No newlines :
’This is some text -- with punctuation.’
’A second line.’
Dotall
:
’This is some text -- with punctuation.\nA second line.’
THANK YOU

Chitra G M and P Rama Devi


Department of CSE
pramadevi@pes.edu
chitragm@pes.edu

You might also like