Professional Documents
Culture Documents
PROGRAMMING
Basic Concepts
P Rama Devi
Department of Computer Science and Engineering
PYTHON ADVANCED PROGRAMMING
re—Regular Expressions
Regular expressions are text-matching patterns described with a formal syntax. The
patterns are interpreted as a set of instructions, which are then executed with a string
as input to produce a matching subset or modified version of the original
Regular expressions are typically used in applications that involve a lot of text
processing. For example, they are commonly used as search patterns in text-editing
programs used by developers, including vi, emacs, and modern IDEs
PYTHON ADVANCED PROGRAMMING
Finding Patterns in Text
The most common use for re is to search for patterns in text. The search() function
takes the pattern and text to scan, and returns a Match object when the pattern is found.
If the pattern is not found, search() returns None .
import re
pattern = ’this’
text = ’Does this text match the pattern?’
match = re.search(pattern, text)
s = match.start()
e = match.end()
PYTHON ADVANCED PROGRAMMING
Compiling Expressions
re includes module-level functions for working with regular expressions as text strings,
but it is more efficient to compile the expressions a program uses frequently. The com-
pile() function converts an expression string into a RegexObject .
1 \b - Word Boundary
2 \B - Not a Word Boundary
3 ^ - Beginning of a String
4 $ - End of a String
PYTHON ADVANCED PROGRAMMING
Special Characters
3 | - Either Or
4 ( ) - Group
PYTHON ADVANCED PROGRAMMING
Special Characters
Quanti3ers:
1 * - 0 or More
2 + - 1 or More
3 ? - 0 or One
4 {3} - Exact Number
5 {3,4} - Range of Numbers (Minimum,
Maximum)
THANK YOU
P Rama Devi
Department of Computer Science and Engineering
PYTHON ADVANCED PROGRAMMING
Multiple Matches
Search() is used to look for single instances of literal text strings. The
findall() function returns all substrings of the input that
match the pattern without overlapping.
import re
text = ’abbaaabbbbaaaaa’
pattern = ’ab’
for match in re.findall(pattern, text):
print(‘Found’,match)
Output:
Found ‘ab’
Found ‘ab’
PYTHON ADVANCED PROGRAMMING
import re
text = ’abbaaabbbbaaaaa’
pattern = ’ab’
for match in re.finditer(pattern, text):
s = match.start()
e = match.end()
print ’Found (text[s:e], s, e)
Output:
0:2
5:7
PYTHON ADVANCED PROGRAMMING
Raw String: String prefixed with r. Tells not to handle backslashes in any
special way.
Sample example:
print(‘\ttab’)
Output: tab
print(r’\ttab’)
Output:\ttab
PYTHON ADVANCED PROGRAMMING
Regarding ‘.’:
pattern=re.compile(r’.’)
pattern=re.compile(r’\.’)
Example:
pattern=re.compile(r’rama\.com’)
pattern=re.compile(r’\bHa’)
\b - word boundary
Output will be words of Ha but does not include last Ha since it is not a word
boundary. A word boundary will be start with a line word or a word followed by a
space
Ex: pattern=re.compile(r’\BHa’)
^ - Begin of string
$ - end of string
pattern=re.compile(r’^start’)
pattern=re.compile(r’^a’)
pattern=re.compile(r’end$’)
Output will be given as the text end is at the last
pattern=re.compile(r’a$’)
Meta characters can be used for various use cases like matching phone numbers
ex. 080-12312
pattern=re.compile(r’\d\d\d.\d\d\d\d\d’)
THANK YOU
P Rama Devi
Department of Computer Science and Engineering
PYTHON ADVANCED PROGRAMMING
Character Set:
Example: re.compile(r’\d\d\d[-.]\d\d\d[-.]\d\d\d\d’)
If input is 321--431-1234
Above will not be matched since [-.] will match either – or . But not second – given in
input.
PYTHON ADVANCED PROGRAMMING
Example: 800-123-6721
900-213-4567
pattern=re.compile(r’[89]00[-.]\d\d\d[-.]\d\d\d\d’)
Output:
800-123-6721
900-213-4567
PYTHON ADVANCED PROGRAMMING
pattern=re.compile(r’[1-5]’)
re.compile(r’[a-z]’)
re.compile(r’[a-zA-Z]’)
re.compile(r’^[A-Z][a-z]’)
Above one will negate everything not in character set. i.e not a lower case ,
upper case
PYTHON ADVANCED PROGRAMMING
Example:
Cat
Mat
Pat
Bat
Output:
Cat
Mat
Pat
PYTHON ADVANCED PROGRAMMING
Quantifiers:
* - 0 or 1
+ - 1 or more
? - 0 or one
{3} – exact number
{3,4} – range of numbers (minimum , maximum)
PYTHON ADVANCED PROGRAMMING
pattern=re.compile(r’Mr\.’)
matches = pattern.finditer(text)
for match in matches:
print(match)
Therefore re.compile(r’Mr\.?’)
re.compile(r’Mr\.?\s[A-Z]\w*’)
Output:
Mr. Ram
Mr Kevin
Mr. T
This did not match Ms, Mrs. Lets see use of groups in next class where we can use
groups concept to print all names
THANK YOU
P Rama Devi
Department of Computer Science and Engineering
PYTHON ADVANCED PROGRAMMING
Groups
Searching for pattern matches is the basis of the powerful capabilities provided
by regular expressions. Adding groups to a pattern isolates parts of the
matching text, expanding those capabilities to create a parser. Groups are
defined by enclosing patterns in parentheses ( ( and ) ).
re.compile(r’(Mr|Ms|Mrs)\.?\s[A-Z]\w*’)
PYTHON ADVANCED PROGRAMMING
Another Example:
import re
text='''
Mr. Sachin
Mr Virat
Mrs Karanam
Ms Sindhu
Mr. R
'''
pattern = re.compile(r'(Mr|Ms|Mrs)\.?\s[A-Z]\w*')
matches=pattern.finditer(text)
for match in matches:
print(match)
PYTHON ADVANCED PROGRAMMING
emails='''
ramadevi@gmail.com
rama.devi@pes.edu
p-rama-devi@my-work.net
'''
pattern = re.compile(r'[a-zA-Z.]+@[a-zA-Z]+\.(com|edu)')
matches=pattern.finditer(emails)
for match in matches:
print(match)
Output:
ramadevi@gmail.com
rama.devi@pes.edu
p-rama-devi@my-work.net
PYTHON ADVANCED PROGRAMMING
urls=
‘’’
https://www.google.com
http://ramadevi.com
https://pep.com
https://www.isro.gov
Question: we want to capture domain name and top level domain name
Example: google.com
If we use re.compile(r’https?://(www\.)?\w+\.\w+’)
Output all 4 will be matched. Lets see how to get the answer for the question
PYTHON ADVANCED PROGRAMMING
pattern=re.compile(r’https?://(www\.)?(\w+)(\.\w+)’)
matches=pattern.finditer(urls)
for match in matches:
print(match.group(2))
match.group(0)
pattern=re.compile(r’https?://(www\.)?(\w+)(\.\w+)’)
subbed_urls=pattern.sub(r’\2\3’,urls)
print(subbed_urls)
P Rama Devi
Department of Computer Science and Engineering
PYTHON ADVANCED PROGRAMMING
Search Options
The way the matching engine processes an expression can be changed using
option flags. The flags can be combined using a bitwise OR operation, then
passed to compile() , search() , match() , and other functions that accept a
pattern for searching.
Case-Insensitive Matching
IGNORECASE causes literal characters and character ranges in the pattern to match both
uppercase and lowercase characters.
PYTHON ADVANCED PROGRAMMING
import re
text = ’This is some text -- with punctuation.’
pattern = r’\bT\w+’
with_case = re.compile(pattern)
without_case = re.compile(pattern, re.IGNORECASE)
print(text)
print(pattern)
print ’Case-sensitive:’
for match in with_case.findall(text):
print(match)
print ’Case-insensitive:’
for match in without_case.findall(text):
print(match)
Since the pattern includes the literal T , without setting IGNORECASE , the
only match is the word This . When case is ignored, text also matches.
PYTHON ADVANCED PROGRAMMING
Two flags affect how searching in multiline input works: MULTILINE and DOTALL .
The MULTILINE flag controls how the pattern-matching code processes anchoring
instructions for text containing newline characters. When multiline mode is turned on,
the anchor rules for ^ and $ apply at the beginning and end of each line, in addition to
the entire string.
PYTHON ADVANCED PROGRAMMING
import re
text = ’This is some text -- with punctuation.\nA second line.’
pattern = r’(^\w+)|(\w+\S*$)’
single_line = re.compile(pattern)
multiline = re.compile(pattern, re.MULTILINE)
print(‘text’)
print(‘pattern’)
print(’Single Line’)
for match in single_line.findall(text):
print ’ %r’ % (match,)
print(‘Multiline’)
The pattern in the example matches the first or last word of the input. It
matches line. at the end of the string, even though there is no newline.
PYTHON ADVANCED PROGRAMMING
The previous code can print output for single line and multiline
something like this
Single Line :
(’This’, ’’)
(’’, ’line.’)
Multiline:
(’This’, ’’)
(’’, ’punctuation.’)
(’A’, ’’)
(’’, ’line.’)
PYTHON ADVANCED PROGRAMMING
DOTALL is the other flag related to multiline text. Normally, the dot character ( . )
matches everything in the input text except a newline character. The flag allows
dot to match newlines as well.
import re
text = ’This is some text -- with punctuation.\nA second line.’
pattern = r’.+’
no_newlines = re.compile(pattern)
dotall = re.compile(pattern, re.DOTALL)
print(‘text’)
print(‘pattern’)
print ’No newlines :’
for match in no_newlines.findall(text):
print(‘match’)
print ’Dotall:’
for match in dotall.findall(text):
print(‘match’)
PYTHON ADVANCED PROGRAMMING
Without the flag, each line of the input text matches the pattern separately. Adding
the flag causes the entire string to be consumed.
The output for the previous program can be something like this:
No newlines :
’This is some text -- with punctuation.’
’A second line.’
Dotall
def print_cube(num):
def print_square(num):
if __name__ == "__main__":
# creating thread
t1 = threading.Thread(target=print_square, args=(10,))
t2 = threading.Thread(target=print_cube, args=(10,))
# starting thread 1
t1.start()
# starting thread 2
t2.start()
t1.join()
t2.join()
# both threads completely executed
print("Done!")
Once the threads start, the current program (you can think of it like a main thread) also
keeps on executing. In order to stop execution of current program until a thread is
complete, we use join method.
As a result, the current program will first wait for the completion of t1 and then t2.
Once, they are finished, the remaining statements of current program are executed.
import threading
import os
def task1():
print(threading.current_thread().name)
def task2():
print(threading.current_thread().name)
if __name__ == "__main__":
# creating threads
t1 = threading.Thread(target=task1, name='t1')
t2 = threading.Thread(target=task2, name='t2')
# starting threads
t1.start()
t2.start()
------------------------------------------------------------------------------------
Concept of global
1)
x=20 # global
def f1():
print(x)
f1() # 20
2)
x=20 # global
def f1():
x=x+1 # error
print(x)
f1()
3)
x=20 # global
def f1():
global x
x=x+1
print(x)
f1() # 21
Example 3:
import threading
# global variable x
x=0
def increment_global():
global x
x += 1
def taskofThread():
for _ in range(1000000):
increment_global()
def main():
global x
x=0
t1 = threading.Thread(target= taskofThread)
t2 = threading.Thread(target= taskofThread)
t1.start()
t2.start()
t1.join()
t2.join()
if __name__ == "__main__":
for i in range(10):
main()
print(i,x)
import time
import threading
from threading import Thread
def i_am_thread():
time.sleep(10)
print("this is thread")
th = Thread(target= i_am_thread,name='fast')
th.start()
print("Current Thread count:",threading.active_count())
print("state:",th.is_alive())
th.join()
print("state:",th.is_alive())
print("get name:",th.getName())
print("Main thread name:",threading.main_thread().name)
References:
www.python.org
www.geeksforgeeks.com
www.tutorialspoint.com
What is a race condition?
A race condition occurs when two or more threads try to access a shared variable
simultaneously, leading to unpredictable outcomes.
In this scenario, the first thread reads the value from the shared variable. At the same time,
the second thread also reads the value from the same shared variable.
Then both threads attempt to change the value of the shared variable. since the updates
occur simultaneously, it creates a race to determine which thread’s modification is
preserved.
The final value of the shared variable depends on which thread completes its update last.
Whatever thread that changes the value last will win the race.
In this program, both threads try to modify the value of the counter variable at the same
time. The value of the counter variable depends on which thread completes last.
If the thread t1 completes before the thread t2, you’ll see the following output:
counter=10
counter=20
The counter is 20
Otherwise, you’ll see the following output:
counter=20
counter=10
The final counter is 10
Using a threading lock to prevent the race condition
The following example shows how to use the Lock object to prevent the race condition in
the previous program:
from threading import Thread, Lock
from time import sleep
counter = 0
def increase(by, lock):
global counter
lock.acquire()
local_counter = counter
local_counter += by
sleep(0.1)
counter = local_counter
print(f'counter={counter}')
lock.release()
lock = Lock()
# create threads
t1 = Thread(target=increase, args=(10, lock))
t2 = Thread(target=increase, args=(20, lock))
# start the threads
t1.start()
t2.start()
# wait for the threads to complete
t1.join()
t2.join()
print(f'The final counter is {counter}')
Output:
counter=10
counter=30
The final counter is 30
Acknowledgement:
https://www.pythontutorial.net/python-concurrency/python-threading-lock/
PYTHON ADVANCED
PROGRAMMING
P Rama Devi
Department of Computer Science and Engineering
PYTHON ADVANCED PROGRAMMING
Search Options
The way the matching engine processes an expression can be changed using
option flags. The flags can be combined using a bitwise OR operation, then
passed to compile() , search() , match() , and other functions that accept a
pattern for searching.
Case-Insensitive Matching
IGNORECASE causes literal characters and character ranges in the pattern to match both
uppercase and lowercase characters.
PYTHON ADVANCED PROGRAMMING
import re
text = ’This is some text -- with punctuation.’
pattern = r’\bT\w+’
with_case = re.compile(pattern)
without_case = re.compile(pattern, re.IGNORECASE)
print(text)
print(pattern)
print ’Case-sensitive:’
for match in with_case.findall(text):
print(match)
print ’Case-insensitive:’
for match in without_case.findall(text):
print(match)
Since the pattern includes the literal T , without setting IGNORECASE , the
only match is the word This . When case is ignored, text also matches.
PYTHON ADVANCED PROGRAMMING
Two flags affect how searching in multiline input works: MULTILINE and DOTALL .
The MULTILINE flag controls how the pattern-matching code processes anchoring
instructions for text containing newline characters. When multiline mode is turned on,
the anchor rules for ^ and $ apply at the beginning and end of each line, in addition to
the entire string.
PYTHON ADVANCED PROGRAMMING
import re
text = ’This is some text -- with punctuation.\nA second line.’
pattern = r’(^\w+)|(\w+\S*$)’
single_line = re.compile(pattern)
multiline = re.compile(pattern, re.MULTILINE)
print(‘text’)
print(‘pattern’)
print(’Single Line’)
for match in single_line.findall(text):
print ’ %r’ % (match,)
print(‘Multiline’)
The pattern in the example matches the first or last word of the input. It
matches line. at the end of the string, even though there is no newline.
PYTHON ADVANCED PROGRAMMING
The previous code can print output for single line and multiline
something like this
Single Line :
(’This’, ’’)
(’’, ’line.’)
Multiline:
(’This’, ’’)
(’’, ’punctuation.’)
(’A’, ’’)
(’’, ’line.’)
PYTHON ADVANCED PROGRAMMING
DOTALL is the other flag related to multiline text. Normally, the dot character ( . )
matches everything in the input text except a newline character. The flag allows
dot to match newlines as well.
import re
text = ’This is some text -- with punctuation.\nA second line.’
pattern = r’.+’
no_newlines = re.compile(pattern)
dotall = re.compile(pattern, re.DOTALL)
print(‘text’)
print(‘pattern’)
print ’No newlines :’
for match in no_newlines.findall(text):
print(‘match’)
print ’Dotall:’
for match in dotall.findall(text):
print(‘match’)
PYTHON ADVANCED PROGRAMMING
Without the flag, each line of the input text matches the pattern separately. Adding
the flag causes the entire string to be consumed.
The output for the previous program can be something like this:
No newlines :
’This is some text -- with punctuation.’
’A second line.’
Dotall
:
’This is some text -- with punctuation.\nA second line.’
THANK YOU