You are on page 1of 10

Pyhton Programming Language

- Useful URLs
o https://docs.python.org/3/
o Tutorial: https://docs.python.org/3.10/tutorial/index.html
o https://www.w3schools.com/python/
o Material del curso (incluye código fuente de los programas de ejemplo):
https://www.py4e.com/materials
- None value
o maxCount = None
o if maxCount is None : …
- To see available methods: --> dir(obj)
- Strings
o ‘n’ in ‘banana’, return True (logical operator), or ‘nan’ in ‘banana’
o Comparisons
▪ If word == “banana”:
o “Please call”.startswith (“Plea”) == True
▪ Or “Please call”.startswith (“p”) == False
o lstrip,rstrip, strip, removes whitespace from left, right or both (meaning also \n\r\t)
o “Hello Bob”.replace (“Bob”,”Jane”)
o lower, upper, capitalize
o str.startswith (“From:”)
o find, looking for text, returns where (pos)
▪ “Hello”.find (“o”) == 4
▪ you can ask to start from a specific position:
str.find (“whatever”, startPos)
o multi-line strings: triple quotes, sometimes used for comments as well: ‘’’ dssdsd
dsdsds’’’
- Files
o Reading the whole file into a single string:
▪ str = fHandle.read()
- Lists
o List constants: [1, 2, 4], or [‘hi’, ‘how’, ‘are’, ‘you’]
o Can hold any Python object, so:
▪ Can be multi-type: [ 1, ‘how’, 2.93 ]
▪ Can have a list within a list [ 1, [ 3, 5], 2 ]
o It can be empty
o They match well with definite loops (for XXX in list: …)
o You access members with [], friends = [ 2, 4, 10]
▪ So friends [2] == 10
o Lists are mutable (strings are not)
o Lists have length
▪ list = [0,1,2,3]
▪ len (list) == 4
o range(N) function
▪ returns a list from 0 to N-1
▪ useful to loop through members of a list
▪ for i in range( len(list) ) :
• …
▪ You could also iterate through the list directly
• for l in list :
o …
o Manipulating lists
▪ Concatenation: via + operator
▪ Slicing via : operator --> [:3] means from beginning up to 3 but not including
▪ Pre-built methods:
• https://docs.python.org/3/tutorial/datastructures.html#more-on-lists
• append(x) something to the end
• Len --> how many items
• Count: how many items match a certain criterium
• pop([pos ]) --> extract item from pos (or end if no param given)
• index(x, [start[,end]]) --> return 0-based index of where x is in the list (with
optional range filter)
• insert (index, what) --> insert (0,x) --> insert at the front!
• reverse
• sort
• Constructor:
o lista = list()
o lista.append (“uno”)
• Check if smthg is in a list
o some = [ 1, 2, 3]
o 4 in some == False
• sum, max, min, len work on lists
▪ str.split() returns a list or words within a string
o ‘Hello how are you’.split() --> [‘Hello’, ‘how’,’are’,’you’]
o str.split( [delimiter] ) By default spaces
- Dictionaries
o purse = dict() # Constructor
o purse = { ‘paco’:25, ‘pepe’: 10, ‘joan’:5 }
o No order (hash??). It seems they ARE ordered from Python 3.7…
o Mutable
o Can be empty purse = {}
o Check if key is present:
▪ if ‘pura’ in purse: //yields False
o get() method to check and if not found returns default value
▪ purse.get (‘pura’,0): # returns 0 if not found, no traceback, value if found
o You can iterate through the dictionary’s keys
▪ for key in purse:
• print ( purse[key] )
o You can get a list of keys, values or both from a dictionary
▪ list (purse) --> return view of list of keys --> [ ‘pepe’, ‘paco’,‘joan’ ]
▪ k=purse.keys() --> [ ‘pepe’, ‘paco’,‘joan’ ] #same as above more explicit
▪ v=purse.values() --> [ 10, 25,5 ]
▪ # order not the same as in dictionary, but both keys() and values() are coherent so k[0]
and v[0] match (but does not have to first item you inserted in purse, dicts have no
order)
▪ purse.items() --> view of list of tuples [ (‘pepe’, 10),( ‘paco’,25), (‘joan’,5) ]
▪ You can iterate through a list of tuples:
• for k,v in purse.items() :
o print (k,v) #again, order you put items in dict not guaranteed
• You get
o pepe 10 #but you inserted paco first in the dict…
o paco 25
o joan 5
- Tuples
o Look like lists but can’t be modified (inmutable)
o Therefore you can’t tup.sort(), .append() or reverse()
o tup = ( ‘paco’, ‘pepe’, ‘joan’ )
o tup[2] # == ‘joan’
o You can iterate in for loops
▪ for nom in tup : …
o max(tup)
o You can usr a tuple on the left-hand side
▪ (x,y) = (9,23)
▪ print(y) # == 23
▪ #the same applies in for (k,v) in dict.items()…
o Tuples are comparable
o This means you can get a sorted version of a dictionary by using the sorted() function on its
items() method
▪ list = sorted (dict.items())
▪ for k,v in list :
• print ( k,v)
o Sorting a dict by values: create temp list and store there v,k tuples (not k,v), then use sorted
on it

- Comprehensions
o Creates a “dynamic” object on the fly
▪ Example for list:
• sorted ( [ (v,k) for k,v in dict.items() ] )
- Regular expressions
o You need to import a module --> import re
o https://docs.python.org/3/howto/regex.html

^ Matches the beginning of a line


$ Matches the end of the line
. Matches any character
\s Matches whitespace
\S Matches any non-whitespace character
* Repeats a character zero or more times
*? Repeats a character zero or more times (non-greedy)
+ Repeats a character one or more times
+? Repeats a character one or more times (non-greedy)
[aeiou] Matches a single character in the listed set
[^XYZ] Matches a single character not in the listed set
[a-z0-9] The set of characters can include a range
( Indicates where string extraction is to start
) Indicates where string extraction is to end

o \$ a dollar sign
o [0-9] a digit
o [0-9]+ 1 or more digits
o [aeiou] a vowel
o Ex: ^X-\S+:
o re.search(“^From: ”, line ) --> check is a string matches a re
▪ returns True or False
o WARNING: by default greedy matching: the largest possible outcome
▪ ^F.+:
▪ Applied to this “From: dsdsds: “ --> does not stop at From:, it goes until the end
because it’s also a match and it’s larger
o ? --> not greedy
▪ ^F.+?: Now it stops at “From:”
o re.findall() --> extract portions of a string that match a re
▪ returns list of matching strings:
▪ x = “My 2 favorute number are 3 and 9”
▪ y = re.findall (‘[0-9]+’, x)
▪ print (y) --> [ ‘2’, ‘3’, ‘9’]
o To get email addresses: -> \S+@\S+
o () they tell where the extraction starts in findall, which can be after the matching occurs
▪ The extracting is a part of the matching, we only want a part of what we match
▪ To extract email addresses but only in lines which start with “From “
• “^From (\S+@\S+)”
▪ To get domain name in address:
• re.findall ( “@([^ ]*)”, lin) -> find @ then start extracting all non-blank
characters
• even more precise, check for lines that start with “From “
o re.findall ( “^From .*@([^ ]*)”, lin)
- Networking
o Textbook and resources: https://www.net-intro.com/
o Transport layer provides end-to-end pipe:
▪ sockets (the pipe) + ports (for specific applications)
o HTTP uses port 80 and HTTPS port 443 --> this is the dominant aplication layer protocol on the
Internet
▪ Invented for the web, to retrieve HTML, images, documents, etc
▪ Then extended to retrieve data, RSS, web services,
▪ http://www.pepe.org/page.html
▪ protocol host document
▪ When you click on a web page, there is href = value with info to switch to a new page,
the browser makes a connection to the web server and issues a GET request, to get
the content of the page at the specified URL --> the server returns the HTML
document, and the browser formats and displays it
o Sockets in Python:
▪ import socket
▪ mysocket = socket.socked (socket.AF_INET, socket.SOCK_STREAM)
▪ mysocket.connect( (‘data.py4e.org’,80) ) (HOST, port) tuple
▪ Main methods for sockets
• socket (constructor)
• connect (establish connection)
• send
• recv
• close
▪ When sending and receiving you need to encode() and decode() to get UTF-8 (Python
strings use UNICODE so you have to convert to UTF8 for the server dialogue)
o Developer console in a browser
▪ Refresh page and check Network section
▪ There, if you click on the document under Name, you can see Headers, preview and
Response

o You can send parameters in the URL with ?


▪ https://www.wa4e.com/code/arrays/guess.php?guess=12
▪ You can send many: https://www.wa4e.com/code/arrays/guess.php?guess=12&abc=2
o Character Codes
▪ ord(‘H’) gives the UNICODE code for a character
▪ UTF = UNICODE Transformation Format
▪ UTF-16 fixed lebgth 2 bytes
▪ UTF-32 fixed length 4 bytes
▪ UTF-8
• 1-4 bytes, compatible with ASCII
• recommended way for encoding data between systems
▪ In Python 3, all strings are UNICODE-based
▪ If you want bytes you have to force
• x = b’abc’
▪ When talking to a network socket we need to send bytes, so we have to encode() and
decode() from Python strings, which are un Unicode
▪ str.encode() from str to bytes // UTF-8 encoding by default
▪ bytes.decode() from bytes to str // UTF-8 encoding by default
o urllib in Python, to simplify HTTP (it encapsulates sockets for us), allows to manage web pages
as a “file” (with a handle)
▪ import urllib.request, urllib.parse, urllib.error
▪ fhand = urllib.request.urlopen ( 'http://data.pr4e.org/romeo.txt')
▪ for line in fhand:
▪ print (line.decode().strip())
▪ The headers are not returned, but you can get them with the method
fhand.getheaders()
• Headers = dict ( fhand.getheaders() ) #it seems you need to convert what
the function returns to a dictionary (it rturns a list of tuples??
• Or get specific contgent from the headers:
o fhand.getheader('Content-Type') → "text/html; charset=utf-8"
▪ You can also read the entire document with str = fhand.read()
o Parsing the web
▪ Useful library that parses HTML pages:
• Beautiful Soup from www.crummy.com
o Webservices and XML
▪ “Serialize” data to be sent and then De-serialize, using a document with data format
(wire format) such as XML or JSON
▪ XML
• eXtensible Markup Lamguage, to share structures data, precursor of HTML
• Some similarities with HTML (there are tags marked by <XXX> SSSS </XXX>
• Simple elements or nodes → <name> Chuck </name)
• Complex elements, which have multiple simple elements
• Start tag <person>
• End tag </person>
• Text content between tags
• Atribute within the initial tag, surrounded by “” <phone type=”intl”>
• Self closing tag <email hide=”yes” />, no content
• White space does not matter in general (except in the middle of content/ text
areas), but we indent to make it readable
• You can picture an XML file as a tree, elements are branches, then at the
bottom you hang text (one per element) and atributes (0 or many)
• Because it’s a tree, you can have paths (route to each element):
o /a/b X
o /a/c/d Y
o /a/c/e Z
• XML Schema
o A contract which defines the legal format of an XML document so that
applications know how to send / receive info (tree structure, names
od elements, data types (string,integer, date, etc).
o If a particular XML doc meets the specification of the contract
(schema), it is said to validate
o Many schema languages have been defined (DTD – Document Type
Definition), SGML ISO 8879:1986 Standard Generalized Markup
Langauge)
o The most common is XSD - XML Schema from W3C www consortium),
files end with .xsd extension
▪ Darta types: string, decimal(float), integer, date (YYYY-MM-
DD), dateTime (ISO 8601 format, YYYY-MM-DDTHH:MM:SSZ, Z
= time zone typically Z=UTC/GMT)
• Parsing XML
o import xml.etree.ElementTree as ET (ET = alias for the whole library
name)

import xml.etree.ElementTree as ET

data = '''

<person>

<name>Chuck</name>

<phone type="intl">

+1 734 303 4456

</phone>

<email hide="yes" />

</person>'''

tree = ET.fromstring(data)

print('Name:', tree.find('name').text) #gets the text of the tag


print('Attr:', tree.find('email').get('hide')) #gets the attrinute hide of the tag

o Now getting multiple elements within a complex element

import xml.etree.ElementTree as ET

input = '''

<stuff>

<users>

<user x="2">

<id>001</id>

<name>Chuck</name>

</user>

<user x="7">

<id>009</id>

<name>Brent</name>

</user>

</users>

</stuff>'''

stuff = ET.fromstring(input)

lst = stuff.findall('users/user') #returns a list of tags

print('User count:', len(lst))

for item in lst:

print('Name', item.find('name').text)

print('Id', item.find('id').text)

print('Attribute', item.get('x')) # get attribute

o JSON
▪ import json
▪ See example json1.py
▪ It comes from javascript data types
▪ JSON represents data as nested
• dictionaries { … } (Objects in javascript)
• lists [ … ] (Arrays in js)
▪ info = json.loads (str) #loads json data contained in a str
▪ print ( json.dumps ( info, indent=4 ) ) #the reverse, produces formatted string from
json object
▪ Returns a Python dict()
o APIs
▪ Producer / Consumer
▪ We typically are the consumers
▪ Example we’ll use: Geo Coding from Google (you send aprox. address and returns
details)
• There are limits on number of request per day on the free version (2500)
• The header tells you that, check x-rate-limit-remaining or something similar
• X-rate-limit-reset tells you when you’ll get new requests available
• Same for twitter or otherpublic APIs, to limit abuse or commercial use (you
have to pay for that)
▪ When using APIs, you need to call them with a URL, to pass parameters you need to
place them in the URL (with ?), properly encoded (URL encoding, urllib.parse does
that)
• Blank = +
• , = %20
▪ When secured you need to use tokens, keys, etc to SIGN the url (so the producers
know it’s you – authenticate) – see twitter example
• oauth
- Databases
o Classes in Python

class partyAnimal :

x=0

def party (self) : # self is not a reserved word, you could use a different word, but it’s common practice

self.x = self.x +1

print (“so far “, self.x

an = partyAnimal()

an.party()

o Constructors and destructors (both methods are optional)


▪ x = “Hello” # contruct a str
▪ x = 123 # destruct the str and now construct an int
o Constructor: def __init__ (self): …
o Destructor: def __del__ (self): …
o Inheritance class Dog (partyAnimal) :
o
o Databases: SQLite
▪ SQLite Browser (there is a Chrome extension also) https://sqlitebrowser.org
▪ Database: contains many tables
▪ Table (relation) – contains many tuples and atributes
▪ Tuple (row) – a set of fields
▪ Attribute (column, or field)
▪ Schema: meta data (data types, restrictions)
▪ SQL Structured Query Language, abstraction to work with the DB (Create, Retrieve,
Insert, Update, Delete data)
o Database Model or Database Schema: structure or format of a database
o Main DB Management Systems:
▪ Oracle
▪ SQL Server
▪ Opensource
• MySQL simple, fast, (purchased by Oracle, still Opensource,
copy and managed by original MySQL team → MariaDB
• Postgress similar to Oracle, powerful
• SQLite Embedded DB, very simple, fast and small, as opposed to Oracle, or
MySQl, which are separate piecesof SW. It’s the one we’re going to use, it’s
included in Python
o SQLite
▪ CREATE TABLE Users (
name VARCHAR (128),
email VARCHAR (128)
)
▪ INSERT INTO Users (name,email) VALUES ( 'Montse', 'montse@test.com' )
▪ DELETE FROM Users WHERE email=’xxx@test.com’
▪ UPDATE Users SET name=’Montse’ WHERE email=’montse@test.com’
• SET column=new value WHERE condition (if not it would change all rows)
▪ SELECT * FROM Users
▪ SELECT * FROM Users WHERE email=’montse@test.com’
▪ SELECT * FROM Users ORDER by name
o SQLite in Python
▪ Import sqlite3
▪ conn = sqlite3.connect('emaildb.sqlite') #open DB
▪ cur = conn.cursor() #get cursor to work with it
▪ To update a value in a row, better to use an ATOMIC instruction, useful when many
apps are interacting with the DB at the same time:
cur.execute('UPDATE Counts SET count = count + 1 WHERE email = ?', (email,))
o Data modelling
▪ Primary key : track ID
▪ Logical key : Track description - useful to search for a particular item
(it will be used in a WHERE clause or ORDER clause)
▪ Foreign key : Album ID
▪ Primary keys are typically INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT
UNIQUE

You might also like