Professional Documents
Culture Documents
Progres2021 4
Progres2021 4
Request
Client HTTP Server
Response
What is a Web Service?
XML: <id>5</id>
Request
Client HTTP Server
Response
<note id=‘5’>
<to>Tove</to>
<from>Jani</from>
XML: <heading>Reminder</heading>
<body>Don't forget the diner</body>
</note>
What is a Web Service?
<soap:Envelope xmlns:soap="http://www.w3.org/2003/05/soap-envelope">
<soap:Header>
</soap:Header>
<soap:Body>
SOAP: <m:GetStockPrice xmlns:m="http://www.example.org/stock/Surya">
<m:StockName>IBM</m:StockName>
</m:GetStockPrice>
</soap:Body>
</soap:Envelope>
Request
Client HTTP Server
Response
<soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/">
<soap:Header>
<ResponseHeader xmlns="https://www.google.com/apis/ads/publisher/v201508">
<requestId>xxxxxxxxxxxxxxxxxxxx</requestId>
<responseTime>1063</responseTime>
</ResponseHeader>
</soap:Header>
<soap:Body>
<getAdUnitsByStatementResponse xmlns="https://www.google.com/apis/ads/publisher/v201508">
SOAP: <rval>
<totalResultSetSize>1</totalResultSetSize>
<startIndex>0</startIndex>
<results>
<id>2372</id>
<name>RootAdUnit</name>
<description></description>
<targetWindow>TOP</targetWindow>
<status>ACTIVE</status>
<adUnitCode>1002372</adUnitCode>
<inheritedAdSenseSettings>
<value>
<adSenseEnabled>true</adSenseEnabled>
What is a Web Service?
Url Encoded: order=date&limit=2
Request
Client HTTP Server
Response
{
"data": [{
"id": 1001,
"name": "Jim"
},
JSON: {
"id": 1002,
"name": "Matt"
}]
}
API Business Model
REST Web API
Is a web service using simpler REpresentational
State Transfer (REST) based communication.
Example:
GET : http://pokeapi.co/api/v1/pokemon/25
• Statelessness
• Cacheability
• Layered system
• Uniform interface
Resources
POST:
https://api.flickr.com/services/rest/?method=flickr.galleries.addPhoto&gallery_id=XX
Resources
URI/Resource based:
• ex: Facebook Graph Api:
GET: /{photo-id} to retrieve the info of a photo
GET: /{photo-id}/likes to retrieve the people who like it
POST: /{photo-id} to update the photo
DELETE : /{photo-id} to delete the photo
API Response:
• Flickr:
{ "stat": "fail", "code": 1, "message": "User not found" }
{ "galleries": { ... }, "stat": "ok" }
• Google Calendar:
{ "error": {"code": 403, "message": "User Rate Limit Exceeded" } }
{ "kind": "calendar#events","summary": ..., "description": ...
Response
Content-Type:
• text/plain
• text/html
• text/xml or application/xml
• application/json
• image/png
• ...
Client-side HTTP
HTTP Requests
from requests import *
manga = "http://lelscano.com"
r = get(manga)
print(f"Request status is
{r.status_code},\n"
f"Content length is
{len(r.content)} bytes,\n"
f"Request encoding
is {r.encoding},\n"
f"Text size is
{len(r.text)} chars.")
print(f"Response headers:
{r.headers}")
HTTP Requests
from requests import * Request status is 200,
Content length is 53111 bytes,
manga = "http://
lelscano.com" Request encoding is UTF-8,
Text size is 53105 chars.
r = get(manga) Response headers: {'Date': 'Wed, 04 Nov 2020
14:40:27 GMT', 'Content-Type': 'text/html;
print(f"Request status is charset=UTF-8', 'Transfer-Encoding': 'chunked',
{r.status_code},\n" 'Connection': 'keep-alive', 'Set-Cookie':
'__cfduid=da1986d3c036d3d4b0dfdbf3f16812e5f160450082
7; expires=Fri, 04-Dec-20 14:40:27 GMT; path=/;
f"Content length is domain=.lelscan.net; HttpOnly; SameSite=Lax,
{len(r.content)} bytes,\n" mobile_lelscan=0; expires=Thu, 05-Nov-2020 14:40:27
GMT; Max-Age=86400; path=lelscan.net', 'Vary':
f"Request encoding 'Accept-Encoding', 'CF-Cache-Status': 'DYNAMIC',
'cf-request-id': '06354cc73b000032c20c30b000000001',
is {r.encoding},\n"
'Expect-CT': 'max-age=604800, report-uri="https://
report-uri.cloudflare.com/cdn-cgi/beacon/expect-
f"Text size is ct"', 'Report-To': '{"endpoints":[{"url":"https:\\/\
{len(r.text)} chars.") \/a.nel.cloudflare.com\\/report?
s=KFPpQxY2A5IilAqwG6j1BXgoJEskCp%2BkW7uCp0z63eYihMbv
UnyfBx7abOP6nhy%2B5H1KHR51De457l7y84Ois4b3gD5D1Fi15R
print(f"Response headers:
rJmklRlavxKwGsFBw3fA%3D%3D"}],"group":"cf-
{r.headers}") nel","max_age":604800}', 'NEL': '{"report_to":"cf-
nel","max_age":604800}', 'Server': 'cloudflare',
'CF-RAY': '5ecf171ec88e32c2-CDG', 'Content-
Encoding': 'gzip'}
HTTP Requests
from requests import *
manga = "http://
lelscano.com"
r = get(manga)
print(f"Request status is
{r.status_code},\n"
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/
f"Content length is xhtml1/DTD/xhtml1-transitional.dtd">
{len(r.content)} bytes,\n" <html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>One Piece lecture en ligne scan</title>
<meta name="description" content="One Piece Lecture en ligne, tous les scan One
f"Request encoding is Piece." />
{r.encoding},\n" <meta name="lelscan" content="One Piece" />
<meta http-equiv="Content-Type" content="text/html;charset=ISO-8859-1" />
<meta http-equiv="Content-Language" content="fr" />
f"Text size is <meta name="keywords" content="One Piece lecture en ligne, lecture en ligne One
Piece, scan One Piece, One Piece scan, One Piece lel, lecture en ligne One Piece,
{len(r.text)} chars.") Lecture, lecture, scan, chapitre, chapitre One Piece, lecture One Piece, lecture
Chapitre One Piece, mangas, manga, One Piece, One Piece fr, One Piece france, scans,
image One Piece " />
print(f"Response headers: <meta name="subject" content="One Piece lecture en ligne scan" />
{r.headers}") <meta name="identifier-url" content="https://lelscan.net" />
<meta property="og:image" content="/mangas/one-piece/thumb_cover.jpg" />
<meta property="og:title" content="Lecture en ligne One Piece scan" />
<meta property="og:url" content="/lecture-ligne-one-piece.php" />
print(f"{r.text}") <meta property="og:description" content="One Piece lecture en ligne - lelscan" />
<link rel="alternate" type="application/rss+xml" title="flux rss" href="/rss/
rss.xml" />
<link rel="icon" type="image" href="/images/icones/favicon.ico" />
<style type="text/css" media="screen">
…
Stream Downloading
img = "http://ftp.crifo.org/debian-cd/current/amd64/iso-
dvd/debian-10.6.0-amd64-DVD-1.iso"
stream_download(source_url=img, dest_file="debian1.iso")
Elementary
String Parsing
Split
s = "Python is a great language\n but Erlang is pretty cool too"
l = s.split()
print(l)
l2 = s.split('a')
print(l2)
l3 = s.split('\n')
print(l3)
l4 = s.split('an')
print(l4)
Split
s = "Python is a great language\n but Erlang is pretty cool too"
l = s.split()
['Python', 'is', 'a', 'great',
print(l) 'language', 'but', 'Erlang', 'is',
'pretty', 'cool', ‘too']
l2 = s.split('a')
['Python is ', ' gre', 't l', 'ngu',
print(l2) 'ge\n but Erl', 'ng is pretty cool
too']
l3 = s.split('\n')
['Python is a great language', ' but
print(l3) Erlang is pretty cool too']
l4 = s.split('an') [‘Python is a great l', 'guage\n but
Erl', 'g is pretty cool too']
print(l4)
Join
s4 = 'an'.join(l4)
print(s4)
s3 = '\n'.join(l3)
print(s3)
s2 = 'a'.join(l2)
print(s2)
s1 = ' '.join(l)
print(s1)
Join
s4 = 'an'.join(l4)
print(s4)
Python is a great language
s3 = '\n'.join(l3) but Erlang is pretty cool too
Python is a great language
but Erlang is pretty cool too
print(s3) Python is a great language
but Erlang is pretty cool too
s2 = 'a'.join(l2) Python is a great language but Erlang
is pretty cool too
print(s2)
s1 = ' '.join(l)
print(s1)
Regular Expressions
Regular Expressions
a, X, 9, < -- ordinary characters just match themselves exactly. The meta-characters
that have special meanings are: . ^ $ * + ? { [ ] \ | ( ) (details below)
. (a period) -- matches any single character except newline '\n'
\w -- (lowercase w) matches a "word" character: a letter or digit or underscore [a-
zA-Z0-9_]. \W matches any non-word character.
\b -- boundary between word and non-word
\s -- (lowercase s) matches a single whitespace character -- space, newline, return,
tab, form [ \n\r\t\f]. \S (upper case S) matches any non-whitespace
character.
\t, \n, \r -- tab, newline, return
\d -- decimal digit [0-9]
^=start,$=end—match the start or the end of the string
\ -- inhibit the "specialness" of a character. So, for example, use \. to match a
period or \\ to match a slash. If you are unsure if a character has special meaning,
such as '@', you can put a slash in front of it, \@, to make sure it is treated just as a
character.
Regular Expressions
[] — set of possible characters
| — or
{n}— exactly n occurrences.
()— create group
+ — at least one occurence.
* — zero or more occurence
? — zero or one occurence
Regular Expressions
Extract Email Information:
sebastien.tixeuil@lip6.fr
([^@]+)@([^@]+)
[ ] a character
^ that is not
@ the at symbol
+ at least one of this character
m = re.match('([^@]+)@([^@]+)','sebastien.tixeuil@lip6.fr')
print(m.group(1))
print(m.group(2))
Regular Expressions
Extract Email Information:
sebastien.tixeuil@lip6.fr
([^@]+)@([^@]+)
[ ] a character
^ that is not
@ the at symbol
+ at least one of this character
m = re.match(‘([^@]+)@([^@]+)',sebastien.tixeuil@lip6.fr)
print(m.group(1))
print(m.group(2))
sebastien.tixeuil
lip6.fr
Extracting Information with
Regular Expressions
Extracting Information with
Regular Expressions
Extracting Information with
Regular Expressions
from requests import *
from re import *
r = get('https://www.lip6.fr/recherche/
team_membres.php?acronyme=NPA')
print(findall(‘(26-00/([0-9]{3}))',
r.text))
Extracting Information with
Regular Expressions
from requests import *
from re import *
r = get('https://www.lip6.fr/recherche/
team_membres.php?acronyme=NPA')
print(findall(‘(26-00/([0-9]{3}))',
r.text))
[('26-00/103', '103'), ('26-00/112', '112'), ('26-00/122', '122'), ('26-00/109', '109'), ('26-00/111', '111'),
('26-00/108', '108'), ('26-00/103', '103'), ('26-00/107', '107'), ('26-00/126', '126'), ('26-00/105', '105'),
('26-00/105', '105'), ('26-00/115', '115'), ('26-00/128', '128'), ('26-00/114', '114'), ('26-00/113', '113'),
('26-00/224', '224'), ('26-00/410', '410'), ('26-00/412', '412'), ('26-00/230', '230'), ('26-00/216', '216'),
('26-00/119', '119'), ('26-00/119', '119'), ('26-00/116', '116'), ('26-00/132', '132'), ('26-00/102', '102'),
('26-00/120', '120'), ('26-00/116', '116'), ('26-00/120', '120'), ('26-00/132', '132'), ('26-00/104', '104'),
('26-00/102', '102'), ('26-00/102', '102'), ('26-00/132', '132'), ('26-00/102', '102'), ('26-00/104', '104'),
('26-00/420', '420'), ('26-00/120', '120'), ('26-00/120', '120'), ('26-00/132', '132'), ('26-00/119', '119'),
('26-00/119', '119')]
JSON Parsing
JSON
from json import *
print(dumps(['aéçèà',1234,[2,3,4,5,6]]))
s = socket(AF_INET,SOCK_STREAM)
try:
print(dumps(s))
except TypeError:
print(dumps(['aéçèà',1234,[2,3,4,5,6]]))
try:
print(dumps(s))
except TypeError:
data = {}
data['people'] = []
data['people'].append({
'name': 'Mark',
'website': 'facebook.com',
})
data['people'].append({
'name': 'Larry',
'website': 'google.com',
})
data['people'].append({
'name': 'Tim',
'website': 'apple.com',
})
JSON Files
with open('data.txt', 'w') as outfile:
dump(data, outfile)
data.txt
{"people": [{"name": "Mark", "website":
"facebook.com"}, {"name": "Larry", "website":
"google.com"}, {"name": "Tim", "website":
"apple.com"}]}
JSON Files
with open('data.txt') as infile:
data = load(infile)
for p in data['people']:
print('')
data.txt
{"people": [{"name": "Mark", "website":
"facebook.com"}, {"name": "Larry", "website":
"google.com"}, {"name": "Tim", "website":
"apple.com"}]}
JSON Files
with open('data.txt') as infile: Name: Mark
Website: facebook.com
data = load(infile)
Name: Larry
for p in data['people']:
Website: google.com
print('Name: ' + p['name'])
Name: Tim
Website: apple.com
print('Website: ' + p['website'])
print('')
data.txt
{"people": [{"name": "Mark", "website":
"facebook.com"}, {"name": "Larry", "website":
"google.com"}, {"name": "Tim", "website":
"apple.com"}]}
XML Parsing
XML Example
With xml.etree.ElementTree
xml.etree.ElementTree loads the whole file,
you can then navigate in the tree structure.
import xml.etree.ElementTree as ET
tree = ET.parse(‘countryXML.xml')
XML Parsing
import xml.etree.ElementTree as ET
tree=ET.parse('countryXML.xml')
root=tree.getroot()
print(root.tag)
print(root.attrib)
print(root[0][1].text)
for n in root.iter(‘neighbor’):
print(n.attrib)
XML Parsing
import xml.etree.ElementTree as ET
tree=ET.parse('countryXML.xml')
root=tree.getroot() 'data'
print(root.tag)
print(root.attrib) {}
print(root[0][1].text) ‘2008’
for n in root.iter(‘neighbor’):
print(n.attrib)
XML Parsing
import xml.etree.ElementTree as ET
tree=ET.parse('countryXML.xml')
root=tree.getroot() 'data'
print(root.tag)
print(root.attrib) {}
print(root[0][1].text) ‘2008’
country {‘name’: ‘Liechtenstein’}
country {‘name’: ‘Singapore’}
for child in root:
country {‘name’: ‘Panama’}
print(child.tag, child.attrib)
for n in root.iter(‘neighbor’):
print(n.attrib)
XML Parsing
import xml.etree.ElementTree as ET
tree=ET.parse('countryXML.xml')
root=tree.getroot() 'data'
print(root.tag)
print(root.attrib) {}
print(root[0][1].text) ‘2008’
country {‘name’: ‘Liechtenstein’}
country {‘name’: ‘Singapore’}
for child in root:
country {‘name’: ‘Panama’}
print(child.tag, child.attrib)
for n in root.iter(‘neighbor’):
{'direction': 'E', 'name': 'Austria'}
print(n.attrib) {'direction': 'W', 'name': 'Switzerland'}
{'direction': 'N', 'name': 'Malaysia'}
{'direction': 'W', 'name': 'Costa Rica'}
{'direction': 'E', 'name': 'Colombia'}
XML Parsing
import xml.etree.ElementTree as ET tree = ET.parse('ContryXML.xml') root = tree.getroot()
# Or Short: root = ET.fromstring(country_data_as_string)
print("---------------country")
for child in root:
print(child.tag, child.attrib)
print("---------------Rank:")
for rank in root.iter('rank'):
print(rank.text)
print("---------------neighbors")
for neighbor in root.iter('neighbor'):
print(neighbor.attrib)
print("---------------neighbors name")
for neighbor in root.iter('neighbor'):
print(neighbor.get('name'))
print("---------------country and neighbors")
for child in root:
print("the neighbors of",child.get('name'),":")
for neighbor in root.iter('neighbor'):
print(neighbor.get('name'))
CSV Parsing
CSV File
CSV File
name,iso_a3,currency_code,local_price,dollar_ex,GDP_dollar,date
Argentina,ARG,ARS,2.5,1,,2000-04-01
Australia,AUS,AUD,2.59,1.68,,2000-04-01
Brazil,BRA,BRL,2.95,1.79,,2000-04-01
Britain,GBR,GBP,1.9,0.632911392,,2000-04-01
Canada,CAN,CAD,2.85,1.47,,2000-04-01
Chile,CHL,CLP,1260,514,,2000-04-01
China,CHN,CNY,9.9,8.28,,2000-04-01
Czech Republic,CZE,CZK,54.37,39.1,,2000-04-01
Denmark,DNK,DKK,24.75,8.04,,2000-04-01
Euro area,EUZ,EUR,2.56,1.075268817,,2000-04-01
Hong Kong,HKG,HKD,10.2,7.79,,2000-04-01
Hungary,HUN,HUF,339,279,,2000-04-01
Indonesia,IDN,IDR,14500,7945,,2000-04-01
Israel,ISR,ILS,14.5,4.05,,2000-04-01
Japan,JPN,JPY,294,106,,2000-04-01
Malaysia,MYS,MYR,4.52,3.8,,2000-04-01
Mexico,MEX,MXN,20.9,9.41,,2000-04-01
New Zealand,NZL,NZD,3.4,2.01,,2000-04-01
Poland,POL,PLN,5.5,4.3,,2000-04-01
CSV Parsing
from csv import *
for row in r:
if(row[0] == "France"):
news = "https://www.lip6.fr/production/publications-
type.php?id=-1&annee=2020&type_pub=ART"
r = get(news)
print(p.find(‘a’)[‘href'])
https://hal.archives-ouvertes.fr/hal-02569404
https://hal.archives-ouvertes.fr/hal-02945354
https://hal.archives-ouvertes.fr/hal-02986029
https://hal.archives-ouvertes.fr/hal-02980298
https://hal.archives-ouvertes.fr/hal-02985997
https://hal.archives-ouvertes.fr/hal-02443135
https://hal.archives-ouvertes.fr/hal-02911665
https://hal.archives-ouvertes.fr/hal-02931632
https://hal.archives-ouvertes.fr/hal-02527916
https://hal.archives-ouvertes.fr/hal-02955863
https://hal.archives-ouvertes.fr/hal-02984494
https://hal.archives-ouvertes.fr/hal-02945921
https://hal.archives-ouvertes.fr/hal-02906806
https://hal.archives-ouvertes.fr/hal-02985461
https://hal.archives-ouvertes.fr/hal-02400963
https://hal.archives-ouvertes.fr/hal-02929626
https://hal.archives-ouvertes.fr/hal-01805478
https://hal.archives-ouvertes.fr/hal-02682005
Some Websites have
Python Library!
Wikipedia
Wikipedia
from wikipedia import *
print(r.summary)
Wikipedia
from wikipedia import *
print(r.summary)
Python is an interpreted, high-level and general-purpose programming language. Created by Guido van Rossum and
first released in 1991, Python's design philosophy emphasizes code readability with its notable use of
significant whitespace. Its language constructs and object-oriented approach aim to help programmers write
clear, logical code for small and large-scale projects.Python is dynamically typed and garbage-collected. It
supports multiple programming paradigms, including structured (particularly, procedural), object-oriented, and
functional programming. Python is often described as a "batteries included" language due to its comprehensive
standard library.Python was created in the late 1980s as a successor to the ABC language. Python 2.0, released
in 2000, introduced features like list comprehensions and a garbage collection system with reference counting.
Python 3.0, released in 2008, was a major revision of the language that is not completely backward-compatible,
and much Python 2 code does not run unmodified on Python 3.
The Python 2 language was officially discontinued in 2020 (first planned for 2015), and "Python 2.7.18 is the
last Python 2.7 release and therefore the last Python 2 release." No more security patches or other
improvements will be released for it. With Python 2's end-of-life, only Python 3.6.x and later are supported.
Python interpreters are available for many operating systems. A global community of programmers develops and
maintains CPython, a free and open-source reference implementation. A non-profit organization, the Python
Software Foundation, manages and directs resources for Python and CPython development.
Google Scholar
Google Scholar
from scholarly import *
s = next(scholarly.search_author("Sebastien
Tixeuil"))
print(s.interests)
Google Scholar
from scholarly import *
s = next(scholarly.search_author("Sebastien
Tixeuil"))
print(s.interests)