You are on page 1of 76

Web Services

and Open Data


Sébastien Tixeuil
sebastien.Tixeuil@lip6.fr

Thanks to Lélia Blin, Quentin Bramas, Fabien Mathieu


Web Services
What is a Web Service?
A Web Service is a method of communication
between two programs over the Web.

HTTP is the typical protocol used to communicate via


Web Services.
What is a Web Service?

Request
Client HTTP Server

Response
What is a Web Service?
XML: <id>5</id>

Request
Client HTTP Server

Response
<note id=‘5’>
<to>Tove</to>
<from>Jani</from>
XML: <heading>Reminder</heading>
<body>Don't forget the diner</body>
</note>
What is a Web Service?
<soap:Envelope xmlns:soap="http://www.w3.org/2003/05/soap-envelope">
<soap:Header>
</soap:Header>
<soap:Body>
SOAP: <m:GetStockPrice xmlns:m="http://www.example.org/stock/Surya">
<m:StockName>IBM</m:StockName>
</m:GetStockPrice>
</soap:Body>
</soap:Envelope>

Request
Client HTTP Server

Response
<soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/">
<soap:Header>
<ResponseHeader xmlns="https://www.google.com/apis/ads/publisher/v201508">
<requestId>xxxxxxxxxxxxxxxxxxxx</requestId>
<responseTime>1063</responseTime>
</ResponseHeader>
</soap:Header>
<soap:Body>
<getAdUnitsByStatementResponse xmlns="https://www.google.com/apis/ads/publisher/v201508">

SOAP: <rval>
<totalResultSetSize>1</totalResultSetSize>
<startIndex>0</startIndex>
<results>
<id>2372</id>
<name>RootAdUnit</name>
<description></description>
<targetWindow>TOP</targetWindow>
<status>ACTIVE</status>
<adUnitCode>1002372</adUnitCode>
<inheritedAdSenseSettings>
<value>
<adSenseEnabled>true</adSenseEnabled>
What is a Web Service?
Url Encoded: order=date&limit=2

Request
Client HTTP Server

Response
{
"data": [{
"id": 1001,
"name": "Jim"
},
JSON: {
"id": 1002,
"name": "Matt"
}]
}
API Business Model
REST Web API
Is a web service using simpler REpresentational
State Transfer (REST) based communication.

Request is just a HTTP Method over an URI.


Response is typically JSON or XML.

Example:

GET : http://pokeapi.co/api/v1/pokemon/25

HTTP Method URI that represents a resource

base URL of the API API version


REST Web API Call Example
HTTP Request Headers
GET /api/v1/pokemon/25/ HTTP/1.1
Host: pokeapi.co
Connection: keep-alive
Pragma: no-cache
Cache-Control: no-cache
Accept: application/json,;q=0.9,*/
*;q=0.8
Accept-Encoding: gzip, deflate, sdch
HTTP Response Body
HTTP Response Headers {
"name": "Pikachu",
HTTP/1.1 200 OK "attack": 55,
Server: nginx/1.1.19 "abilities": [
Date: Fri, 08 Jan 2016 13:10:08 GMT {
Content-Type: application/json "name": "static",
Transfer-Encoding: chunked "resource_uri": "/api/v1/ability/9/"
Connection: keep-alive },
Vary: Accept {
X-Frame-Options: SAMEORIGIN "name": "lightningrod",
Cache-Control: s-maxage=360, max-age=360 "resource_uri": "/api/v1/ability/31/"
}
]
}
REST Web API Call Example
HTTP Request Headers
GET /api/v1/pokemon/25/ HTTP/1.1
Host: pokeapi.co
Connection: keep-alive
Pragma: no-cache
Cache-Control: no-cache
Accept: application/json,;q=0.9,*/
*;q=0.8
Accept-Encoding: gzip, deflate, sdch
HTTP Response Body
HTTP Response Headers {
"name": "Pikachu",
HTTP/1.1 200 OK "attack": 55,
Server: nginx/1.1.19 "abilities": [
Date: Fri, 08 Jan 2016 13:10:08 GMT {
Content-Type: application/json "name": "static",
Transfer-Encoding: chunked "resource_uri": "/api/v1/ability/9/"
Connection: keep-alive },
Vary: Accept {
X-Frame-Options: SAMEORIGIN "name": "lightningrod",
Cache-Control: s-maxage=360, max-age=360 "resource_uri": "/api/v1/ability/31/"
}
]
}
REST: Architectural
Properties
• Simplicity of a uniform interface

• Modifiability of components to meet changing needs (even


while the application is running)

• Visibility of communication between components by service


agents

• Portability of components by moving program code with the


data

• Reliability in the resistance to failure at the system level in the


presence of of failures within components, connectors, or data
REST: Architectural
Constraints
• Client-server architecture

• Statelessness

• Cacheability

• Layered system

• Code on demand (optional)

• Uniform interface
Resources

Command based (ex: Flicker Api):


GET:
https://api.flickr.com/services/rest/?method=flickr.galleries.getList&user_id=XX

POST:
https://api.flickr.com/services/rest/?method=flickr.galleries.addPhoto&gallery_id=XX
Resources
URI/Resource based:
• ex: Facebook Graph Api:
GET: /{photo-id} to retrieve the info of a photo
GET: /{photo-id}/likes to retrieve the people who like it
POST: /{photo-id} to update the photo
DELETE : /{photo-id} to delete the photo

• ex: Google Calendar Api:


GET: /calendars/{calendarId} to retrieve the info of a calendar
PUT: /calendars/{calendarId} to update a calendar
DELETE : /calendars/{calendarId} to delete a calendar
POST: /calendars to create a calendar
GET: /calendars/{calendarId}/events/{eventId}
Response
HTTP Response:
• 200: OK
• 3 _ _: Redirection
• 404: not found (4 _ _ : something went wrong with what you try to access)
• 5 _ _ : Server Error

API Response:
• Flickr:
{ "stat": "fail", "code": 1, "message": "User not found" }
{ "galleries": { ... }, "stat": "ok" }

• Google Calendar:
{ "error": {"code": 403, "message": "User Rate Limit Exceeded" } }
{ "kind": "calendar#events","summary": ..., "description": ...
Response

Content-Type:
• text/plain

• text/html

• text/xml or application/xml

• application/json

• image/png

• ...
Client-side HTTP
HTTP Requests
from requests import *

manga = "http://lelscano.com"

r = get(manga)

print(f"Request status is {r.status_code},\n"

f"Content length is {len(r.content)} bytes,\n"

f"Request encoding is {r.encoding},\n"

f"Text size is {len(r.text)} chars.")

print(f"Response headers: {r.headers}")


HTTP Requests
from requests import * Request status is 200,
Content length is 53111 bytes,
manga = "http://
lelscano.com" Request encoding is UTF-8,
Text size is 53105 chars.
r = get(manga)

print(f"Request status is
{r.status_code},\n"

f"Content length is
{len(r.content)} bytes,\n"

f"Request encoding
is {r.encoding},\n"

f"Text size is
{len(r.text)} chars.")

print(f"Response headers:
{r.headers}")
HTTP Requests
from requests import * Request status is 200,
Content length is 53111 bytes,
manga = "http://
lelscano.com" Request encoding is UTF-8,
Text size is 53105 chars.
r = get(manga) Response headers: {'Date': 'Wed, 04 Nov 2020
14:40:27 GMT', 'Content-Type': 'text/html;
print(f"Request status is charset=UTF-8', 'Transfer-Encoding': 'chunked',
{r.status_code},\n" 'Connection': 'keep-alive', 'Set-Cookie':
'__cfduid=da1986d3c036d3d4b0dfdbf3f16812e5f160450082
7; expires=Fri, 04-Dec-20 14:40:27 GMT; path=/;
f"Content length is domain=.lelscan.net; HttpOnly; SameSite=Lax,
{len(r.content)} bytes,\n" mobile_lelscan=0; expires=Thu, 05-Nov-2020 14:40:27
GMT; Max-Age=86400; path=lelscan.net', 'Vary':
f"Request encoding 'Accept-Encoding', 'CF-Cache-Status': 'DYNAMIC',
'cf-request-id': '06354cc73b000032c20c30b000000001',
is {r.encoding},\n"
'Expect-CT': 'max-age=604800, report-uri="https://
report-uri.cloudflare.com/cdn-cgi/beacon/expect-
f"Text size is ct"', 'Report-To': '{"endpoints":[{"url":"https:\\/\
{len(r.text)} chars.") \/a.nel.cloudflare.com\\/report?
s=KFPpQxY2A5IilAqwG6j1BXgoJEskCp%2BkW7uCp0z63eYihMbv
UnyfBx7abOP6nhy%2B5H1KHR51De457l7y84Ois4b3gD5D1Fi15R
print(f"Response headers:
rJmklRlavxKwGsFBw3fA%3D%3D"}],"group":"cf-
{r.headers}") nel","max_age":604800}', 'NEL': '{"report_to":"cf-
nel","max_age":604800}', 'Server': 'cloudflare',
'CF-RAY': '5ecf171ec88e32c2-CDG', 'Content-
Encoding': 'gzip'}
HTTP Requests
from requests import *

manga = "http://
lelscano.com"

r = get(manga)

print(f"Request status is
{r.status_code},\n"
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/
f"Content length is xhtml1/DTD/xhtml1-transitional.dtd">
{len(r.content)} bytes,\n" <html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>One Piece lecture en ligne scan</title>
<meta name="description" content="One Piece Lecture en ligne, tous les scan One
f"Request encoding is Piece." />
{r.encoding},\n" <meta name="lelscan" content="One Piece" />
<meta http-equiv="Content-Type" content="text/html;charset=ISO-8859-1" />
<meta http-equiv="Content-Language" content="fr" />
f"Text size is <meta name="keywords" content="One Piece lecture en ligne, lecture en ligne One
Piece, scan One Piece, One Piece scan, One Piece lel, lecture en ligne One Piece,
{len(r.text)} chars.") Lecture, lecture, scan, chapitre, chapitre One Piece, lecture One Piece, lecture
Chapitre One Piece, mangas, manga, One Piece, One Piece fr, One Piece france, scans,
image One Piece " />
print(f"Response headers: <meta name="subject" content="One Piece lecture en ligne scan" />
{r.headers}") <meta name="identifier-url" content="https://lelscan.net" />
<meta property="og:image" content="/mangas/one-piece/thumb_cover.jpg" />
<meta property="og:title" content="Lecture en ligne One Piece scan" />
<meta property="og:url" content="/lecture-ligne-one-piece.php" />
print(f"{r.text}") <meta property="og:description" content="One Piece lecture en ligne - lelscan" />
<link rel="alternate" type="application/rss+xml" title="flux rss" href="/rss/
rss.xml" />
<link rel="icon" type="image" href="/images/icones/favicon.ico" />
<style type="text/css" media="screen">

Stream Downloading

from pathlib import *


from requests import *

def stream_download(source_url, dest_file):


r = get(source_url, stream=True)
dest_file = Path(dest_file)
with open(dest_file, "wb") as f:
for chunk in r.iter_content(chunk_size=8192):
if chunk:
f.write(chunk)
Stream Downloading
from pathlib import *
from requests import *

def stream_download(source_url, dest_file):


r = get(source_url, stream=True)
dest_file = Path(dest_file)
with open(dest_file, "wb") as f:
for chunk in r.iter_content(chunk_size=8192):
if chunk:
f.write(chunk)

img = "http://ftp.crifo.org/debian-cd/current/amd64/iso-
dvd/debian-10.6.0-amd64-DVD-1.iso"

stream_download(source_url=img, dest_file="debian1.iso")
Elementary
String Parsing
Split
s = "Python is a great language\n but Erlang is pretty cool too"

l = s.split()

print(l)

l2 = s.split('a')

print(l2)

l3 = s.split('\n')

print(l3)

l4 = s.split('an')

print(l4)
Split
s = "Python is a great language\n but Erlang is pretty cool too"

l = s.split()
['Python', 'is', 'a', 'great',
print(l) 'language', 'but', 'Erlang', 'is',
'pretty', 'cool', ‘too']
l2 = s.split('a')
['Python is ', ' gre', 't l', 'ngu',
print(l2) 'ge\n but Erl', 'ng is pretty cool
too']
l3 = s.split('\n')
['Python is a great language', ' but
print(l3) Erlang is pretty cool too']
l4 = s.split('an') [‘Python is a great l', 'guage\n but
Erl', 'g is pretty cool too']
print(l4)
Join
s4 = 'an'.join(l4)

print(s4)

s3 = '\n'.join(l3)

print(s3)

s2 = 'a'.join(l2)

print(s2)

s1 = ' '.join(l)

print(s1)
Join
s4 = 'an'.join(l4)

print(s4)
Python is a great language
s3 = '\n'.join(l3) but Erlang is pretty cool too
Python is a great language
but Erlang is pretty cool too
print(s3) Python is a great language
but Erlang is pretty cool too
s2 = 'a'.join(l2) Python is a great language but Erlang
is pretty cool too
print(s2)

s1 = ' '.join(l)

print(s1)
Regular Expressions
Regular Expressions
a, X, 9, < -- ordinary characters just match themselves exactly. The meta-characters
that have special meanings are: . ^ $ * + ? { [ ] \ | ( ) (details below)
. (a period) -- matches any single character except newline '\n'
\w -- (lowercase w) matches a "word" character: a letter or digit or underscore [a-
zA-Z0-9_]. \W matches any non-word character.
\b -- boundary between word and non-word
\s -- (lowercase s) matches a single whitespace character -- space, newline, return,
tab, form [ \n\r\t\f]. \S (upper case S) matches any non-whitespace
character.
\t, \n, \r -- tab, newline, return
\d -- decimal digit [0-9]
^=start,$=end—match the start or the end of the string
\ -- inhibit the "specialness" of a character. So, for example, use \. to match a
period or \\ to match a slash. If you are unsure if a character has special meaning,
such as '@', you can put a slash in front of it, \@, to make sure it is treated just as a
character.
Regular Expressions
[] — set of possible characters
| — or
{n}— exactly n occurrences.
()— create group
+ — at least one occurence.
* — zero or more occurence
? — zero or one occurence
Regular Expressions
Extract Email Information:
sebastien.tixeuil@lip6.fr
([^@]+)@([^@]+)
[ ] a character
^ that is not
@ the at symbol
+ at least one of this character

m = re.match('([^@]+)@([^@]+)','sebastien.tixeuil@lip6.fr')
print(m.group(1))
print(m.group(2))
Regular Expressions
Extract Email Information:
sebastien.tixeuil@lip6.fr
([^@]+)@([^@]+)
[ ] a character
^ that is not
@ the at symbol
+ at least one of this character

m = re.match(‘([^@]+)@([^@]+)',sebastien.tixeuil@lip6.fr)
print(m.group(1))
print(m.group(2))
sebastien.tixeuil

lip6.fr
Extracting Information with
Regular Expressions
Extracting Information with
Regular Expressions
Extracting Information with
Regular Expressions
from requests import *

from re import *

r = get('https://www.lip6.fr/recherche/
team_membres.php?acronyme=NPA')

print(findall(‘(26-00/([0-9]{3}))',
r.text))
Extracting Information with
Regular Expressions
from requests import *

from re import *

r = get('https://www.lip6.fr/recherche/
team_membres.php?acronyme=NPA')

print(findall(‘(26-00/([0-9]{3}))',
r.text))
[('26-00/103', '103'), ('26-00/112', '112'), ('26-00/122', '122'), ('26-00/109', '109'), ('26-00/111', '111'),
('26-00/108', '108'), ('26-00/103', '103'), ('26-00/107', '107'), ('26-00/126', '126'), ('26-00/105', '105'),
('26-00/105', '105'), ('26-00/115', '115'), ('26-00/128', '128'), ('26-00/114', '114'), ('26-00/113', '113'),
('26-00/224', '224'), ('26-00/410', '410'), ('26-00/412', '412'), ('26-00/230', '230'), ('26-00/216', '216'),
('26-00/119', '119'), ('26-00/119', '119'), ('26-00/116', '116'), ('26-00/132', '132'), ('26-00/102', '102'),
('26-00/120', '120'), ('26-00/116', '116'), ('26-00/120', '120'), ('26-00/132', '132'), ('26-00/104', '104'),
('26-00/102', '102'), ('26-00/102', '102'), ('26-00/132', '132'), ('26-00/102', '102'), ('26-00/104', '104'),
('26-00/420', '420'), ('26-00/120', '120'), ('26-00/120', '120'), ('26-00/132', '132'), ('26-00/119', '119'),
('26-00/119', '119')]
JSON Parsing
JSON
from json import *

from socket import *

print(dumps(['aéçèà',1234,[2,3,4,5,6]]))

print(loads('["a\u00e9\u00e7\u00e8\u00e0", 1234, [2, 3, 4, 5, 6]]'))

s = socket(AF_INET,SOCK_STREAM)

try:

print(dumps(s))

except TypeError:

print("this data does not seem serializable with JSON")


JSON
from json import *

from socket import *


["a\u00e9\u00e7\u00e8\u00e0", 1234, [2, 3, 4, 5, 6]]

print(dumps(['aéçèà',1234,[2,3,4,5,6]]))

print(loads('["a\u00e9\u00e7\u00e8\u00e0", 1234, [2, 3, 4, 5, 6]]'))

['aéçèà', 1234, [2, 3, 4, 5, 6]]


s = socket(AF_INET,SOCK_STREAM)

try:

print(dumps(s))

except TypeError:

print("this data does not seem serializable with JSON")


JSON Files
from json import *

data = {}

data['people'] = []

data['people'].append({

'name': 'Mark',

'website': 'facebook.com',

})

data['people'].append({

'name': 'Larry',

'website': 'google.com',

})

data['people'].append({

'name': 'Tim',

'website': 'apple.com',

})
JSON Files
with open('data.txt', 'w') as outfile:

dump(data, outfile)

data.txt
{"people": [{"name": "Mark", "website":
"facebook.com"}, {"name": "Larry", "website":
"google.com"}, {"name": "Tim", "website":
"apple.com"}]}
JSON Files
with open('data.txt') as infile:

data = load(infile)

for p in data['people']:

print('Name: ' + p['name'])

print('Website: ' + p['website'])

print('')

data.txt
{"people": [{"name": "Mark", "website":
"facebook.com"}, {"name": "Larry", "website":
"google.com"}, {"name": "Tim", "website":
"apple.com"}]}
JSON Files
with open('data.txt') as infile: Name: Mark
Website: facebook.com
data = load(infile)
Name: Larry
for p in data['people']:
Website: google.com
print('Name: ' + p['name'])
Name: Tim
Website: apple.com
print('Website: ' + p['website'])

print('')

data.txt
{"people": [{"name": "Mark", "website":
"facebook.com"}, {"name": "Larry", "website":
"google.com"}, {"name": "Tim", "website":
"apple.com"}]}
XML Parsing
XML Example

<?xml version="1.0" encoding="UTF-8"?>


<note>
<to>Tove</to>
<from>Jani</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend!</body>
</note>
XML Example 2
<?xml version="1.0"?> <data>
<country name="Liechtenstein"> <rank>1</rank>
<year>2008</year> <gdppc>141100</gdppc>
<neighbor name="Austria" direction="E"/>
<neighbor name="Switzerland" direction="W"/>
</country>
<country name="Singapore">
<rank>4</rank>
<year>2011</year> <gdppc>59900</gdppc>
<neighbor name="Malaysia" direction="N"/>
</country>
<country name="Panama">
<rank>68</rank>
<year>2011</year> <gdppc>13600</gdppc>
<neighbor name="Costa Rica" direction="W"/>
<neighbor name="Colombia" direction="E"/>
</country> </data>
XML Example 2
<?xml version="1.0"?> <data>
<country name="Liechtenstein"> <rank>1</rank>
<year>2008</year> <gdppc>141100</gdppc>
<neighbor name="Austria" direction="E"/>
<neighbor name="Switzerland" direction="W"/>
</country>
<country name="Singapore">
<rank>4</rank>
<year>2011</year> <gdppc>59900</gdppc>
<neighbor name="Malaysia" direction="N"/>
</country>
<country name="Panama">
<rank>68</rank>
<year>2011</year> <gdppc>13600</gdppc>
<neighbor name="Costa Rica" direction="W"/>
<neighbor name="Colombia" direction="E"/>
</country> </data> countryXML.xml
XML Example 2
<?xml version="1.0"?> <data>
<country name="Liechtenstein"> <rank>1</rank>
<year>2008</year> <gdppc>141100</gdppc>
<neighbor name="Austria" direction="E"/>
<neighbor name="Switzerland" direction="W"/>
</country>
<country name="Singapore">
<rank>4</rank>
<year>2011</year> <gdppc>59900</gdppc>
<neighbor name="Malaysia" direction="N"/>
</country>
<country name="Panama">
<rank>68</rank>
<year>2011</year> <gdppc>13600</gdppc>
<neighbor name="Costa Rica" direction="W"/>
<neighbor name="Colombia" direction="E"/>
</country> </data> countryXML.xml
XML Parsing

With xml.etree.ElementTree
xml.etree.ElementTree loads the whole file,
you can then navigate in the tree structure.

import xml.etree.ElementTree as ET
tree = ET.parse(‘countryXML.xml')
XML Parsing
import xml.etree.ElementTree as ET
tree=ET.parse('countryXML.xml')
root=tree.getroot()
print(root.tag)
print(root.attrib)
print(root[0][1].text)

for child in root:


print(child.tag, child.attrib)

for n in root.iter(‘neighbor’):
print(n.attrib)
XML Parsing
import xml.etree.ElementTree as ET
tree=ET.parse('countryXML.xml')
root=tree.getroot() 'data'
print(root.tag)
print(root.attrib) {}
print(root[0][1].text) ‘2008’

for child in root:


print(child.tag, child.attrib)

for n in root.iter(‘neighbor’):
print(n.attrib)
XML Parsing
import xml.etree.ElementTree as ET
tree=ET.parse('countryXML.xml')
root=tree.getroot() 'data'
print(root.tag)
print(root.attrib) {}
print(root[0][1].text) ‘2008’
country {‘name’: ‘Liechtenstein’}
country {‘name’: ‘Singapore’}
for child in root:
country {‘name’: ‘Panama’}
print(child.tag, child.attrib)

for n in root.iter(‘neighbor’):
print(n.attrib)
XML Parsing
import xml.etree.ElementTree as ET
tree=ET.parse('countryXML.xml')
root=tree.getroot() 'data'
print(root.tag)
print(root.attrib) {}
print(root[0][1].text) ‘2008’
country {‘name’: ‘Liechtenstein’}
country {‘name’: ‘Singapore’}
for child in root:
country {‘name’: ‘Panama’}
print(child.tag, child.attrib)

for n in root.iter(‘neighbor’):
{'direction': 'E', 'name': 'Austria'}
print(n.attrib) {'direction': 'W', 'name': 'Switzerland'}
{'direction': 'N', 'name': 'Malaysia'}
{'direction': 'W', 'name': 'Costa Rica'}
{'direction': 'E', 'name': 'Colombia'}
XML Parsing
import xml.etree.ElementTree as ET tree = ET.parse('ContryXML.xml') root = tree.getroot()
# Or Short: root = ET.fromstring(country_data_as_string)
print("---------------country")
for child in root:
print(child.tag, child.attrib)
print("---------------Rank:")
for rank in root.iter('rank'):
print(rank.text)
print("---------------neighbors")
for neighbor in root.iter('neighbor'):
print(neighbor.attrib)
print("---------------neighbors name")
for neighbor in root.iter('neighbor'):
print(neighbor.get('name'))
print("---------------country and neighbors")
for child in root:
print("the neighbors of",child.get('name'),":")
for neighbor in root.iter('neighbor'):
print(neighbor.get('name'))
CSV Parsing
CSV File
CSV File
name,iso_a3,currency_code,local_price,dollar_ex,GDP_dollar,date
Argentina,ARG,ARS,2.5,1,,2000-04-01
Australia,AUS,AUD,2.59,1.68,,2000-04-01
Brazil,BRA,BRL,2.95,1.79,,2000-04-01
Britain,GBR,GBP,1.9,0.632911392,,2000-04-01
Canada,CAN,CAD,2.85,1.47,,2000-04-01
Chile,CHL,CLP,1260,514,,2000-04-01
China,CHN,CNY,9.9,8.28,,2000-04-01
Czech Republic,CZE,CZK,54.37,39.1,,2000-04-01
Denmark,DNK,DKK,24.75,8.04,,2000-04-01
Euro area,EUZ,EUR,2.56,1.075268817,,2000-04-01
Hong Kong,HKG,HKD,10.2,7.79,,2000-04-01
Hungary,HUN,HUF,339,279,,2000-04-01
Indonesia,IDN,IDR,14500,7945,,2000-04-01
Israel,ISR,ILS,14.5,4.05,,2000-04-01
Japan,JPN,JPY,294,106,,2000-04-01
Malaysia,MYS,MYR,4.52,3.8,,2000-04-01
Mexico,MEX,MXN,20.9,9.41,,2000-04-01
New Zealand,NZL,NZD,3.4,2.01,,2000-04-01
Poland,POL,PLN,5.5,4.3,,2000-04-01
CSV Parsing
from csv import *

with open('big-mac-source-data.csv', newline='') as csvfile:

r = reader(csvfile, delimiter=',', quotechar='|')

for row in r:

if(row[0] == "France"):

print(str(row[0]) + ',' + str(row[3]) + ',' + str(row[6]) )


CSV Parsing
from csv import * France,3.5,2011-07-01
France,3.6,2012-01-01
France,3.6,2012-07-01
with open('big-mac-source-data.csv', France,3.6,2013-01-01
France,3.9,2013-07-01
newline='') as csvfile:
France,3.8,2014-01-01
France,3.9,2014-07-01
France,3.9,2015-01-01
r = reader(csvfile, delimiter=',',
France,4.1,2015-07-01
quotechar='|') France,4.1,2016-01-01
France,4.1,2016-07-01
France,4.1,2017-01-01
for row in r: France,4.1,2017-07-01
France,4.2,2018-01-01
France,4.2,2018-07-01
if(row[0] == "France"): France,4.2,2019-01-01
France,4.2,2019-07-09
France,4.2,2020-01-14
print(str(row[0]) + ',' + France,4.2,2020-07-01
str(row[3]) + ',' + str(row[6]) )
HTML Parsing
Beautiful Soup
Make a soup (a navigable version of a string)
Browse a soup
soup.find("tag") / soup.tag (returns soup)
soup.find_all("tag") / soup("tag") (returns list)
soup.find("tag", {'attr_name':
'attr_value'})
soup.contents (list of children)
Extract text
soup.decode_contents(): returns soup as string
soup.encode_contents(): returns soup as bytes
soup.text: return soup as tagless string
soup['attr_name']: return attribute value
Make a Soup
from requests import *

from bs4 import BeautifulSoup as bs

news = "https://www.lip6.fr/production/publications-
type.php?id=-1&annee=2020&type_pub=ART"

r = get(news)

soup = bs(r.text, features="lxml")


Example
Example
Browse Soup
and Extract Text
print(soup.find('li', {'class': ‘D700'}))
Browse Soup
and Extract Text
print(soup.find('li', {'class': ‘D700'}))

<li class="D700"><strong>L. Amorim Reis,


A. Murillo Piedrahita, S. Rueda Rodríguez,
N. Castro Fernandes, D. Scherly Varela de
Medeiros, M. Dias De Amorim, D. Ferrazani Mattos</
strong> : “<a href="https://hal.archives-
ouvertes.fr/hal-02569404">Unsupervised and
Incremental Learning Orchestration for Cyber-
Physical Security</a>”, Transactions on emerging
telecommunications technologies, (Wiley-Blackwell)
[Amorim Reis 2020]</li>
Browse Soup
and Extract Text
print(soup.find('li', {'class': ‘D700’}))

for p in soup.find_all('li', {'class': 'D700'}):

print(p.find(‘a’)[‘href'])
https://hal.archives-ouvertes.fr/hal-02569404
https://hal.archives-ouvertes.fr/hal-02945354
https://hal.archives-ouvertes.fr/hal-02986029
https://hal.archives-ouvertes.fr/hal-02980298
https://hal.archives-ouvertes.fr/hal-02985997
https://hal.archives-ouvertes.fr/hal-02443135
https://hal.archives-ouvertes.fr/hal-02911665
https://hal.archives-ouvertes.fr/hal-02931632
https://hal.archives-ouvertes.fr/hal-02527916
https://hal.archives-ouvertes.fr/hal-02955863
https://hal.archives-ouvertes.fr/hal-02984494
https://hal.archives-ouvertes.fr/hal-02945921
https://hal.archives-ouvertes.fr/hal-02906806
https://hal.archives-ouvertes.fr/hal-02985461
https://hal.archives-ouvertes.fr/hal-02400963
https://hal.archives-ouvertes.fr/hal-02929626
https://hal.archives-ouvertes.fr/hal-01805478
https://hal.archives-ouvertes.fr/hal-02682005
Some Websites have
Python Library!
Wikipedia
Wikipedia
from wikipedia import *

r = page("Python (programming language)")

print(r.summary)
Wikipedia
from wikipedia import *

r = page("Python (programming language)")

print(r.summary)

Python is an interpreted, high-level and general-purpose programming language. Created by Guido van Rossum and
first released in 1991, Python's design philosophy emphasizes code readability with its notable use of
significant whitespace. Its language constructs and object-oriented approach aim to help programmers write
clear, logical code for small and large-scale projects.Python is dynamically typed and garbage-collected. It
supports multiple programming paradigms, including structured (particularly, procedural), object-oriented, and
functional programming. Python is often described as a "batteries included" language due to its comprehensive
standard library.Python was created in the late 1980s as a successor to the ABC language. Python 2.0, released
in 2000, introduced features like list comprehensions and a garbage collection system with reference counting.
Python 3.0, released in 2008, was a major revision of the language that is not completely backward-compatible,
and much Python 2 code does not run unmodified on Python 3.
The Python 2 language was officially discontinued in 2020 (first planned for 2015), and "Python 2.7.18 is the
last Python 2.7 release and therefore the last Python 2 release." No more security patches or other
improvements will be released for it. With Python 2's end-of-life, only Python 3.6.x and later are supported.
Python interpreters are available for many operating systems. A global community of programmers develops and
maintains CPython, a free and open-source reference implementation. A non-profit organization, the Python
Software Foundation, manages and directs resources for Python and CPython development.
Google Scholar
Google Scholar
from scholarly import *

s = next(scholarly.search_author("Sebastien
Tixeuil"))

print(s.interests)
Google Scholar
from scholarly import *

s = next(scholarly.search_author("Sebastien
Tixeuil"))

print(s.interests)

['Algorithms & Theory', 'Computer Networks', 'Distributed Computing']

You might also like