Professional Documents
Culture Documents
Python Requests
last modified July 6, 2020
In this tutorial, we show how to work with the Python Requests module. We grab data, post data,
stream data, and connect to secure web pages. In the examples, we use an online service, an Nginx
server, a Python HTTP server, and a flask application.
The Hypertext Transfer Protocol (HTTP) is an application protocol for distributed, collaborative,
hypermedia information systems. HTTP is the foundation of data communication for the World
Wide Web.
Python requests
Requests is a simple and elegant Python HTTP library. It provides methods for accessing Web
resources via HTTP.
version.py
#!/usr/bin/env python3
import requests
print(requests.__version__)
print(requests.__copyright__)
$ ./version.py
2.21.0
Copyright 2018 Kenneth Reitz
read_webpage.py
#!/usr/bin/env python3
resp = req.get("http://www.webcode.me")
print(resp.text)
The script grabs the content of the www.webcode.me web page.
resp = req.get("http://www.webcode.me")
print(resp.text)
$ ./read_webpage.py
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>My html page</title>
</head>
<body>
<p>
Today is a beautiful day. We go swimming and fishing.
</p>
<p>
Hello there. How are you?
</p>
</body>
</html>
The following program gets a small web page and strips its HTML tags.
strip_tags.py
#!/usr/bin/env python3
import requests as req
import re
resp = req.get("http://www.webcode.me")
content = resp.text
The script strips the HTML tags of the www.webcode.me web page.
HTTP Request
An HTTP request is a message send from the client to the browser to retrieve some information or
to make some action.
Request's request method creates a new request. Note that the request module has some
higher-level methods, such as get(), post(), or put(), which save some typing for us.
create_request.py
#!/usr/bin/env python3
get_status.py
#!/usr/bin/env python3
resp = req.get("http://www.webcode.me")
print(resp.status_code)
resp = req.get("http://www.webcode.me/news")
print(resp.status_code)
We perform two HTTP requests with the get() method and check for the returned status.
$ ./get_status.py
200
404
200 is a standard response for successful HTTP requests and 404 tells that the requested resource
could not be found.
head_request.py
#!/usr/bin/env python3
The example prints the server, last modification time, and content type of the www.webcode.me
web page.
$ ./head_request.py
Server: nginx/1.6.2
Last modified: Sat, 20 Jul 2019 11:49:25 GMT
Content type: text/html
mget.py
#!/usr/bin/env python3
resp = req.get("https://httpbin.org/get?name=Peter")
print(resp.text)
The script sends a variable with a value to the httpbin.org server. The variable is specified
directly in the URL.
$ ./mget.py
{
"args": {
"name": "Peter"
},
"headers": {
"Accept": "*/*",
"Accept-Encoding": "gzip, deflate",
"Host": "httpbin.org",
"User-Agent": "python-requests/2.21.0"
},
...
}
mget2.py
#!/usr/bin/env python3
print(resp.url)
print(resp.text)
The get() method takes a params parameter where we can specify the query parameters.
We send a GET request to the httpbin.org site and pass the data, which is specified in the params
parameter.
print(resp.url)
print(resp.text)
$ ./mget2.py
http://httpbin.org/get?name=Peter&age=23
{
"args": {
"age": "23",
"name": "Peter"
},
"headers": {
"Accept": "*/*",
"Accept-Encoding": "gzip, deflate",
"Host": "httpbin.org",
"User-Agent": "python-requests/2.21.0"
},
...
}
redirect.py
#!/usr/bin/env python3
resp = req.get("https://httpbin.org/redirect-to?url=/")
print(resp.status_code)
print(resp.history)
print(resp.url)
redirect2.py
#!/usr/bin/env python3
print(resp.status_code)
print(resp.url)
The allow_redirects parameter specifies whether the redirect is followed; the redirects are
followed by default.
$ ./redirect2.py
302
https://httpbin.org/redirect-to?url=/
location = /oldpage.html {
Add these lines to the nginx configuration file, which is located at /etc/nginx/sites-
available/default on Debian.
After the file has been edited, we must restart nginx to apply the changes.
oldpage.html
<!DOCTYPE html>
<html>
<head>
<title>Old page</title>
</head>
<body>
<p>
This is old page
</p>
</body>
</html>
newpage.html
<!DOCTYPE html>
<html>
<head>
<title>New page</title>
</head>
<body>
<p>
This is a new page
</p>
</body>
</html>
redirect3.py
#!/usr/bin/env python3
resp = req.get("http://localhost/oldpage.html")
print(resp.status_code)
print(resp.history)
print(resp.url)
print(resp.text)
This script accesses the old page and follows the redirect. As we already mentioned, Requests
follows redirects by default.
$ ./redirect3.py
200
(<Response [301]>,)
http://localhost/files/newpage.html
<!DOCTYPE html>
<html>
<head>
<title>New page</title>
</head>
<body>
<p>
This is a new page
</p>
</body>
</html>
This is the output of the example.
As we can see from the access.log file, the request was redirected to a new file name. The
communication consisted of two GET requests.
User agent
In this section, we specify the name of the user agent. We create our own Python HTTP server.
http_server.py
#!/usr/bin/env python3
class MyHandler(BaseHTTPRequestHandler):
def do_GET(self):
self.send_response(200)
if self.path == '/agent':
message = self.headers['user-agent']
self.send_header('Content-type', 'text/html')
self.end_headers()
self.wfile.write(bytes(message, "utf8"))
return
def main():
main()
if self.path == '/agent':
message = self.headers['user-agent']
user_agent.py
#!/usr/bin/env python3
This script creates a simple GET request to our Python HTTP server. To add HTTP headers to a
request, we pass in a dictionary to the headers parameter.
$ simple_server.py
starting server on port 8081...
$ ./user_agent.py
Python script
Then we run the script. The server responded with the name of the agent that we have sent with the
request.
post_value.py
#!/usr/bin/env python3
The script sends a request with a name key having Peter value. The POST request is issued with the
post method.
$ ./post_value.py
{
"args": {},
"data": "",
"files": {},
"form": {
"name": "Peter"
},
"headers": {
"Accept": "*/*",
"Accept-Encoding": "gzip, deflate",
"Content-Length": "10",
"Content-Type": "application/x-www-form-urlencoded",
"Host": "httpbin.org",
"User-Agent": "python-requests/2.21.0"
},
"json": null,
...
}
import os
from flask import Flask, request
app = Flask(__name__)
@app.route("/")
def home():
return 'This is home page'
@app.route("/upload", methods=['POST'])
def handleFileUpload():
if 'image' in request.files:
photo = request.files['image']
if photo.filename != '':
photo.save(os.path.join('.', photo.filename))
msg = 'image uploaded successfully'
return msg
if __name__ == '__main__':
app.run()
This is a simple application with two endpoints. The /upload endpoint checks if there is some
image and saves it to the current directory.
upload_file.py
#!/usr/bin/env python3
files = {'image': f}
r = req.post(url, files=files)
print(r.text)
We send the image to the Flask application. The file is specified in the files attribute of the
post() method.
JSON
JSON (JavaScript Object Notation) is a lightweight data-interchange format. It is easy for humans
to read and write and for machines to parse and generate.
Read JSON
send_json.php
<?php
echo json_encode($data);
The PHP script sends JSON data. It uses the json_encode() function to do the job.
read_json.py
#!/usr/bin/env python3
resp = req.get("http://localhost/send_json.php")
print(resp.json())
print(resp.json())
$ ./read_json.py
{'age': 17, 'name': 'Jane'}
Send JSON
parse_json.php
<?php
$data = file_get_contents("php://input");
if (!is_array($value)) {
echo "The $key is $value\n";
} else {
foreach ($value as $key => $val) {
echo "The $key is $value\n";
}
}
}
This PHP script reads JSON data and sends back a message with the parsed values.
send_json.py
#!/usr/bin/env python3
This script sends JSON data to the PHP application and reads its response.
$ ./send_json.py
The name is Jane
The age is 17
get_term.py
#!/usr/bin/env python3
term = "dog"
if sel.text:
s = sel.text.strip()
print(textwrap.fill(s, width=50))
In this script, we find the definitions of the term dog on www.dictionary.com. The lxml module is
used to parse the HTML code.
Note: The tags that contain the definitions may change overnight. In such case we would need
to adapt the script.
from lxml import html
import textwrap
root = html.fromstring(resp.content)
if sel.text:
s = sel.text.strip()
print(textwrap.fill(s, width=50))
We parse the content. The main definitions are located inside the span tag, which has the one-
click-content attribute. We improve the formatting by removing excessive white space and stray
characters. The text width has maximum of 50 characters. Note that such parsing is subject to
change.
$ ./get_term.py
a domesticated canid,
any carnivore of the dog family Canidae, having
prominent canine teeth and, in the wild state, a
long and slender muzzle, a deep-chested muscular
body, a bushy tail, and large, erect ears.
...
streaming.py
#!/usr/bin/env python3
url = "https://docs.oracle.com/javase/specs/jls/se8/jls8.pdf"
local_filename = url.split('/')[-1]
r = req.get(url, stream=True)
f.write(chunk)
Setting stream to True when making a request, Requests cannot release the connection back to the
pool unless we consume all the data or call Response.close().
f.write(chunk)
We use the htpasswd tool to create a user name and a password for basic HTTP authentication.
location /secure {
<body>
<p>
This is a secure page.
</p>
</body>
</html>
credentials.py
#!/usr/bin/env python3
user = 'user7'
passwd = '7user'
The script connects to the secure webpage; it provides the user name and the password necessary
to access the page.
$ ./credentials.py
<!DOCTYPE html>
<html lang="en">
<head>
<title>Secure page</title>
</head>
<body>
<p>
This is a secure page.
</p>
</body>
</html>
With the right credentials, the credentials.py script returns the secured page.