You are on page 1of 36

The HTTP protocol

Jesús Arias Fisteus

Web Applications (2022/23)

Web Applications (2022/23) The HTTP protocol 1


The HTTP protocol

HTTP (Hypertext Transfer Protocol) is an


application-level stateless protocol for distributed and
collaborative hypertext information systems.

Web Applications (2022/23) The HTTP protocol 2


The HTTP protocol

HTTP is based on the transmission of messages on top of the TCP


transport protocol:
I Clients send a request message to a server, requesting it to
perform an action on a specific resource (often, just getting
the resource itself).
I Servers send back a response message, which usually
includes the contents of the requested resource.

Web Applications (2022/23) The HTTP protocol 3


Protocol versions

The mostly used versions nowadays are:


I HTTP/1.1: the most frequently used version since the end of
90s.
I HTTP/2: new version being deployed in the last few years. It
improves performance through binary encoding of messages,
header compression, request/response multiplexing on top of
single TCP connections, server initiated requests, etc.

(Both protocol versions are compatible in terms of message


semantics and structure, with changes affecting mainly to message
encoding and transport on top of TCP connections.)

Web Applications (2022/23) The HTTP protocol 4


Resource identifiers

HTTP resources are identified in HTTP through uniform


resource identifiers (URI).

A URI is a compact character sequence that identifies a


physical or abstract resource, using multiple protocols and
applications.

A URI that also provides the information needed to locate and


access the resource is named uniform resource locator
(URL).

Web Applications (2022/23) The HTTP protocol 5


Structure of a URI: example

Web Applications (2022/23) The HTTP protocol 6


Structure of a URI
I Scheme: refers to the name of a scheme, which defines how
identifiers are assigned within its scope. The schemes used for
HTTP are http and https.
I Authority: element from a hierarchical naming authority,
typically based on a DNS domain name or a network address
(IP, IPv6) and, optionally, a port number.
I Path: element that identifies a resource within the scope of
the provided scheme and authority, typically hierarchically
organized in fragments separated by slashes (“/”).
I Query: non hierarchical data that, combined with the path,
allows to identify the resource. It’s usually presented as one or
more name/value pairs.
I Fragment identifier: identifies a secondary resource in the
context of the primary resource such as, for example, a
specific fragment of a Web page.
Web Applications (2022/23) The HTTP protocol 7
Examples of URIs with query

https://aulaglobal.uc3m.es/course/view.php?id=91019

https://www.google.com/search?q=madrid&tbm=isch

Web Applications (2022/23) The HTTP protocol 8


Examples of URIs with fragment identifiers

http://example.com/manual#cap3

http://example.com/manual?lang=es#cap3

Web Applications (2022/23) The HTTP protocol 9


Reserved characters

I URIs can contain just US-ASCII letters, digits and a few


graphical symbols (“-”, “.”, “ ” y “∼”), as well as reserved
characters used, among other things, to delimit their
components (“:”, “/”, “?”, “#”, “[”, “]”, “@”, etc.)
I Other characters, as well as any reserved character when used
to represent normal data instead of a delimiter, must be
encoded with URL encoding.

Web Applications (2022/23) The HTTP protocol 10


URL encoding

I Each character not allowed in a URI is encoded as an octet


sequence. Each octet is presented with the symbol “%”
followed by the octet itself represented with two hexadecimal
characters.
I For example:
I “path=docs/index.html” is encoded as
“path=docs%2Findex.html” (the “/” character is encoded
with the 2F octet in ASCII, UTF-8 and most character
encoding schemes).
I “q=evaluación” is encoded as “q=evaluaci%C3%B3n” (the
“ó” character is encoded in UTF-8 with the octet sequence
C3–B3).

Web Applications (2022/23) The HTTP protocol 11


HTTP methods

Request messages specify a method, which defines the action


to perform on the resource.

Web Applications (2022/23) The HTTP protocol 12


HTTP methods

The main methods used by Web applications are:


I GET: get the resource.
I POST: do some processing on the resource (the actual kind
of processing is resource-dependent) based on the data
included in the request message.

(Other methods defined by the HTTP standard are: HEAD, PUT,


DELETE, CONNECT, OPTIONS and TRACE.)

Web Applications (2022/23) The HTTP protocol 13


Requests with GET method

GET requests:
I Are used to get the contents of resources (HTML pages,
images, etc.).
I Are generated by Web browsers when, among others, users
type some URL at their address bar or click on a hyperlink,
additional resources linked to a just-received Web page are
needed, or some forms need to be sent.
I Are subject to caching to optimize resource utilization.
I Are supposed to be safe, that is, they cannot have side effects
on the server, application state, etc.

Web Applications (2022/23) The HTTP protocol 14


Requests with POST method

POST requests:
I Are used to perform actions (authenticate a user, add a
product to the shopping cart, confirm an order in an online
shop, upload a message to a social network, etc.).
I Are generated by Web browsers when some forms are sent.
I Aren’t subject to caching.
I Can be unsafe. Among other potential problems, repeating
the request could have side effects (e.g. confirming the same
order twice).

Web Applications (2022/23) The HTTP protocol 15


Structure of a request message

An HTTP request includes:


I Resource URL (without scheme and authority).
I Method.
I Request headers: additional data about how the request
needs to be processed.
I Request body (only for some methods): data to be
processed by the server.

Web Applications (2022/23) The HTTP protocol 16


Request body

A request body:
I Cannot appear in GET resources.
I Includes, in POST request, the data needed by the server to
process it.
I Is usually combined with the Content-Type and
Content-Length request headers.

Web Applications (2022/23) The HTTP protocol 17


Example of an HTTP/1.1 request

GET /Inicio HTTP/1.1


Host: www.uc3m.es
Connection: keep-alive
Cache-Control: max-age=0
User-Agent: Chrome/62.0.3202.89
Upgrade-Insecure-Requests: 1
Accept: text/html,application/xhtml+xml,application/xml;q=0.9
Accept-Encoding: gzip, deflate, br
Accept-Language: es-ES,es;q=0.9,en;q=0.8,en-US;q=0.7

Web Applications (2022/23) The HTTP protocol 18


Structure of a response message

An HTTP response includes:


I A status: numeric code that indicates the result of the
processing of the request.
I Response headers: additional data about how the response
needs to be processed.
I Response body: representation of the response to the
request, typically as an HTML page, an image, etc.

Web Applications (2022/23) The HTTP protocol 19


Example of an HTTP/1.1 response

HTTP/1.1 200 OK
Server: Apache-Coyote/1.1
Set-Cookie: JSESSIONID=E26E8...; Domain=www.uc3m.es; HttpOnly
Cache-Control: no-store
Last-Modified: Fri, 10 Nov 2017 11:44:28 CET
Content-Type: text/html;charset=UTF-8
Transfer-Encoding: chunked
Date: Fri, 10 Nov 2017 10:44:28 GMT

<!DOCTYPE html>
<html lang="es" class="no-js">
<head>
<title>Inicio | UC3M</title>
(...)

Web Applications (2022/23) The HTTP protocol 20


Main headers used in requests and responses

I Connection: information on whether the TCP connection


should be closed after completing the response to the request.
I Content-Encoding: data encoding (compression, typically)
that has been applied to the request or response body.
I Content-Length: length in bytes of the request or response
body.
I Content-Type: MIME type of the request or response body
(e.g., text/html).

Web Applications (2022/23) The HTTP protocol 21


Main request-specific headers

I Accept: client preferences regarding content types to receive.


I Accept-Encoding: client preferences regarding the encoding
to apply to the response body (compression, typically).
I Cookie: sending cookies back to the server.
I Host: requested authority.
I If-Modified-Since: timestamp of the latest version of the
resource at the client’s cache.
I If-None-Match: ETag value received with the last version of
the resource at the client’s cache.
I Referer: URL from which the current request originates.
I User-Agent: information (name, version, etc.) about the
client’s software.

Web Applications (2022/23) The HTTP protocol 22


Main response headers

I Cache-Control: instructions about how the resource can be


stored in cache.
I ETag: code that identifies the current contents of the
resource.
I Expires: up to when the resource can be taken from cache.
I Vary: list of request headers whose values would make the
contents of a resource change.
I Location: in redirection responses, new URL to be requested
by the client.
I Server: information (name, version, etc.) about the server’s
software.
I Set-Cookie: cookies that the client should send back in
future requests.

Web Applications (2022/23) The HTTP protocol 23


Status codes of HTTP responses

Five types of status codes:


I 1XX: informational.
I 2XX: resource successfully processed.
I 3XX: redirection to another resource.
I 4XX: error in the request.
I 5XX: error in the server.

Web Applications (2022/23) The HTTP protocol 24


Status codes of HTTP responses
Code Reason Meaning
200 OK Request successfully processed.
301 Moved Permanently Resource moved to another URL (Location
header), that the client should always use from
now on.
302 Found Resource temporary moved to another loca-
tion (Location header).
303 See Other The other resource (confirmation page,
progress, etc.) has to be requested with GET
method.
304 Not Modified The client can use the version of the resource
it currently has in cache.
400 Bad Request The client sent an invalid HTTP request (syn-
tax, etc.).
403 Forbidden It’s forbidden to the client to access this re-
source.
404 Not Found There is no resource with such path.
405 Method Not Allowed The resource does not allow such method.
500 Internal Server Error Server-side error while processing the request.

Web Applications (2022/23) The HTTP protocol 25


Cookies

HTTP is a stateless protocol, i.e., each request is independent


from other requests.

Cookies allow the server to keep state: they are small data
pieces associated to a name that the server creates and sends
to the client in its response messages, in order to the client to
send them back with its next requests.

Web Applications (2022/23) The HTTP protocol 26


Structure of cookies

A cookie is represented as a short string that contains:


I A name: a server can set several cookies with different names.
I A value: the actual data of the cookie.
I Attributes:
I Domain and Path: they define in which requests, according to
their authority and path, the client will send the cookie to the
server.
I Expires y Max-Age: they define when the client must stop
using the cookie. If not specified, it’s removed when the
browser gets closed.
I Secure: the cookie can only be sent through secure channels
(HTTPS, typically).
I HttpOnly: the cookie can only be sent or accessed through
HTTP or HTTPS. For example, accessing it from JavaScript
code is forbidden (because of security reasons).

Web Applications (2022/23) The HTTP protocol 27


Examples of the use of cookies

Setting cookies (at HTTP responses):


Set-Cookie: sid=4RT67aY...;
Expires=Thu, 13 Feb 2020 21:47:38 GMT;
Path=/; Domain=.example.com; Secure; HttpOnly

Sending cookies back (at HTTP requests):


Cookie: sid=4RT67aY...

Web Applications (2022/23) The HTTP protocol 28


Applications of cookies

I Some typical uses of cookies include:


I Session tracking: the user signs in to create a session (the
server sets a cookie that includes a session token). The server
identifies future requests as part of the same session because
they include the same session token.
I Storing user preferences at the client side: user preferences
for a Web site can be stored in cookies inside the user’s Web
browser.
I User tracking: Web sites can use cookies to track the user’s
behavior (when third parties do that, e.g. with commercial
intentions, their use can be considered abusive).

Web Applications (2022/23) The HTTP protocol 29


HTTP over TLS (HTTPS)

HTTP over TLS, also known as HTTPS (Hypertext


Transfer Protocol Secure), defines how HTTP is transported
over a secure TLS (Transport Layer Security) channel.

Web Applications (2022/23) The HTTP protocol 30


Security properties of HTTPS

The use of HTTP over TLS provides the following security


properties:
I Authentication: the server is always authenticated and,
optionally, the client.
I Confidentiality: data sent through the secure channel once it
has been established can only be seen by the two end-points
of the channel.
I Integrity: any modification to the data sent through the
secure channel once it has been established will be detected.

Web Applications (2022/23) The HTTP protocol 31


HTTP/2

HTTP/2 was designed to fix some issues in HTTP/1.1 that


impact the performance of current Web applications. More
specifically, it optimizes how HTTP messages are transported
through the underlying connection.

HTTP/2 keeps HTTP/1.1 message semantics (structure,


headers, etc.).

Web Applications (2022/23) The HTTP protocol 32


Main changes in HTTP/2
I The frame is the basic unit of the protocol. It is encoded in
binary format in order to speed up its processing. Each
HTTP message is encoded as one or more frames.
I Multiple requests and their responses are multiplexed as
independent flows within the same connection, without
delays in the processing of some requests affecting the rest of
concurrent requests.
I The protocol integrates flow control and flow prioritization
mechanisms.
I A server can send responses to the client without the client
having sent the corresponding request (server push), thus
anticipating future client requests.
I A header compression mechanism (HPACK) is applied to
reduce the size of message headers, given the high redundancy
they contain.
Web Applications (2022/23) The HTTP protocol 33
HTTP/3

HTTP/3 provides the same HTTP semantics as HTTP/1.1 and


HTTP/2, but on top of the QUIC transport protocol instead of
TCP:
I QUIC provides applications with flow-controlled streams
(ordered sequences of bytes), low latency connection
establishment and network path migration.
I QUIC works on top of the UDP protocol.
I QUIC integrates with TLS for security.

Web Applications (2022/23) The HTTP protocol 34


References

I HTTP Semantics. IETF RFC 9110. June 2022.


I HTTP/1.1. IETF RFC 9112. June 2022.
I HTTP/2. IETF RFC 9113. June 2022.
I HTTP/3. IETF RFC 9114. June 2022.
I Uniform Resource Identifier (URI): Generic Syntax. IETF RFC
3986. January 2005.
I HTTP State Management Mechanism. IETF RFC 6265. April
2011.

Web Applications (2022/23) The HTTP protocol 35


Other resources

I MDN Web Docs, “Web Technology for Developers: HTTP”


I Andrew S. Tanenbaum, David J. Wetherall, Computer
Networks, 5th ed., Prentice Hall (2010):
I Chapter 7.3 (The World Wide Web).
I Online access at O’Reilly through UC3M Library

Web Applications (2022/23) The HTTP protocol 36

You might also like