You are on page 1of 7

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/260308941

Real-Time Communications in the Web: Issues, Achievements, and


Ongoing Standardization Efforts

Article  in  IEEE Internet Computing · September 2012


DOI: 10.1109/MIC.2012.115

CITATIONS READS
80 4,223

2 authors:

Salvatore Loreto Simon Pietro Romano


Ericsson University of Naples Federico II
38 PUBLICATIONS   326 CITATIONS    164 PUBLICATIONS   1,118 CITATIONS   

SEE PROFILE SEE PROFILE

Some of the authors of this publication are also working on these related projects:

IEEE Communications Magazine Design and Implementation Series View project

SHINE ESA Project View project

All content following this page was uploaded by Simon Pietro Romano on 31 August 2015.

The user has requested enhancement of the downloaded file.


Standards
Editor: Barry Leiba • barryleiba@computer.org

Real-Time Communications
in the Web
Issues, Achievements, and Ongoing
Standardization Efforts
Salvatore Loreto • Ericsson Research

Simon Pietro Romano • University of Napoli Federico II

Web Real-Time Communication (WebRTC) is an upcoming standard that aims


to enable real-time communication among Web browsers in a peer-to-peer
fashion. The IETF RTCWeb and W3C WebRTC working groups are jointly
defining both the APIs and the underlying communication protocols for setting
up and managing a reliable communication channel between any pair of next-
generation Web browsers.

S
upport for enabling real-time communica- many industry and academic researchers’ recent
tion (RTC) in the Web is currently gaining work. Here, we discuss the growing interest in
momentum with the two main Internet integrating interactive multimedia features into
standardization bodies — the IETF and W3C. Web applications.
Standardization activities in this area aim to
define a W3C API that enables a Web application How Did We Get Here?
running on any device — through secure access A few documents have tried to foster browser-
to input peripherals (such as webcams and enabled RTC in place of traditional commu-
microphones) — to send and receive real-time nication ser vices. In one notable example,
media and data in a peer-to-peer (P2P) fashion researchers compared the complexity of tradi-
between browsers. The API’s design must allow tional telecommunications systems with that of
Web developers to implement functionality for Web architectures,1 concluding that the former
finding and connecting participants in a com- should move toward the latter to make this kind
munication session. The W3C API will rely on of communications available to as many users
existing protocols the IETF community has as possible. This paper also paved the way for
identified as the most appropriate for address- a (now expired) Internet draft 2 describing a
ing network-related aspects (control protocols, RESTful interface to the Session Initiation Pro-
connection establishment and management, tocol (SIP). Researchers and implementors have
connectionless transport, selection of the most devoted many other efforts to a similar approach,
suitable encoders and decoders, and so on). but often in a nonstandard way (for example, by
However, no clean separation exists between using proprietary plug-ins such as Adobe Flash
the two standardization activities, which clearly or Microsoft ActiveX) and without documenta-
intersect at the interface between the application- tion. A notable exception is recent work from
level responsibilities residing in a single node Ericsson Labs in which researchers have tried
and the intercommunication activities among to add native support for the Real-Time Trans-
remote nodes. port Protocol (RTP) and media devices within
This migration toward browser-enabled RTC browsers themselves by exploiting an ad hoc
represents a major breakthrough and has motivated JavaScript API and a properly modified version

68 Published by the IEEE Computer Society 1089-7801/12/$31.00 © 2012 IEEE IEEE INTERNET COMPUTING

IC-16-05-Standards.indd 68 8/8/12 4:54 PM


Real-Time Communications in the Web

of WebKit that uses gstreamer (see


https://labs.ericsson.com/developer-
com mu n it y/blog/beyond-ht m l5- Application
peerpeer-conversational-video). provider
That said, this migration has led
Proprietary Proprietary
the main standardization bodies to over HTTP/WebSocket over HTTP/WebSocket
devote fresh energy to this problem,
eventually leading to two differ-
ent yet interrelated working groups:
RTCWeb (http://tools.ietf.org/wg/
JS/HTML JS/HTML
rtcweb/charters) within the IETF application application
and WebRTC (www.w3.org/2011/04/ JS API
webrtc-charter.html) within the W3C.
RTCWeb has focused on the protocols
and interactions that the IETF must
address, including interoperability
with legacy systems (such as exist- Browser Browser
Media + media path
ing telecommunications systems). signaling (ICE/STUN/TURN)
WebRTC is working to define an API
allowing browsers and scripting Figure 1. The RTCWeb architecture. This typical voice-over-IP communication
languages to interact with media trapezoid has a server-mediated signaling path and a direct (browser-to-
devices (microphones, webcams, and browser) media path.
speakers), processing devices (encoders/
decoders), and transmission func-
tions. Such efforts will likely expand with CSS and control it via Java­ interact with Web browsers through
and enhance the HTML5 specifi­ Script. All these technologies are the WebRTC API, letting them prop-
cation, which already provides a delivered over the Web infrastruc- erly exploit and control browser
standard way to stream multimedia ture (via browsers, proxies, and Web functions and interact with brows-
content from servers to browsers. servers) using either HTTP or Web- ers themselves in both a proact­
Both working groups will have Socket (http://tools.ietf.org/html/ ive (for instance, to query browser
to consider any security issues that rfc6455). capabilities) and reactive (to receive
arise from the features that will be Scripting APIs let programmers browser-generated notif ications)
addressed. We expect (and hope to properly control and exploit browser way. The aforementioned application-
see) several prototype implementa- capabilities through JavaScript. Indeed, browser API must thus provide a
tions during these working groups’ as soon as new functions are added to wide set of functions, such as con-
lifetime. a browser, the W3C also devises novel nection management (in a P2P fash-
APIs to expose those functions to ion), encoding/decoding capabilities,
Open Web Platform developers, letting the browser’s capa- negotiation, selection and control,
HTML5 is generally used as an bilities grow closer to those of native media control, firewall, and network
umbrella term for the advances tak- application environments. address translation (NAT) traversal.
ing place on the so-called Open Web From a technical perspective, the
Platform, although HTML is itself just Architecture API is being implemented in Java­
one part of the various features used RTC’s architectural model is the Script, which has demonstrated its
for developing Web applications and browser RTC trapezoid (see Figure 1), effectiveness as a powerful scripting
commonly referred to as part of that which lets t he media pat h f low language on the client side of a Web
platform. The complete set of features directly between browsers without application.
also includes Cascading Style Sheets any inter vening ser vers. The sig- The application-browser API’s
(CSS), the Document Object Model naling path crosses servers that can design represents a challenging issue,
(DOM) convention, JavaScript, and modify, translate, or manage signals but doesn’t solve the overall problem
several scripting APIs. as needed. at hand. The complete picture, in fact,
HTML represents an application The idea is to have client-side envisages a continuous, real-time
and its data in a structured way, and Web applications (typically written flow of data streams across the net-
lets developers style the application in a mix of HTML and JavaScript) work to put into direct communication

SEPTEMBER/OCTOBER 201269

IC-16-05-Standards.indd 69 8/8/12 4:54 PM


Standards

Data Signaling It requires the browser core to pro-


SSRCN
SSRC1

keying
SRTP
… Since its inception, the general idea vide the functionality needed to
SCTP behind WebRTC’s design has been establish the necessary audio, video,
to fully specify how to control the and data channels. However, ongo-
STUN SRTP DTLS
media plane while leaving the sig- ing standardization work hasn’t yet
UDP naling plane to the application layer resulted in a decision regarding
as much as possible. The rationale which audio (G.711, opus, vorbis, and
IP is that different applications might so on) and video (H.264, VP8, and so
prefer to use different standardized on) codecs the browser core will use.
signaling protocols (such as SIP or The current assumption is that all
Figure 2. The RTCWeb protocol XMPP) or even something custom. In media and data streams will always
stack. RTCWeb allows for secure this approach, the important infor- be encrypted.
transmission of multiplexed real-time mation that browsers must exchange The API is being designed around
flows over the Internet. is the multimedia session descrip- three main concepts: PeerConnection,
t ion, wh ich specif ies t he t rans- MediaStreams, and DataChannel.
two (or even more) browsers at a port (and Interactive Connectivity
time, with no further intermediaries Establishment [ICE]) information as PeerConnection
along the path. We’re thus talking well as the media type, format, and A PeerConnection lets two users
about browser-to-browser commu- all associated media configuration communicate directly, browser-to-
nication, which is a revolutionary parameters necessary to establish browser. It then represents an asso-
approach to Web-based commu- the media path. ciation with a remote peer, which
nication because it allows P2P The original idea to exchange is usually another instance of the
(in which each peer is a browser) session descr iption infor mation same JavaScript application run-
data communication to enter the in the form of Session Description ning at the remote end. Communica-
Web application arena for the Protocol (SDP) “blobs” had several tions are coordinated via a signaling
first time. shortcomings, some of which would channel provided by scripting code
Imagine a real-time audio and be diff icult to address. Thus, the in the page via the Web server — for
video call between two browsers. IETF is standardizing the Java­ instance, using XMLHttpRequest or
Communication in such a scenario script Session Establishment Proto- WebSocket. Once the calling browser
might involve direct media streams col.3 JSEP provides the interface an establishes a peer connection, it can
between the two browsers, with the application needs to deal with the send MediaStream objects directly to
media path negotiated and instanti- negotiated local and remote session the remote browser.
ated through a complex sequence of descriptions (with the negotiation As Figure 2 illustrates, the peer-
interactions involving the following carried out via any desired signaling con nec t ion mec ha n i sm u ses t he
entities: mechanism), along with a standard- ICE protocol along with the Session
ized way for the application to inter- Traversal Utilities for NAT (STUN)
• the caller browser and the caller act with the ICE state machine. The and Traversal Using Relays around
Ja v a S c r i pt a p pl ic at ion (f or JSEP approach leaves the responsi- NAT (TURN) servers to let UDP-
example, through the JavaScript bility for driving the signaling state based media streams traverse NAT
API); machine entirely to the application. boxes and firewalls. ICE lets brows-
• the caller JavaScript application Rather than simply forwarding the ers discover enough information
and the application provider (typ- messages the browser emits to the about the topology of the network
ically, a Web server); remote side, the application must on which they’re deployed to find
• the application provider and the call the right APIs at the right times the best exploitable communica-
callee JavaScript application; and and convert the session descriptions tion path. Using ICE also provides a
• the callee JavaScript application and related ICE information into the security measure because it prevents
and the callee browser (again defined messages of its chosen sig- untrusted webpages and applica-
through the application-browser naling protocol. tions from sending data to hosts that
JavaScript API). aren’t expecting to receive it.
API The remote host feeds each sig-
Given this overall picture, we next The W3C WebRTC 1.0 A PI 4 lets a naling message into the receiving
dig into the details of the most rel- JavaScript application exploit the PeerConnection on arrival. The APIs
evant features of RTCWeb. novel browser’s real-time capabilities. send signaling messages that most

70 www.computer.org/internet/ IEEE INTERNET COMPUTING

IC-16-05-Standards.indd 70 8/8/12 4:54 PM


Real-Time Communications in the Web

applications will treat as opaque the issue of opening a new “hole” an instantiated PeerConnection object.
blobs, but which the Web applica- for each stream used, the IETF is Each subsequent call to the Create-
tion must transfer securely and effi- working on possibly reducing the DataChannel() function just creates
ciently to the other peer via the Web number of transport layer ports that a new DataChannel within the exist-
server. RTP-based real-time applications ing SCTP association.
consume by combining (that is, multi­
MediaStream plexing) multimedia traffic in a A Simple Example
A MediaStream is an abstract repre- single RTP session.5 Alice and Bob use a common call-
sentation of an actual data stream of ing service. To communicate, they
audio or video. It serves as a handle DataChannel must be simultaneously connected to
for managing actions on the media The DataChannel provides a generic the Web server that implements that
stream — such as displaying the transpor t ser v ice that lets Web service. Indeed, when Bob (or Alice)
stream’s content, recording it, or browsers exchange generic data in points his browser to the calling ser-
sending it to a remote peer. A Media­ a bidirectional, P2P fashion. Within vice webpage, he will download both
Stream can be extended to repre- the IETF, standardization work has the H T M L page and a JavaScr ipt
sent a stream that either comes from reached a genera l consensus on that keeps the browser connected
(remote stream) or is sent to (local using the Stream Control Transmis- v ia a secure H T T P or WebSocket
stream) a remote node. A Local­ sion Protocol (SCTP) encapsulated connection.
MediaStream represents a media stream in DTLS to handle nonmedia data Figure 3 illustrates a typical call
from a local media-capture device types.6 Encapsulating “SCTP over flow associated with setting up a
(such as a webcam or microphone). DTLS over ICE over UDP” provides a real-time, browser-enabled commu-
To create and use a LocalMedia­ NAT traversal solution together with nication channel. When Alice clicks
Stream, the Web application must confidentiality, source authentica- on the webpage button to start a call
request access from the user via tion, and integrity-protected trans- with Bob, the JavaScript instanti-
the getUserMedia() function. The fers. Moreover, this solution lets the ates a PeerConnection object. Once
application specifies the type of data transport interwork smoothly the PeerConnection is created, the
media — audio or video — to which with parallel media transports, and JavaScript code on the calling ser-
it requires access. The devices selec- both can share a single transport- vice side must set up media, which it
tor in the browser interface grants or layer port number. The IETF chose does via the MediaStream function.
denies access. Once the application is SCTP because it natively supports Alice must also grant a permission
done, it can revoke its own access by multiple streams with reliable, unre- to let the calling service access both
calling the stop() function on the liable, and partially reliable delivery her camera and her microphone.
LocalMediaStream. modes. It allows applications to open In the current W3C API, once an
Media-plane signaling is carried several independent streams (up to application has added some streams,
out-of-band between peers. Figure 2 65,536 in each direction) within an Alice’s browser, enriched with Java­
shows the protocol stack needed for SCTP association toward a peering Script code, generates a signaling
media transmission. Secure Real- SCTP endpoint. Each stream actu- message. The IETF hasn’t yet defined
time Transport Protocol (SRTP) car- ally represents a unidirectional logi- the exact format of this message. It
ries the media data and the RTP cal channel, providing in-sequence must contain media channel infor-
Control Protocol (RTCP) information delivery. mation and ICE candidates, as well
used to monitor transmission statis- An application can send a message as a fingerprint attribute binding
tics associated with data streams. sequence either ordered or un­ordered. the communication to Alice’s pub-
Datagram Transport Layer Security The message delivery order is pre- lic key. Alice’s browser then sends
(DTLS) is used as an SRTP key and for served only for all ordered messages this message to the signaling server
association management. The IETF is sent on the same stream. However, (for example, via XMLHttpRequest
still discussing the option of using the DataChannel API is bidirectional, or WebSocket). The signaling server
SDP Security Descriptions for Media which means that each DataChannel processes the message from Alice’s
Streams (SDES) as an alternative key bundles an incoming and an outgo- browser, determines that this is a
and for association management. ing SCTP stream. call to Bob, and sends a signaling
In a multimedia session, each An application sets up a data chan- message to Bob’s browser. The Java­
medium is typically carried in a nel (that is, creates the SCTP associa- Script on Bob’s browser processes the
separate RTP session with its own tion) when the CreateDataChannel() incoming message and alerts Bob.
RTCP packets. However, to overcome function is called for the first time on Should Bob decide to answer the

SEPTEMBER/OCTOBER 201271

IC-16-05-Standards.indd 71 8/8/12 4:54 PM


Standards

Remote peer Alice’s application


(Bob) PeerConnectionFactory (PeerConnectionObserver) PeerConnection With this model, security policies
mainly relate to JavaScript code
running in the browser, which actu-
CreatePeerConnectionFactory ally acts as a trusted computing base
for the visited sites. Widening the
scope to browser-to-browser com-
CreatePeerConnection
munications unveils new facets of
CreateLocalMediaStream the general security issue.
CreateLocalVideoTrack
CreateLocalAudioTrack First and foremost, we must con-
(Add tracks to the stream) sider communications security in
much the same way as we do with
AddStream
other network protocols (such as SIP)
CommitStreamChanges that allow for direct communication
OnSignalingMessage between any two endpoints — unless
(offer) we envisage the presence of relays
Send offer to the remote peer acting as transparent intermediaries.
Get answer from the remote peer ProcessSignalingMessage Second, if we let browsers directly
(answer) communicate between each other,
mechanisms must exist that let users
Media verify consent before the actual data
exchange phase starts. Consent veri-
OnAddStream
fication shouldn’t rely on the pres-
ence of a trusted server and should
hence be directly enforced by the
Figure 3. Call setup from Alice’s perspective. We can use the WebRTC API
browser aiming at initiating commu-
to create a direct media connection between Alice’s and Bob’s browsers.
nication with a potential peer. Last
but not least, RTC scenarios through
call, the JavaScript running in his time, prioritizing different parts of the Web clearly call for the browser to
browser would then instantiate a the WebRTC transport bundle should interact on a deep level with the node
PeerConnection related to the mes- be possible. on which it resides — for example, to
sage coming from Alice’s side. Then, access local audio and video devices
a process similar to the one on Alice’s Security Considerations before making a multimedia call with
browser would occur. Bob’s browser A s we descr ibed, t he RTC Web a target peer. Access policies must be
verifies that the calling service is approach and related architecture defined that potentially involve some
approved and the media streams are definitely represent a new challenge form of user consent, thus opening
created; afterward, it creates a sig- in the world of telecommunications up several new challenges.
naling message containing media because they allow Web browsers
information and ICE candidates, and to directly communicate with little
returns a fingerprint to Alice via the
signaling service.
or no intervention from the server
side. This is extremely challenging
because it requires that we consider
T he RTCWeb vision is to standard-
ize an open framework enabling
browser-to-browser mu lt imedia
Congestion Control several issues, among which trust applications along the most straight-
The discussion of a congestion con- and security play a fundamental forward path, with no need to install
trol mechanism specifically con- role. With respect to this last point, a plug-ins. Two major browser vendors
ceived for interactive media and data new set of potential security threats have already made available an early
transmissions is at an early stage — comes to the fore when we allow implementation of the framework in
the IETF hasn’t even decided whether direct browser-to-browser commu- their developers’ release. However,
to begin working on it. The idea nication. The basic Web security neither is fully compliant to the stan-
is to tightly couple the congestion policy currently in place is based on dard as of yet, even if they are expected
control of streams, ideally using only the principle of isolation (also known to become so soon.
a single congestion control instance as sandboxing), which lets users protect In the near future, standardiza-
for all the WebRTC transfers (that is, their computers from malicious scripts tion activities will focus on conges-
audio, video, or data). At the same and cross-site content references. tion control, audio and video codec

72 www.computer.org/internet/ IEEE INTERNET COMPUTING

IC-16-05-Standards.indd 72 8/8/12 4:54 PM


Real-Time Communications in the Web

selection, telepresence, and enhanced Description Protocol (SDP) Por t Num- Simon Pietro Romano is an assistant profes-
usage of data channels. ber s,” I ET F Internet draft, Feb. 2012. sor in the Computer Engineering Depart-
6. R. Jesup, S. Loreto, and M. Tuexen, “RTC- ment at the University of Napoli, and
References Web Datagram Connection,” IETF Inter- cofounder of Meetecho, a startup (and
1. H. Sinnreich and W. Wimmreuter, “Com- net draft, Mar. 2012. spin-off of the university) focusing on
munications on the Web,” Elektrotechnik Internet-based conferencing and collabo-
und Informationstechnik, vol. 127, 2010, Salvatore Loreto is a senior researcher in the ration. His research interests primarily
pp. 187–194; http://dx.doi.org/10.1007/ MultiMedia Technology branch of the fall in the field of networking, with spe-
s00502-010-0742-l. NomadicLab at Ericsson Research Fin- cial regard to real-time multimedia appli-
2. H. Sinnreich and A. Johnston, “Sip APIs land. He’s made contributions in Internet cations, network security, and autonomic
for Communications on the Web,” IETF transpor t protocols, signal protocols, network management. Romano has a PhD
Internet draft, June 2010. VoIP, IP-telephony convergence, con- in computer networks from the Univer-
3. J. Uberti and C. Jennings, “Javascript ferencing over IP, 3GPP IP Multimedia sity of Napoli. He actively participates
Session Establishment Protocol” IETF Subsystem, HTTP, and Web technologies. in IETF standardization activities in the
Internet draft, June 2012. Loreto has a PhD in computer networking Real-time Applications and Infrastruc-
4. A. Bergkvist et al., “WebRTC 1.0: Real- from Napoli University. He serves within ture area.
Time Communication between Browsers,” the IETF as cochair of the SIP Overload
W3C working draft 09, Feb. 2012; www. Control (SOC), BiDirectional or Ser ver-
w3.org/TR/webrtc/. Initiated HTTP (HyBi), and Application Selected CS articles and columns
5. C. Holmberg and H. Alvestrand, “Multi­ Area (Appsawg) working groups. Contact are also available for free at http://
ple x i ng Negot iat ion Usi ng Session him at salvatore.loreto@ieee.org. ComputingNow.computer.org.

NEW Transactions on Computers


{EssentialSets} Available:

ESSENTIAL INDUSTRIAL Edited by TC AE Elisardo Antelo, this EssentialSet


IMPLEMENTATIONS OF surveys the industrial design of floating-point
FLOATING-POINT UNITS units during the last decade. This EssentialSet is
DURING THE LAST DECADE: broken into two volumes, sold separately.

PDF edition • $15 each ($9 members)


VOLUMES 1 & 2
Order Online: computer.org/store.

tc_essentialset_hHalf_mjb.indd
SEPTEMBER/OCTOBER 201273 1 7/25/2012 1:40:03 PM

IC-16-05-Standards.indd 73 8/8/12 4:54 PM


View publication stats

You might also like