Professional Documents
Culture Documents
net/publication/260308941
CITATIONS READS
80 4,223
2 authors:
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Simon Pietro Romano on 31 August 2015.
Real-Time Communications
in the Web
Issues, Achievements, and Ongoing
Standardization Efforts
Salvatore Loreto • Ericsson Research
S
upport for enabling real-time communica- many industry and academic researchers’ recent
tion (RTC) in the Web is currently gaining work. Here, we discuss the growing interest in
momentum with the two main Internet integrating interactive multimedia features into
standardization bodies — the IETF and W3C. Web applications.
Standardization activities in this area aim to
define a W3C API that enables a Web application How Did We Get Here?
running on any device — through secure access A few documents have tried to foster browser-
to input peripherals (such as webcams and enabled RTC in place of traditional commu-
microphones) — to send and receive real-time nication ser vices. In one notable example,
media and data in a peer-to-peer (P2P) fashion researchers compared the complexity of tradi-
between browsers. The API’s design must allow tional telecommunications systems with that of
Web developers to implement functionality for Web architectures,1 concluding that the former
finding and connecting participants in a com- should move toward the latter to make this kind
munication session. The W3C API will rely on of communications available to as many users
existing protocols the IETF community has as possible. This paper also paved the way for
identified as the most appropriate for address- a (now expired) Internet draft 2 describing a
ing network-related aspects (control protocols, RESTful interface to the Session Initiation Pro-
connection establishment and management, tocol (SIP). Researchers and implementors have
connectionless transport, selection of the most devoted many other efforts to a similar approach,
suitable encoders and decoders, and so on). but often in a nonstandard way (for example, by
However, no clean separation exists between using proprietary plug-ins such as Adobe Flash
the two standardization activities, which clearly or Microsoft ActiveX) and without documenta-
intersect at the interface between the application- tion. A notable exception is recent work from
level responsibilities residing in a single node Ericsson Labs in which researchers have tried
and the intercommunication activities among to add native support for the Real-Time Trans-
remote nodes. port Protocol (RTP) and media devices within
This migration toward browser-enabled RTC browsers themselves by exploiting an ad hoc
represents a major breakthrough and has motivated JavaScript API and a properly modified version
68 Published by the IEEE Computer Society 1089-7801/12/$31.00 © 2012 IEEE IEEE INTERNET COMPUTING
SEPTEMBER/OCTOBER 201269
keying
SRTP
… Since its inception, the general idea vide the functionality needed to
SCTP behind WebRTC’s design has been establish the necessary audio, video,
to fully specify how to control the and data channels. However, ongo-
STUN SRTP DTLS
media plane while leaving the sig- ing standardization work hasn’t yet
UDP naling plane to the application layer resulted in a decision regarding
as much as possible. The rationale which audio (G.711, opus, vorbis, and
IP is that different applications might so on) and video (H.264, VP8, and so
prefer to use different standardized on) codecs the browser core will use.
signaling protocols (such as SIP or The current assumption is that all
Figure 2. The RTCWeb protocol XMPP) or even something custom. In media and data streams will always
stack. RTCWeb allows for secure this approach, the important infor- be encrypted.
transmission of multiplexed real-time mation that browsers must exchange The API is being designed around
flows over the Internet. is the multimedia session descrip- three main concepts: PeerConnection,
t ion, wh ich specif ies t he t rans- MediaStreams, and DataChannel.
two (or even more) browsers at a port (and Interactive Connectivity
time, with no further intermediaries Establishment [ICE]) information as PeerConnection
along the path. We’re thus talking well as the media type, format, and A PeerConnection lets two users
about browser-to-browser commu- all associated media configuration communicate directly, browser-to-
nication, which is a revolutionary parameters necessary to establish browser. It then represents an asso-
approach to Web-based commu- the media path. ciation with a remote peer, which
nication because it allows P2P The original idea to exchange is usually another instance of the
(in which each peer is a browser) session descr iption infor mation same JavaScript application run-
data communication to enter the in the form of Session Description ning at the remote end. Communica-
Web application arena for the Protocol (SDP) “blobs” had several tions are coordinated via a signaling
first time. shortcomings, some of which would channel provided by scripting code
Imagine a real-time audio and be diff icult to address. Thus, the in the page via the Web server — for
video call between two browsers. IETF is standardizing the Java instance, using XMLHttpRequest or
Communication in such a scenario script Session Establishment Proto- WebSocket. Once the calling browser
might involve direct media streams col.3 JSEP provides the interface an establishes a peer connection, it can
between the two browsers, with the application needs to deal with the send MediaStream objects directly to
media path negotiated and instanti- negotiated local and remote session the remote browser.
ated through a complex sequence of descriptions (with the negotiation As Figure 2 illustrates, the peer-
interactions involving the following carried out via any desired signaling con nec t ion mec ha n i sm u ses t he
entities: mechanism), along with a standard- ICE protocol along with the Session
ized way for the application to inter- Traversal Utilities for NAT (STUN)
• the caller browser and the caller act with the ICE state machine. The and Traversal Using Relays around
Ja v a S c r i pt a p pl ic at ion (f or JSEP approach leaves the responsi- NAT (TURN) servers to let UDP-
example, through the JavaScript bility for driving the signaling state based media streams traverse NAT
API); machine entirely to the application. boxes and firewalls. ICE lets brows-
• the caller JavaScript application Rather than simply forwarding the ers discover enough information
and the application provider (typ- messages the browser emits to the about the topology of the network
ically, a Web server); remote side, the application must on which they’re deployed to find
• the application provider and the call the right APIs at the right times the best exploitable communica-
callee JavaScript application; and and convert the session descriptions tion path. Using ICE also provides a
• the callee JavaScript application and related ICE information into the security measure because it prevents
and the callee browser (again defined messages of its chosen sig- untrusted webpages and applica-
through the application-browser naling protocol. tions from sending data to hosts that
JavaScript API). aren’t expecting to receive it.
API The remote host feeds each sig-
Given this overall picture, we next The W3C WebRTC 1.0 A PI 4 lets a naling message into the receiving
dig into the details of the most rel- JavaScript application exploit the PeerConnection on arrival. The APIs
evant features of RTCWeb. novel browser’s real-time capabilities. send signaling messages that most
applications will treat as opaque the issue of opening a new “hole” an instantiated PeerConnection object.
blobs, but which the Web applica- for each stream used, the IETF is Each subsequent call to the Create-
tion must transfer securely and effi- working on possibly reducing the DataChannel() function just creates
ciently to the other peer via the Web number of transport layer ports that a new DataChannel within the exist-
server. RTP-based real-time applications ing SCTP association.
consume by combining (that is, multi
MediaStream plexing) multimedia traffic in a A Simple Example
A MediaStream is an abstract repre- single RTP session.5 Alice and Bob use a common call-
sentation of an actual data stream of ing service. To communicate, they
audio or video. It serves as a handle DataChannel must be simultaneously connected to
for managing actions on the media The DataChannel provides a generic the Web server that implements that
stream — such as displaying the transpor t ser v ice that lets Web service. Indeed, when Bob (or Alice)
stream’s content, recording it, or browsers exchange generic data in points his browser to the calling ser-
sending it to a remote peer. A Media a bidirectional, P2P fashion. Within vice webpage, he will download both
Stream can be extended to repre- the IETF, standardization work has the H T M L page and a JavaScr ipt
sent a stream that either comes from reached a genera l consensus on that keeps the browser connected
(remote stream) or is sent to (local using the Stream Control Transmis- v ia a secure H T T P or WebSocket
stream) a remote node. A Local sion Protocol (SCTP) encapsulated connection.
MediaStream represents a media stream in DTLS to handle nonmedia data Figure 3 illustrates a typical call
from a local media-capture device types.6 Encapsulating “SCTP over flow associated with setting up a
(such as a webcam or microphone). DTLS over ICE over UDP” provides a real-time, browser-enabled commu-
To create and use a LocalMedia NAT traversal solution together with nication channel. When Alice clicks
Stream, the Web application must confidentiality, source authentica- on the webpage button to start a call
request access from the user via tion, and integrity-protected trans- with Bob, the JavaScript instanti-
the getUserMedia() function. The fers. Moreover, this solution lets the ates a PeerConnection object. Once
application specifies the type of data transport interwork smoothly the PeerConnection is created, the
media — audio or video — to which with parallel media transports, and JavaScript code on the calling ser-
it requires access. The devices selec- both can share a single transport- vice side must set up media, which it
tor in the browser interface grants or layer port number. The IETF chose does via the MediaStream function.
denies access. Once the application is SCTP because it natively supports Alice must also grant a permission
done, it can revoke its own access by multiple streams with reliable, unre- to let the calling service access both
calling the stop() function on the liable, and partially reliable delivery her camera and her microphone.
LocalMediaStream. modes. It allows applications to open In the current W3C API, once an
Media-plane signaling is carried several independent streams (up to application has added some streams,
out-of-band between peers. Figure 2 65,536 in each direction) within an Alice’s browser, enriched with Java
shows the protocol stack needed for SCTP association toward a peering Script code, generates a signaling
media transmission. Secure Real- SCTP endpoint. Each stream actu- message. The IETF hasn’t yet defined
time Transport Protocol (SRTP) car- ally represents a unidirectional logi- the exact format of this message. It
ries the media data and the RTP cal channel, providing in-sequence must contain media channel infor-
Control Protocol (RTCP) information delivery. mation and ICE candidates, as well
used to monitor transmission statis- An application can send a message as a fingerprint attribute binding
tics associated with data streams. sequence either ordered or unordered. the communication to Alice’s pub-
Datagram Transport Layer Security The message delivery order is pre- lic key. Alice’s browser then sends
(DTLS) is used as an SRTP key and for served only for all ordered messages this message to the signaling server
association management. The IETF is sent on the same stream. However, (for example, via XMLHttpRequest
still discussing the option of using the DataChannel API is bidirectional, or WebSocket). The signaling server
SDP Security Descriptions for Media which means that each DataChannel processes the message from Alice’s
Streams (SDES) as an alternative key bundles an incoming and an outgo- browser, determines that this is a
and for association management. ing SCTP stream. call to Bob, and sends a signaling
In a multimedia session, each An application sets up a data chan- message to Bob’s browser. The Java
medium is typically carried in a nel (that is, creates the SCTP associa- Script on Bob’s browser processes the
separate RTP session with its own tion) when the CreateDataChannel() incoming message and alerts Bob.
RTCP packets. However, to overcome function is called for the first time on Should Bob decide to answer the
SEPTEMBER/OCTOBER 201271
selection, telepresence, and enhanced Description Protocol (SDP) Por t Num- Simon Pietro Romano is an assistant profes-
usage of data channels. ber s,” I ET F Internet draft, Feb. 2012. sor in the Computer Engineering Depart-
6. R. Jesup, S. Loreto, and M. Tuexen, “RTC- ment at the University of Napoli, and
References Web Datagram Connection,” IETF Inter- cofounder of Meetecho, a startup (and
1. H. Sinnreich and W. Wimmreuter, “Com- net draft, Mar. 2012. spin-off of the university) focusing on
munications on the Web,” Elektrotechnik Internet-based conferencing and collabo-
und Informationstechnik, vol. 127, 2010, Salvatore Loreto is a senior researcher in the ration. His research interests primarily
pp. 187–194; http://dx.doi.org/10.1007/ MultiMedia Technology branch of the fall in the field of networking, with spe-
s00502-010-0742-l. NomadicLab at Ericsson Research Fin- cial regard to real-time multimedia appli-
2. H. Sinnreich and A. Johnston, “Sip APIs land. He’s made contributions in Internet cations, network security, and autonomic
for Communications on the Web,” IETF transpor t protocols, signal protocols, network management. Romano has a PhD
Internet draft, June 2010. VoIP, IP-telephony convergence, con- in computer networks from the Univer-
3. J. Uberti and C. Jennings, “Javascript ferencing over IP, 3GPP IP Multimedia sity of Napoli. He actively participates
Session Establishment Protocol” IETF Subsystem, HTTP, and Web technologies. in IETF standardization activities in the
Internet draft, June 2012. Loreto has a PhD in computer networking Real-time Applications and Infrastruc-
4. A. Bergkvist et al., “WebRTC 1.0: Real- from Napoli University. He serves within ture area.
Time Communication between Browsers,” the IETF as cochair of the SIP Overload
W3C working draft 09, Feb. 2012; www. Control (SOC), BiDirectional or Ser ver-
w3.org/TR/webrtc/. Initiated HTTP (HyBi), and Application Selected CS articles and columns
5. C. Holmberg and H. Alvestrand, “Multi Area (Appsawg) working groups. Contact are also available for free at http://
ple x i ng Negot iat ion Usi ng Session him at salvatore.loreto@ieee.org. ComputingNow.computer.org.
tc_essentialset_hHalf_mjb.indd
SEPTEMBER/OCTOBER 201273 1 7/25/2012 1:40:03 PM