You are on page 1of 806

Copyright Country of printing Confidentiality Legal statements Trademarks

Essentials of Real-Time Networking: How Real-Time Disrupts the Best-Effort Paradigm


Copyright © 2004 Nortel Networks, All Rights Reserved
Printed in the United States of America

NORTEL, NORTEL NETWORKS, NORTEL NETWORKS LOGO, the GLOBEMARK, BAYSTACK, CALLPILOT, CONTIVITY,
DMS, MERIDIAN, MERIDIAN 1, NORSTAR, OPTERA, OPTIVITY, PASSPORT, SUCCESSION and SYMPOSIUM are trademarks
of Nortel Networks.
ALTEON is a trademark of Alteon WebSystems, Inc.
ARIN is a trademark of American Registry of Internet Numbers, Ltd.
APACHE is a trademark of Apache Micro Peripherals, Inc.
APPLE, APPLETALK, MAC OS and QUICKTIME, are trademarks of Apple Computer Inc.
CAPEX is a trademark of Solyman Ashrafi
CABLELABS, DOCSIS and PACKETCABLE are trademarks of Cable Television Laboratories, Inc.
C7 and CALIX are trademarks of Calix Networks Inc.
CANAL + is a trademark of Canal + Corporation
KEYMILE is a trademark of Datentechnik Aktiengesellschaft
MPEGABLE is a trademark of Dicas Digital Image Coding GmbH
CINEPAK is a trademark of Digital Origin, Inc.
ECI is a trademark of ECI Telecom Limited
DIGICIPHER and GENERAL INSTRUMENT are trademarks of General Instrument Corporation
OPEX is a trademark of Gensym Corporation
INFOTECH is a trademark of Infotech, Inc.
ESCON and LOTUS NOTES are trademarks of International Business Machines Corporation (dba IBM Corporation).
IANA and ICANN are trademarks of Internet Corporation of Assigned Names and Numbers.
NAGRA and NAGRAVISION are trademarks of Kudelski A.B.
INDEO is a trademark of Ligos Corporation
ENHYDRA is a trademark of Lutris Technologies, Inc.
SIP is a trademark of Merrimac Industries, Inc.
FORE SYSTEMS is a trademark of Marconi Communications, Inc.
ACTIVEX, NETMEETING, MICROSOFT WINDOWS, OUTLOOK, WINDOWS, and WINDOWS MEDIA are trademarks of Microsoft
Corporation
NETIQ is a trademark of NetIQ Corporation
TIMBUKTU is a trademark of Netopia, Inc.
OPNET is a trademark of OPNET Technologies, Inc.
ECAD is a trademark of Pentek, Inc.
REALAUDIO, REALNETWORKS, REALPLAYER, REALPROXY, and REALVIDEO are trademarks of RealNetworks, Inc.
PESQ is a trademark of Psytechnics Limited
POWERTV and SCIENTIFIC ATLANTA are trademarks of Scientific-Atlanta, Inc.
SILKROAD is a trademark of SilkRoad Technology, Inc.
SPRINT is a trademark of Sprint Communications Company L.P.
CDMA2000 is a trademark of Telecommunications Industry Association
NETBSD is a trademark of The NetBSD Foundation
THE YANKEE GROUP is a trademark of The Yankee Group
VERIZON is a trademark of Verizon Trademark Services LLC
BSD is a trademark of Wind River Systems, Inc.
ZENITH is a trademark of Zenith Electronics Corporation
Trademarks are acknowledged with an asterisk (*) at their first appearance in the document.
i

Contents 1
Author Biographies .................................................................................. ix
Acknowledgments ................................................................................................ xiv

Chapter 1. Introduction .............................................................................1


Section I: Real Time Applications and Services ..................................................... 3
Section II: Legacy Networks ................................................................................... 5
Section III: Protocols for Real-time Applications .....................................................5
Section IV: Packet Network Technologies ...............................................................6
Section V: Network Design and Implementation .....................................................7
Section VI: Examples ..............................................................................................8
Let’s Get Started .....................................................................................................9
Reading the Transport Path Diagrams ..................................................................10
Conclusion ............................................................................................................11

Section I: Real-Time Applications and Services ...........13

Chapter 2. The Real-Time Paradigm Shift ......................................15


Concepts Covered ................................................................................................15
Introduction ...........................................................................................................15
What is convergence? ...........................................................................................16
What do we mean by real time? ............................................................................19
Service quality and performance requirements ....................................................25
Conclusion ............................................................................................................31
What You Should Have Learned ........................................................................... 33
References ............................................................................................................34

Chapter 3. Voice Quality ...................................................................35


Concepts covered .................................................................................................35
Introduction ...........................................................................................................36
Voice calls through an IP network .........................................................................36
Factors affecting VoIP conversation quality ...........................................................38
Quality metrics for voice ........................................................................................53
What you should have learned ..............................................................................63
References ............................................................................................................64

Chapter 4. Video Quality ..................................................................67


Concepts covered .................................................................................................67

Copyright © 2004 Nortel Networks Essentials of Real-Time Networking


ii Contents

Video .....................................................................................................................67
Video Impairments ................................................................................................68
Digital video impairments ......................................................................................68
Causes of video signal impairments .....................................................................69
Digital video ..........................................................................................................70
Sequences of frames ............................................................................................74
What you should have learned ..............................................................................79

Chapter 5. Codecs for Voice and Other Real-Time Applications .81


Concepts Covered ................................................................................................81
Introduction ...........................................................................................................82
Basic Characteristics of Codecs ........................................................................... 85
Coding impairments ..............................................................................................88
Speech Codecs for Voice Services ....................................................................... 98
Audio Codecs ......................................................................................................105
Video Codecs ......................................................................................................107
What you should have learned ............................................................................116
References ..........................................................................................................117

Section II: Legacy Networks .........................................121


Chapter 6. TDM Circuit-Switched Networking ..............................123
Concepts covered ...............................................................................................124
TDM principles ....................................................................................................124
The importance of clock rate and synchronization in TDM ................................. 127
Principles of digital switching, voice switches .....................................................128
What you should have learned ............................................................................138

Chapter 7. SONET/SDH ..................................................................139


Concepts covered ...............................................................................................139
Introduction .........................................................................................................140
Overview .............................................................................................................140
SONET a practical introduction ...........................................................................141
The copper DS0-DS1-DS3 (traditional services) ................................................142
ATM and other traffic services .............................................................................142
Optical Ethernet applications ..............................................................................142
SONET terminology ............................................................................................143
The network element ..........................................................................................147
Network configurations .......................................................................................148
Synchronization ..................................................................................................153
What you should have learned ............................................................................158

Section III: Protocols for Real-Time Applications .......159


Chapter 8. Real-Time Protocols: RTP, RTCP, RTSP .....................161
Concepts covered ...............................................................................................161
Introduction .........................................................................................................162
Real-Time Transport Protocol (RTP) ...................................................................162
Real-Time Control Protocol .................................................................................167

Essentials of Real-Time Networking Copyright © 2004 Nortel Networks


Contents iii

RTP and TCP ......................................................................................................169


Real-Time Streaming Protocol (RTSP) ...............................................................170
RTSP and HTTP .................................................................................................175
What you should have learned ............................................................................177
References ..........................................................................................................178

Chapter 9. Call Setup Protocols: SIP, H.323, H.248 .....................179


Concepts covered ...............................................................................................179
Introduction .........................................................................................................179
H.323 .................................................................................................................. 182
SIP ......................................................................................................................198
Comparison of SIP and H.323 ............................................................................216
Gateway Control protocols .................................................................................. 219
What you should have learned............................................................................ 225
References.......................................................................................................... 227

Chapter 10. QoS Mechanisms ....................................................... 229


Concepts Covered .............................................................................................. 229
Introduction ......................................................................................................... 230
QoS and Network Convergence .........................................................................231
Overview of QoS Mechanisms............................................................................ 232
DiffServ QoS Architecture ...................................................................................237
DSCP Configuration Considerations ...................................................................239
Ethernet IEEE 802.1Q ........................................................................................240
Host DSCP or 802.1p Marking............................................................................ 241
Packet Fragmentation and Interleaving ...............................................................241
Other methods to achieve QoS........................................................................... 243
What you should have learned............................................................................ 246
References ..........................................................................................................247

Section IV: Packet Network Technologies ...................249


Chapter 11. ATM and Frame Relay ................................................251
Concepts Covered ..............................................................................................251
Introduction .........................................................................................................252
Layered protocol ................................................................................................. 253
ATM interfaces .................................................................................................... 255
ATM architecture .................................................................................................256
AAL (ATM adaptation layer) ................................................................................261
QoS and services in ATM networks .................................................................... 264
Voice and telephony over ATM ............................................................................269
Frame relay and FRF.11/12 .................................................................................271
Seven engineering best practices .......................................................................275
What you should have learned ............................................................................280
References .........................................................................................................282

Chapter 12. MPLS Networks ..........................................................285


Concepts covered ...............................................................................................285
Introduction .........................................................................................................286

Copyright © 2004 Nortel Networks Essentials of Real-Time Networking


iv Contents

Traffic trunks and flows ........................................................................................287


Motivations to move to MPLS .............................................................................287
The label ............................................................................................................287
Protocol components of MPLS ...........................................................................288
How to build the LSR’s MPLS forwarding table ...................................................290
Label switched paths setup..................................................................................290
LSP setup using explicit routes ...........................................................................291
LSP setup, example using RSVP-TE signaling................................................... 291
Integration MPLS and DiffServ ...........................................................................292
Label merging .....................................................................................................293
Label stacking .....................................................................................................293
What you should have learned ............................................................................295
References ..........................................................................................................296

Chapter 13. Optical Ethernet .........................................................297


Concepts covered ............................................................................................... 297
Introduction .........................................................................................................297
What is optical Ethernet? ....................................................................................298
How does an optical Ethernet network operate? ................................................299
How fast is optical Ethernet?............................................................................... 299
Ethernet over fiber ..............................................................................................299
Resilient packet ring.............................................................................................300
Ethernet over DWDM ..........................................................................................303
Optical Ethernet services ....................................................................................304
Internet access services .....................................................................................305
LAN extension .....................................................................................................305
What you should have learned.............................................................................306

Chapter 14. Network Access: Wireless, DSL, Cable ...................307


Concepts covered ...............................................................................................307
Introduction .........................................................................................................308
Physical challenges to bandwidth and distance ..................................................309
Wireless systems for broadband delivery ...........................................................310
Networking solutions supporting nomadic and mobile users ..............................317
xDSL technology ................................................................................................. 319
Cable access technology ....................................................................................327
What you should have learned ............................................................................340
References ..........................................................................................................340

Chapter 15. The Future Internet Protocol: IPv6 ...........................341


Concepts covered ...............................................................................................341
Renewing the Internet .........................................................................................342
Key IPv6 Items affecting Real-Time Networking ................................................. 343
Basics of the IPv6 network layer .........................................................................344
Layers above the Network Layer .........................................................................356
Control, Operations and Management ................................................................ 356
IPv6 Transition Strategies ...................................................................................360
What you should have learned ...........................................................................366
References.......................................................................................................... 367

Essentials of Real-Time Networking Copyright © 2004 Nortel Networks


Contents v

Section V: Network Design and Implementation ........371


Chapter 16. Network Address Translation ....................................373
Concepts covered ...............................................................................................373
Introduction ......................................................................................................... 374
The swings and the roundabouts ........................................................................375
Why do we need NATs? ......................................................................................377
NATs as a middlebox .......................................................................................... 378
NAT terminology ..................................................................................................378
The basics of Network Address Translation ........................................................379
Network Address Translator taxonomy ............................................................... 381
Interactions with applications ..............................................................................385
Packet translations needed to support NAT ........................................................385
Using the varieties of NAT ...................................................................................387
Issues with NAT ..................................................................................................389
What you should have learned ............................................................................392
References ..........................................................................................................393

Chapter 17. Network Reconvergence ...........................................395


Concepts covered ...............................................................................................395
Introduction .........................................................................................................396
Achieving resiliency .............................................................................................396
Redundancy to provide reliability ........................................................................396
Path redundancy and recovery ...........................................................................397
Protection schemes ............................................................................................ 398
Protocols for the network edge ...........................................................................399
Protocols for the core .......................................................................................... 407
What you should have learned ............................................................................417

Chapter 18. MPLS Recovery Mechanisms ....................................419


Concepts covered ...............................................................................................419
Introduction ......................................................................................................... 419
MPLS protection schemes ..................................................................................420
Components of an MPLS recovery solution ........................................................422
Monitoring, detection and notification mechanisms ............................................ 424
MPLS scope of recovery — global and local ......................................................428
MPLS recovery versus IP (IGP) recovery ...........................................................430
What you should have learned ............................................................................432
References ..........................................................................................................433
Informative references ........................................................................................433

Chapter 19. Implementing QoS: Achieving Consistent Application


Performance ....................................................................................435
Concepts covered ...............................................................................................436
Introduction .........................................................................................................436
Mapping DiffServ to Link Layer (Layer 2) QoS ....................................................436
Application Performance Requirements ..............................................................446
Categorizing Applications ...................................................................................447

Copyright © 2004 Nortel Networks Essentials of Real-Time Networking


vi Contents

Making QoS Simple via Networks Service Classes ............................................450


Additional QoS Implementation Considerations .................................................458
What you should have learned ............................................................................463
References ..........................................................................................................464

Chapter 20. Achieving QoE: Engineering Network


Performance ....................................................................................465
Concepts covered ...............................................................................................465
Introduction .........................................................................................................466
QoE Engineering Methodology ...........................................................................466
HRXs and QoS Mechanism Requirements......................................................... 472
Traffic Engineering & Resources Allocation ........................................................479
Summary–Network Engineering Guidelines ....................................................... 498
What you should have learned ............................................................................500

Section VI: Examples ....................................................501


Chapter 21. VoIP and QoS in a Global Enterprise .......................503
Voice over IP: Raising the need for Quality of Service ........................................503
The Quality of Service (QoS) design ..................................................................509
Lessons learned ..................................................................................................516

Chapter 22. Real-Time Carrier Examples .....................................517


Centrex IP ........................................................................................................... 517
Local ...................................................................................................................521
Long distance ......................................................................................................524
Multimedia........................................................................................................... 528
Cable ...................................................................................................................531
Broadband ..........................................................................................................536
Conclusion ..........................................................................................................540
References ..........................................................................................................542

Chapter 23. Private Network Examples ........................................543


The data solution ................................................................................................543
Introduction .........................................................................................................543
Getting Started ....................................................................................................543
Designing the Real-time Networking Solution Infrastructure ..............................545
Content Delivery Networking .............................................................................. 558
Solutions Management ....................................................................................... 560
Closets and Aggregation Points ..........................................................................562
The voice solution ...............................................................................................563
Voice and Multimedia Communication Services ................................................. 564
VoIP architecture and requirements ....................................................................565
Call Server .......................................................................................................... 568
Gateways ............................................................................................................579
Clients .................................................................................................................581
Applications .........................................................................................................583
Conclusion .......................................................................................................... 590

Essentials of Real-Time Networking Copyright © 2004 Nortel Networks


Contents vii

Chapter 24. IP Television Example ................................................593


Introduction ......................................................................................................... 594
The IP television system .....................................................................................597
Core switch/router feature enhancements for IP multicast ..................................599
Video on demand service extensions ................................................................. 604
Summary .............................................................................................................614

Appendix A. Additional Details about TDM Networking ....................615


SONET/SDH hierarchy .......................................................................................615
Stratum level clocks ............................................................................................ 616

Appendix B. RTP Protocol Structure ................................................... 619

Appendix C. Additional Information on Voice Performance


Engineering ............................................................................................ 623
Network jitter ....................................................................................................... 624
Access/source jitter .............................................................................................625
Jitter buffer dimensioning ....................................................................................626

Appendix D. Additional Information about IPv6 ..................................637


Concepts covered ...............................................................................................637
Introduction .........................................................................................................637
Directing Packets in an IPv6 Network ................................................................. 637
Control, Operations and Management ................................................................ 647
Application Programming Interface ..................................................................... 657
IPv6 Transition Mechanisms ............................................................................... 660
References ..........................................................................................................672

Appendix E. Virtual Private Networks: Extending the Corporate


Network ...................................................................................................675
Introduction .........................................................................................................675
The development of VPNs ..................................................................................676
Catering for the ‘Road Warrior’ ...........................................................................678
More Flexible VPNs ............................................................................................680
Layer 2 VPNs ......................................................................................................682
Layer 3 VPNs ...................................................................................................... 684
VPN Scaling, Security and Performance ............................................................690
Summary .............................................................................................................693
References ..........................................................................................................694

Appendix F. IP Multicast ........................................................................695


IP television head-end system ............................................................................695
Core feature enhancements for IP multicast .......................................................698
Bandwidth factors at the network edge ...............................................................701
Conditional access & content protection methods ..............................................703
Web streaming methods and practices ...............................................................707

Appendix G. QoE Engineering ..............................................................711


Nodal QoS Mechanisms .....................................................................................714

Copyright © 2004 Nortel Networks Essentials of Real-Time Networking


viii Contents

Appendix H. PPP Header Overview...................................................... 721

Glossary .................................................................................................723

Essentials of Real-Time Networking Copyright © 2004 Nortel Networks


ix

Author Biographies
Dave Anderson is a Senior Manager of Nortel Networks Wireless
Engineering and is responsible for the engineering aspects of Nortel’s
responses to global Wireless proposals and network designs. Dave has
been with Nortel since earning his B.S.E.E. in 1986 and has held a number
of Engineering positions within the company, including customer
supporting engineering roles for DMS switching, and more recently,
Wireless Network Engineering including aspects of Radio as well as core
network. He is familiar with the evolving global standards for wireless
systems including CDMA, EV-DO, GSM, UMTS and Wireless LAN.

Cedric Aoun works as a Senior Network Architect in Nortel Networks


Carrier VoIP business. For the past four years, he has been working on
corporate strategy for solving NAT and Firewall Traversal application
issues, as well as the introduction of IPv6 in VoIP networks. He holds a
B.S. in Engineering and an M.S. in Mobile Communications. Cedric is a
regular attendee and contributor to the IETF in the areas of NAT and
Firewall Traversal solutions, with primary focus on the NSIS Working
Group.

François Audet is an IP Telephony Subject Matter Expert at Nortel, where


is a System Architect in the Enterprise Business Networks division.
François has expertise is in telephony protocols, Voice over ATM, and
Voice over IP. He is an active participant in many Standards Forums, such
as ITU-T (for the H.323 suite of Recommendations), and IETF (for SIP
and SIP-related standards).

François Blouin is an Engineer and Subject Matter Expert in modeling


and simulation with the Solutions Ready Group in Nortel’s Ottawa Lab.
François leads a team developing network models for the prediction of
performance for real-time services running over packet networks, using
tools based on the ITU-T E-Model and OPNET. He consults with Nortel
account teams to specify target performance for customer networks. In
2001, François was presented with a gold pride award for his work on
voice bearer channel design guidelines for Succession Networks.

Sandra Brown is the Manager of the Technology and New Product


Introduction team in Nortel Networks Information Services, responsible
for the evaluation and trialing of new product and technology onto the
company's global network. Sandra has a B.S. and an M.B.A. and has
extensive experience in managing corporate information services.
Chris Busch is a Senior Network Architect for Nortel. He has more than
eight years of network systems experience within Telecommunications and

Copyright © 2004 Nortel Networks Essentials of Real-Time Networking


x Author Biographies

Healthcare industries. While at Nortel, he has held leadership roles in


Network Design, Product Integration, and Systems Engineering. Mr.
Busch is recognized for end-to-end solutions approach to Wide Area, Core,
Enterprise and Access Networks.

Peter Chapman has held positions as Development Manager, Product


Line Manager and Business Unit Manager over more than thirty years in
the telecommunications, aerospace, television technology and
semiconductor industries. He currently holds a position in the Chief
Technology Officer's organization of Nortel, with responsibility for end-to-
end performance of networks, applications and services. He is a Chartered
Electrical Engineer and holds a B.S.E.E. from Imperial College, London.

Hung-Ming Fred Chen is a Network Performance Consultant with Nortel.


At Nortel, he conducts analytical modeling and simulations for various
network architectures and products, including wireless and wireline
networks. His work mainly investigates QoS for triple-play service. He
completed his B.S, M.S., and Ph.D. with National Sun Yat-Sen University,
National Taiwan University, and University of Durham, UK, respectively.

Robert L. Cirillo, Jr. is a Network Architect in the Wireline Global


Operations division of Nortel. A native of Boston, he has been involved
with Enterprise Networking and the Telecommunications Industry for
fifteen years. Since joining Nortel in 1997, he has had extensive
involvement with the evolution of circuit-switched voice systems into high
performance, “Next Generation” packet-based networks.

Rob Dalgleish is responsible for 3G Access Strategy with Nortel Networks


Wireless Network Engineering based in Richardson, USA. Rob has been
with Nortel Wireless since 1995 in Engineering management and advisory
roles, working with GSM and CDMA customers and Nortel core R&D and
project teams in Asia and North America. Robert holds a B.S. in
Engineering with Honors from the University of Tasmania and is a
Chartered Professional Engineer.

Elwyn Davies is currently leading the CTO Office team setting the
strategy for the introduction of IPv6 into Nortel products. His background
includes an M.A. in Mathematics and research into aspects of Noise
Reduction in electronic and other systems. He is a regular attendee and
contributor to the IETF in a number of areas, including network layer
signaling and to the IRTF in routing research.

Essentials of Real-Time Networking Copyright © 2004 Nortel Networks


Author Biographies xi

Stephen Dudley has 23 years experience as an engineer in the


telecommunications business, working with fiber optic and digital
switching systems. He holds a B.S.E.E. and an M.S.E.E. with a certificate
in Computer Integrated Manufacturing. He is currently a Network
Engineer for converged voice and data networks.

Stéphane Duval is a Product Line Manager for OME Data. He has twelve
years of experience with data infrastructure solutions design for private and
public sector organizations and extensive customer interaction, which
helped him develop his skills to deliver reliable and secure data
infrastructures.

Gwyneth Edwards manages the Information Services (IS) product


engagement process, which supports the method in which the IS team
provides Nortel product feedback to the business units. Gwyneth has
twelve years IT experience at Nortel, specifically within the areas of
strategy and communication. She has a B.S.M.E. and an M.B.A.

Shane Fernandes is a Network Engineer in the Information Services


global WAN engineering group at Nortel, where he is responsible for all
aspects of the network including the global corporate backbone and
Internet. He has architected networks at various companies for the last
fourteen years. Shane has a B.S. from McMaster University.

Shardul Joshi is a Network Engineer at Nortel with seven years


experience in the Telecommunications Industry. His primary areas of
expertise are Data Communications and Voice over IP, for which he has
been slated an SME (Subject Matter Expert). Shardul has passed several
expert level certifications in the Data Communications areas and is highly
regarded among the Network Engineering community and external
customer base. His primary duties involve pre-sales consulting and
external customer trials to ensure proper cost-effective solutions are
created. In addition to those activities, he works with new product teams to
evaluate and ensure that these products provide positive customer
experience. Shardul graduated from Angelo State University with a B.S. in
Biology and minor in Chemistry in 1996.

Sinchai Kamolphiwong received a Ph.D. degree from the University of


NSW, Australia, where his thesis concerned flow control in ATM
networks. He is now an Associate Professor in the Department of
Computer Engineering, Prince of Songkla University, Thailand. He is a
director of Centre for Network Research (CNR). He has published fifty
technical papers. His main research interest areas are: ATM networks,
IPv6, VoIP, and performance evaluation. He is a committee member of
ComSoc (Thailand section) and a member of IEEE Computer Society.

Copyright © 2004 Nortel Networks Essentials of Real-Time Networking


xii Author Biographies

Peter Kealy has over twenty years experience in the telecommunications


industry. He is an RF and Telecom graduate from the Dublin Institute of
Technology, Dublin, Ireland. Peter spent seven years working in London,
UK, with BT working on varied telephony-based systems as a Field
Engineer. Peter joined Nortel in 1996 as a Technical Support Engineer for
Optical products and for the past four years has been an Optical Network
Engineer specializing in customized engineering procedures including
RPR & Optical Ethernet networks.

Joseph King is Director of Network Engineering for North America.


Joseph leads the network engineering teams responsible for planning,
designing and engineering next generation VoIP networks. He has over 22
years experience in the Telecommunications field—the last 17 ½ with
Nortel, where he has held various roles in the operations and engineering
organizations, supporting both Enterprise and Carrier customers.

Ed Koehler is a Solutions Architect for Nortel Portfolio Engineering


Group. He provides advanced consultation and design support for multicast
and multimedia-based solutions to the field engineering staff as well as
product development direction to the various product groups within Nortel.
Ed began his career at Eastman Kodak and was involved in one of the first
pilot projects for what was to become the IEEE 802.3 10BaseT
specification during the 1986-88 timeframe.

Ali Labed is a Subject Matter Expert in Performance Modeling and Traffic


Engineering. He currently works in the Solutions Ready Group in Nortel’s
Ottawa Lab. Ali consults with product line management teams on traffic
engineering, MPLS, and LSP resiliency strategies. Ali holds a B.S.C.S, a
M.S. in applied mathematics, and a Ph.D. in Computer Science.

Anthony Lugo is an Optical Network Engineer and a Subject Matter


Expert in the optical arena. He currently develops and implements complex
network reconfigurations for Nortel customers. Anthony served in the
United States Navy Submarine Division and received an A.A.S. in
Electrical Engineering Technology and a B.S. in Telecommunication
Engineering.

Timothy A. Mendonca is Team Leader of Succession Network Design


and has thirty years of data, voice, multimedia and security infrastructure
solutions design based on disparate technologies for private and public
sector organizations. Extensive customer interaction has positioned him
with a clear view and vision of the convergence issues. He is currently
working on his dissertation for a Ph.D. in Information Systems. His
dissertation topic is “Knowledge Acquisition Using Multiple Domain
Experts in the Design and Development of an Expert System for
Implementing VoIP and Multimedia in a Secure Environment.”

Essentials of Real-Time Networking Copyright © 2004 Nortel Networks


Author Biographies xiii

Robert D. Miller has 23 years of experience in designing and engineering


networks at Nortel. Rob was the Lead Architect of both the internal voice
ESN network and the private ATM network, and has the distinction of
making the first voice call over a private ATM network. More recently, he
was the Team Leader for the internal VoIP and QoS architectures.

Ralph Santitoro, Director of Network Architecture at Nortel, provides


strategic direction and best practice design guidelines for multiservice
network convergence. He defined the Network Service Classes (NSCs) and
DiffServ mapping discussed in this book, which are being standardized in
the IETF. Ralph also founded and chairs Nortel's QoS Core Team which
defines the QoS technology requirements and strategy for Nortel’s
Enterprise and Carrier product portfolios.

Leigh Thorpe is a Senior Advisor with the Solutions Ready Group in


Nortel under the CTO. She is responsible for evaluating QoE and
developing specifications based on QoE results. Her background includes a
B.S. in Physics and a Ph.D. in Experimental Psychology (Perceptual
Development). Leigh has directed many selection and characterization
tests for codec standards with ITU-T, TIA, and other standards groups. In
1997, she was awarded Nortel Networks Wireless President’s Award for
Quality.

Copyright © 2004 Nortel Networks Essentials of Real-Time Networking


xiv Acknowledgments

Acknowledgments
The writing, editing, and assembly of a large textbook is a formidable task.
We are fortunate that the corporate culture at Nortel encourages
collaboration and teamwork. Many people contributed to the successful
completion of this project, whether by direct contributions to the text, by
supporting the steering committee or individual authors, by removing
obstacles, or by championing this project to the senior executives. Thank
you to everyone who helped us move forward.

We would like to acknowledge some specific contributions. David


Downing, Leader of Wireline Deployment (Americas) for Nortel Networks
Global Operations group, was the main sponsor of this work. Account
Architect John Gibson was instrumental in initiating the project. Lorelea
Moore coordinated the Steering Committee. Michelle Bigham prepared the
cover graphics and assisted with internal communications. Ann Marie
Bishop tracked actions, published minutes, and tried to keep us on schedule
(sorry, Ann Marie!). Mark Bernstein provided valuable input on how
effectively we conveyed the main messages and on the integration of ideas
across chapters. Rod Wallace, Director under the CTO, and Bill
McFarlane, Director, Nortel Networks Global Certification Program,
promoted the initiative to Senior Executives across the company.

A big thank you to our many reviewers. Whether they read one chapter or
many, whether they focused on technical accuracy or clarity and
readability, their feedback and suggestions have greatly improved the
quality of the published version.

Finally, we are extremely grateful to the authors, many of whom worked


considerable overtime to complete their chapters. Thank you all for your
perseverance through the successive drafts, revisions, and detail checking.

Shardul Joshi, Leigh Thorpe, Steve Dudley, and Tim Mendonca, Editors

Essentials of Real-Time Networking Copyright © 2004 Nortel Networks


Acknowledgments xv

Steering Committee:
Lorelea Moore – Certification
Leigh Thorpe – Editor, CTO's Office
Shardul Joshi – Editor, Wireline Engineering
Stephen Dudley – Editor, Wireline Engineering
Tim Mendonca – Editor, Enterprise Engineering
Carelyn Monroe – Wireline Engineering
Joe King – Wireline Engineering
Michelle Bigham – Marketing communications
Ann-Marie Bishop – Project Manager

Contributors:
The following made significant contributions to the contents of this book:

Mark Armstrong
Benedict Bauer
Roger Britt
Peter Bui
James Chanco
Paul Coverdale
Steve Elliott
Matt Michels
Mustapha Moussa
Tom Taylor
Andrew Timms

Copyright © 2004 Nortel Networks Essentials of Real-Time Networking


xvi Acknowledgments

Essentials of Real-Time Networking Copyright © 2004 Nortel Networks


1

Chapter 1
Introduction
Joseph King

You have an emergency. You know you can dial three digits from any
phone, anywhere, any time and within milliseconds you have help. Most
people don't understand how it happens, and frankly they don't care. They
just count on it to work. The network design engineer both knows and cares
how it works. Would you have the same confidence dialing that number if it
was being routed across an IP core today? Not if that IP network is the
Internet or one of about 85% of the IP networks out there today.
Consumer and engineers alike have heard the technology hype:
convergence, VoIP, triple play, interactive applications. Voice, data, and
video networks are finally becoming one. Why? Because consumers
demand it, want it and need it. It is becoming a way of life. Technology is a
driving force for the way people communicate. The paradigm has shifted.
Consumers are driving this change in technology to support the way they
want to work and live.
Convergence is occurring between real-time and non–real-time data
networks. Voice over IP is being deployed on networks that were originally
designed with a router architecture and best-effort delivery philosophy.
Many of these networks are not capable of meeting the performance quality
requirements of real-time services such as voice. Voice services are critical.
As convergence proceeds and networks begin to carry voice and other real-
time services, these networks must adapt to the mission critical nature of
those services. The people who design and operate these networks must
meet a new set of constraints. Best-effort cannot guarantee the performance
of mission critical real-time applications. Throwing more bandwidth at the
problem is not sufficient. What is needed is proper network planning and
design, which in turn requires a thorough understanding of the operation
and constraints of real-time networking and how that interacts with the
operation and constraints of IP networking.
Because convergence of real-time applications with data networking is new
ground for so many people, a group of Nortel subject matter experts have
created a real-time networking manual to serve as a shared foundation for
engineers and other professional from various areas of the industry. As part
of this effort, Nortel has also developed a certification that is focused
purely on real-time networking: Nortel Certified Technology Specialist
(NCTS)—Real-Time Networks. It is a baseline certification in real-time
networking, intended to be as applicable to the managers of engineers as it

Copyright © 2004 Nortel Networks Essentials of Real-Time Networking


2 Chapter 1 Introduction

is to the engineers themselves. It provides a foundation for future


certifications that will cover specific topics in more detail.
This book covers the basics of real-time networking. It addresses the
technology, performance requirements, and basic best practices for
engineering and implementation. It is not intended to provide detailed
treatment of all possible solutions, but rather to bring the reader to a level
of competence where he or she will be able to approach specific solutions,
to understand the technical documentation, to ask questions, and to
understand and use the answers. As stated, it does not touch all possible
aspects. For example, security is essential in a world of converged
networks. Security is touched on in parts of the book, but not in great detail
as this book is focused on the basics of real-time networking. The topic of
security in converged networks can easily fill a volume of its own.
This book is technology-and standards-focused rather than product-
focused. Given that, nowhere in this book are we recommending any single
architecture. Each chapter was written by one or more Subject Matter
Experts, and contains the fundamentals you need to sit for the associated
certification exam. More importantly, it will supply you with the necessary
background and guidelines for understanding the networks of tomorrow
and making informed decisions about network components, network setup,
and network operations.
The NCTS – Real-Time Networks certification encompasses real-time
traffic engineering issues such as guaranteed Service Quality, high
availability networking, and the fundamental science of voice transmission.
Who better than Nortel to tell you how to deliver high quality, high
availability services? Nortel is the world leader in voice, with over 100
years experience in telecommunications. Nortel’s optical and multiservice
switching carry telecom services on an international scale where reliability
must meet standards of 99.999%. But this book was not written to discuss
Nortel. Instead, we've chosen to focus on standards-based, open
networking. At last, a certification that can be applied to any vendor that
supports open standards! Convergence is not only about running real-time
and non–real-time services on a single network, but also about assembling
many different vendors' products into a single, cohesive standards-based
network.
Networking is often taught as separate threads. For instance, the OSI model
separates the network into layers representing different functions and
protocols. Breaking the material down this way helps structure learning.
Rather than expose the student to the full complexity of the topic, it is more
efficient to expose them to the individual parts. What often happens,
however, is that the parts are never reintegrated. Students become designers
and engineers, keeping the narrow focus and become experts in individual
protocols, boxes, or topic areas. Relatively few expand their focus to see
the network as one seamless system.

Essentials of Real-Time Networking Copyright © 2004 Nortel Networks


Chapter 1 Introduction 3

In converged networks, technologies and protocols all interact and should


not be treated in isolation as separate threads. To create a single converged
network, the threads must be brought together into a single fabric. While
the understanding of the individual threads is important, it is the knowledge
of how to weave the threads together to create the seamless fabric of
converged communications that is key. This book and the accompanying
certification focus on the fabric rather than on the individual threads. To
this end, the chapter authors are all networking experts with many years of
real world experience designing, analyzing, and integrating multiservice
real-time networks. The book focuses sharply on real world issues,
emphasizing the knowledge, strategies and techniques that are needed to
design, deploy, and operate real-time networks in today's world and how to
migrate from your existing system to the network of the future.

Section I: Real Time Applications and Services


The most critical slice of the convergence pie is “real time.” All networks
can pass best-effort data but not all networks can support real-time services
such as voice. Convergence demands real-time capability, but first real time
needs to be defined
For our purposes, real time depends on more than just throughput rate. Nor
is real time simply another way of saying time-sensitive. For example,
broadcast or streaming media are not considered real time even where the
signal is “live.” Viewers often watch a “live” event several seconds or even
minutes after it occurs, without any major impact on their experience.
On the other hand, when streaming media are combined with an interactive
control, such as a channel changer, it exhibits the responsiveness
requirements characteristic of real time. When the user enters a command
or sends interactive data, the adaptation of the user's perception and
thought processes to the real world put strict constraints on the network
response times. If response time is longer, the user experience is degraded.
The important factor, then, is not whether streaming media has delay, but
rather how delay affects the user experience.
Real-time networking is bidirectional and interactive: the user experience
depends on a response from another user, or from a device at the other end
of the network. It is the ability to engage on simultaneous tasks or
applications at once and control the quality of experience across the entire
network. Real-time networking is so rapid it creates the illusion that there
is no network in between, a kind of “virtual reality.”
Beyond that, the delay starts to become noticeable and unacceptable.
Achieving real-time transmission speeds is quite a challenge when it is
considered that light traveling through fiber takes an eighth of a second to
travel half way around the world. The problem is that sampling,
compressing, framing, and synchronizing are time consuming. The early
chapters address these issues.

Copyright © 2004 Nortel Networks Essentials of Real-Time Networking


4 Chapter 1 Introduction

The first step will be to introduce the concepts of convergence and real
time. Network, service, and application convergence are discussed.
Examples of real-time services are presented, and the constraints around
operating real-time services in a packet environment are discussed. The
concept of Quality of Experience is introduced as the fundamental
performance requirement for all services and applications.
To design a real-time network and assure your customers excellent Quality
of Experience, you need to understand the concepts of real-time
applications. As discussed earlier, real-time challenges network
performance. What are the performance requirements for the major real-
time applications and services, and what are the protocols and mechanisms
we use to control network behavior?
Most all will agree that if the video freezes for seconds while watching the
news but the audio continues, the interruption is a mere annoyance.
However, if the reverse occurs, and the audio is lost for seconds while the
video continues, your comprehension of the news will be severely
degraded. For other content, such as sporting events, loss of audio may be
tolerable, while video loss is not. That said, interactive voice is one of the
most demanding communications services. Consequently, a significant
portion of Section I is focused on the quality of voice services and voice
codecs.
For conversational voice services on a converged network, there are many
contributing factors to the final quality. You, as a convergence engineer,
need to understand the contributions of various parameters such as delay,
packet loss, and echo. Network planning for voice is essential, and tools
like the E-Model and its associated quality metric R are invaluable in
designing and provisioning a network. Other metrics such as MOS are also
used to quantify voice performance.
As with voice, there are aspects of video signals that need to be understood.
Understanding the concepts of video is critical to a convergence engineer.
Impairments such as noise (luminance and chrominance), loss of
synchronization signals, co-channel interference, and RF interference are
all critical factors for video.
Real-time applications are often concerned with the transmission of signals
originating in analog mode. Sound signals because of their wave nature
necessarily begin as analog signals. An NTSC (ordinary TV) video signal
captures the information needed to reconstruct the visual display as an
analog stream. To be transported across a digital network, analog signals
must be converted to digital information by means of a codec. We discuss
the basic characteristics of codecs used for telephony (speech), those for
general audio signals, and codecs used for video signals. Also covered are
the parameters that underlie the performance of the codec from both a
human and technical perspective, common coding standards defined for

Essentials of Real-Time Networking Copyright © 2004 Nortel Networks


Chapter 1 Introduction 5

each of these areas, and the boundary conditions for the effective use of
compression codecs in Real-time communications systems.

Section II: Legacy Networks


Before jumping into packet transmission, protocols, and the technologies
involved, let's take a look at Time Division Multiplexing (TDM).
Understanding TDM is very helpful to appreciating real time. How does
TDM technology support real-time networking? What makes it so reliable?
TDM is a declining technology, but much has been accomplished with it
and learned from it. Most importantly, TDM will be the benchmark against
which users compare packet voice services.
Fiber optics was the first example of convergence. Today, almost all
networks, applications and protocols converge on a fiber optic cable
running SONET. Working with the real-time converged network of
tomorrow will require an understanding of SONET and the advancements
that have taken place in optical networking.

Section III: Protocols for Real-time Applications


Many real-time applications are associated with a stream of data such as
voice or video. For real-time applications, there is no time to resend lost or
corrupted data. The Real-Time Protocol was created to control the data
flow for those applications. RTP does not guarantee that the packets are
delivered in a timely manner, but it provides information to the application
regarding whether all packets have been delivered and whether packets are
in sequence so the application can make the correct decisions about playing
out the content of the packets.
IP allows packets to make their way from one side of the global network to
the other. IP has true universal global access and can reach nearly any
business, home, or hotel on the planet. This global reach is a key factor in
IP becoming the new focal point of communications convergence. IP can
bring about end-to-end connections to more places than any other packet
technology.
Now that you have an IP infrastructure, you want to have some real-time
applications to run on it. To make this happen, you're going to need some
control mechanisms for locating other endpoints, call setup, capability
negotiation, and so forth. This is where the call control protocols (SIP* and
H.323) and the gateway control protocols (H.248/Megaco) come in. These
protocols take care of all the housekeeping tasks needed for setting up
communication paths optimized for real time.
SIP, H.323 and H.248 are critical to a successful user experience. They
determine how a session is established between users. These protocols
automate provisioning of communications parameters from mapping of the
destination IP address against the telephone number, to authenticating the

Copyright © 2004 Nortel Networks Essentials of Real-Time Networking


6 Chapter 1 Introduction

caller as a real network subscriber, to choice of codec, to determination of


IP addresses and ports for running the media, to tracking of usage for every
call!
As well as session control, you're going to need more control over the
network traffic behavior. Most IP networks treat all traffic the same and are
referred to as “best-effort” networks. Best effort means that the network
delivers the traffic on a first-come, first-served basis, without regard for the
urgency of the content. The speed and completeness of delivery depend on
the amount of traffic sharing the nodes and links. Best-effort networks are
engineered for connectivity, not for performance: all sessions are admitted,
without regard for the effect on overall packet movement through the
network. Best-effort makes no bandwidth or performance assurances, so
packets may be discarded or delayed under network congestion. Because of
this, traffic may experience different amounts of packet delay, loss, or jitter
at any given time.
As Service Providers and Enterprises alike consider converging network
operations to leverage their existing infrastructures, real-time services will
join non–real-time applications riding on IP networks. For real-time
services and applications, performance is paramount.
The shift from connectivity to performance puts different demands on the
network. Can your network handle this? Well, if it has some way of
policing the packet flows, it might. This type of policing is typically done
with Quality of Service (QoS) mechanisms. QoS mechanisms and
protocols are essential to converting the connectivity-based network to the
performance-based network. QoS provides ways to streamline packet
movement, prevent congestion, and prioritize performance needs for
different types of traffic. Spending some time discussing these mechanisms
and how they work with various technologies is high on our list of
priorities.

Section IV: Packet Network Technologies


As an IP certified engineer, your area of expertise may not include ATM
and SONET. You design your LAN, are handed a cable from your Service
Provider, and off you go. But what is happening in the carrier network? Is it
important how the Service Provider is treating your packets? Your packet
goes from your office in New York and arrives in London after a bit of
delay, but it makes it. Well, it turns out that a lot of things can happen in the
network core that can be detrimental to your real-time communications.
What you don't know can bite you. As an Enterprise LAN engineer, do you
know what questions you should ask your Service Provider? To do more
than best effort, you'll want to know the questions and the answers. To get
the most out of talking to Service Providers, you'll need some basic
understanding of the technologies they use.

Essentials of Real-Time Networking Copyright © 2004 Nortel Networks


Chapter 1 Introduction 7

From the other side of the street, Service Providers need to understand
what their Enterprise customers are working with if they are going to serve
them well. Frame relay continues to be used extensively in Enterprise
networks. Packets crossing the Enterprise boundary encounter NAT. How
are these things going to affect your SLA and the final user quality? Both
Service Provider and Enterprise networks today are quite complex.
Designing a real-time network to work across the combined domain is
doubly complex.
IP was designed to be a simple protocol. A few entries in a routing table,
connect your cables, and you're up and running. It's a great concept, but the
complexities of convergence will not allow us to maintain that simplicity.
In Section IV, access, WAN, and core technologies are discussed. What are
the drivers to move to an MPLS network? What QoS mechanisms are
available in ATM and how are they invoked? What are the important things
to know about ATM, frame relay, MPLS, SONET, and Optical Ethernet
with respect to real-time networking? A convergence engineer needs to
understand these network technologies, to be able to comprehend the
concepts, and to understand their influence on real-time operation.
Convergence is happening in many places. It's already happened at Layer 1.
We are now seeing convergence at Layers 2 and 3, and VoIP is just one of
the driving factors. The characteristics of the LAN, the WAN, and of
course, the access network will come into play in the determination of the
final network performance.

Section V: Network Design and Implementation


You know it's going to happen. It's Friday afternoon, a three-day weekend
is coming up, and you want to leave early. But then the call comes in, “THE
NETWORK IS DOWN”. Why does it always happen on Friday?
Convergence will make these calls even more heart-stopping. It's not just
about the data network anymore. It's about a real-time network carrying
voice calls and billing applications. As important as your data may be,
these applications are mission-critical. Your CEO/CIO or your customer
will be calling to find out when these applications are going to be back up.
The network of yesterday was about reliability. Convergence has ratcheted
that up a notch. Now it's called survivability. Survivability means your
network can continue to provide service in the face of network faults. Fast
recovery from failures and strategies for rerouting traffic around a problem
are key aspects to survivability. Survivability techniques can be applied at
different levels: at the node and link levels, as well as at the network level.
They address software as well as hardware failures. In the past, reliability
was thought of in terms of hardware. It was provided by redundant
processors and dual power supplies. Most vendors offer redundancy at the

Copyright © 2004 Nortel Networks Essentials of Real-Time Networking


8 Chapter 1 Introduction

hardware level, but how can you carry this through to the logical layer?
You need to know, or suffer the consequences.
Chapters 17 and 18 cover survivability at the network level. Together, let's
explore the concepts of Network Reconvergence and MPLS Recovery. In
other words, how can you build in survivability at the logical layer. This is
a key factor to successful network convergence.
The previous sections have discussed the relationship of applications,
protocols, and technologies to real-time networking. Once you get to this
point, you will have been introduced to some basic real-time services, how
packet transport affects them, and some techniques for controlling and
enhancing the performance of the packet network to meet the demands of
these real-time services. You will understand the concepts of core network
technologies and how they need to work on your existing LAN.
In Chapters 19 and 20, the concepts are all brought together. Now is the
time for the Converged Network Engineer to shine. Managing the
complexity of the converged network takes planning. This section of the
book helps consolidate the concepts you've learned, and shows how
network planning can polish real time over IP to a brilliant shine. These
chapters consider network planning for real-time voice and data, and how
to translate Quality of Service settings from one network technology to
another. In these chapters, potential issues related to real-time networking
will be described along with mitigations and best-practice engineering
guidelines.

Section VI: Examples


Up to this point, the authors have maintained a vendor agnostic perspective.
The concepts discussed are applicable to any vendor's products. This is
consistent with Nortel’s commitment to open standards. In the next few
chapters, however, we want to offer a few examples of what Nortel
solutions look like in both the Service Provider and Enterprise
environments.
The first example illustrates a global-scale Enterprise network, and one that
we are particularly proud of. Nortel runs one of the largest real-time
Enterprise networks in the world, equivalent in breadth and scope to a Tier
2 Service Provider. In a typical month, more than 1,500 terabytes of routed
traffic runs across the network. We will explore the business drivers of
deploying QoS in our own network, and how it was implemented.
The next two examples highlight Nortel’s carrier-grade portfolio by
providing examples of real-time network solutions. Each example includes
descriptions of business and technical challenges faced by Enterprises and
Service Providers alike, and it presents the Nortel solution including
specific product and service descriptions and network architectures. Note
that these are not specific customer networks, but are examples of how a

Essentials of Real-Time Networking Copyright © 2004 Nortel Networks


Chapter 1 Introduction 9

Nortel solution can help both Enterprise and Carrier customers move to the
converged network of tomorrow.

Let’s Get Started


When we first got the idea for this book, we wanted to address a specific
issue. Voice and data services are converging onto a single network.
Unfortunately, the histories of the two domains, the networks themselves,
and the knowledge required for working in these environments are quite
different. We wanted to do something to bridge the gap and create a way
for the integrated knowledge to be disseminated cost effectively across
both the Enterprise and Carrier spaces.
While we hope for a broad readership, it was necessary to make some
assumptions about the background of the nominal reader. Familiarity with
basic IP data networking, the OSI reference model, and the corresponding
terminology will be needed to follow much of the explanation, discussion,
and examples.
One of the goals of the book was to present the material in enough depth to
provide a useful ongoing reference but still compact enough that the
breadth of issues that confront implementing a converged network could be
addressed. There is a lot of material in this book. More material, in fact
than any one user of the book might need. Depending on your objectives in
reading the book, following are a few suggestions for navigating the
material.
As mentioned earlier, this book is being produced in conjunction with the
NCTS – Real-Time Networks certification, there is more depth to the
material in the chapters than is needed to prepare for the certification.
Those who read this book as preparation for the certification exam should
be aware that the exam will address definitions of terminology concerning
Real-Time networking, and the associated technologies and concepts. It
will also assess your understanding of the issues that Real-Time
applications face in IP networks. No exam questions are drawn specifically
from Section VI (Examples) or the Appendices, although reading those
sections will help you consolidate what you have learned from the earlier
chapters.
For the reader looking to pass the certification, we advise focusing on how
the pieces fit together into the larger picture, rather than memorizing details
of individual technical solutions. The exam questions will concentrate
more on knowledge of issues and terminology than on solutions to those
issues. Aim for a general understanding of the technologies, paying special
attention to issues that are highlighted in the text. Identify and digest the
key terminology and acronyms. Depending on your professional
background, you may find that some terms are used differently than you
may be used to.

Copyright © 2004 Nortel Networks Essentials of Real-Time Networking


10 Chapter 1 Introduction

Many readers will come to this book with strong expertise in one or more
of the areas covered, but may have little or no familiarity with other areas.
While we assume that readers will have basic knowledge of data
networking and TCP/IP, we cover a range of topics associated with
convergence and real time. The reader can pick and choose sections and
chapters, and does not necessarily need to read the various parts in order.
If you're not planning to take the certification, it is our hope that you can
use this as a guide as you embark on the journey to convergence. It is our
hope that everyone will be able to get something out of this book,
regardless of their background or the environment they work in today.

Reading the Transport Path Diagrams


This book employs a set of diagrams to provide a frame of reference for the
material in the individual chapters. A diagram will be presented at the
beginning of each chapter indicating the network components and
functions addressed in that chapter. These transport path diagrams are
intended to help the reader appreciate the relationships within the transport
path for different network implementations. Because these diagrams are

Figure 1-1: Transport path diagram

Essentials of Real-Time Networking Copyright © 2004 Nortel Networks


Chapter 1 Introduction 11

intended to give a high-level overview rather than a comprehensive


summary, they are simplified in some ways for ease of presentation and
comprehension. There is no 1:1 mapping between protocol functions and
the seven-layer OSI model even though we do mention some of the same
elements. This diagram is not intended to convey OSI stack information.
We hope that these diagrams will help the reader locate the threads of the
individual technologies within the fabric of the overall network.

The arrows used on the diagram are not meant to indicate that a process is
strictly one-way, but to illustrate the perspective of application-level data or
decision-making looking into the network. In general, the arrows point
down to indicate that the real-time application looks towards the Wide Area
Network through these layers. In many cases, there are different ways that
a real-time application could reach the Wide Area Network. Some
technologies, such as Cable and xDSL, have special Layer 2 relationships
between the Local Area Network and the Wide Area Network. Not shown
on the diagram are the encapsulation mechanisms used by these types of
applications to bridge Local Area Network level traffic through Cable or
xDSL transport and back into the core IP/ATM networks.
The diagram highlights some of the different aspects of real-time issues
that are addressed in the book, including transport protocols, session
control protocols, Quality of Service (QoS) protocols, and reliability-
related protocols. The latter have been included because real-time
applications can significantly increase the requirements for network
reliability. The braces on the right side illustrate where in the transport path
the QoS and reliability features are applied.

Conclusion
It is no longer enough to be solely a data or voice engineer. The networks
of today carry essential applications. These applications are not just data
anymore. It is about a real-time interactive world. To support convergence,
the underlying network must support real-time applications and service
with delays less than 250 ms. Convergence demands “One Network” that
brings together all the threads into one fabric. It is no longer about pieces of
knowledge; convergence is all about how to weave all the pieces together. It
is all about building your engineering toolkit. The NCTS – Real-Time
Networks certification is part of that kit.
The Nortel Certified Technology Specialist (NCTS) – Real-Time Networks
is just the first step of becoming a Convergence Engineer. This certification
was created to not only assist you, the IP certified engineer, but also to help
us at Nortel. Convergence is part of our culture and our everyday life.
Nortel has built real-time capability into our converged networks. There are
advantages of the converged and real-time world, and you as a certified
engineer, will be ready to embrace them.

Copyright © 2004 Nortel Networks Essentials of Real-Time Networking


12 Chapter 1 Introduction

I know you will find value in this study guide as you get ready for your
certification. The subject matter experts who created this book hope you
enjoy reading this guide as much as we enjoyed creating it.
Thank you and best of luck in building your tool kit.

Essentials of Real-Time Networking Copyright © 2004 Nortel Networks


13

Section I:
Real-Time Applications and Services
Let's begin by looking at the applications that run on real-time networks.
Section I examines the characteristics of real-time applications and the
issues that arise when we run them over IP. This section looks at the
applications as the user sees them, and the implications of IP transport
performance on the quality the user experiences. These chapters will give
you a detailed view of how network design and implementation decisions
can affect the user's experience.
Chapter 2, The Real-Time Paradigm Shift, defines what we mean by
Real-Time networking. Real time is defined, along with convergence in
general and some specific types of convergence. Familiar applications are
sorted into real-time and non–real-time categories, and some potentially
distinguishing features of real-time packet traffic are described. Quality of
Experience (QoE) is introduced, and its relationship to network and
application performance is discussed in detail. The chapter concludes with
a discussion of the difference between QoE and QoS (Quality of Service).
Chapters 3 and 4 take a look at issues around quality in two popular
applications, voice and video. Voice Telephony continues to be the “Killer
Application” of telecommunications. Examining voice impairments
provides a good reference point for us to understand both the implications
of IP network behavior on real-time applications, and how to interface IP
networks with the existing TDM network. Chapter 4 looks at video and the
impairments to the image that can result from IP transport.
Chapter 5, Codecs for Voice and Other Applications, introduces
digitization and encoding, which make it possible to put analog signals
over digital networks. The discussion here provides some background on
(1) voice and video analog signals and how characteristics of those signals
translate into digital mode, (2) codecs that are used to remove redundancy
to reduce the amount of data needed to carry the signal information, and (3)
how various errors and disturbances of the compressed digital signal affect
the reconstituted analog output. Common coding standards for telephony,
audio streaming, and video streaming and conferencing are summarized,
and guidelines for selecting a codec for VoIP are provided.

Copyright © 2004 Nortel Networks Essentials of Real-Time Networking


14

Essentials of Real-Time Networking Copyright © 2004 Nortel Networks


15

Chapter 2
The Real-Time Paradigm Shift
Concepts Covered
Telecommunications convergence
Types of convergence: network convergence, service convergence,
and application convergence.
Convergence removes constraints for users but adds constraints
for network operators
Real-time telecommunications
How to separate real-time and non–real-time applications and
services
Service quality and performance
Quality of Experience (QoE)
Measuring QoE
Quality of Service (QoS)

Introduction
For more than a decade, telecom scholars have been publicizing the
advantages of convergence to multiservice networks; anticipated benefits
range from cost savings from operating a single network infrastructure to
productivity gains and/or new revenue from advanced services. Depending
on who you talk to, next generation networks are expected to reduce capital
expenditures, reduce operating expenses, increase revenues, decrease user
cost of telecom services, improve quality, reduce quality, increase the
reliability and survivability of the network, increase competition among
carriers, and reduce churn in the customer base. Only time will tell which
of these predictions are correct, but one thing is certain: meeting user
requirements and expectations on converged packet networks is an
enormous challenge. The crucial component of this challenge is making
real-time services operate over a packet infrastructure.
This chapter introduces the concepts of convergence, real-time operation,
and Quality of Experience (QoE). No matter what network they run on,
real-time services like voice telephony1 and video conferencing require
careful engineering to deliver acceptable performance. Convergence of
applications and services means that real-time and non–real-time functions
must share a common network environment and/or run side-by-side within
the same application. Mixed traffic types from services and applications

1. Hereafter referred to variously as interactive voice or simply voice.

Copyright © 2004 Nortel Networks Essentials of Real-Time Networking


16 Section I: Real-Time Applications and Services

with differing requirements, can end up bumping heads as the traffic moves
across the network.
Successful deployment of converged networks depends on careful planning
and engineering. The building blocks of this success include an
appreciation of the characteristics and constraints of the services and
applications the network will support, as well as a solid understanding of
the network environment. A solid understanding includes knowledge of the
transport technologies, the protocols, the available choices for
implementation, and issues around interconnection with other networks.
The details of network implementation will affect network performance
and resiliency. The balance of this book reviews these building blocks, the
choices and options available around network architecture, deployment of
services on IP infrastructure, and design guidelines for achieving the
performance and reliability users and network operators need from real-
time services.
The criteria for successful performance are based on user QoE. It doesn’t
matter how smoothly packets move through your network, if the users find
that services and applications don’t meet expectations. Planning must
address the factors that underlie QoE for each service that runs on the
network, as well as any interactions or inconsistencies between them.
This book introduces real-time networking and many of the real-time
concepts.

What is convergence?
Narrowly defined, the term convergence, refers to the merging of traffic
from two or more separate networks, onto a single network. At present, we
are witnessing the convergence of traditional voice traffic (consisting
mostly of standard voice telephony) and LAN-based data traffic (consisting
mostly computer communications such as e-mail and file transfer) onto a
common packet-based infrastructure. More broadly, convergence is used to
describe the fusion of function across all aspects of communications. Three
main kinds of convergence have been defined:
Network convergence–Combining network traffic from different
services (for example, voice, video, data) on one infrastructure
Service convergence–Combining previously distinct services (for
example, wireline and wireless voice; wireless voice and short
message service) into a single service
Application convergence–Merging of previously distinct
applications into a coordinated suite (for example, multimedia,
collaboration, and the integrated desktop)
Although convergence has recently gained prominence and notoriety, it is
not a recent phenomenon. The telecommunications industry recognized

Essentials of Real-Time Networking Copyright © 2004 Nortel Networks


Chapter 2 The Real-Time Paradigm Shift 17

over thirty years ago that the hierarchical network architecture of the voice
network could not continue to grow indefinitely. The introduction of digital
networking in the 1970s made it possible for the network to carry non-
voice data along with digitized voice. This trend continued in the 1980s
with frame relay and ATM. On the data side, private and then public
multiprotocol best-effort networks followed, and forerunners to the IP
protocols emerged. On the voice side, FAX and voice-band data services
ran on a common infrastructure with traditional voice. More recently,
convergence continues with the introduction of Storage Area Networking
(SAN) and IP telephony; these bring with them the more stringent
requirements of real-time operation and business-critical reliability.

Convergence increases options


For the user, convergence can increase the number of things one can do and
the ways they can be done. Convergence brings down the boundaries
between data and voice, between wired and wireless, between public and
private, between the central site and the remote site, and between the
location of the supporting processor and the location of the user.
Convergence allows services to more easily combine different media and
different types of information, and to facilitate contingencies between
events across service types. For example, convergence in messaging means
that the user monitors only one mailbox, rather than separately monitoring
e-mail, work voice mail, cellular voice mail, home answering machine, and
a pager. Convergence in the application user interface, makes a new
application instantly familiar, in that the controls are the same or similar to
applications that the user already knows how to use.

Network convergence
Network convergence brings all types of traffic onto a single network
infrastructure, such as voice, audio, video, and data; bearer and signaling.
Such convergence may occur at the level of the transfer protocol (for
example, IP), the data link protocol (for example, Ethernet), and/or the
physical medium (for example, optical fiber).
The overwhelming presence of the Internet has led to the choice of IP as
the converged transmission environment for both Service Providers and
Enterprises. The advantages of this choice include the economics of a
single network platform as well as seamless connectivity with existing
Internet infrastructure. The IP protocol suite now includes higher level
protocols for all forms of data applications, for audio and video streaming,
and for real-time applications such as telephony and conferencing.

Service convergence
Convergence enriches telecommunications by bringing together familiar
features and services that traditionally operate on different systems. For

Copyright © 2004 Nortel Networks Essentials of Real-Time Networking


18 Section I: Real-Time Applications and Services

example, a user’s wired phone and cell phone can share a single number.
Paging, voice messaging, instant messaging, e-mail, and FAX can be
managed by a single agent device. User mobility can be enhanced by
wireless LAN and follow-me services; various media (voice, pager, PDA,
e-mail) converge onto a single user device. At the same time, the total cost
of ownership of the supporting network may be reduced.
Service convergence brings with it, new client devices, communications
servers, and media gateways. It may be realized in a fully distributed
system, running on top of an IP network (service provider, large
Enterprise), or through an integrated office-in-a-box (small Enterprise). It
may be realized as an evolution from an existing installed base, as a stand-
alone system, or as a managed or hosted solution. Users want service
convergence without compromising their familiar telephone operation.
They want the same features/functionality, voice quality, security, and
reliability they are getting from their current individual services. Services
converged on a carefully engineered packet infrastructure can deliver this,
and more.
Service convergence enables a highly mobile and distributed work force.
You can use any IP desktop phone, register and your desktop is where you
are. You can work at home or in a wireless LAN hot spot while you run a
Session Initiation Protocol (SIP) client on your laptop, have your phone
number and telephony features with you, and make secure calls over the
Internet. Or you can have system-wide roaming for your IP wireless
telephone or telephony-enabled PDA. Service convergence will ultimately
allow voice and data roaming across the WAN, bringing down the
boundaries between Enterprise wireless LAN systems and public wireless
services.

Application convergence
The full potential of the IP multimedia networking will bring significant
changes in how people communicate and collaborate. Application
convergence can do for person-to-person communications what browsers,
HTML, and Domain Name System (DNS) have done for information
access and transaction services. It will put the end-user back in control of
the communications space, enhance how users collaborate with colleagues,
and enrich how Enterprises communicate with their customers.
Application convergence is realized through the development of
anticipatory, media-adaptive, and time-sensitive applications. Employee-
facing converged applications allow Enterprises to create distributed teams
to address business opportunities and challenges more effectively and
dynamically. Customer-facing converged applications serve to strengthen
Enterprise/customer relationships and leverage investments in contact
centers and self service applications with integrated databases and back-

Essentials of Real-Time Networking Copyright © 2004 Nortel Networks


Chapter 2 The Real-Time Paradigm Shift 19

office systems. Converged applications will form one facet of the new
revenue-generating services that Service Providers are anticipating.

Convergence adds constraints


While convergence offers greater freedom to the user, it puts increased
demands on designers and network operators. Network convergence means
that the network must meet the combined requirements for all the services
it will carry. Each performance parameter must meet the strictest
requirement among all those defined for individual services. Converged
services require end devices that take into account all the services that user
will access, which can include a variety of perceptual modes (sound,
graphics, text), different types of input (hard and soft keys, alphanumerics,
stylus, speech recognition), portability to support mobility, and so on.
Application convergence requires that designers build flexible, responsive
user interfaces, and that they identify and develop equivalent operational
features between applications to maximize capability and efficiency for the
user. We have seen with computing platforms and applications that
promised powerful features and functionality, often resulted in an
encounter of complexity and annoyance, as the application tried to guess
what the user wanted to do, or failed altogether when the operating system
and the application weren't compatible.

What do we mean by real time?


The inherent value of real-time applications and services, combined with
the incompatibility of real time and best effort operation, led us to develop
an engineering strategy specifically aimed at real-time networking. This
section defines real time as it relates to telecommunications networks,
offers some examples of real-time services and applications that run on
those networks, and discusses the characteristics of real-time traffic. These
characteristics distinguish real-time traffic from non–real-time traffic, and
make it particularly challenging to deliver high-performance, high-
availability, high-reliability real-time networks.

The fallacy of network as conduit


Initially, it was assumed that packet networks would operate simply as a
pipeline for data, regardless of the origin or application that generated the
data. The network would be an unintelligent pipe, with all the intelligence
in the endpoints. IP was designed to treat all packets equally, and all
applications were expected to peacefully coexist. This gave maximum
flexibility to applications developers, and it remains a suitable arrangement
for non–real-time applications.
Since then, the inherent difference between voice and other applications
has asserted itself. This led to vigorous debate about the virtues of thick
clients (smart end devices connected to a dumb pipe) versus thin clients

Copyright © 2004 Nortel Networks Essentials of Real-Time Networking


20 Section I: Real-Time Applications and Services

(relatively unintelligent end devices connected by an intelligent network).


It is now universally recognized that best-effort, multiservice IP
environment will not support voice services with the performance and
reliability that users have come to expect. Interactive voice requires
differentiated service classes or other special mechanisms to expedite the
voice packets, and careful management of the network to constrain jitter
and bursts of packet loss. Thus, a packet network would require different
tuning to optimize voice performance than to optimize data performance.
At the same time, it is not just voice that requires special care; all real-time
applications pose similar difficulties. What is it about real-time services
that impose these constraints? More specifically, what are the differences
between real-time and non–real-time applications and services, and what
do we need to do to meet user expectations for real-time applications
without disturbing the performance of non–real-time applications?

Real-time processes
You may be familiar with the term real time with respect to computing
operations. There, real time is used to describe processes that take more or
less continuous input and run fast enough to keep up with the rate at which
new input arrives. If the process does not run fast enough, the input backs
up. The execution time of the process is not critical, but the rate at which
new input arrives determines the minimum throughput rate. Taking an
example of digitization and compression of an analog video signal, the
codec must be able to accept and process video frames at the given rate of
the analog input. Variability in input rate can be buffered out, but the
process needs to keep abreast of the average rate to perform as a real-time
computing process.

Real-time networking
As a real-time process, real-time networking requires a minimum
throughput rate defined by the operation of the application. In networking,
we refer to the throughput capacity of a transport path as the bandwidth2. In
contrast to real-time computing processes, however, the execution time is
also a key factor. The “execution time” of a real-time networking service or
application is the end-to-end (one-way) delay. This delay is made up of
both processing (for example, time needed to parse input, execution time
for computations, and other operations), and transport delays (mostly
queuing, buffering, and propagation time).
Among other functions, networking processes mediate information transfer
between the endpoints. As well as bandwidth, sufficient to keep up with the

2. The term bandwidth is derived from the relationship between the frequency bandwidth of an analog
carrier and the maximum rate that the carrier can be modulated to signal one bit of information. The
broader the bandwidth, the faster the maximum modulation rate, and so the more bits can be sent per
unit time.

Essentials of Real-Time Networking Copyright © 2004 Nortel Networks


Chapter 2 The Real-Time Paradigm Shift 21

input (talker speech, video signals, text entry), real-time networking


services and applications need a short round-trip delay; that is, the time
needed for a signal to be sent to other end and a response to be sent back by
a specific application depends on what activity is involved (talking, text
entry, button press) and on what response time would be expected in
similar circumstances without the intervening networks (speech reply,
channel change).
Non–real-time network applications, for instance, file transfer, can trade off
the amount of bandwidth available with delay. A higher bandwidth channel
simply carries data more quickly. Real-time applications do not have this
direct bandwidth for delay trade-off. For example, an ordinary voice
telephone call uses 64 kb/s in each direction. All things being equal, using
faster links will not allow the data to be sent out at a faster rate since the
input rate remains constant. Adding bandwidth will not necessarily
improve the performance of real-time applications3. Instead, application
design, network topology, provisioning choices, and traffic management
are key factors to controlling delay for real-time networking services.
How short does the delay need to be? The response time required by a
specific application depends on what activity is involved (talking, text
entry, button press) and on what response time would be expected in
similar circumstances without the intervening network (speech reply,
channel change). The round-trip delay must not be short enough to avoid
disrupting the user’s thoughts, behavior, and/or interaction with another
user at the far end. Excessive baseline delay or variable delay across the
connection can force users to spend concentration adapting to or coping
with the communications medium and distract them from their goals. For
example, a call with long delay can exaggerate the pause that occurs
between one speaker’s turn and another. A user may interpret the pause as a
hesitation on the part of the other person, or possibly may wonder whether
the pause is a hesitation or delay in the speech path. In the first case, the
pause is erroneously interpreted simply as part of the conversation (with
implications for how the two parties will assess each other), while in the
second, the user is distracted from the conversation by his uncertainty
about what the delay means. Machines are more tolerant of delay; however,
there is usually a user somewhere in the chain. A machine may be prepared
to wait indefinitely for, say, password screening. A user, whose session
setup time depends on screening time will not.
Research done at Nortel has shown that for voice, end-to-end (one-way)
delay of up to 250 ms measurable does not measurably impair a
conversation. Browser page loading time should complete within four

3. As usual, in reality, things are more complex. Increasing bandwidth in the network does have some effect
on the total delay. First, there may be alternatives available at higher bandwidth (such as higher rate
codecs that can reduce processing time). Second, where congestion is occurring, increasing the total
bandwidth available can reduce queuing in the network, which in turn decreases the delay.

Copyright © 2004 Nortel Networks Essentials of Real-Time Networking


22 Section I: Real-Time Applications and Services

seconds of the request. Acknowledgement of remote application control


commands should take less than a hundred milliseconds. Delay is an
important impairment for IP networks, and will be discussed in several
chapters later in the book, especially around voice, video, and network
planning.
Real-time delay constraints mean that there may not be time to resend lost
or errored data. Some applications are relatively tolerant of lossy channels,
while others are completely intolerant of missing data. Figure 2-1 shows
some common applications within a delay-by-loss space. The applications
above the line are tolerant to limited amounts of missing data, as indicated
by the height of the box. Those falling below the 0% line will fail if any
data is corrupt or missing. This chart tells us that the most challenging
applications to run over converged networks are command-and-control
applications, since these require both very fast response (low delay) and
exact data.

Packet Loss
10%
Interactive Responsive Timely Non-critical

5%
Voice/video
Conversational messaging
voice and video Streaming
audio/video
100 ms 1 sec 10 sec Fax 100 sec
0%
Command/control Transactions Background
Paging,
e.g., Telnet, e.g., E-commerce e.g., Usenet
Downloads
interactive games web-browsing, E-mail delivery
E-mail access

Delay
Figure 2-1: Sensitivity of applications to delay and loss of data (from Rec.
G.1010, End User Multimedia QoS; figure reproduced with the kind
permission of ITU)
Aside from delay and bandwidth requirements, there are differences
between real-time and non–real-time flows. Real-time applications often
use small, more regularly generated packets, and for many, the flows may
last minutes or even hours. Simultaneous two-way traffic is common. Voice
traffic, for example, consists of small packets carrying the speech signal
that are generated on a regular schedule. Voice packet generation is
predictable either deterministically (where both speech and silence are
sent), or statistically, according to the distribution of conversational
utterances (where silence suppression is used). Video conferencing

Essentials of Real-Time Networking Copyright © 2004 Nortel Networks


Chapter 2 The Real-Time Paradigm Shift 23

services will have larger packets, but these are still generated regularly,
compared to the short, burst traffic associated with data communications
such as file transfer or e-mail delivery. The characteristics of interactive
command-and-control games show burst traffic of small packets containing
the joystick movements or mouse clicks associated with rapid-fire play.
The packet generation statistics depend on the characteristics of the
particular game.
In contrast, most non–real-time computing data traffic is highly bursty and
consists of large packets. Flows are generally shorter-lived and have a
back-and-forth nature (one direction, then the other) and the amount of data
transferred may be highly asymmetric, where the return traffic consists
mostly of TCP acknowledgements, user commands, and so on. Streaming
flows consist of regularly generated packets, more like telephony flows,
than like typical data flows. However, streaming flows are unidirectional
and use larger packets than are usually found in interactive applications.
Network signaling traffic is different again. Signaling traffic is usually
time-sensitive, and is often associated with session setup or another system
function. Similar to other real-time traffic, signaling is comprised of small
packets. However, flows are short, and the pattern is back-and-forth, rather
than simultaneous traffic on the two paths.
Table 2-1 provides a point-by-point comparison of the characteristics of
real-time and non–real-time traffic through a packet network.

Real Time Non–Real-Time


input arrives at a consistent rate input is highly bursty
requires minimum throughput rate throughput rate can vary
smaller packets (higher overhead) large packets (smaller overhead)
flows tend to be long flows tend to be short
usually bidirectional and symmetric usually highly asymmetric
performance is delay-sensitive performance is not overly delay-sensitive

Table 2-1: Real-Time and Non–Real-Time Traffic Comparison

Real-time services
We can use the preceding definitions to categorize common services and
applications. Figure 2-2 classifies many applications as real-time (right,
shaded background) versus non–real-time (left, white background). In
addition, the diagram also differentiates between applications where a

Copyright © 2004 Nortel Networks Essentials of Real-Time Networking


24 Section I: Real-Time Applications and Services

human is involved at one or both ends of the interaction (lower) versus


where the interaction is strictly machine-to-machine (upper).

Backup Storage Medical


Monitoring
eMail
Machine

Password
(delivery) Screening
Process
File Monitoring
Transfer Network
Computing Security
Monitoring
Remote Command &
Control Games
App
Move/Response eCommerce
Games SMS Video
Human

(Chess, Trivia) Browsing


Audio
Streaming
Video PTT Voice
Voice
Mail Streaming Text Conferencing
Audio Chat

Non-Real Time Real Time


Figure 2-2: Common network applications categorized as non–real-time
(left) versus real-time (right) as well as those mediating a human interaction
(lower) versus those between machines (upper).
Examples of real-time services and applications include, ordinary voice
telephony, audio or video conferencing, interactive command-and-control
(such as remote operation using Telnet or Timbuktu*, and shoot-em-up
games), and shared multimedia session (such as Netmeeting*).
There is no clean break between real-time and non–real-time applications.
Applications shown deep within their category in Figure 2-2 are easily
distinguished. Those falling near the boundary between non–real-time and
real-time have less stringent delay/responsiveness requirements, or may
show some variation in the requirements. This group includes audio and
video streaming, Push-to-Talk (PTT) (which is a kind of chat applications
using voice), browsing and e-Commerce, banking and other interactive
data services, security screening functions, and storage area networking. In
part, the delay requirements for these “near–real-time” or “quasi–real-
time” applications may depend on how they are being used, whether the
application offers particular features, or even what information is being
carried.

Essentials of Real-Time Networking Copyright © 2004 Nortel Networks


Chapter 2 The Real-Time Paradigm Shift 25

A new paradigm known as collaborative multimedia communications,


combines individual applications into coordinated operation (see
application convergence above). Such collaborative environments can
result in simultaneous operation of real-time and non–real-time elements.
Once a session is established between two or more parties, various modes
of communication can be invoked as required over the course of the
session. These modes can be voice, video, text chat, file transfer, file
sharing, web page push, and so on. They may be invoked sequentially or
simultaneously. Any component application of a collaborative session must
perform at least as well as it would perform individually on a purpose-built
network for users to find it acceptable.

VoIP vs. IP Telephony


Voice over IP (VoIP) is usually considered the IP parallel of TDM
telephony. VoIP is an application or a service where full duplex voice
channels are set up, over an IP network. It is almost always assumed that
these channels will be used for speech. Features ordinarily associated with
TDM telephony may be absent. Consequently, VoIP may not deliver the
same breadth of service and features. As well, 911 and voice-band data
services (like FAX and modem) may not be supported. Users, who expect
telephones to behave like telephones, and who may not even be aware that
the infrastructure has changed, may be surprised and disappointed by the
apparent shortcomings. IP Telephony extends VoIP to replicate the
standard TDM telephony service on an IP infrastructure. This requires the
network to support the public dialing plan, to offer backward compatibility
for voice-band data and other non-voice services, and to incorporate
telephony features like voice mail, call transfer, call forward, and 911.
Many aspects of IP Telephony operation are real time. IP Telephony
performance requirements are generally stricter than those for VoIP.
Our traditional real-time networks, that is to say, voice networks, have been
engineered for performance. In contrast, data networks are generally
engineered for connectivity. These two goals impose different engineering
priorities and strategies, and lead to very different network characteristics.

Service quality and performance requirements


Generally, end users of network services don’t care how service quality is
achieved. What matters to them is how easily they can complete their tasks
and achieve their goals. These factors contribute significantly to the Quality
of Experience (QoE) users have with a service. Carriers and transport
providers, on the other hand, are concerned with defining which low-level
network technologies and Quality of Service (QoS) mechanisms to use,
and how to implement, optimize, and configure them to deliver adequate
services quality while minimizing operating cost and maximizing link
utilization.

Copyright © 2004 Nortel Networks Essentials of Real-Time Networking


26 Section I: Real-Time Applications and Services

What is QoE?
Quality of Experience (QoE) is the user’s perception of the performance of
a device, a service, or an application. User perception of quality is a
fundamental determinant of acceptability and performance for any service
platform. The design and engineering of telecommunications networks
must address the perceptual, physical, and cognitive abilities of the humans
that will use them; otherwise, the performance of any service or application
that runs on the network is likely to be unacceptable4. Successful design

Sidebar: Quality of Experience


QoE refers to the quality of a device, system, or service from the user’s
point of view. Other terms for similar and related concepts include user
performance, human factors, user engineering, user interface design,
Human-Computer Interface (HCI), and Man-Machine Interface (MMI).
QoE is associated with all technology used by humans to reduce work,
solve problems, or reach goals. Voice telephony is a good example to
explore QoE, since QoE of telephones has been well-studied and used to
guide network and equipment design for decades. So, where does QoE
show up in telephony?
Efficiency: modern telecom services make it fast and easy to talk to
someone.
Ease of Use: the telephone dialpad is a simple user interface: a number
sequence is pressed to set up the call, call progress tones tell the caller
what is happening as call setup completes, the phone rings, the called
party picks up the handset and talks.
Transparency: How well does a telephone call approximate a face-to-
face conversation? The voice should have a good listening level without
distortion or noise. Delay should be short enough, and there should be
no echo or other annoying artifacts. Any impairments will annoy the
user or will require that the user adapt to them. The better the “virtual
reality” of the phone channel, the more the user can forget or ignore that
the conversation is taking place on the phone.
The effectiveness of a device or system in addressing the user’s needs
and constraints determines its QoE.

requires a thorough understanding of the needs and constraints of the


eventual users of the system. QoE is best understood on the system level,
since system characteristics and usage factors may interact, and this may be

4. Without proper understanding of user requirements, there is a risk of both under-engineering, where the
network fails to meet the needs of the users, and over-engineering, where the specifications go
beyond the user’s needs, needlessly driving up the cost to provide the device or service.

Essentials of Real-Time Networking Copyright © 2004 Nortel Networks


Chapter 2 The Real-Time Paradigm Shift 27

missed in subsystem-level analysis. For telecommunications networks, this


means understanding the end-to-end performance.
QoE directly affects the bottom line. If service QoE is poor, the service
provider may lose revenue or customers. When a conversation is impaired
by excessive packet loss or delay, when an application is slow, or when an
e-mail arrives late because the network was congested, communication
effectiveness goes down. This affects the user’s efficiency, and may push
his costs up.
In telecommunications usage, the older term Quality of Service (QoS) has
broadened in meaning and is now used to refer to the mechanisms intended
to improve or streamline the movement of packets through the network (as
in “Is QoS enabled on that network?”). In the past, the same term referred
both to the intention (enabling mechanisms used to help ensure good
service quality) and the outcome (the user’s perception of the service
quality), and described the user’s perception of quality. We now use the
term QoE for the user’s perception of quality to eliminate any confusion.
The sidebar gives more details on QoE.
Examples of user tasks or goals in the telecommunications realm include
making an appointment (voice call), finding out when a movie is playing
(Internet browsing), or obtaining an item from an online retailer
(e-commerce). When a user needs to spend attention and effort to manage
the medium (accommodate complex setup, unstable session, signal
distortion or artifacts, delay or other impairment), the task becomes more
difficult to complete, and QoE is reduced. Each application will have its
own combination of parameters to determine the QoE. Parameter values
leading to acceptable or optimal performance may also be specific to the
application.
Engineering for QoE is most effective when it is undertaken at the
beginning of the design process. Overall requirements are determined from
user needs for the target applications. Other factors such as the total
number of users to be supported and the different applications that will run
on the network are also taken into account. Requirements for individual
network components are derived from the overall requirements. In some
cases, it will be necessary to trade off between factors; for example, the use
of encryption may improve the user’s feeling of security and privacy but
can also increase delay and, therefore, reduce responsiveness. Guidelines
for deployment options address the QoE implications of various choices.
The user interface associated with the network management system and the
effectiveness of quality monitoring features will also be improved by
attention to QoE factors.

Copyright © 2004 Nortel Networks Essentials of Real-Time Networking


28 Section I: Real-Time Applications and Services

Ea

y ilit
sy

Reliab
to

y
lit
Us

bi
la
e

ai
Av
Pro
ind gress
ic a
tors

QoE Dexterity

sive
p on Se
Res cu

Clear
e
Us rit
y

& pict
t to

sound
n

ure
icie
Eff

Figure 2-3: Some of the factors influencing the QoE of a service, application,
or device
Efforts are more successful where QoE is an integral part of the design
process. Retrofitting to improve low QoE is likely to be difficult,
expensive, or inadequate. For example, external echo cancellers are more
expensive than integrated echo control. Tweaking the network to reduce
delay may achieve some minor improvement, but many sources of delay
will be hard-coded and therefore inaccessible to tuning. What does this
mean for buyers of real-time converged networks? Vendors whose
performance targets are derived from a comprehensive set of QoE
parameters, and whose design intent begins with these targets are likely to
achieve better overall QoE. Vendor selection criteria should include the
vendor’s attention to QoE, as well as system reliability and cost.

Measuring QoE
Aside from the obvious grossly malfunctioning cases and user complaints,
how can we determine the level of QoE our network or service provides?
Quality of Experience is a subjective quantity and can be measured directly
using behavioral science techniques. QoE can be measured in a laboratory

Essentials of Real-Time Networking Copyright © 2004 Nortel Networks


Chapter 2 The Real-Time Paradigm Shift 29

Sidebar: Quantifying QoE parameters


As noted in the main text, we need to relate subjectively measured QoE
to a set of objective parameters, and determine the target for each
parameter.
The particular values of QoE parameters determine or influence (1) the
user's rating of service quality OR (2) his/her performance on some
relevant aspect of the service. Subjective evaluation is done to quantify
the relationship between the overall QoE and the objective parameters
we believe determine the QoE. We vary the physical parameter (for
example, the resolution of a video image, and examine how the user's
quality rating changes.
The figure shows a hypothetical relationship of a generic parameter to
some QoE measure. As our hypothetical parameter increases (x-axis),
the subjective rating also increases (y-axis). The shape shown is
common for QoE parameters, where the user rating bottoms out at the
low end and tops out at the high end (so-called “floor” and “ceiling”
effects). Other shapes are possible.
The positioning of the unacceptable, acceptable, and premium quality
areas depends on another subjective measure, acceptability. Depending
on human perceptual factors, user expectation, etc., the boundaries
between the colored regions can shift. t
le

n
le

lle
ab
ab

ce
pt
pt

Ex
ce
e
cc

Ac
Some Rating

a
Un

Some Physical Variable


[ e.g.,SNR, delay, resolution]

setting or in the field, through user ratings, surveys, or observation of user


behavior. Specific techniques include, user quality ratings of an application
or service, performance measurement, such as the time taken to complete a

Copyright © 2004 Nortel Networks Essentials of Real-Time Networking


30 Section I: Real-Time Applications and Services

task, or tabulation of service-related information, such as subscriber


complaint rates or frequency of abandoned calls. A familiar QoE metric is
subjective Mean Opinion Score (MOS).
In the previous section, we emphasize that the outcome is best where
design and development proceeds using performance targets based on QoE.
The performance targets, however, should not be expressed in terms of
QoE metrics. This may seem counter-intuitive, given the previous
discussion about QoE. Not only are subjective metrics more time-
consuming and expensive to measure, they cannot always be translated into
engineering characteristics. In concrete terms, if the specification was
given as MOS, and verification testing showed that the performance was
below target, how would we know what to fix?
Instead, we need to identify objectively measurable correlates of QoE, and
determine the target for each. This approach facilitates design engineering,
verification, and troubleshooting in the field, as well as providing
customers with measurable performance targets the vendor will stand
behind.
Objective parameters that contribute to QoE include:
physical properties of the end device (such as size, weight, fit, button
placement)
timing and logic of system operation (such as feedback on progress of
a hidden operation, how long the user must wait before going on to
the next step, number of steps needed to complete a task)
network characteristics (availability, call setup time, data loss [bit
errors or missing packets], end-to-end delay/response time)
network/account administration (availability of user support, billing
accuracy).
There are a few cases where two or more parameters interact, making it
difficult to assess the QoE impact of one parameter individually. In most
cases, the parameters can be separated into sensible domains. This allows
the network characteristics to be considered separately from the physical
properties of the end device.
Service pricing is not a component of QoE. A service that performs poorly,
remains poor even when it is free. Nevertheless, pricing remains a factor in
a customer’s decision whether to tolerate poor QoE or to complain about a
problem.
The QoE results determine the range of allowable variation in each
parameter that matches the perceptual and cognitive abilities of the user.
The relationship between the range of variation and the acceptability of the
performance allows us to define targets and tolerances for each parameter.
When all parameters and their targets are properly identified, and a device

Essentials of Real-Time Networking Copyright © 2004 Nortel Networks


Chapter 2 The Real-Time Paradigm Shift 31

is properly engineered to meet them, the resulting device will have high
QoE.

What is QoS?
Quality of Service (QoS) refers to a set of technologies (protocols and
other mechanisms) that enable the network administrator to manage
network behavior by avoiding or relieving congestion, expediting time
sensitive packets, limiting access to congested links, and so on.
The aim of QoS mechanisms is to ensure efficient use of network
resources. The alternative is overprovisioning capacity, which may not
solve the problem of contention for specific resources, and, as we have
discussed earlier, may not improve the performance of real-time
applications. Quality of Service mechanisms do not create bandwidth but
instead manage available bandwidth more efficiently, especially during
peak congestion periods. Congestion occurs when a node or a link reaches
its maximum capacity, that is, when the sum of ingress traffic at a given
node exceeds the egress port capacity. QoS mechanisms may not be
sufficient or effective in a network that is continually congested; to address
this, redimensioning may be necessary.
An important aspect of QoS is assigning packet priorities corresponding to
specific service classes (for example, with specific payload types) or within
specific flows (for example, User X vs. User Y). They can raise the priority
of a given class of packets or a given flow or limit the priority of competing
flows. Providing differentiated services requires first determining the
desired services and user performance requirements, and second defining
and evaluating the appropriate QoS mechanisms required to balance the
resulting traffic. QoS mechanisms allow us to manage network
performance (for example, bandwidth, delay, jitter, loss rate, and response
time) to maintain stable, predictable network behavior.

Conclusion
The main challenge of converged networks is to create a network
environment that allows all the services and applications it carries to
perform well, regardless of whether they are real-time or non–real-time.
The combination of real-time applications with traditional non–real-time
computing data applications on a single network can result in a widely
variable packet traffic characteristic. The network must be able to
comfortably carry many different types of traffic without degrading any of
the applications riding on them. Combined with the delay requirements for
individual real-time applications, the design challenge is formidable
indeed. Table 2-2 summarizes the demands of converging real-time and
non–real-time services and applications onto a common packet
infrastructure.

Copyright © 2004 Nortel Networks Essentials of Real-Time Networking


32 Section I: Real-Time Applications and Services

Mixed traffic types with various (sometimes conflicting) requirements

Traffic highly likely to be mission-critical: Resiliency is essential.

Goal is performance not connectivity

High user expectations for existing Real-Time services

Service and application convergence marry real-time and non–real-time

End-to-end delay requirements are strict for some applications

Some real-time applications are intolerant to data loss

Table 2-2: Challenges of converged Real-Time Packet Networking

Essentials of Real-Time Networking Copyright © 2004 Nortel Networks


Chapter 2 The Real-Time Paradigm Shift 33

What You Should Have Learned


Telecommunications convergence holds enormous potential. At the same
time, it is a significant challenge to ensure that all applications and
services, both real-time and non–real-time, running over a single network
infrastructure perform well. Real-time services have different requirements
than non–real-time services, and these will sometimes conflict.
Convergence is defined generally as, fusion of function within various
domains. Three important domains are the network, the service, and the
application. Network convergence combines traffic from previously
separate networks. Service convergence combines previously distinct
services into a single service. Application convergence combines separate
applications into a coordinated suite. Convergence promises greater
flexibility and freedom to users, but designers face tighter requirements and
additional constraints, and must understand network issues end-to-end,
rather than by function or location.
Real-time services are distinguished by having both a lower limit on
throughput and an upper limit on delay. The specific limits differ for
different services and applications. These requirements make it difficult to
deliver high quality real-time services over a best- effort network.
Flows associated with real-time services have different characteristics than
those for non–real-time. Real-time flows are made up of small, regularly
generated packets, and the flows tend to last a long time. Non–real-time
flows tend to contain large packets, are bursty in nature, and the flows tend
to be shorter. If more bandwidth is available, a non–real-time application
can take advantage of it to run faster.
Real-Time services have traditionally been engineered for user
performance. The end users of a service do not care how the service quality
is achieved, but only that they can do what they need to do without
difficulty. How well the service performs for the users determines their
Quality of Experience. For voice at least, QoE affects the bottom line.
Each service or application has its own set of QoE parameters and
associated requirements. These are determined through subjective studies
evaluating the contribution of individual variables. Understanding the
contributing factors allows us to determine a set of performance
requirements for each service to help ensure a given level of QoE in the
final network. The engineering performance requirements are expressed in
terms of objectively measurable parameters that have been shown to
correlate strongly with QoE. We can expect the best QoE in the final
product where QoE has been an integral part of the design process from its
inception. Pricing is never a factor for QoE.
For the present discussion, Quality of Service (QoS) refers to a set of
technologies (protocols and other techniques) that enable the efficient use
of network resources.

Copyright © 2004 Nortel Networks Essentials of Real-Time Networking


34 Section I: Real-Time Applications and Services

References
ITU-T Recommendation G.1010, End-user Multimedia QoS Categories,
Geneva: International Telecommunication Union Telecommunication
Standardization Sector (ITU-T), 2001.
ITU-T Recommendation G.114, One-way transmission time, Geneva: ITU-
T, 2003.
ITU-T Recommendation G.107, The E-Model, a computational model for
use in transmission planning, Geneva: ITU-T, 1998.
ITU-T Recommendation P.800, Methods for subjective determination of
transmission quality, Geneva: ITU-T, 1996.

Essentials of Real-Time Networking Copyright © 2004 Nortel Networks


35

Chapter 3
Voice Quality
Leigh Thorpe

Media Session Related

Session Gateway
Media Related Control Control

H.248 / MGCP
View From
Video
Audio
Voice

Real-Time

Application
Control

/ NCS
RTCP

H.323
RTSP

Perspective

SIP
To
codec Ntwk

RTP

UDP / TCP / SCTP


IP

QoS
MPLS Packet
AAL1/2 AAL5 Cable

Resiliency
ATM FR Ethernet DOCSIS

xDSL Cellular Fiber Copper WLAN HFC

SONET / TDM

Figure 3-1: Transport path diagram

Concepts covered
Characteristics of conversational voice services (“voice”)
Steps involved in transporting voice over an IP network
The main factors affecting VoIP conversation quality
Effects of delay and jitter
Effects of packet loss, and its mitigation using packet loss
concealment
Effects of echo, and its mitigation through control of signal level and
echo cancellation
Quality metrics including MOS, PESQ, and E-Model R

Copyright © 2004 Nortel Networks Essentials of Real-Time Networking


36 Section I: Real-Time Applications and Services

Introduction
Everyone uses the telephone every day and has well established
expectations about how it should work. What users may not realize is that
the standard voice call consists of two narrow-band (300-3400 Hz) sound
channels, one in each direction, and that these operate independently. This
means that even if the two users talk simultaneously, each will be heard at
the other end. This operating mode is called full-duplex. The situation is
more complex for advanced features like handsfree (speakerphone) or
conferencing, but for now we will limit our discussion to the simple desk-
to-desk handset call.
Traditional voice networks have provided very high quality voice, setting a
high benchmark for IP voice services. In this chapter, we will discuss the
main factors that contribute to voice quality on converged networks. Voice
is a demanding real-time application. How will typical IP network behavior
affect Quality of Experience (QoE) for conversational voice?1 What are
the challenges we face in making voice meet user expectations, and what
can be done to ensure we meet them?

Voice calls through an IP network


Speech traveling through an IP network undergoes a number of
transformations on its way from the talker's mouth to the listener's ear.
Figure 3-2 shows the steps involved in transmitting a simple call from an
IP terminal through a gateway to an ordinary phone at the other end. The
upper row of boxes in the diagram shows the path taking speech from left
to right, and the lower row shows the path from right to left. On the upper
path, the user's voice is picked up on the microphone (Point A) and carried
to an analog-to-digital converter (A/D, shown at B), which is part of the IP
telephone electronics. The A/D encodes the voice into a synchronous
digital bit stream (eight-bit G.711, 64 kb/s). The G.711 bit stream passes
through an Echo Canceller (ECAN)2 and perhaps through the encoder of a
low bit-rate codec (this is optional). The output of the encoder (or the
G.711 bits, if no compression codec is used) at C is chunked into packets,
and headers are added. The packets are sent across the network by edge and

1. The subjective quality of a voice call is based on many parameters, some of which involve how the
output sounds (for example, level, distortion) and some of which involve the conversation dynamics
(for example, echo, round-trip delay). For the purposes of the present discussion, these are combined
under the designation voice quality.
2. An echo canceller is used in this example, but it is not the only option. Other methods of echo control
may be used to control local echo in the phone. This is discussed in “Echo control for VoIP”.

Essentials of Real-Time Networking Copyright © 2004 Nortel Networks


Chapter 3 Voice Quality 37

network routers. The signal may pass through several routing nodes before
reaching the packet receive side.

A B C D E F G
Packet Core Gateway TDM
Transport Network Network

Terminal Gateway

A/D E Encoder Packet- Jitter Decoder E L


o T
G.711 C SAD ization Routers Buffer PLC C s D
A A s M
D/A N Decoder Jitter Packet- Encoder N
G.711 PLC Buffer ization SAD

Synchronous
Packet Packet Synchronous
(Non-packet)
Side Side Side
Side
A B C D E F G

Figure 3-2: Block diagram of the processes making up the VoIP voice path
In this example, the packet receive side is a media gateway. There, the
packets are delivered and stored in a buffer called the jitter buffer. The jitter
buffer applies a short delay to the data to ensure that a steady stream of data
can be sent to the decoder. Packets are unbundled (D), and the data
reassembled into a synchronous stream. If compression was used, the
signal is decoded. The output of this process is a G.711 bit stream (E),
which is handed over to the TDM network. The echo canceller shown is a
network canceller, and it is essential at the interface between an IP network
and a TDM network where analog access lines may be in use. A loss pad
(F) at the output may be needed to match the loss plan of the packet
network to that of the TDM network. When the G.711 stream reaches the
end of the TDM network (detail not shown), it is converted back to analog,
and the analog signal travels over the local line to the telephone at the far
end (G).
Certain special features are included in the diagram. Speech Activity
Detection (SAD) indicates a silence suppression function, which may be
used to determine whether the data contains speech, which is sent across
the packet network, or only silence, which is not sent. PLC refers to Packet
Loss Concealment. PLC is a process by which the output G.711 bit stream
is repaired to smooth over the gaps left by any missing data.
Note that all Digital Signal Processing (DSP) components (codecs, echo
cancellers, silence suppression, and packet loss concealment) are situated

Copyright © 2004 Nortel Networks Essentials of Real-Time Networking


38 Section I: Real-Time Applications and Services

in the synchronous (non-packet) portion of the path. The speech data are
not read or modified in the packet portion of the network.
Although Figure 3-2 shows a particular connection from an IP phone to a
conventional phone, other connections (for example, IP-to-IP, wireless-to-
wireless over IP) are similar. Only minor changes to the diagram would be
needed to describe most alternate VoIP scenarios.

Factors affecting VoIP conversation quality


Packet networks introduce potential degradation to the voice channel. This
section describes how these impairments affect voice quality. In subsequent
chapters, principally Chapter 20, we will discuss how to control or avoid
these impairments to achieve high-quality voice performance.
Sources of impairment can be classified into two distinct groups: (1)
intrinsic or noncontrollable and (2) controllable. Figure 3-3 indicates where
many of these impairments are found in a converged IP network.
Impairments are termed controllable if the network architect, the box
designer, or the network operator can make choices that increase or
decrease the impairment. On the other hand, non-controllable impairments
are those where the design of the equipment or network has no, or very
limited, influence. Propagation delay due to distance (governed by physics)
is an example of a noncontrollable impairment. Delay through legacy
equipment such as TDM end offices is also beyond the control of the IP
network engineers.
Parameters identified as controllable in Figure 3-3 are not always
controllable by the network operator. In many cases, the choices have been
made by the equipment vendor. Processor speed and capacity, use of
buffering, and many other design factors are chosen by the vendor. The
vendor's design will also determine what setup options are available, such
as the codecs available (for example, G.711, G.726, G.729), packetization
options (for example, 10, 20, or 30 ms), silence suppression, built-in echo
cancellation, and so on. Vendors who give a high priority to end-user

Essentials of Real-Time Networking Copyright © 2004 Nortel Networks


Chapter 3 Voice Quality 39

Quality of Experience (see Chapter 2) will offer choices in their setup


options that allow the optimization of voice qualities.
Processing/switching
Propagation Delay Delay Jitter Buffer
Codec Voice/data
Packet size Access Packet Loss
Link Speed Transcoding

Router
Packet Packet
PSTN POTS
IP Phone Network Network TO
TO EO
V2I Edge
core
MG
Router
L2 SW router
V2I TDM handoff Non-controllable
Source Jitter parameters
Enterprise Queue Size Transmission Delay
access Network Jitter
Controllable
Voice/data loading parameters

Figure 3-3: Controllable and noncontrollable impairments introduced by


packet networks
There are four main performance factors, or impairments, that are
important for VoIP voice quality. These are the speech codec used, delay,
packet loss, and echo. A fifth aspect, signal level, is not affected by IP
transport, but it is important to establish proper settings at the point where
an IP network connects to another type of network. None of these
impairments are specific to VoIP, they all exist in some form in traditional
networks, both wireline and wireless. In traditional wireline networks, they
are under careful engineering control. Traditional wireless is less strictly
controlled, mainly because of delay and data loss on the RF channel, but
users are willing to trade off some voice quality for the convenience of
mobility. The impairments are introduced below, and are considered in
more detail in the following sections.

Speech codec
The speech codec chosen will have a strong influence on the final obtained
quality, both because of the baseline quality of the codec (that is, the
quality of the codec without other impairments) as well as the response of
the codec to other factors, such as presence of background noise, packet
loss, and transcoding with itself or another codec. The choice of codec is an
important determinant of the overall performance of VoIP. See Chapter 5
for additional discussion of the contribution of various telephony speech
codecs to VoIP service quality.

End-to-end delay
The end-to-end delay of a voice signal is the time taken for the sound to
enter the transmitter at one end of the call, be encoded into a digital signal,
travel through the network, and be regenerated by the receiver at the other

Copyright © 2004 Nortel Networks Essentials of Real-Time Networking


40 Section I: Real-Time Applications and Services

end. Delay is sometimes called latency. When delay is too long, it may
cause disruptions in conversation dynamics. As well, increasing delay
makes echo more noticeable.
Variation in delay, caused by differences in the time taken for packets to
cross the network, is called jitter. Jitter is a concern because the decoding
of the digital signal is a synchronous process and must proceed at the same
constant pace that was used during encoding. The data must be fed to the
decoder at a constant rate. Variation in packet arrival times is smoothed out
by the jitter buffer, which adds to the end-to-end (mouth-to-ear) delay.
Jitter is not considered a separate impairment because the effects of jitter in
the packet network are realized in the output either as delay or as distortion
from packet loss.

Packet loss
In VoIP, packets sometimes get lost. Packets may be dropped during their
journey across the network, or more commonly, they are late in arriving at
the destination and miss their turn to be played out. The missing
information degrades the voice quality, and a Packet Loss Concealment
(PLC) algorithm may be needed to smooth over the gaps in the signal.

Echo control
Because of the longer delay introduced by VoIP, echo control is a major
concern. A given level of echo sounds much worse when the delay is
longer. Echo control at the appropriate places in the connection will protect
the users at both ends. Echo control relies on the correct signal levels (see
Signal Level, below) as well as on echo cancellers and other techniques
that prevent or remove echo from the connection.

Signal level
The level or amplitude of the transmitted speech signal is determined by
amplitude gains and loss across the network. There are a number of
contributors to the final signal level, and most are defined in the loss plan
(sometimes called the loss/level plan) of the network. The loss plan for
TDM ensures that the output speech is heard at the proper level and
contributes to the control of echo. The loss plan for VoIP is reasonably
simple; the sensitivities of the sending device (say, an IP phone) and the
receiving device (say, a media gateway) are defined by standards, and there
is no gain or loss in the packet portion of the network.

Essentials of Real-Time Networking Copyright © 2004 Nortel Networks


Chapter 3 Voice Quality 41

Things are more complicated when a packet network is connected to


another network with a different loss plan. When the other network is a
traditional network with analog access, it may be necessary to adjust the
level of each signal path (the signal sent to the other network and the signal
coming from the other network) to account for the loss plan of that
network. The required loss for each path must be determined and set
accordingly. Errors in the loss settings can cause incorrect speech level or
audible echo at one or both ends of a connection.

Pulling it all together


Making choices for all these characteristics is called transmission planning.
Voice transmission planning is essential to supporting a usable
conversation connection, and is especially important in the migration from
TDM to converged networks. Most VoIP transmission planning issues
should have been addressed in the design of the network architecture and
equipment. There are a number of standards that define limits on these
factors, and conformance to these standards is an important selection
criterion for VoIP network equipment. However, a solid understanding of
the impairments and their control will protect against poor setup choices,
slips, or neglect that can defeat the transmission plan and result in
unacceptable voice quality.

Delay & jitter

Impairment from delay


Delay in the voice path destroys simultaneity at the two ends of the
conversation, disrupting turn-taking, and introducing subtle changes to
interpretation of meaning. With longer delays, simultaneous starts and
awkward silences occur. It becomes difficult to interrupt gracefully, and
attempts to do so may appear especially impolite because of the difference
in perceived timing at the two ends of the call. Delay can even affect one
party's perception of the attentiveness, honesty, or intelligence of the other,
without either party being aware that there is an objective cause. This can
cause significant difficulties in the business environment, where sensitive
discussion and negotiation may be involved, and when callers may not be
familiar with one another and must rely on their immediate impressions.
ITU Rec G.114 provides guidance on the range of delay for acceptable
service quality. G.114 suggests that delay be kept below about 250 ms to
avoid noticeable impairment; this will provide essentially transparent
interactivity for voice and multimedia applications where conversational
voice is a component. Delay may range up to 400 ms, but degradation may
be apparent. Figure 3-4 shows the impairment associated with increasing
one-way delay in terms of a metric called Transmission Rating (R).

Copyright © 2004 Nortel Networks Essentials of Real-Time Networking


42 Section I: Real-Time Applications and Services

100

90
no
d re
80 fr e g co n
om ra im mi m ot
R

de dat pa ld m
en
la ion irm to de
70 y en sig d
t f nif
ro ic
m an
de t
60 la
y

50
0 100 200 300 400 500
One-way Delay (mouth-to-ear) (ms)

Figure 3-4: Degradation from delay, shown as decrease in R with increase in


one-way, end-to-end delay. Degradation shows up as impairment to
conversation quality. The relation shown is that used by the ITU E-Model
(G.107). R shown has all other factors at ideal values. (After G.114.)
While individual contributions to delay (ten milliseconds here, five
milliseconds there) sound small, it is important to remember that once
delay is added in, it can not be removed. When making choices that involve
trade-offs, including delay, be sure that you are getting value for any delay
that is added by network setup choices. Good quality VoIP equipment will
minimize delay associated with the equipment design and will allow you to
select setup options that minimize additional delay. Mobile/cellular already
has substantial delay, so VoIP networks with cellular access require a
carefully engineered delay budget and close adherence to other engineering
guidelines.

Jitter and the jitter buffer


In IP networks, jitter is the variation in the time-of-arrival of consecutive
packets. Jitter results from a momentary condition where more packets are
vying to get on a particular link than the link can carry away.
Jitter must be removed from VoIP data so the packet payload can be
converted to a synchronous stream. A buffer, called the jitter buffer, is used
to hold the packets so that a constant rate bit stream can be output. Two
values3 are needed to describe the jitter buffer: first, the amount of data it
can hold and second, the waiting time imposed before the data is sent to the

Essentials of Real-Time Networking Copyright © 2004 Nortel Networks


Chapter 3 Voice Quality 43

decoder. The wait time is an important network tuning variable, since it


determines the arrival “deadline” for packets if their data is to be played out
in the bit stream. Longer waiting time will result in fewer late packets.
However, the wait time adds to the end-to-end delay. For best results, the
wait time should be only as long as it needs to be, and no longer.
The capacity of the jitter buffer depends on the amount of memory
allocated; this is set by the equipment vendor. The waiting time (jitter
buffer delay) may be fixed, provisionable, or adaptive. If the jitter buffer
waiting time was fixed by the equipment design, voice quality will be sub-
optimal unless the jitter on the network just happens to match the jitter
buffer setting (not very likely!). A provisionable jitter buffer allows the
selection of a setting that matches the amount of jitter generally
experienced on the network. If jitter occasionally exceeds the wait time,
packet loss will result. An adaptive buffer is best, since the adaptation will
keep delay low during periods of low jitter, but will increase the waiting
time when increased jitter is experienced.
The amount of jitter in the network is governed by:
Network loading, that is, the volume of packets being handled by
the network nodes (switches, routers)
The scheduling strategy and priorities of classes of service
The link speed
Voice and data packet sizes
Maximum queue size
Use of silence suppression and/or down-speeding
Contention for link bandwidth by multiple voice calls sharing the same
priority, causes some packets to sit in the queue until the processor can get
to them. Voice packets may also have to wait if a data packet has already
started transmission, even though they have a higher priority. Timing drift
can also add to jitter.
In networks carrying voice and data, jitter can be significantly reduced by
strict priority scheduling and proper load balancing. This will prevent
individual nodes from being over-subscribed. This is discussed further in
Appendix C. Controlling jitter means that jitter buffer wait time can be
lower, reducing end-to-end delay, and that the network will experience
fewer (or no) congestion events that may result in jitter buffer underflow
(speech gaps) or overflow (packet loss).

3. Jitter buffer for VoIP was described with a single value (delay in ms), and the recommended size
was twice the packet size. This formulation does not adequately specify either the buffer capacity
(which may need to be higher than two packets to prevent packet loss through buffer overflow fol-
lowing a congestion event) or the wait time (which should be much lower where jitter behavior
allows).

Copyright © 2004 Nortel Networks Essentials of Real-Time Networking


44 Section I: Real-Time Applications and Services

Sources of delay
The end-to-end delay is the total of all delays incurred in the voice path.
The principal sources of delay are summarized in Table 3-1. The four main
categories of delay are shown in the left-hand column.
Processing delay is an inevitable part of VoIP. Voice packet payloads
contain speech associated with a chunk of time, and the system must wait
for that speech to accumulate before it can be put in a packet. The packet
can not be loaded and sent until all the speech for that chunk is collected.
Where speech compression is used, the time needed for coding is added as
well. The speed of any processors (DSP, CPU) involved also contributes to
the final delay.
Serialization delay (the time needed to push a packet onto the wire) is a
small but predictable contribution. It is determined by the channel speed
(bits/sec) and the number of bits in the packet. On high speed links (> T1)
serialization delay becomes negligible compared to other sources of delay.
Queuing delay accumulates at network nodes (routers and switches)
across the network. Congestion can increase packet waiting times in
buffers. Variation in queuing and buffering delays in the network account
for most of the variation in packet transport times (that is, jitter). The jitter
buffer wait time is another instance of queuing delay.
Propagation delay is the time taken for the signal to travel through a cable
or fiber. In the conventional public network, propagation delay is the
largest contributor to end-to-end delay. For international calls, propagation
delay through terrestrial circuits can exceed 100 ms, so it remains an
important contributor to VoIP delay.
Propagation delay across a fixed distance is not a controllable parameter,
since it is determined by the speed of the signal through the transmission
medium (usually light through a fiber). However, it is possible to ensure
that packets take the most direct route through the network to minimize
queuing and propagation delay. Note that where the shortest route is
congested, queuing delays on that route may exceed the additional time
needed to take an alternate route.

Essentials of Real-Time Networking Copyright © 2004 Nortel Networks


Chapter 3 Voice Quality 45

Delay Type Delay Sources Mitigation techniques


Packetization/ select smaller packet size
depacketization
Processing
(if used)
Load balancing
QoS mechanisms and
traffic management
Optimized buffer size
DSP/CPU processing select faster DSP/CPU
Speech compression (if select waveform codec or
used) codec with smaller frame
Serialization Time to push a packet into increase link speed or
the wire/fiber reduce packet size
Queuing Delay in Routers Load balancing, QoS
mechanisms and traffic
management, Optimized
Queuing buffer size
Interleaving, scheduling/ System architecture and
polling traffic loading
Propagation Distance Noncontrollable

Table 3-1: Sources of delay in IP networks

Distortion
The remaining VoIP impairments to the conversation quality are different
types of distortion. These are summarized in Table 3-2. Codecs are
included in the table, but the details are discussed in
Chapter 5. Signal level is included with echo, since signal level through the
network plays an important part in the control of audible echo.

Copyright © 2004 Nortel Networks Essentials of Real-Time Networking


46 Section I: Real-Time Applications and Services

Distortion Type Distortion Sources Mitigation techniques


Coding distortion Appropriate codec selection
Codec
Transcoding/Tandeming Packet handoff between
networks
Transcoder-free operation
Missing data from late/lost packets
Load balancing and traffic
Packet Loss Concealment (PLC) management
artifacts
Packet loss
Lost data from statistical Tuning of silence
oversubscription with silence suppression parameters
suppression appropriate for number of
channels & link speed
Sound level Inappropriate loss settings or
terminal loudness ratings Delay & loss planning
Careful planning for echo
Echo Level Insufficient echo control; control
Artifacts from echo control
methods (suppressor, canceller)

Table 3-2: Sources of distortion in IP networks

Packet loss
Packet loss can be a significant source of distortion to VoIP. Lost packets
create gaps in the voice data, which can result in clicks, muting, and
artifacts associated with attempts at smoothing and repair. Non–real-time
data transmission is robust to packet loss because packets can be resent.
Delay-sensitive applications such as interactive voice can not wait for the
time it takes to resend.
Generally, there are two ways that packets can be “lost.” The first way is
that some packets never make it to the destination. They may be lost at
network nodes either through a buffer overflow at a congested network
node (insufficient memory to store packets waiting for forwarding), or
because a congested router deliberately discarded them to reduce packet
load. These packets are truly gone, and will never arrive at the destination.
Disabled devices or fiber cuts can also result in lost packets, until the
network responds by establishing an alternate route. Packets lost in these
ways will be spread across all the flows being handled at the time, so losses
on individual channels are likely to be small.

Essentials of Real-Time Networking Copyright © 2004 Nortel Networks


Chapter 3 Voice Quality 47

The second type of loss is that packets arrive too late. Queuing and other
network delays can cause variability in packet arrival time at the receiving
end. The jitter buffer smooths out the variability by holding packets for a
fixed wait time relative to the expected arrival time before they are sent to
the decoder. The jitter buffer waiting time determines the longest time that
a packet can take to arrive. Packets delayed longer than this lose their turn
in line, and are as good as lost since the voice playout can not wait for the
late data to show up. The total packet loss will be a combination of losses
from these two sources.
When significant congestion occurs at a transmission node, packets may be
held up long enough that the decoder uses up all the data waiting in the
jitter buffer, resulting in underflow. When the congestion clears, the several
packets that have been backed up are forwarded quickly, one after another,
with the possibility that there may be more packets arriving at the jitter
buffer than the memory can hold. When this happens, packets are lost
because there is no room to store them (jitter buffer overflow).

Concealing missing data


Packet loss creates gaps in the speech data, and these cause clicks or other
artifacts in the output speech. Packet Loss Concealment (PLC) consists of
processing to remove clicks and fill in the gaps. Techniques range from
very simple ones requiring little processing power to complex methods that
can restore the quality to almost the equivalent of the original speech. Even
the best techniques can not repair speech gaps of more than about 60–80
ms. Where a burst of loss exceeds this range, the PLC will mute the output.
This can result in temporal clipping and missing words.
A very basic PLC strategy is simply to smooth the edges of the gaps to
eliminate audible clicks. More sophisticated techniques create synthesized
replacement speech that preserve the spectral characteristics of the talker's
voice, and maintain a smooth transition between the estimated signal and
the surrounding original. The most advanced techniques begin with a fast-
response adaptation of the jitter buffer wait time, which helps reduce the
number of packets that are lost. This keeps late packets to a minimum,
reducing the number of gaps to be smoothed. Additional discussion of PLC
techniques can be found in Chapter 5.
IP Telephony requires backward compatibility regarding voice band data
(analog FAX and modem) calls that are routinely carried on TDM. These
services are particularly susceptible to packet loss, and PLC techniques are
ineffective at repairing voice band data signals. Packet loss rates between
10–6 and 10–3 show increasing impairment to voice band data performance.

Copyright © 2004 Nortel Networks Essentials of Real-Time Networking


48 Section I: Real-Time Applications and Services

FAX is generally affected during the FAX handshake, where a lost packet
may result in a failed call attempt. Modem calls can have similar setup
difficulties, and may be subject to data rate downspeeding or call drop if
packet loss is encountered during data transfer.

Controlling echo

Sources of echo and measuring echo impairment


The full-duplex characteristic of voice telephone connections leads to echo.
Because both channels are open at the same time, a signal sent over the call
in one direction can leak back onto the return path. (Only analog signals
produce echo; if digital signals happen to leak between send and receive,
they will add noise to the bit stream but are not heard as echo.) Because the
connection is full-duplex, echo is always present. When properly
controlled, however, it is below the user's threshold of perception.
Echo impairs a telephone conversation when it becomes loud enough to
irritate or distract the talker. Because echo is highly correlated with the
talker's speech, it is difficult to ignore. Longer delay echo (especially
louder echo) may interfere with one's ability to speak, because of disrupted
timing of auditory feedback to speech areas of the brain. When users at
both ends of the connection are talking at once, echo can mix with nonecho
speech and reduce intelligibility. Consequently, it is essential to control
echo to achieve good Quality of Experience on voice calls.
Figure 3-5 provides a conceptual illustration of two types of echo heard on
voice calls. The first type is called talker echo (Figure 3-5, upper panel),
because in this case the talker hears his own voice returning. The second
type is called listener echo. This occurs when talker echo is reflected again
onto the sending path, and the listener hears a second, delayed instance of
the talker's speech (Figure 3-5, lower panel). Controlling talker echo
usually controls listener echo also, so our discussion will focus on talker
echo.
Specific sources of echo are electrical reflections in analog access
equipment and acoustic pick-up (coupling) between the receiver and the
transmitter. (See Sidebar for details on echo in analog access circuits.) As
noted in the sidebar, analog access lines, which remain the most common
access type in traditional telephone networks, are the most frequent source
of echo. Currently, almost all residential telephone lines and many business
lines use analog loop to connect to the telephone company equipment.

Essentials of Real-Time Networking Copyright © 2004 Nortel Networks


Chapter 3 Voice Quality 49

Talker Listener

A B
Talker Echo A's voice
(Delay > 5 ms)
A's voice, delayed

Talker Listener

A B
Listener Echo A's voice
Talker Echo
A's voice, delayed

Figure 3-5: The type of echo is named after who hears the echo.
Because echo cannot be generated in the digital portion of the path, the
only sources of echo in all-digital networks such as ISDN, cellular, and
packet networks (for example, IP, ATM, frame relay) are audio and
acoustic coupling in the end device. These echoes are best controlled in the
end device itself, and TIA-810-A gives requirements for the maximum
allowable coupling in an end device (TCLw), which applies to any
handsets, headsets, and speakerphones used on wireline digital networks,
including IP.
The degree to which echo impairs a conversation depends on two main
factors: the level (loudness) of the echo and the time it takes the echo to
come back (delay). Other sound (such as the talker's own voice, the far
user's voice, room noise, circuit noise) may mask the echo and change the
threshold of audibility. Figure 3-6 shows the quantitative relationship
between the level of the echo (measured in dB TELR) and delay (expressed
as the mean one-way delay of the echo path), for a talker in a quiet location.
The echo delay refers to the delay between the talker and the reflection
point. Since the reflection point is usually a hybrid in the access circuit at
the other end of the call, the echo path delay is typically the same as the
end-to-end delay.
Talker Echo Loudness Rating (TELR), which is a measure of how much
attenuation is applied to an echo along the echo path, weighted for the
perceptibility of the frequency components of the echo. TELR accounts for
all gains and losses in the echo path (including those supplied by an echo
canceller or echo suppressor). The computation of TELR also takes into
account the sensitivity of human hearing to the sound frequencies making
up the echo. TELR measures the loss (attenuation) of an echo rather than
its absolute level, and is thus independent of the level of the talker's voice.
This means that a single TELR requirement applies to all talker levels.

Copyright © 2004 Nortel Networks Essentials of Real-Time Networking


50 Section I: Real-Time Applications and Services

The color code in Figure 3-6 reflects the audibility and annoyance of echo.
There is no audible echo for TELR/delay combinations falling in the green
region. The contour between green and yellow shows the average threshold
of echo audibility, where echo is “just noticeable.” A given level of echo is
more easily detected at longer delays. Therefore, the “just noticeable” echo
is progressively quieter with increasing delay, that is, the TELR gets higher.
The yellow region represents combinations of TELR/delay for which the
echo is noticeable but is not loud enough to be annoying. TELR/delay
combinations where echo is loud enough to be irritating or annoying fall in
the red region. Limits on TELR are defined in terms of subjective
acceptability. Maintaining adequate TELR for the maximum expected
delay will ensure acceptable echo performance.

Figure 3-6: This graph shows the limit of audibility (green/yellow contour)
and the limit of acceptability (yellow/red contour) of echo as a function of
delay. The x-axis gives the one-way delay on the echo path, while the y-axis
gives the level of echo measured in dB TELR. (Higher TELR denotes quieter
echo.) Also shown are the positions of common types of telephone
connections, with an indication of the improvement associated with adding
echo cancellation to the call. These contours are taken from ITU Rec. G.131,
and are based on subjective ratings of echo in telephone conversations.

Essentials of Real-Time Networking Copyright © 2004 Nortel Networks


Chapter 3 Voice Quality 51

Sidebar: What causes echo in telephone connections?


Most telephone sets in the existing networks are analog. Analog
telephones are connected to the telephony company local switch through
a two-wire line that terminates at a hybrid. The hybrid is a dual
transformer in a bridge arrangement that splits the send and receive
signal paths so that they can ride on separate channels in the network
core. The separated signals are digitized and multiplexed in the core.
The figure below shows the circuit between the telephone and the hybrid
at the switch. The analog access line is said to be “two-wire,” and the
send and receive signals share the same pair of wires. The network core,
on the other hand, is “four-wire,” which means that the send and receive
signals travel through separate channels. A telephone handset is also a
four-wire device, and there is a similar hybrid circuit in the telephone
itself that breaks out the circuit into a path from the mouthpiece and a
path to the earpiece. Two-wire access lines were originally used because
of cost considerations.
When the two-wire signal reaches the hybrid, the signal is not
completely transferred from one side to the other. Some energy gets
reflected. The signal reflected on the four-wire side of the hybrid goes
back to the user at the other end of the connection and is heard as echo.
Of course, only a small amount of the total energy is reflected. This
means that the echo level is much lower than the original signal level.
Measurements of the echo level are made in terms of the attenuation or
loss relative to the level of the original speech signal in dB.
Hybrid echo is the main source of echo in the traditional telephone
network. Other common sources are pickup of received signal by the
transmitter (acoustic echo) and inductive coupling in the handset/headset
cord. The figure indicates where all these sources of echo occur in a
typical analog telephone circuit.

Receiver
Four-Wire Four-Wire
Analog Digital
inductive
coupling

A/D Send Path


Transmitter

D/A Receive Path

hybrid
inductive coupling echo
(in handset cord)

Copyright © 2004 Nortel Networks Essentials of Real-Time Networking


52 Section I: Real-Time Applications and Services

Echo control for VoIP


Given that most echo is generated in the traditional network, why is echo
such a big concern for VoIP? There are two reasons. First, IP networks have
long delay compared with traditional networks. As discussed above, the
longer the delay, the more audible a given level of echo will be. For calls
within the IP network, even low levels of acoustic coupling in the end
device can generate annoying levels of echo. Second, the delays in the IP
network are not compatible with the operational assumptions of the
traditional network. In traditional networks, echo control is applied to
reduce any echo below threshold of audibility, which is determined from
the expected delays (see Figure 3-3). For calls made between an IP phone
and a traditional phone, the echo control applied in the traditional network
may not be sufficient. Additional attenuation is needed to compensate, so
that acceptable quality can be assured.
Echo control should be in place at the following points in an IP network:
In all IP end devices, such as IP telephones or access gateways
(such as a DSLAM), to remove echo from acoustic coupling and
electronic cross talk. This echo control may be any of, or any
combination of the following:
Separation assured in the design of the device/component
Tuning of gains/losses in the voice path (loss plan, see section
above on signal levels)
Echo suppression (voice switch, such as is often used in
speakerphones)
Echo cancellation
At the interface with any traditional network using analog access
lines to remove echo coming back from that network (mostly
hybrid echo) using:
Echo cancellation, combined with the proper loss plan. If the
loss plans of the two networks are not properly matched, it may
affect the performance of the echo canceller.
Echo cancellation is not needed at the interface between an IP network and
another all-digital network (that is, where analog access is not in use), such
as digital cellular (for example, GSM, CDMA) or ISDN, provided the end
devices in both networks meet the echo control requirements of their
respective guiding standards. Of course, echo control will be needed if the
connection spans the adjacent digital network and reaches into a third
network where analog access is used, and it will typically be provided at
the interface to that third network.

Essentials of Real-Time Networking Copyright © 2004 Nortel Networks


Chapter 3 Voice Quality 53

Quality metrics for voice


Each of the characteristics discussed above (and other, conventional voice
impairments, such as noise and harmonic distortion) can be measured
individually. However, it is useful to have an overall indicator of voice
quality. Various metrics have been devised to quantify the overall perceived
voice quality of a component or a system. Three common metrics are
discussed below: the subjective measure called Mean Opinion Score
(MOS), an objective MOS estimator called PESQ (Perceptual Evaluation
of Speech Quality, pronounced “pesk”), and a computed metric called
Transmission Rating (R), which is calculated from objective measurements
of fifteen contributing parameters using an ITU standard tool called the E-
Model. Since most quality metrics are based on MOS in some way
(sometimes in name only), we'll start by looking at types of MOS before
looking at the details of the individual metrics.
Quality metrics are evaluated for individual calls. There is no single value
to describe a network: networks carry many calls, both simple and
complex, and the quality is determined by the access types, the transport
technology, the number of nodes the call passes through, the distance,
packet transport links speeds, and many other factors that differ from one
connection to another. To compare networks, specific connections
(reference connections) representing equivalent calling conditions are
defined that can be measured and compared.

Types of MOS
Mean Opinion Score began life as a subjective measure. Currently, it is
more often used to refer to one or another objective approximation of
subjective MOS. Although all “MOS” metrics are intended to quantify
QoE performance and they all look very similar (values between one and
five with one or two decimal places), the various metrics are not directly
comparable to one another. This can result in a fair amount of confusion,
since the particular metric used is almost never reported when “MOS”
values are cited. Appendix C provides more details on the distinction
between different types of MOS, and how to distinguish them. There are
fundamental differences between individual metrics, and numerical values
are not necessarily directly comparable just because they are both called
MOS.

Subjective MOS
Subjective MOS is a direct measure of user perception of voice quality (or
some other quality of interest), and is thus a direct measure of QoE.
Subjective MOS is the mean (average) of ratings assigned by subjects to a
specific test case using methods described in ITU-T P.800 and P.830.
Subjective MOS can be obtained from listening tests (where people rate the
quality of recorded samples) or conversation tests (where people rate the

Copyright © 2004 Nortel Networks Essentials of Real-Time Networking


54 Section I: Real-Time Applications and Services

quality of experimental connections). Quality ratings are judged against a


five-point scale: Excellent (five), Good (four), Fair (three), Poor (two), and
Bad (one). MOS is computed by averaging all the ratings given to each test
case, and it falls somewhere between one and five.4 Higher MOS reflects
better perceived quality.
Mean Opinion Scores (MOS) are not a measure of acceptability. While
perceived quality contributes to acceptability, so do many other factors
such as cost and availability of alternative service.
Subjective MOS is strongly affected by the context of the experiment.5
There is no “correct” subjective MOS for any test case, process, or
connection. This is extremely inconvenient, since it means that it is not
possible to specify performance or verify conformance to design
specifications based on subjective MOS, but it is very important to any
analysis based on subjective MOS evaluation.

PESQ (P.862)
Subjective studies take significant time and effort to carry out. MOS
estimators such as PESQ6 (Perceptual Evaluation of Speech Quality) can
provide a quick, repeatable estimate of distortion in the signal. However,
the score does not reflect the conversational voice quality, since listening
level, delay, and echo are excluded from the computation. Separate
measures of these characteristics must be considered along with a PESQ
score to appreciate the overall performance of a channel.
PESQ is an intrusive test, which means that the tester must commandeer a
channel and put a test signal through it. To perform a test, one or more
speech samples are put through a device or channel, and the output (test
signal) is compared to the input (reference signal). The more similar the
two waveforms, the less distortion there is, and the better the assigned
score. The algorithm does some preprocessing to equalize the levels, time

4. MOS are sometimes quoted to many decimal places. The appropriate number of decimal places de-
pends on the reliability, which in turn is determined by the number of independent ratings that con-
tribute to the mean. Usually, one decimal place is appropriate. Two places may be justified if a
large numbers of ratings are averaged (more than about fifty).
5. “Context” refers to things like the order in which the test cases are presented in the experiment, the
range of quality between the worst and best test cases used in the experiment, and whether the sub-
jects are asked to do a task before making a rating. If an experiment is repeated exactly (with dif-
ferent subjects), similar scores will be obtained within a known margin of error. This is not the
case from one experiment to another. Consistency from test to test is found in the pattern of scores,
not in the absolute value of the scores. For example, the MOS-LQS for G.711 may be 4.1 in one
study, 3.9 in another, and 4.3 in a third, but whatever the value obtained, we expect to obtain a
higher score for G.711 than G.729, and approximately equal scores for G.729 and G.726 (32 kb/s).
6. Many objective quality algorithms have been defined. Aside from PESQ, the best-known are PSQM
(Perceptual Speech Quality Measure), standardized as P.861, and PAMS (Perceptual Analysis
Measurement System), a proprietary method developed by BT. As the current standard, PESQ is
preferred to the older measures.

Essentials of Real-Time Networking Copyright © 2004 Nortel Networks


Chapter 3 Voice Quality 55

align the signals, and remove any time slips (where some time has been
inserted or deleted). PESQ then applies perceptual and cognitive models
that represent an average listener's auditory and judgment processes. A
diagram of the process is shown in Figure 3-7.
The raw PESQ score is usually converted to a MOS estimate using one of
several available conversion rules, for example, PESQ-LQ7.

Original

Perceptual
Difference Model
System under
test ?

Output
Cognitive
Model

The system under Quality


test may be a codec, Estimate
a network element,
or a network

Figure 3-7:Block diagram showing the operation of PESQ and similar


objective speech quality algorithms. A known signal is input to the system
under test, and the algorithm analyzes the difference between them by
applying a model of human auditory perception followed by a model of
human judgment of preference to arrive at a quality estimate.

Transmission rating (R)


Transmission Rating or R is an objective metric indicating the overall
quality of narrow-band conversational voice. R is the main output variable
of the ITU E-Model (Rec. G.107).8 Fifteen parameters are used to
compute R, some of which are listening level, noise, distortion, the codecs
used, packet loss, delay, and echo. Because R accounts for all the factors

7. A conversion defined by Psytechnics, a company holding intellectual property rights for PESQ.
There is now an ITU-T standard conversion defined in P.862.1.
8. The E-Model can also be used to compute a “MOS” estimate. Note that MOS computed with the E-
Model is not comparable to MOS computed with PESQ.

Copyright © 2004 Nortel Networks Essentials of Real-Time Networking


56 Section I: Real-Time Applications and Services

that contribute to the conversational voice quality, it is the only value


needed to completely describe the quality. R can be determined for voice
calls on any technology platform or combination of platforms (analog,
digital, TDM, ATM, or IP) and with any type of access (analog loop, digital
loop, wireless, or 802.11).
The E-Model computes R for individual connections. By defining many
connections of interest (called hypothetical reference connections or
HRX), we can determine R for many paths through the network.
Generating R for a well-chosen set of HRX gives a good indication of the
performance of a network. By comparing similar call scenarios for
different networks, we can get a picture of where a network does well and
where it may disappoint users.
The input values used to compute R can be measured values or expected
values. This means that the E-Model can be used to predict the quality of
equipment and networks that are still in the planning stages. It is also
helpful to compute R for a benchmark network, particularly where the
benchmark provides a “known user experience.” The PSTN (Public
Switched Telephone Network) is a common choice for a benchmark. Other
sensible choices might be (1) an existing network that is being replaced by
the new network, or (2) an unrelated network that delivers a known user
experience that is chosen to serve as a quality target, for instance, ordinary
wireless cellular performance.
R can be tabulated to facilitate comparisons. Often, however, we find it
useful to make comparisons graphically. E-Model output can be used to
generate a graph showing how R changes as delay increases. The R x delay
relation not only indicates how a particular scenario will respond to
increasing distance between the endpoints, but can also suggest the benefit
associated with changes to the end-to-end delay. The range for delay is
0–500 ms, which goes slightly beyond the limits suggested in G.114.
Detailed discussion of the charts are given in the figure captions for Figure
3-8, Figure 3-9, Figure 3-10, and Figure 3-11. The accompanying sidebar
shows how the quality of the PSTN TDM network may be represented
using this type of graph.
Nortel engineers have incorporated the E-Model into a powerful predictive
tool that allows call-by-call comparisons of hundreds of hypothetical
reference connections, allowing the evaluation and comparison of entire
networks. The results of studies of network performance have generated
network design guidelines for voice telephony services and real-time data
applications. Additional details about Nortel’s modeling process are
provided in Chapter 20 and Appendix C.

Essentials of Real-Time Networking Copyright © 2004 Nortel Networks


Chapter 3 Voice Quality 57

Typical Delay Impairment


International Call (IP-to-DCME-to-IP)

100
PSTN Reference
R=78.7, Delay=125 ms

90 (G.711 @ 20 ms)
G.711 > DCME > G.711
R=75.8, Delay=191 ms

80 (G.711 @ 40 ms)
G.711 > DCME > G.711
R R=71.0, Delay=231 ms
70
14057 km

60

50
0 100 200 300 400 500

One-Way Delay (ms)


Propagation Delay +
Switching/Equipment Delay DCME Delay Gateway Processing Delay Gateway Packetization Delay

Figure 3-8: R vs. delay for a particular class of terrestrial international calls.
G.711 is used in the national links with a Digital Circuit Multiplexing
Equipment (DCME), which generally uses G.726 speech coding at 32 kb/s in
the undersea cable. Specific points on the curve show R for the benchmark
(PSTN reference, TDM end-to-end), and for each of two calls using IP in the
national portions of the call (20-ms and 40-ms packets, respectively). Bars
under the curve indicate the sources for the cumulative delay associated
with each call. Since only one coding scenario is considered (G.711> G.726
> G.711), the model generates only one contour. The model assumes best
practices for any factors not specified.

Copyright © 2004 Nortel Networks Essentials of Real-Time Networking


58 Section I: Real-Time Applications and Services

Speech Coding Distortion


100
G.711, Ie = 0

G.726, Ie = 7
90
G.729, Ie = 10

80

R
70

60

50
0 100 200 300 400 500

Listening @ 0ms
One-Way Delay (ms)
Minimum delay for 20-ms payload

Figure 3-9: R vs. delay for G.711, G.726 (32 kb/s), and G.729 (8 kb/s). The
model assumes best practices for any factors not specified. Note that
although R is plotted for all delays, there will be a non–zero minimum delay
(yellow points) for interactive calls. For these points, propagation delay is
zero. This is the lowest delay for the modeled call scenario (the minimum
delay will depend on the codec as well as the packetization selected). In this
chart, we have assumed similar equipment delays beyond those associated
with the codec; however, in actual network situations, these can change as
well. The blue points represent the quality differences heard when listening
to recorded speech samples.

Essentials of Real-Time Networking Copyright © 2004 Nortel Networks


Chapter 3 Voice Quality 59

Echo Distortion, Digital Set


(with Ideal Loss Plan)

100
No Audible Echo
TELR = 65 dB
90
TELR = 60 dB

80 TELR = 55 dB

R
TELR = 50 dB
70
TELR = 45 dB

60

50
0 100 200 300 400 500
One-Way Delay (ms)

Figure 3-10: R vs. delay for various levels of echo. Note how R drops off
more quickly with smaller values of TELR. The increasing rate of
degradation for louder echo reflects the interaction of delay and echo
discussed above. The model assumes best practices for any factors not
specified.

Copyright © 2004 Nortel Networks Essentials of Real-Time Networking


60 Section I: Real-Time Applications and Services

Combining Factors
Loss Plan, Speech Compression, & Packet Loss
R
ISDN (all-digital)
Digital Loss Plan
100
POTS > POTS
Analog Loss Plan
POTS > G.726 > POTS
90
Analog Loss Plan + POTS > G.729 > POTS
Waveform Compression
80
Analog Loss Plan POTS > G.729 >POTS (3% PL)
+ Speech Compression

Analog Loss Plan


70
+ Speech Compression
+ 3% Packet Loss
60

50
0 100 200 300 400 500
One-Way Delay (ms)

Figure 3-11: R vs. delay for multiple distortion factors, showing the effect of
successive addition of non–ideal factors: loss plan, compression coding,
and packet loss. Since delay does not exacerbate any of these factors, each
contour has the same relative shape as the one above. The model assumes
best practices for any factors not specified.

Essentials of Real-Time Networking Copyright © 2004 Nortel Networks


Chapter 3 Voice Quality 61

:.

Sidebar: PSTN quality


Just for fun, let's use the E-Model to see how the TDM PSTN stacks up
against an “ideal” network.
The black curve shows how R changes for an ideal all-digital wireline
connection using G.711, with perfect echo control. The green curve
shows R for the PSTN, with wireline analog access and initially using
only the fixed loss plan (that is, no echo cancellers) described in
(T1.508-2003) for control of echo on short delay connections (up to
approximately 20 ms). This modeling was done using values taken from
standards targets. Actual connections will vary somewhat on all the
contributing parameters, with the result that many real calls through the
PSTN will be similar to the results shown here, while some will be better
and others will be worse.
In the PSTN, the largest contributor to the end-to-end delay is the
propagation delay. Equipment delay through a single digital central
office or PBX is 1–2 ms. As it turns out, almost all the differences
between the ideal network and the PSTN are due to echo control
measures.
For short delays (local calls and what we call “short-haul” toll), the
PSTN uses the fixed loss plan to keep echo below the threshold of
impairment. The fixed loss plan adds (amplitude) loss in each direction
to attenuate echo. As delay increases from 0 to about 22–25 ms, echo
impairment increases until it is no longer tolerable. At this point, the
network shifts to echo cancellation to control echo, and R increases
again to the high 80s and degrades gracefully with a slope slightly
shallower than the “ideal” ISDN curve.

No ECAN used here; TELR = 31 dB


Listening level not
ideal: Trade-off ECAN for these calls; TELR = 69 dB
ideal listening level
for echo control

Copyright © 2004 Nortel Networks Essentials of Real-Time Networking


62 Section I: Real-Time Applications and Services

Sidebar: PSTN quality (Continued)


These results highlight some details about the PSTN performance as
well as about the use of R as an indicator of quality. First, we can see
that R for the PSTN ranges between 88 and 70, for calls with typical
delays. Delays are low for local calls (2–3 ms), and may increase to
about 150–160 ms for the longest directly routed terrestrial (nonsatellite)
international calls. We know from experience that any calls falling in
this range will be completely acceptable to the vast majority of
telephone users. Calls over satellite links have additional delay from the
propagation of the signal over the distance to communications satellites
and back to earth, as well as propagation to and from earth stations at
each end, and some processing delays. For geostationary satellites, this
additional delay is approximately 300 ms.
Second, PSTN quality can not be characterized by any single value of R
(or any other scalar metric). It is not sufficient to use an absolute
performance cutoff for R such as 75. For many local calls, the PSTN is
significantly better than 75. On the other hand, many common calls such
as the short-haul toll calls with more than 10 ms of delay as well as
geostationary satellite calls would fail a criterion of 75.
Instead, if we want to compare the quality of an IP network to the quality
of the TDM PSTN, we need to look at multiple calls, and compare
similar calls made over the PSTN and over the IP network.

Essentials of Real-Time Networking Copyright © 2004 Nortel Networks


Chapter 3 Voice Quality 63

What you should have learned


The reader should now be familiar with the potential impairments
associated with voice services in general and VoIP in particular.
Conversational voice services (“voice”) are based on a pair of channels,
one in each direction, operating in full-duplex mode. Each channel carries
the digital equivalent of a narrow-band analog signal (300–3400 Hz). Good
conversational quality of a voice call depends on the listening level, low
distortion, low channel delay, and freedom from echo.
The path taken by the voice signal through the IP network involves
digitization of the analog voice signal, conversion to a synchronous (non-
packet) digital signal, packetization and routing, and finally ending up in a
jitter buffer, where the signal is unpacked and sent to a decoder, which
restores the synchronous signal to be played out to a local telephone
receiver or sent over a TDM channel. Echo cancellers and other DSP
functions sit on the synchronous side of the boundary. The VoIP payload is
not read or modified in the packet portion of the network.
The four main factors affecting VoIP conversation quality are the speech
codec used, the end-to-end delay, the rate of packet loss and application of
PLC, and how well echo is controlled. Jitter (variation in packet time-of-
arrival) contributes to delay and/or packet loss impairments, depending on
how it is handled, but does not directly affect the voice performance. The
listening level of the received signal is also important but not generally
affected by the packet transport.
End-to-end delay can affect the conversation dynamics (turn-taking and
interruptability) and introduce subtle changes in the listener's interpretation
of the talker's meaning. Jitter turns into either lost packets (when the jitter
buffer is too short) or longer delay (when the jitter buffer is long enough to
prevent packet loss). An adaptive jitter buffer can ensure that the jitter
buffer wait time is always long enough to prevent most losses, but never too
long so that unnecessary delay is added.
Packet loss introduces distortion. Short losses (< 40–60 ms) can be repaired
by Packet Loss Concealment (PLC), but longer losses result in missing
speech.
Echo is always present in full-duplex calls, and must be controlled by good
design in the network and the end devices, proper loss plan, and
appropriate us of echo control devices. The annoyance associated with
echo depends on both the level and the delay of the echo. A network echo
canceller is required at any interface between an IP network and the
wireline PSTN. Echo control is also needed in IP telephones and PC
clients, where coupling between the receiver and the transmitter may occur.

Copyright © 2004 Nortel Networks Essentials of Real-Time Networking


64 Section I: Real-Time Applications and Services

Some commonly used quality metrics were discussed: MOS, PESQ, and
the E-Model R. Subjective MOS is a directly measure of user perception of
quality, and may be carried out in lab or field studies. PESQ is a method of
estimating subjective MOS with an objective algorithm, and has been
standardized as P.862. The E-Model is a standard network planning tool
(ITU G.107) that generates another overall quality metric, R. R combines
fifteen objective measures, including the listening level, the delay, the
encoding distortion. While both PESQ and R can be translated to a MOS
value, such MOS should be considered for indication only. No comparison
of MOS derived from these different sources.

References
ITU-T Recommendation G.131, Talker Echo and its control, Geneva:
International Telecommunication Union Telecommunication
Standardization Sector (ITU-T), 1996.
ITU-T Recommendation G.711, Pulse code modulation (PCM) of voice
frequencies, Geneva: ITU-T, 1988.
ITU-T Recommendation G.726, 40, 32, 24, 16 kbit/s Adaptive Differential
Pulse Code Modulation (ADPCM), (includes Annex A: Extensions of
Recommendation G.726 for Use with Uniform-Quantized Input and
Output-General Aspects of Digital Transmission Systems), Geneva: ITU-T,
1990.
ITU-T Recommendation G.729, Coding of Speech at 8 kbit/s Using
Conjugate-Structure Algebraic-Code-Excited Linear-Prediction (CS-
ACELP), Rec. G.729, (includes Annex A, Reduced Complexity 8 kbit/s
CS-ACELP Speech Codec, and Annex B, Silence Compression Scheme for
G.729 Optimized for Terminals Conforming to Recommendation V.70),
Geneva: ITU-T, 1996.
ITU-T Recommendation P.800, Methods for subjective determination of
transmission quality, Geneva: ITU-T, 1996.
ITU-T Recommendation P.800.1, Mean Opinion Score (MOS) terminology,
Geneva: ITU-T, 2003.
ITU-T Recommendation P.830, Subjective performance assessment of
telephone-band and wideband digital codecs, Geneva: ITU-T, 1996.
ITU-T Recommendation P.861, Objective quality measurement of
telephone-band (300-3400 Hz) speech codecs, Geneva: ITU-T, 1998
(withdrawn).
ITU-T Recommendation P.862, Perceptual evaluation of speech quality
(PESQ): An objective method for end-to-end speech quality assessment of
narrowband telephone networks and speech codecs, Geneva: ITU-T, 2001.

Essentials of Real-Time Networking Copyright © 2004 Nortel Networks


Chapter 3 Voice Quality 65

ITU-T Recommendation P.862.1, Mapping function for transforming P.862


raw result scores to MOS-LQO, Geneva: ITU-T, 2003.
ITU-T Recommendation PESQ-LQ, A.W. Rix, Comparison between
subjective listening quality and P.862 PESQ score, Psytechnics, September
2003.
T1.508-2003, Loss plan for digital networks, Washington, DC: American
National Standards Institute, Committee T1, 2003.
TIA-810-A9, Transmission requirements for narrowband Voice over IP and
Voice over PCM digital wireline telephones.
Further information on VoIP service quality and performance
requirements:
G.113, Transmission Impairments due to Speech Processing, (includes
Appendix I, Provisional planning values for the equipment impairment
factor Ie and packet-loss robustness factor Bpl.) Geneva: ITU-T, 2001.
TIA TSB-116.1, Voice Quality Recommendations for IP Telephony,
Telecommunications Industry Association, 2001.
Y.1541, Network performance objectives for IP-based services, Geneva:
ITU-T, 2002.

9. TIA-810-A, TSB-116, and other VoIP standards are available from TIA for free at the following site:
http://www.tiaonline.org/standards/sfg/committee.cfm?comm=tr%2D41&name=User%20Premises
%20Telecommunications%20Requirements. Click on the first link (TR-41 VoIP Standards). Answer
the questions. This takes you to the page where you can download these standards for free.

Copyright © 2004 Nortel Networks Essentials of Real-Time Networking


66 Section I: Real-Time Applications and Services

Essentials of Real-Time Networking Copyright © 2004 Nortel Networks


67

Chapter 4
Video Quality
Peter Chapman

Media Session Related

Session Gateway
Media Related Control Control

H.248 / MGCP
View From
Audio

Video
Voice

Real-Time

Application
Control

/ NCS
RTCP
RTSP

H.323
Perspective

codec SIP To
Ntwk

RTP

UDP / TCP / SCTP


IP

QoS
MPLS Packet
AAL1/2 AAL5 Cable

Resiliency
ATM FR Ethernet DOCSIS

xDSL Cellular Fiber Copper WLAN HFC

SONET / TDM

Figure 4-1: Transport path diagram

Concepts covered
An overview of analog video systems
Impairments to analog systems
MPEG coding principles
A brief description of each of the common effects of coding
principles

Video
Video is the means of transmitting images by sending the instantaneous
brightness, color hue and color intensity of a picture element. The picture

Copyright © 2004 Nortel Networks Essentials of Real-Time Networking


68 Section I: Real-Time Applications and Services

elements are sent sequentially in rapid succession so that the image is


perceived as being instantaneously updated in real time.
The rate at which the complete image is refreshed is known as the frame
rate, and it is usually either about thirty fps (frames per second) for North
American television systems or 25 fps for systems used in most other parts
of the world. Motion pictures (movies) are usually created at 24 fps.
Analog video systems were designed for the technology of the day,
assuming the receiver or display device could not store the image. They
depend on the persistence in illumination on the screen (lag) to sustain the
image between updates. For this reason, a technique known as “interlace”
was developed that rendered the sequential writing of the image on the
screen imperceptible to the human eye.

Video Impairments
Video impairments come from many sources, such as capturing, digitizing,
compressing, and distributing videos. Nortel is concerned primarily with
problems resulting from distribution or transmission. The analog standards
are designed to ensure that a picture is created even when there is
corruption of a broadcast signal. Due to the brain and eye correlating
information from line to line (spatial redundancy), a picture can be
perceived in extreme signal degradation, provided the structure of the
picture is maintained.

Digital video impairments


Digital video systems produce very different impairments than analog
video systems. In digital systems, the frame format is generated at the
receiving end, therefore, loss of frame synchronization does not manifest
itself as frame roll as it does in analog systems. However, the digitizing
process introduces other artifacts into the received video image.
Compression is a technique used to trade off quality and error tolerance
against bit rate. Compression invariably decreases picture quality, but if
done well, many effects and artifacts that it introduces are not noticeable.
Current compression, which takes advantage of spatial redundancy in an
image, can reduce data rates by a factor of fifty or more. The current
standard in widespread use for TV and DVD is known as MPEG-2,
indicating the version of the standard created by Motion Picture Expert
Group (MPEG), a working group of ISO/IEC. MPEG does not define the
compression technology, intentionally so, in order to allow innovations.
MPEG defines only the format of the bit stream. However, MPEG suggests
methods for decoding the bit stream.
Disadvantages that result from the use of compression are as follows:
increased sensitivity to transmission errors

Essentials of Real-Time Networking Copyright © 2004 Nortel Networks


Chapter 4 Video Quality 69

delay in transmission
artifacts introduced into the image
loss of some information

Causes of video signal impairments


The causes of video signal impairments to analog signals are as follows:
luminance noise
chrominance noise
loss of synchronization signal
co-channel interference
electrical RF interference
color burst phase and frequency errors

Luminance noise
Random noise manifests itself as random speckles on the picture. To
minimize the effect of noise, some versions of broadcast TV use negative
picture modulation, which means that the peak output of the transmitted RF
signal corresponds to a black signal and minimum amplitude corresponds
to a white signal. Negative picture modulation is effective against
interfering noise, such as that generated by motor vehicle ignition systems,
which it was designed to counter. It generates random black spots in the
picture, which is far less intrusive than white spots. (Modern vehicle
electrical systems have largely eliminated the source of this problem.)

Chrominance noise
The effect of noise on the chrominance channel is to change the hue of the
chrominance signal with no effect on the luminance. Due to the
uncorrelated nature of noise, this effect reduces the saturation and intensity
of the reproduced color, producing a washed-out effect. High-frequency
chrominance noise manifests itself as dots or specks of varying color, but
these are not well defined, due to the lower chrominance bandwidth. They
do not have well defined edges because there is no change in luminance at
the edges.

Loss of synchronization signals


Most broadcast TV receivers can tolerate occasional losses or corruption of
the synchronization pulses. The receiver generates lines at a nominal rate,
equal to that needed to reproduce the picture. Synchronization pulses adjust
this rate to equal that of the received signal. If a synchronization pulse is
not received at the expected time, the receiver generates it at the current
line period after the previous synchronization pulse. Receivers adjust the

Copyright © 2004 Nortel Networks Essentials of Real-Time Networking


70 Section I: Real-Time Applications and Services

rate at which they create lines to be exactly the same as the received line
rate. They base this line rate on a composite rate from a number of received
lines so that a single corrupted or missing synchronization pulse does not
cause loss of synchronization. Sustained loss of synchronization pulses
causes the receiver to revert to its default line generation rate, which is the
nominal rate specified in the standards.

Co-channel interference
Co-channel interference is possibly the most annoying of all the broadcast
video impairments. It manifests itself in one of two ways. If the source of
the interfering signal is the same as the primary signal, but the propagation
path is different, it appears as a second signal but slightly delayed in time.
On the screen, it appears as a second signal displaced horizontally from the
first. The effect of the delayed field/frame synchronization pulse appears as
a darker column on the left side of the screen where the blacker-than-black
synchronization pulse is being added to the primary signal. If the source of
the reflected signal is moving, as if caused by a reflection from an aircraft,
then this second image position changes in position on the screen due to the
varying path length. If the co-channel interference is from a separate
transmitter, it appears as a second different image on the screen. Because
the two sources are not perfectly synchronized, the two images move
spatially in relation to each other.

Electrical RF interference
RF interference that emanates from a source other than a video manifests
itself as diagonal lines or patterns superimposed on the picture. This type of
interference can be very annoying.

Color burst phase and frequency errors


Small errors in color phase or frequency cause significant errors in hue,
particularly in the NTSC system. The PAL system was developed to
overcome this problem by reversing the subcarrier phase on alternate lines.
This is less of a problem in cable distribution systems than on broadcast
systems. Phase errors caused by signal reflections do not occur in cable
systems.

Digital video

Compression
Compression invariably uses one of the MPEG standards. MPEG does not
define the compression codec, instead, it defines the format of the
compressed information and suggests how it is reconverted to video. This
method allows for the development of compression tools and techniques
based on experience. There is significant development in this area. Early

Essentials of Real-Time Networking Copyright © 2004 Nortel Networks


Chapter 4 Video Quality 71

MPEG decoders required much operator interaction, primarily to choose


the type of encoding tool based upon the program material. For example, a
slow moving scene requires good background detail and definition, but it
needs infrequent updating. A fast moving scene can sacrifice background
definition because the eye and brain is unable to follow, but it needs
frequent updating. Similarly, certain sporting events require a very small
part of the scene, say a ball, to be given high priority in being tracked
accurately. At the same time, the background information can update more
slowly.
Video information is sent as discrete pictures, and pictures are clearly
delineated as such. Frame and field accurate timing is regenerated by the
device decoding the frames. Strict field timing information is not sent as
part of the signal stream. Instead, information relating to the sequence of
frames or fields is sent with information that allows the frames to be
reassembled in sequence and matched to the strictly timed audio signal.

Encoding issues
MPEG provides a bit stream definition. To create this bit stream, encoders
use a toolbox of different techniques, which are proprietary and dependent
on the encoder used. Details of these techniques are beyond the scope of
this document. However, the video quality is dependent on the encoding
scheme and the tools used. There are wide variations in the perceived
quality of encoded video.

Decoding issues

Visual complexity, spatial and temporal


The MPEG compression system has difficulty dealing with certain types of
rapid motion information. Sports events provide a particular challenge
because they contain a large expanse of spectators, whose image is rapidly
moving across the screen due to panning. In early implementations, some
algorithms disregarded small rapidly moving objects if they occupied only
a small part of a relatively static picture. This proved to be a problem for
sports events such as tennis, baseball, and cricket where it was necessary to
follow a ball.
MPEG sends video information by coding information from pictures and
sending these in sequence. MPEG defines a picture as either a frame or a
field depending on whether it is interlaced or not. Interlaced pictures
consist of two interlaced fields. Interlacing is a technique whereby alternate
lines are sent in sequential half frames, called fields. The first of these will
consist of the odd numbered lines followed by a field of even numbered
lines. Interlacing was introduced in the early days of analog television to
reduce visible flicker for a given frame rate. It is unnecessary in modern

Copyright © 2004 Nortel Networks Essentials of Real-Time Networking


72 Section I: Real-Time Applications and Services

digital technology, and introduces some undesirable artifacts, but is


maintained for compatibility with legacy standards and equipment.
MPEG defines three different types of frame:
I frame. This frame uses intracoding with no motion
compensation.
P frame. This frame uses forward prediction. The previous frame is
used to predict the current frame.
B frame. This frame uses the previous and next frames to
determine the current frame.

I (Intra frame coding) frame


An individual I frame enables the complete static scene to be reproduced
with reasonable fidelity. It is compressed using various techniques,
primarily the Discrete Cosine Transfer Function (DCT) to encode the
information. To perform the transfer, the image is reduced to blocks of 8 x
8 pixels. This matrix of 64 pixels is scanned and a DCT performed on it.
Loss of any of these elements prevents the block from being updated,
therefore, if the picture is rapidly changing, the block from the previous
frame remains. This result affects more than this frame because the I frame
is used as a basis for creating the next sequence of frames until the next I
frame is sent.
A DCT is used for most picture content, because many of the coefficients
are reduced to zero or near zero and can be run length encoded. This
significantly reduces the amount of data that needs to be sent. The
coefficient reduction to near zero introduces a video artifact known as
shimmer or “Gibbs effect” that appears as movement in plain areas close to
edges. It is worth noting that the conversion to DCT is a completely
lossless, hence reversible, transformation. But once the near zero
coefficients have been reduced to zero, the reverse transformation can not
be performed. The compression has become “lossy.”

B and P frame
Rather than send complete information for every frame, the MPEG
standard provides for sending two other types of frames, containing only
difference information from the I frames. Such frames, known as B and P
frames, send considerably less information than I frames do. P are
“Predictive” frames that carry only information relating to the difference
between the last frame and the current frame. B frames are “bidirectional”
and carry difference information based on the immediate past and future
frames. The MPEG decoder then takes this information and together with
the information from the most recent I frame or reconstituted frame from I
frame and previous P frame or B frame, makes a prediction of the next
frame.

Essentials of Real-Time Networking Copyright © 2004 Nortel Networks


Chapter 4 Video Quality 73

A result of packet loss in an I frame, and its associated error in that frame,
is likely to result in an error being propagated until the next I frame.
Typically, the error is propagated for ten or twelve frames, though this is
entirely at the discretion of the coding system. These ten or twelve frames
last about one third of a second and are noticeable.
Missing packets in received video streams manifest themselves as
displaced static blocks. A block of 8 x 8 pixels appears at the wrong
location in the picture frame and is static. This distortion is corrected when
the next complete sequence is received, normally an I frame.
These P and B frames use motion compensation to further reduce the
information being sent.
Motion detection works on a macro block, which is a matrix of four blocks
in a 2 x 2 matrix (16 x 16 pixels). The encoder determines the motion by
looking for a match within adjacent blocks on the subsequent frame. It then
sends a vector indicating direction and position of the matching macro
block. The first vector (left upper-most element) is sent as a complete
vector and this vector is followed by subsequent horizontal elements being
sent as differences from this first vector. This vector and group of
differences is known as a slice. Over what range the match attempt is made
is determined by the encoder. If the range is a wide area, it causes
significant delay in encoding, and inevitably there is a trade off between
encoding delay and compression efficiency. For non–real-time coding, for
example, when preparing a movie for DVD, the delay is not such a
problem, but for live events it is.
Because of the resolution of the vectors and possible nonlinear motion of
the object, the resultant image is not necessarily accurate. Therefore, the
encoder calculates the new image from the calculated vector, and it then
compares this calculated image with the actual image. The encoder sends
as part of the P frame, the difference information, as well as the vector.
Sending an approximate vector with the difference information is more
efficient than sending an accurate vector.

P frames and error propagation


To use P frames, knowledge of the current image is needed. However, if the
current image is created from a previous P frame, an error in a P frame
propagates until the next I frame is received.

B frames, bidirectional frames


B frames offer slightly more protection. They use both the previous image
and the following image to determine the current image. Because
information revealed by moving objects is not available from past images,
but is available from future images, it can be used to reproduce the
background exposed as a result of moving objects. B frames are not coded

Copyright © 2004 Nortel Networks Essentials of Real-Time Networking


74 Section I: Real-Time Applications and Services

from each other so they do not propagate errors. Because B frames, when
coded from future frames, need those frames ahead of the B frame itself,
the sequence of frames, usually referred to as a Group of Pictures (GOP) is
sent in a sequence other than purely chronological so that both before and
after images are available to the decoder before the B frame is calculated.
This calculation requires a buffer of several frames duration to be present in
the decoder. B frames can be calculated from the previous frame, from the
next frame, or by an interpolation from both before and after. When the
calculation is made using both before and after frames, MPEG supports
simple linear interpolation from two frames, before and after, but does not
support weighted interpolation to handle multiple B frames interpolating
between P and I frames.

Sequences of frames
I frames are much more critical than P frames or B frames, therefore, loss
of an I frame is much more serious than loss of a P or B frame.
A typical MPEG sequence is as follows:
IBBPBBPBBPBBI
A reordering sequence is as follows:
IPBBPBBPBBPBI
Due to different error performance among DVD, broadcast, Digital Video
Broadcast (DVB), and Internet streaming, different sequences are suitable
for different applications. Material needs to be encoded for the medium for
which it is designed. This is a new area and requires experience at finding
the best method.

Compression issues
The distortions resulting from compression are as follows:
blocking
blurring
shimmer
smearing
edge distortion
jerkiness
luminance noise
chrominance noise
contouring

Essentials of Real-Time Networking Copyright © 2004 Nortel Networks


Chapter 4 Video Quality 75

Blocking
Blocking (Figure 4-2) is the effect of blocks of pixels appearing in the
wrong location on the image. Sometimes, blocks appear in the correct
position but in the wrong color because corruption to the color information
has occurred.

Figure 4-2: Blocking

Blurring (low resolution)


Blurring (Figure 4-3) is caused by insufficient resolution of information
into pixels. If information changes rapidly and the pixel update rate is not
sufficient to keep track of it, blurring occurs. Credits on movies typically
suffer from blurring because the rate at which they are scrolling is
incompatible with the MPEG data refresh rate.

Figure 4-3: Blurring (low resolution)

Copyright © 2004 Nortel Networks Essentials of Real-Time Networking


76 Section I: Real-Time Applications and Services

Shimmer, the Gibbs effect


Shimmer (Figure 4-4) is a phenomenon whereby edges of objects appear to
shimmer. It is caused by the DCT representation not exactly representing
the time domain information. A contributor to this effect is compression
applied to the DCT. High-frequency components with low coefficients are
reduced to zero because the loss of high frequencies is less noticeable, and
they sometimes represent noise. Therefore, reducing these coefficients to
zero significantly reduces the amount of data with little noticeable effect
other than this artifact.

Figure 4-4: Gibbs effect

Smearing
Smearing typically occurs if the luminance does not significantly change at
an edge, but the color does. Because the chrominance resolution is less than
the luminance resolution, edges tend to blur. This effect is often seen on
colored titles superimposed on a colored background.

Edge distortion
Edge distortion (Figure 4-5) is caused by a number of effects. A common
cause is the application of compression to an interlaced image. Edge
distortion or a comb effect, is caused by the two fields, each representing a
sample of information that represents a different time, presented as a single
frame. Horizontally moving objects display in a different horizontal
position on the two fields. Another cause of edge distortion is problems
with the MPEG motion vectors.

Essentials of Real-Time Networking Copyright © 2004 Nortel Networks


Chapter 4 Video Quality 77

Figure 4-5: Edge distortion

Jerkiness
Jerkiness, resulting from a lack of smooth information, is caused by a
number of effects. Change of frame rate from 24 or 25 to 30 or vice versa,
causes nonsmooth motion, noticeable on the background during horizontal
panning. Other causes are problems with motion estimating vectors,
particularly during accelerating objects.

Luminance noise
Luminance noise appears as specks on the screen. It is caused by a noisy
input source. If possible, remove noise using analog techniques prior to
digitizing.

Chrominance noise
Chrominance noise appears as color spots with no obvious brightness
change. If possible, remove the source of the noise.

Contouring
Contouring (Figure 4-6) is an effect appearing as lines indicating
quantization level changes. It is seen on areas where there is a smooth
intensity gradient. It is a known problem in digitizing video and is normally
dealt with by adding a random signal equal to one half quantization level,
known as dither. Dither moves the quantization level slightly, resulting in
randomizing the level changes so that they do not appear as a line and

Copyright © 2004 Nortel Networks Essentials of Real-Time Networking


78 Section I: Real-Time Applications and Services

become invisible. In addition, MPEG allows for finer quantization levels on


a selective basis to reduce this problem.

Figure 4-6: Contouring

Audio/video synchronization
Audio/video synchronization needs to be ± 50 ms or better. Many
compressed broadcast material does not achieve this synchronization and
lack of lip synchronization is noticeable. The most common cause is poor
attention to the delay introduced in the compression process for the audio
and video channels.

Effects of packet loss


With the introduction of packet switching techniques, particularly the use
of Internet Protocol, packet loss is a concern. With high data rates,
relatively low loss rates can affect video quality. Video packet losses
become perceptible at 10–4 (0.01%).

Figure 4-7: Effect of packet loss

Essentials of Real-Time Networking Copyright © 2004 Nortel Networks


Chapter 4 Video Quality 79

What you should have learned


The reader should have learned an overview of analog video and the
problems that the original analog system was designed to overcome. A
description of frame and field were provided as well as the need for and the
effects of interlaced scanning.
The reader should understand the need for and methods of video
compression.
The chapter also presented a description of the causes of video
impairments and the manner in which they manifest themselves. The
reader should be able to identify the most common video impairments and
determine which are the results of imperfections in the original source
material, which are analog impairments, and which are the result of loss of
information due to corruption of the packet stream.
The reader should understand the limitations of both analog and digital
video systems.
The chapter contains a description of the components of MPEG video
compression and some effects of video compression. The reader should
have learned the three types of video frames used in MPEG systems (I, B
and P frames) and the significance of corruption of the data stream
containing those frames. Also discussed in the chapter was macro blocks
and the applicability of macro blocks to motion detection and prediction.

Copyright © 2004 Nortel Networks Essentials of Real-Time Networking


80 Section I: Real-Time Applications and Services

Essentials of Real-Time Networking Copyright © 2004 Nortel Networks


81

Chapter 5
Codecs for Voice and Other Real-Time
Applications
Leigh Thorpe

Peter Chapman

Media Session Related

Session Gateway
Media Related Control Control

H.248 / MGCP
View From
Audio

Video
Voice

Real-Time

Application
Control

/ NCS
RTCP
RTSP

H.323
Perspective
SIP

To
codec Ntwk

RTP

UDP / TCP / SCTP


IP

QoS
MPLS Packet
AAL1/2 AAL5 Cable

Resiliency
ATM FR Ethernet DOCSIS

xDSL Cellular Fiber Copper WLAN HFC

SONET / TDM
Figure 5-1: Transport path diagram

Concepts Covered
Digitization of analog signals
General characteristics of codecs such as sampling rate, bit rate,
and compression ratio
Coding impairments such as baseline quality, encoding delay,
performance with missing packets, and transcoding
Speech codecs for telephony including G.711, G.726, G.729/
G.729A, G.723.1, GSM-EFR, and GSM-AMR

Copyright © 2004 Nortel Networks Essentials of Real-Time Networking


82 Section I: Real-Time Applications and Services

Selecting a codec for your VoIP network


Overview of audio codecs for streaming and other applications
Overview of video coding and codecs used for streaming video
(video-on-demand) and other video services.

Introduction
Much of the content of human communications consists of analog signals.
To ride across digital networks, analog signals must be converted into
digital form. Codecs play a key role in formatting, compressing, and
translating digital data from analog signals. The characteristics of the codec
contribute significantly to the efficiency and the final quality of the
transmitted signal. In this chapter, we discuss codecs used in real-time
telecommunications applications and services, primarily speech codecs,
and more briefly, audio and video codecs. The discussion will introduce the
parameters that underlie the performance of a codec and will explore
effective use of compression codecs in real-time communications systems.
The field of digitization, encoding, and compression is a broad one, and
this chapter addresses only a very small portion of that field. We will not
cover the mechanics of compression or encoding for the purposes of
encryption. Readers unfamiliar with the digitization process should begin
with the following Sidebar, which describes basic analog-to-digital
conversion.

Sidebar: The coding process–PCM and compression


Real-time applications often require the transport of analog signals.
Sound waves are analog signals, as are ordinary video signals. An
analog signal is digitized using an A-to-D converter (analog-to-digital or
A/D). The digital signal may be variously processed, stored, or
transported, but is eventually reconstituted to analog form for a listener
or viewer by means of a D-to-A converter (digital-to-analog, or D/A).
An A/D converter outputs a simple digital bit stream. This basic digital
format is called Pulse Code Modulation (PCM), and this linear form is
called linear PCM. Restoring a signal to analog form is sometimes called
reconstruction.

Essentials of Real-Time Networking Copyright © 2004 Nortel Networks


Chapter 5 Codecs for Voice and Other Real-Time Applications 83

Sidebar: The coding process–PCM and compression (Continued)


A codec (combining the terms code and decode) is a device that
processes the simple bit stream from an A/D converter according to how
it will be used; processing may change the format of the data, reduce the
amount of data, and so on. A codec has two parts: an encoder and a
decoder. An encoder creates a specific signal that is recognized by its
matching decoder, which returns the bit stream to its preprocessed form.
For stored media such as MP3 music files, digitization, encoding and
decoding functions usually occur at different times, and there are many
decoders for each encoder. In real-time telecommunications, encoders
and decoders commonly come in pairs, and operate on a real-time signal
such as a live talker, with the encoder at the input end of a digital
transport path, and the decoder at the output end. In telecom, the A/D
and D/A functions are grouped with the encoder and decoder,
respectively, and are sometimes considered to be part of the codec.
PCM determines the amplitude of the signal at regular intervals (the
sampling rate), and assigns a value to that amplitude. The sampling rate
is important because it limits the highest frequency that can be
represented in the digitized code. The highest frequency that can be
adequately tracked is half the sampling rate.
Figure 5-2 shows how an arbitrary analog signal is digitized into a linear
PCM format. Because the process is based on discrete amplitude values,
it is sometimes referred to as quantization. Samples are taken at regular
intervals, indicated by ticks on the x-axis. The quantization steps (the
amplitude values available) are shown as light gray lines. The interval
between ticks is determined by the sampling rate. At each tick-mark, the
process determines a quantized value corresponding to the amplitude of
the signal. There are no in-between values: if the signal amplitude is
between two digital steps, it is assigned to the closer one. The
determination of the amplitude value is instantaneous, and does not take
into account whether the amplitude is going up or down.

Copyright © 2004 Nortel Networks Essentials of Real-Time Networking


84 Section I: Real-Time Applications and Services

Sidebar: The coding process–PCM and compression (Continued)

Note deviation between


actual and quantized values

(Quantization Steps)
Original Signal

Signal Amplitude

{
5

0 time

}
5
Sampling rate too low
to respond to high
Digitized Signal
frequency content

Figure 5-2: Digitization of an analog signal. Note the


deviation between the actual signal amplitude and the
quantized values, as well as the loss of information where
the frequency content (up and down swing) is too high to
be captured given the sampling rate.

There are eleven quantization steps in Figure 5-2, and the step spacing is
linear, meaning that the amplitude represented by Step Five is five times
the amplitude represented by Step One. The dynamic range of sound
amplitude is very broad, and a large number of linear steps are needed
for high quality reproduction. Sixteen-bit linear PCM (used on CD and
other high quality audio) has 216 (65,536) quantization steps covering 96
dB of dynamic range. The dynamic range of a coding system refers to
the difference between highest (loudest) and lowest (quietest) signals
that can be represented.
If the appropriate filtering is used in the digitization and reconstruction
processes, the deviation between the actual amplitude and the integer
value assigned shows up as noise in the signal, called quantization noise
or sometimes quantization distortion. The number of quantization steps
and their spacing determine how much quantization noise is added to the
output. The more steps, the closer the quantized value will be to the
actual amplitude, and the lower the quantization noise. The more
amplitude steps used, the more bits needed, and the higher the bit-rate
needed to represent the signal in digital form.

Essentials of Real-Time Networking Copyright © 2004 Nortel Networks


Chapter 5 Codecs for Voice and Other Real-Time Applications 85

Sidebar: The coding process–PCM and compression (Continued)


Again, higher sampling rates preserve more of the original signal and
generate more data. The sampling rate should be chosen based on the
frequency content of the signal and the required fidelity. General audio
signals, including music, may contain frequencies up to (and beyond) 20
kHz, the upper limit of human hearing. Accordingly, CD audio uses a
44 kHz sampling rate (44,000 times per second). Human speech
contains frequencies up to about 7 kHz, and a sampling rate of 16 kHz is
sufficient to digitize speech. Conventional-band telephone speech
preserves only part of the speech band, frequencies up to about 3500 Hz,
because most of the information in human speech is contained in this
frequency range.
Telephone signals are sampled at 8 kHz, which records an eight-bit
sample every 125 µs (microseconds). This results in a bit rate of 64
kbit/s.
Compression techniques can reduce the amount of data needed to
represent a digital signal without sacrificing too much quality. A simple
type of compression employs nonlinear amplitude steps. The basic
digital encoding standard used for telephone voice, G.711, uses
amplitude quantization defined by one of two logarithmic functions, the
A-law and the µ-law. The logarithmic steps allow an eight-bit
quantization to encode a dynamic range similar to that for a twelve-bit
linear quantization.
Higher compression can be achieved using Digital Signal Processing
(DSP) techniques. A common method is to encode the difference
between one sample and the next. For instance, Adaptive Differential
PCM (ADPCM), a technique used with speech and audio signals,
reduces the bits needed by encoding the difference between one PCM
sample and the next, with a step size that adapts to the signal. Many
video codecs use a differential encoding technique. To compute the
difference between samples, an codec must wait until the second sample
is obtained, which increases the time taken for encoding. Another
disadvantage of differential techniques is that they take longer to recover
from lost data than do nondifferential techniques.

Basic Characteristics of Codecs


Codecs in telecommunications play one of two roles. First, as discussed in
the Sidebar, they convert analog signals to digital data that can be sent
across a network or processed further by Digital Signal Processors (DSPs).
Second, they are used to convert one type of digital signal to another.
Devices fulfilling this second role are more properly called transcoders,
although this term is rarely used. Later, this chapter discusses the effects of
using such codecs in the transport media path, which is indeed called

Copyright © 2004 Nortel Networks Essentials of Real-Time Networking


86 Section I: Real-Time Applications and Services

transcoding. In almost all situations, the analog-to-digital conversion is


done using a linear codec, and a codec of the second type is applied to the
output of the first, to transcode the signal to another format.
Certain basic information describes a codec. The basic approach
determines the codec type. Parametric characteristics include sampling
rate, data rate (bit rate), encoding delay, frame size, look-ahead, and
compression ratio. Finally, the baseline coding performance indicates
how well the codec reproduces the input sound signal.

Types of Codecs
Codecs are designed to address particular signal domains (for example,
audio and video signals). Within each domain, codecs can be designed to
handle a broad range of signal types, or they can be tailored to optimize
performance on a particular type of signal. In the audio domain, there are
general audio codecs, others specifically for speech signals, and others
specifically for music. Codecs can also be classified by the format of the
output data. Basic audio codecs that directly represent the amplitude of the
analog signal (such as PCM codecs) are called waveform codecs. G.711
and G.726, two telephony codecs, are both waveform codecs. The bit
stream (the digital representation) of a waveform codec contains explicit
information about the amplitude and frequency of the sound signal. These
are sometimes called sample-based codecs. Other codecs work on a
principle that groups information together in bundles called frames. Frame
size for speech and audio codecs is usually given in terms of the duration of
signal contained in the frame, say, 10 or 20 ms. Before the codec can
encode the frame, it has to wait for the entire frame of signal to collect in a
buffer.
Video codec frames are based on the natural succession of images making
up the original signal, each frame corresponding to one image. A video
frame thus refers to either the data corresponding to the frame, or the image
itself. The frame rate used by a video codec is a key determinant of the
smoothness of motion depicted in the playback.
Some frame-based codecs also look at the signal following the current
frame, which means they wait a little more before they begin processing the
frame. This little extra bit of signal is called (not surprisingly) the look-
ahead. The look-ahead for speech codecs is around 5–10 ms. Look-ahead
in video codecs is usually the entire following frame; this will be 40 ms for
full-motion frame rate.
Codecs can also be classified by the type of algorithm they use. Common
types in this classification include PCM, ADPCM, (see Sidebar for a
discussion of both of these approaches), sub-band coding, in which the
signal is separated into multiple narrow frequency bands, each of which is
encoded separately, and Code Excited Linear Prediction (CELP). CELP

Essentials of Real-Time Networking Copyright © 2004 Nortel Networks


Chapter 5 Codecs for Voice and Other Real-Time Applications 87

codecs are very common in voice telephony, and are described in greater
detail in the following section.
The sampling rate describes how often a digital value is defined from the
instantaneous value of the analog signal. The data rate or bit rate of a
codec refers to the number of bits per second that are needed to transfer the
signal from the encoder to the decoder. These terms are described in more
detail in the Sidebar above. Codecs used to transcode the signal from one
digital format to another generally inherit the sampling rate of the original
analog-to-digital [A/D] conversion; although, downsampling (changing to
a lower sampling rate) is possible.
Some codecs operate at one data rate; others have different rates available.
Constant bit-rate (CBR) codecs carry on at the same rate no matter what
the signal (or even when there is no signal). Variable bit-rate (VBR) codecs
can adjust their rate as coding proceeds. A VBR codec can select a bit-rate
based on the content of the signal being encoded (voiced vs. unvoiced
speech, speech vs. silence, full video scene change vs. a static image), or
based on other factors such as the channel quality or capacity.
All codecs take a finite time to encode and decode a signal. This time is the
encoding delay. Waveform codecs are very fast (microseconds), while
frame-based codecs have a significant delay built into them that can not be
reduced. This delay minimum is the algorithmic delay, which consists of
the frame length plus the look-ahead, if the codec uses one. The encoding
delay consists of the algorithmic delay plus additional time needed by a
finite speed processor to complete the processing. For practical reasons,
this delay is estimated as twice the frame size plus the look-ahead.

Compression
Linear encoding generates a lot of data, and transferring or storing them
uses a lot of capacity. To reduce the volume of data, and hence the capacity
needed to handle it, compression techniques are employed. Compression
can be lossless or lossy; lossless compression is called for where it is
necessary to restore the exact signal content. For voice and video
telecommunications, lossless compression does not sufficiently reduce the
data rate, so lossy methods are used. Some simple compression techniques,
such as logarithmic coding, restriction of dynamic range, and differential
coding, are described in the sidebar.
More complex techniques can compress the signal more efficiently. Many
coding algorithms include a modeling process that makes some
information or assumptions about the characteristics of the signal, the
channel, or human perception. Speech codecs model the acoustics of the
human speech production apparatus. Assuming a signal is speech limits the
number of different kinds of sounds that the codec needs to reproduce.
Speech codecs are designed to process speech signals, and they do it well,
on the other hand, they usually perform poorly with nonspeech signals such

Copyright © 2004 Nortel Networks Essentials of Real-Time Networking


88 Section I: Real-Time Applications and Services

as noise and music. Audio codecs are aimed at a broader range of signals,
and are less likely to put different types of sounds at a disadvantage.
One of the most common compression methods is Code Excited Linear
Prediction (CELP). CELP codecs are frame-based. A CELP coding
algorithm deconstructs the signal into two parts: a spectral model
component and a residual component. The spectral model component is a
digital filter. The residual is the left over part of the signal not accounted for
by the filter model. When the residual component of the signal is passed
through the filter, the segment of speech contained in the frame is
reproduced. The filter parameters are quantized, and the quantization
indices (step numbers) for each parameter are obtained. A table called the
codebook contains numbered entries corresponding to choices for the
residual. The encoder determines the codebook entry that best matches the
residual. These quantization and codebook indices make up the encoded
data that are sent to the decoder. Since the encoder and decoder use the
same codebook, the decoder can easily look up the codebook entry
corresponding to the residual. The quantization indices are used to
reconstruct the spectral model filter. The decoder then filters the residual
codebook entry to reproduce the signal segment for that frame.
Coding distortion occurs because the codebook entry is not an exact match
for the actual residual. The subjective quality of a CELP codec depends on
both the size of the codebook, which determines the number of signal
segments that could be used to represent the residual, and how well the
distortion takes advantage of the “blind spots” in the human perception of
distortion (some types of distortion are less apparent or less irritating). The
size of the codebook affects the bit rate needed to operate the codec.
The efficiency of compression is quantified in the compression ratio, which
is the ratio of the data rate of the compression codec compared to either the
uncompressed signal or some standard digital process. The standard digital
process for telephony voice is G.711 at 64 kb/s, which is the rate used for
an individual channel (DS0) in a TDM network. Audio and video
compression is usually compared to the rate of the linearly encoded signal.
For audio, this is 16-bit linear PCM (used in CDs). For video there is no de
facto standard; the data rate of a linear signal will depend on the frame rate,
display size, and other characteristics of the original analog signal.

Coding impairments
With the exception of delay, impairments associated with the digitization
and compression of audio and video signals are similar for real-time and
non–real-time operation. The performance of individual codecs is
determined by the amount of distortion they add to the target signal, as well
as how they behave with any unwanted signal components (such as noise),
how much the signal degrades when it is passed through the codec multiple
times, and how disruptive data loss is to the output signal.

Essentials of Real-Time Networking Copyright © 2004 Nortel Networks


Chapter 5 Codecs for Voice and Other Real-Time Applications 89

Constraints on delay, however, put limits on our ability to avoid or mitigate


the other impairments. This section discusses coding impairments, their
impact on real-time applications, and what if any steps can be taken to
manage their effects.

Encoding distortion
As discussed above, digitization and compression always reduce the
information content of the signal. This can manifest itself as distortion
(change in the shape of the waveform), as increase in the noise floor
(“hiss”), or addition of so-called coding artifacts. For PCM codecs, it
depends on the sampling rate, quantization step size, and whether any
compression techniques (such as logarithmic companding or differential
encoding) are applied. Frame-based codecs that aim for higher
compression will add more distortion. It is often assumed that the lower a
codec's bit rate, the more encoding distortion it will add. While bit rate has
some direct relationship with distortion (see description of the CELP
coding in the previous section), advances in coding technology have made
successive generations of low bit-rate codecs much better than previous
generations. Comparisons across different technologies (differential PCM
vs. CELP, say) do not follow such a simple relationship, either. (The reader
can inspect data rate vs. coding impairment for various codecs in
Table 5-2).
Coding distortion in waveform codecs does not depend on signal type.
CELP codecs, however, because they are generally tuned to a particular
type of input signal, distort different signals in different ways. CELP-based
speech codecs perform poorly with nonspeech signals, including
background noise, DTMF1 tones, and music. This means that CELP codecs
often require a DTMF work-around to detect, transfer, and reconstruct the
tone without putting the signal through the CELP encoder. Music on hold
will be significantly degraded by CELP compression. CELP codecs also
vary in their performance with different voices. Because of the way they
work, CELP codecs work best with lower-frequency voices, so they
generally reproduce men’s voices better than women’s and children’s
voices.
Given this dependence on input signal, evaluating encoding distortion of
CELP and other compression codecs can be a complex process. The most
reliable method is formal subjective testing. This is usually done with
listening tests, so that the codec’s performance on many different input
signals can be examined. Listeners in these tests rate the quality on a scale
of one (bad) to five (excellent), and the resulting average is known as Mean

1. Dual-tone multifrequency (DTMF) tones were invented to pass some signaling such as number dialled
over analog equipment. They are also used by network and proprietary features such as access to
voice mail, credit card number entry, and so on. Network transparency to DTMF is required for these
features to work.

Copyright © 2004 Nortel Networks Essentials of Real-Time Networking


90 Section I: Real-Time Applications and Services

Opinion Score, or MOS, which ranges between one (poor quality) and five
(high quality).
At least three different approaches to estimating encoding distortion are
used: (1) using a MOS from one test case (or a weighted average from
several) from formal subjective evaluation,2 (2) estimating MOS using an
objective quality estimation technique such as P.862 (PESQ*) (currently
available for speech codecs only), or (3) assigning a value indicating the
general extent of impairment, derived from a prescribed subjective test.
The first is an older strategy and works well if the codecs of interest were
directly compared in the same subjective study. The main limitation is that
no bench testing can be done using this method. The second approach can
make measurements on arbitrary systems or components. However, these
methods depend on complex algorithms that are used to obtain the
distortion estimates, and they are not guaranteed to map onto the values
that would have been obtained in a subjective test, since the models are
incomplete. The final approach is used in the ITU E-Model3, where
subjective test results are used to generate an Equipment Impairment (Ie)
value for each codec. The Ie value is then used in the modeling. Ie is cited
in Table 5-2 below as an indicator of the baseline coding quality of each
codec described. All these methods are discussed in more detail in
Chapter 3.

Encoding delay
While encoding by a waveform codec is virtually instantaneous, encoding
by a frame-based codec may introduce a significant delay. Before a frame
can be processed, a frame's worth of speech must collect in the buffer.
Where a look-ahead is used, the encoding window stays a fixed time ahead
of the current frame, and this time is added to the delay.
The encoding delay can be calculated as (frame size + look-ahead +
queuing delay + processing delay). The queuing delay is the time between
a complete frame of speech becoming available and when that frame is
submitted to CPU for processing. Processing delay is the time taken for the
processor to execute the algorithm for that frame. Queuing and processing
delay depend on a particular implementation and processor. A conventional
formula for estimating encoding delay in the absence of a specific
implementation is (2 x frame size) + (look-ahead). This formula assumes a

2. The tendency of authors to cite “the” MOS associated with a particular codec is misguided. No specific
MOS can be assigned to a codec. Different input signals will return different scores, and a change to
the test cases or reference cases used can shift the scores. It is the pattern of results for input signal
types with one codec and for different codecs that is key to understanding a codec’s performance.
This is discussed further in “Chapter 3 Voice Quality”.
3. Additional details on the definition and operation of the ITU E-Model are covered in
“Chapter 3 Voice Quality”.

Essentials of Real-Time Networking Copyright © 2004 Nortel Networks


Chapter 5 Codecs for Voice and Other Real-Time Applications 91

worst-case scenario that the sum of queuing delay and processing delay is
equal to the framesize, which is the maximum tolerable delay for real-time
operation. It is based on considerations of efficiency: a powerful processor
could encode each frame very quickly, resulting in a delay only a few
microseconds longer than the framing delay. However, the processor would
then sit idle until the next frame is ready. This would require a relatively
expensive DSP. Instead, systems are often designed around a processor that
is powerful enough to finish the processing of one frame just as the next
one is ready. Using optimal scheduling and some other techniques, system
delay can be reduced to (frame size + look-ahead + processing delay).
Decoding is typically a small fraction of encoding time for CELP codecs.
Packet Loss Concealment (PLC) may add a few milliseconds.
Because of the integration of functions in the DSP chip, it is not always
possible to assess the delay associated with individual steps in the
processing. Integration allows more parallel processing and thus provides
the opportunity to reduce the end-to-end delay; on the other hand, it makes
it more difficult to partial out the contributions of the different functions to
the end-to-end delay.

Performance with Missing Packets


VoIP can benefit from packet loss mitigation techniques. Some packet loss
concealment can be built into or added to the decoder, and other techniques
(an adaptive jitter buffer, interleaving of frames in the packet flow, or
sending of duplicate information) are external to the codec.
For speech and audio, the effects of gaps of 40-60 ms or less will depend
on the particular codec and PLC technique. For example, lost data may
cause codecs using adaptive techniques to deconverge, with the result that
impairment will propagate forward in time, lasting until the adaptive
features restabilize. As well, PLC algorithms can distinguish themselves
with their ability to smooth and repair the signal. PLCs are designed to
mute the output when faced with long bursts of loss, since it is not possible
to repair the speech in such cases.
Some speech coding standards designate a PLC as part of the standard.
Other codecs require an external PLC. For codecs requiring an external
PLC, concealment algorithms may be commercially available. These will
be marketed with optimistic claims about the effectiveness of the repair
process. Some methods integrate PLC with an adaptive jitter buffer.
Adaptive jitter buffers prevent some packet loss by trading it for delay
(covered in Chapter 3). The majority of improvement from these methods
may be attributable to the adaptation of the jitter buffer4.
While it is unlikely that any methods can live up to the more extreme
claims, such methods will improve performance of VoIP running over best-
effort networks compared to VoIP without mitigation or concealment. VoIP

Copyright © 2004 Nortel Networks Essentials of Real-Time Networking


92 Section I: Real-Time Applications and Services

running on a managed network will likely require little assistance from


mitigation, provided loading and jitter sources are properly controlled.
Mitigation techniques provide useful “insurance” for call paths running
over the Internet or other best effort networks, where these factors are not
rigorously controlled.

Silence Suppression
Silence suppression is a technique that conserves network capacity by
identifying only the portions of a signal containing active speech, and
sending those portions while discarding or suppressing the portions that do
not. If you're familiar with silence suppression, you've probably been
thinking of it as a VoIP feature, so you may be surprised to find it lumped
in with impairments.
Silence Suppression capitalizes on conversational turn-taking: partners
alternately talk, then listen. On average, each voice path in a two-party call
is active 40-60% of the time. This means that about half the time, a channel
carries no speech signal. To reduce the amount of data to be sent across the
network, only data that encodes actual speech is sent. Data associated with
silent intervals between utterances is discarded.
Silence Suppression employs a Voice Activity Detector (VAD; also called
Speech Activity Detector or SAD) to determine whether there is speech on
the channel, and a noise estimator that samples the noise background noise
and sends coefficients describing the noise to the decoder end of the call.
The detector is actually configured to detect the absence of speech rather
than its presence. This inverts the logic of the detector, which provides a
kind of fail-safe: if the signal is ambiguous in some way, or the detector
fails, the decision outcome will be that speech is present. Therefore, speech
content will not be suppressed accidentally. Two VAD parameters are
important to Silence Suppression performance: (1) the detection threshold
and (2) the hang time (which determines the minimum time that data will
be sent once the algorithm has determined that a signal is present). In
addition, the network operator must decide the peak capacity of each link,
which will determine the probability that the channel becomes overfilled by
active speech data.
At the decoder end, the speech bursts are played as sent, but the silences
between them are filled with a synthesized background noise called

4. Some vendors of enhanced methods claim to be able to repair speech with up to 30% packet loss.
These techniques invariably compare a PLC algorithm combined with an adaptive jitter buffer to
the performance of a system using an ordinary (or no) PLC and a fixed, moderately sized jitter
buffer. This algorithm does not repair 30% packet loss. Instead, it prevents loss associated with
late packets arriving at the jitter buffer, only to be discarded because they are too late to play out.
While this greatly improves the sound reproduction, it does so by adding delay, at least temporari-
ly. Should the network suffer high packet loss from drops or discards in the core, repair by these
algorithms will not be much better than standard PLCs.

Essentials of Real-Time Networking Copyright © 2004 Nortel Networks


Chapter 5 Codecs for Voice and Other Real-Time Applications 93

comfort noise. The level and spectrum of the comfort noise are determined
from the coefficients received from the encoder end. This works best for
stationary and quasi-stationary noise, like car interior noise or crowd
babble. Dynamic noise, such as street noise is more difficult to match.
Differences in level between the actual noise that arrives mixed with the
speech and the comfort noise generated at the decoder creates an audible
contrast. This makes it obvious that something is interfering with the signal
or is being turned on and off.
The use of silence suppression can cause several impairments:
front-end clipping, where the beginnings of utterances are
removed
background noise contrast, where noise used to fill the silent
periods is noticeably different from the background noise audible
during speech
noise pumping, where peaks in the background noise trip the
detector and background noise is transmitted momentarily (for the
duration of the algorithm's hang time)
data loss, caused when the total volume of active speech exceeds
the capacity of the link it is carried over and some data must be
discarded
The first three impairments result from detection errors, while the last
results from a provisioning trade-off between statistical fluctuation in
speech activity and the number of channels carried over the link. Design
parameters (threshold and time constants) determine the amount of silence
detected as well as the accuracy of the decisions. In general, aggressive
settings will detect more silence and both long and short silent intervals,
even short silences within one talker's speech. Conservative settings, on the
other hand, remove only long periods of silence. At the same time, the
aggressive settings create more opportunity for errors. In quiet, speech is
easily differentiated from non-speech, so the parameter settings are not
critical for this case. Elevated background noise, however, can exceed the
threshold preventing the detection of silence. Tuning the threshold and time
constants can prevent such errors, but this can lead to the inverse, where
lower level speech is mistaken for silence. These errors can cause audible
artifacts in the output: the front ends of words may be clipped off or quieter
sections chopped out.
The total speech load to be carried across the network will depend on the
number of channels with active speech at any one time. Within aggregated
flows, silence suppression reduces the peak bandwidth needed to carry
voice traffic. Where the number of talkers is high (as in the network core),
the distribution of peak talker data rate will be based on the statistics of
large numbers and the actual peak data rate will rarely exceed the capacity

Copyright © 2004 Nortel Networks Essentials of Real-Time Networking


94 Section I: Real-Time Applications and Services

of the link. Silence Suppression can reduce speech traffic by 30–35%;


attempting more increases the likelihood of impairments.
Silence suppression employs statistical multiplexing (statmux). Where
silence suppression is used to put more calls on a link than the system can
guarantee bandwidth for, occasionally the packet volume will exceed the
provisioned capacity, causing jitter and packet loss. Capacity engineering
for silence suppression requires a determination of (1) the capacity of the
link, (2) the expected reduction in data given the VAD settings, and (3)
determination of a tolerable rate of congestion and packet loss from
momentarily exceeding the link capacity. These can be used to compute the
maximum number of calls that can be handled with Silence Suppression
enabled. Spreadsheet calculators are available to assist in this analysis.
Modeling has shown that links carrying 24 calls or less do not have a
sufficiently stable peak rate to share bandwidth among voice channels
without unacceptable rates of packet loss. Low-speed links can still benefit
from the use of silence suppression, which makes room for data from other
applications to share the channel. Note that such sharing may increase jitter
where voice and nonvoice packets share a queue.

Transcoding
Transcoding refers to the successive encoding of a digital signal by
different codecs. Transcoding can be problematic for voice quality because
of cumulative degradation to the final output speech. Transcoding may
increase both signal distortion and delay. It is important to understand
when transcoding adds impairment and where it does not.
Generally, transcoding occurs because some networks use compression
coding to save bandwidth, and the compression codecs they choose are
different. For example, many VoIP networks use G.729, an 8 kb/s
telephony standard codec, to conserve bandwidth; digital wireless networks
use low bit-rate codecs over their radio channels. In addition, some network
features such as conferencing and voice mail may also add transcoding.
A special case of transcoding, called tandeming, occurs when a signal is
encoded, decoded, and reencoded by the same codec. The impairment is
similar to that for transcoding. TIA TSB-116 offers a good discussion of
the effects of transcoding.

The arithmetic of transcoding


For conventional telephony, G.711 serves as a baseline and neutral starting
point; all other codecs are evaluated relative to G.711. Transcoding from
G.711 to another telephony codec degrades the quality, while transcoding
from another telephony codec to G.711 does not. Typically, successive
non-G.711 encodings involve decoding from one codec to G.711 and then
encoding from G.711 to the second codec. A VoIP LAN using G.729,
connecting by G.711 to a GSM network begins with G.729 coding on the

Essentials of Real-Time Networking Copyright © 2004 Nortel Networks


Chapter 5 Codecs for Voice and Other Real-Time Applications 95

LAN, transcodes to G.711, and finally to GSM-EFR on the wireless


channel. This would technically count as two transcodings (G.729 to G.711
and G.711 to GSM-EFR). However, since the intermediate G.711 does not
increase the impairment, we may describe the transcoding simply by
counting only non-G.711 encodings. For the purposes of voice quality
estimation, this counts as one transcoding (G.729 to GSM-EFR).
The precise amount of coding distortion depends on the specific codecs
involved, the order of processing (that is, which of the two codecs was
first), whether noise is present in the signal, and for some codecs, the
spectral content of the specific voice. The E-Model Ie, defined in the
discussion above on Encoding Distortion, can help estimate the impairment
from transcoding5. Using the values shown in Column 8 of Table 5-2
below, Table 5-1 computes the value of the E-Model output metric,
Transmission Rating (R), based on transcoding between pairs of codecs.
The highest possible R for conventional-band telephony is 93, and Ie for
each pair is added together, and then subtracted from 93. Where at least one
of the codecs is G.711, scores for the codecs shown remain near eighty or
above. Scores that drop below eighty may have minor impairment, and
those dropping below seventy more obvious impairment. (Wireless
operates happily in this region because it has different user expectations.)
Note that the scores in the table account only for the transcoding
impairment; contributions to quality from encoding delay, as well as other
impairments from the connection have not been included.
Transcoding between G.711 and G.726 can use a special feature of G.726
called the Synchronous Coding Adjustment (SCA) to limit accumulation of
distortion. Digital signals can be transcoded back and forth between G.711
and G.726 repeatedly, incurring only the degradation associated with the
lowest bit-rate used for G.726, provided the SCA is used in each successive
application of G.726. The effect of the SCA is included in the computations
for Table 5-1

Transcoder-free operation6
To maintain voice quality performance, we want to limit the number of
encodings by CELP and other low-bit-rate codecs to one. Packet networks
offer a unique opportunity to do this; speech that is already compressed can
be transported over IP without transcoding to an intermediate form.
Transcoder-free operation is a feature that ensures that speech signals can

5. Since the E-Model is an additive model, this estimate is approximate. It does not account for differences
in order of transcoding and becomes less accurate as more transcodings are combined.
6. Here, the term transcoder-free operation (TrFO) is used generically. In wireless technology standards,
TrFO refers to a specific signaling system to set up a clear TDM channel between the endpoints that
allows the encoded speech to be sent as data.

Copyright © 2004 Nortel Networks Essentials of Real-Time Networking


96 Section I: Real-Time Applications and Services

be carried through the network without encountering more than one low-
bit-rate encoding.

Table 5-1:The table entries show the effect of transcoding on predicted


voice quality in terms of Transmission Rating (R). Values are
computed by subtracting the Equipment Impairment (Ie) for with
each codec from the G.711 reference of 93. Transcodings between
non-G.711 codecs are assumed to have an intermediate G.711
step between the two. G.726 is assumed to have the Synchronous
Coding Adjustment (see description of G.726, below). See
Chapter 3 for a discussion of R and Ie of the E-Model.
Several things must be in place to support transcoder-free operation. First,
the network endpoints must be able to communicate with the call server or
each other about the codecs available and which one to use. Second, since
the packet portion of the network may not know the final end point of any
call, gateways must be able to pass on the data to a compatible endpoint or
to decode it where it will be passed to a TDM endpoint.
Where compression is used in the access, such as cellular/wireless,
transcoder-free operation can provide simultaneous benefits to both end-
user quality and bandwidth efficiency. For example, instead of converting
wireless speech back to G.711 for transport across the core network, the
low-bit-rate code can be sent instead. Where the call goes to another
wireless endpoint using the same wireless standard, the speech data can be
carried across the packet network from one base station to the other, and
sent down the second wireless channel, without incurring additional
impairments from intermediate codecs. Similarly, Enterprise LAN using

Essentials of Real-Time Networking Copyright © 2004 Nortel Networks


Chapter 5 Codecs for Voice and Other Real-Time Applications 97

G.729 VoIP connecting via a public carrier to another similar LAN


currently decode the G.729 to G.711 for transport across the TDM link. As
the public network converts to IP infrastructure, it will be possible to
forward the G.729 packets across the public carrier IP network to the far
end G.729 decoder. Voice mail on G.729-based networks should store
G.729 code, rather than transcoding to a proprietary codec for storage,
reducing further the number of transcodings speech is subject to in the
network.

Codecs and conference calls


Audio conferencing is a well-established business communications tool.
For large Enterprise networks, as much as one-third of user call-minutes
include a conference bridge. User satisfaction with Enterprise VoIP, then,
depends on the acceptability of conference call quality. Service providers
do not have the same conference call volumes, but many operator-assisted
calls are made through a bridge.
The core of a conference bridge is an algorithm that mixes the speech
signals and sends the result to individual receivers. The selection of which
talkers to mix can be complex, but the difficulty VoIP poses for
conferencing quality is that the conference bridge can only mix PCM code.
If the signal is compressed, it must be decoded before the bridge can
process it. The mixed signal must be encoded again before being sent to the
endpoints. G.711 will not suffer impairment from the mixing itself, but
must be unpacketized and repacketized, increasing delay. To make matters
worse, many conference callers use headsets or speakerphones, which can
combine with the increased delay to produce audible echo.
The implications of this are clear: traditional conferencing algorithms
insert the equivalent of a TDM hop in the middle of the call. This doubles
the equipment delay, and where a compression codec is used, transcoding
will be inevitable7.
Adaptation of the conferencing algorithms to the VoIP environment can
relieve some of these impairments. One strategy is to send only one talker
at any time, so that the signal does not need to be mixed. This may solve
the delay and transcoding impairment problem, but the detector may
introduce other impairments, since it will have to switch quickly between
talkers, and the signal sent to participants will be very choppy. Some

7. Where the bridge sits in a legacy network, and only some lines are calling from a network using packet
technology and/or speech compression, the situation is more complex. If there is only one caller from
such a network, there will be no additional impairments over the TDM, since that signal must be
unpacketized/uncompressed at the network interface in any case. Where there is more than one line
with packet or compressed speech, the analysis applies to signals they hear from the other talkers on
such networks, regardless of whether the conversions take place at the bridge or at an intermediate
interface.

Copyright © 2004 Nortel Networks Essentials of Real-Time Networking


98 Section I: Real-Time Applications and Services

designers are introducing hybrid algorithms that mix sometimes but not
other times, depending on whether there is significant activity on more than
one line. The success of these new strategies is not yet determined, but it is
clear that the traditional conference bridge will not deliver the quality users
expect when combined with VoIP technology.
Impairment to conference calls over VoIP can be minimized by using
G.711 or G.726-32 as the VoIP codec. Delay will increase, but the increase
will be less than for compression codecs, and there will be no transcoding
impairment.

Speech Codecs for Voice Services


The design intent of the codecs that have been adopted for VoIP ranges
from general use on digital services (for example, G.711) to digital wireless
(for example, G.729, which was designed for use with digital wireless,
although it was never used for such). To date, no codec has been designed
specifically for use on packet networks. While this has meant that some
codecs have required some retrofitting, there are some potential
advantages. In particular, where the same codecs are used on different
types of networks, interworking between them is simplified. Even so, while
VoIP hasn't produced a completely new crop of special-purpose codecs,
there are still more than a few to keep track of. Some are common choices
for VoIP services, while others are used in existing networks and have
implications for interworking and end-to-end quality.

Commonly used telephony codecs


While there are many standard and proprietary speech codecs, commercial
telephony employs a comparatively small subset. Table 5-2 lists the most
commonly used telephony codecs for wireline, wireless, and VoIP. The
G.700-series codecs are ITU standards. The remaining codecs listed here
are wireless standards adopted by various standards bodies (see References
for details). Also shown are the systems each codec was standardized for or
where it is normally deployed. All these codecs use the conventional
telephony frequency band (300–3500 Hz) associated with 8 kHz sampling.
The codec type shows the dominance of the CELP technology for speech
compression coding. Other defining characteristics include data rate(s),
frame size, look ahead, an estimate of encoding delay, and the E-Model
Equipment Impairment (Ie) indicating the baseline coding distortion of the
codec. The sections following the table provide additional information on
design intent, usage, features, idiosyncrasies, listening quality of the
individual codecs.
This list of codecs is not exhaustive. Other codecs are used in wireless and
IP telephony in specific national and proprietary systems.

Essentials of Real-Time Networking Copyright © 2004 Nortel Networks


Chapter 5 Codecs for Voice and Other Real-Time Applications 99

Data Frame Look- Delay E-model


Codec Systen rate size ahead Impairment
Codec type implementation factor (Ie)1
(Standard) (kb/s) (ms) (ms) estimate (ms)
G.711 PCM 64 — 0 0.125 0
G.726 ADPCM 32, 24, — 0 0.250 7, 25, 50
16
G.728 LD-CELP 16 0.625 0 1.250 7
G.729 CS-ACELP 8 10 5 25 10 (VAD off)
G.729AB CS-ACELP 8 10 5 25 11 (VAD on)
G.723.1 MP-MLQ 6.3 30 7.5 67.5 15
GSM-EFR GSM ACELP 12.2 20 0 40 5
2
AMR GSM, ACELP 12.2 20 0 40 5
Mode 1 UMTS
AMR2 GSM, ACELP 7.4 20 5 45 10
Mode 4 UMTS
IS-641 TDMA ACELP 7.4 20 5 45 10
3
SMV CDMA RCELP 8 20 10 50 not determined
3
EVRC CDMA RCELP 8 20 10 50 6
3
733-A CDMA QCELP 13 20 7.5 47.5 5*
4
iLBC IP 15.2 20 5 45 not determined
(2 rates) 13.33 30 104 70 not determined
BV16 IP proprietary 16 5 0 10 not determined
1
Equipment Impairment (Ie) factors given here account for codec distortion on clean channels, that is.,
without packet loss or wireless RF impairments, in the computation of the E-Model. (Ie never includes
impairment from delay.)
2
AMR has eight rates in total. AMR Mode 1 is identical to GSM-EFR, while AMR Mode 4 (7.4 kb/s) is
identical to IS-641, and is very similar to G.729.
3
These codecs are used in North American CDMA, and each is made up of four fixed-rate codecs. Each
uses a rate determination algorithm that decides which codec is to be used at any time, depending on the
nature of the signal present and on the mode of operation. The bit-rate given is the average rate for the
highest quality operating mode.
4
This delay is not technically a look-ahead, since it occurs in the decoder rather than the encoder.
However, it has a similar effect on the overall encoding delay.

Table 5-2: Summary of telephony codec information

G.711
The workhorse of the PSTN for digital trunking and switching
The best quality conventional-band codec
Handles nonspeech as well as speech

Copyright © 2004 Nortel Networks Essentials of Real-Time Networking


100 Section I: Real-Time Applications and Services

Two coding laws are defined for G.711: A-law and µ-law; the voice
quality produced by these two coding laws is very similar;
transcoding from one to the other adds distortion equivalent to
somewhat less than one unit of R or Ie.
The two coding laws do not interwork (a signal encoded by one can
not be decoded by the other), but the data can be translated using a
simple look up table.
A-law coding is used in most of the world and on international
connections; µ-law is used in North America.
Requires external packet loss concealment and silence suppression,
which are easily added.

G.726
32 kb/s rate is commonly used for compression in TDM networks,
private networks, undersea cables, and satellite links.
Specified for low-power (in-building) digital wireless systems, such
as CT2 and DECT
Common in ATM and FR environments, but not found in many
VoIP systems yet
Sounds slightly raspier on active speech than G.711. The noise
floor is slightly higher, and is acceptable for both public and private
networks.
Quality of the lower rates (24 and 16 kb/s) not generally acceptable
for commercial telecommunications, although these rates are
sometimes used in private networks.
The Synchronous Coding Adjustment (SCA) is used to avoid
cumulative quantization distortion from multiple conversions
between G.711 and G.726.
More sensitive to data loss than G.711 because the decoder can lose
its adaptive reference, and it takes a finite time to reconverge.
Requires external packet loss concealment and silence suppression;
which are easily added.

G.729, G.729A8:
G.729A (that is, Annex A) is a reduced-complexity version of
G.729

8. When reference is made to G.729, it is almost always the 8 kb/s rate that is intended. However, other
rates are defined in G.729 Annexes. G.729A is a reduced complexity version of the 8 kb/s codec
defined in the main body of the standard. In this book, references to G.729 without qualification may
be taken to mean the 8 kb/s algorithms, either G.729, G.729A, or both.

Essentials of Real-Time Networking Copyright © 2004 Nortel Networks


Chapter 5 Codecs for Voice and Other Real-Time Applications 101

G.729 and G.729A can be considered interchangeable; the


differences are important to developers, but not to deployment,
operation, or interoperability
Both use the same decoder; therefore, they are fully interoperable.
Often used in VoIP networks, especially Enterprise LAN-based
systems.
Offers acceptable quality on point-to-point calls; minor coding
distortion on single encoding; rated similar to G.726 at 32 kb/s,
although qualitative difference in the sound; poorer reproduction of
non–speech signals (DTMF, music); typical CELP transcoding/
tandeming degradation.
Built-in packet loss concealment
Built-in silence suppression algorithm, defined in G.729 Annex B
When G.729A was introduced, marketing claims that G.729A was
better than G.729 (or vice versa) were common; there is no basis for
these claims.

G.723.1:
Developed for use in video teleconferencing
Early de facto standard for “shrink-wrap” VoIP applications
Slightly poorer baseline quality than G.729, plus relatively long
delay
Built-in packet loss concealment
Runs at two rates, 6.3 kb/s and 5.3 kb/s

GSM-EFR:
GSM EFR (Enhanced Full-Rate) wireless speech coding standard.
Baseline quality with speech signals is essentially equivalent to
G.711
Tandeming degradation less than for the earlier compression codecs
Built-in silence suppression feature, called DTX (discontinuous
transmission)

Copyright © 2004 Nortel Networks Essentials of Real-Time Networking


102 Section I: Real-Time Applications and Services

AMR:
Adaptive Multi-Rate (AMR) codec for GSM wireless
Developed to optimize quality over wireless channels: the coding
rate adapts to current channel conditions: The bit rate of the speech
codec is reduced in the face of data loss; bits freed up are
transferred to the error protection function.
Has eight rate modes available (half-rate operation uses the lower
four modes).
Top rate mode is identical to the GSM-EFR codec; half-rate mode
is equivalent to IS-641.

iLBC, BV16:
Selected as low bit-rate codecs for CableLabs Packet Cable
standard.

Selecting a codec for your network


What should you consider in determining which speech codec is
appropriate for your particular network? The main selection drivers are
voice distortion, delay, and system capacity. The importance of specific
considerations will depend on the type of network (service provider, large
Enterprise, small Enterprise), what kind of calls are being made (two-party,
conferencing, wireless access, long distance), and who the primary users
are (subscribers, employees, business-to-customer, etc.). Refer also to
Chapter 20, which discusses the effect of codec selection and choice of
packetization on overall network performance. Chapter 20 also offers
design guidelines integrating codec choice into the overall packet network
planning for voice.

Voice performance
For maximizing voice performance calls with a codec with low distortion
and low delay, G.711 is the natural choice. G.711 will provide the best
intranetwork voice quality, and will optimize interworking with other
networks. Conferencing and voice mail performance will be similar to that
with TDM. Running G.711 with 10 ms packets will offer the best end-to-
end delay, but where capacity considerations prevent that, G.711 with
20-ms packets is a good alternative. Using G.711 with careful network
provisioning, it is possible to shift from TDM to VoIP without users being
aware of any change in the infrastructure.
Note that voice mail can suffer more from coding distortion than live
conversation, because the listener does not get a chance to ask for repetition
of any unintelligible parts. This can be a significant problem if the voice

Essentials of Real-Time Networking Copyright © 2004 Nortel Networks


Chapter 5 Codecs for Voice and Other Real-Time Applications 103

mail system has its own compression codec, in which case the signal may
be transcoded during storage and again during playback.
Sometimes G.711 cannot be used, for instance, with low speed links from
remote sites, when teleworkers or road warriors (salesmen or other users
who are often out of the office) dial in, or where LAN has insufficient
margin for G.711 operation. G.726-32 is the next best codec, but many
VoIP gateways have not yet implemented G.726.

Capacity
Where bandwidth is the top priority, there is a strong push to adopt the
lowest bit-rate codec. Where a codec is chosen based on bandwidth, make
sure that the quality is acceptable to your users on all their common calling
scenarios. Often when justifying a codec choice, the codec quality
considerations are limited to two-party calls over the immediate network.
An encoding delay, the kinds and quality impact of any transcoding, and
the quality of long distance calls, or calls to other networks such as wireless
are rarely considered. Only after the network is up and running do user
complaints focus attention on performance shortcomings.
When selecting a low bit-rate codec, remember that the bandwidth
efficiency obtained will be less than the compression ratio as determined by
the bit rate alone. VoIP packets are small, meaning that the header accounts
for a significant portion of the bits. For example, G.729 has a compression
ratio of 8:1 (compared to G.711),but the bandwidth efficiency of G.729
packets (with one frame per packet) is approximately 4:1. For networks
running with low proportions of voice traffic compared to data traffic, the
savings in terms of percentage of overall capacity may not be worth the
cost in terms of voice performance. This trade-off must be examined
independently for each network.
Where a low bit-rate codec is used, it may be advisable to specify a higher
rate for certain call scenarios that are particularly vulnerable to transcoding
degradation. Equipment features and the network architecture will
determine whether it is possible to implement contingent selection of
codec. Three-way and n-way conferencing and calls to or from cellular/
wireless networks are two situations that show unavoidable degradation
with additional low bit-rate coding.
Bandwidth calculators are useful in understanding the capacity
implications of various network provisioning choices.

Delay
Networks that will carry calls from cellular/wireless access, international
calls, or private networks with global reach must pay close attention to
delay. Your choice of codec and packetization can increase delay across the
network. Increasing the speech payload of the packets will improve

Copyright © 2004 Nortel Networks Essentials of Real-Time Networking


104 Section I: Real-Time Applications and Services

bandwidth utilization by reducing header overhead. It does so by increasing


delay. Ensure that your delay budget can handle the increase.
Table 5-2 shows some of the characteristics of the codecs most commonly
chosen for packet and wireless applications.
Here are some points to note:
G.711 and G.726 are sample-based codecs, and do not use frames
in their processing. The packetization delay (collection of digital
samples for the payload) is the main source of delay with these
codecs.
While their encoding rates are very similar, G.729 uses a 10-ms
frame, and G.723.1 uses a 30-ms frame. G.729 also has better
encoding distortion performance.
Where multiple frames per packet are used, the packetization delay
will increase according to the number of frames that must be
collected before the packet is sent.

Compatibility with existing equipment and other networks


Determine what codecs are running in your existing network equipment,
including in-building wireless and voice mail. Will traffic from the packet
network be handed over to the old network? (The answer to this is almost
always yes.). If older equipment is using a particular codec, it is worth
considering this for the new equipment as well. Alternatively, consider
bringing the older network in line with the new equipment. Where neither
is possible, try to select a codec for the new network that will minimize
transcoding distortion and delay. Again, G.711 gets top marks for
compatibility and interworking.

Robustness to packet loss


Where robustness to packet loss is an issue, the codec implementation must
include packet loss mitigation. For codecs without a built-in packet loss
concealment, an external PLC must be added. If packet loss from network
jitter will be a factor, then an adaptive jitter buffer will also contribute to
final quality. Commercial enhanced packet loss mitigation techniques may
contribute additional quality over a standard PLC plus adaptive jitter buffer.
In some cases, packet loss concealment can work so well that it is difficult
to tell that there is speech missing from the received signal. While this gets
high marks for low distortion where the listener does not know what was
actually said, there are implications for users where a received signal is so
well repaired that there is no indication that there is missing information.
Missing words can make the modified utterance nonsensical, in which case
a user will ask for clarification. Sometimes, however, a misplaced “no” or a
missing “thousand” can significantly alter the meaning, without leaving

Essentials of Real-Time Networking Copyright © 2004 Nortel Networks


Chapter 5 Codecs for Voice and Other Real-Time Applications 105

any clue that something is wrong. It may be possible to repair data losses
too well!

Highest priority is:


Rank bandwidth savings low distortion low delay
1 G.729/G.729A/G.723.1 G.711-64 G.711-64
2 G.726-32 G.726-32 G.726-32
3 G.711-64 G.729/G.729A G.729/G.729A
4 G.723.1 G.723.1
Table 5-3: Codec rankings for three different selection priorities. Rank one is the best fit for
given selection priority.

Restrictions on transcoding
For equivalent-to-TDM wireline access, there can be no transcoding to
frame-based compression codecs. For equivalent-to-2G wireless mobile-to-
land (2G being current digital cellular operating with TDM backhaul), only
one frame-based encoding can be tolerated.

Audio Codecs
The term audio codec can refer to all codecs intended to digitize sound, but
it often refers specifically to codecs intended to handle all signal types, or
especially nonspeech signals such as music. In general, the operation of an
audio codec is similar to that of a speech codec. Audio codecs are likely to
model the human auditory system, as compared to speech codecs, which
often model the human vocal tract.
Audio codecs are frequently used in IP applications and computer-based
audio applications. Audio streaming and exchange of compressed music
files are two common ones. Such applications are less likely than speech to
be real time, although some components of multimedia applications such
as games may include audio signals. Commonly used audio codecs are
summarized below.
Some general purpose audio codecs combine two different algorithms, one
for speech and one for nonspeech. This arrangement is essentially two
codecs, each operating on the portions of the signal for which it is best
equipped. A detection scheme is used to determine which algorithm is
appropriate for any particular signal. This technique allows the codec to
obtain better quality for a given compression (or higher compression for a
given quality) than might be achieved with a single coding algorithm, since
the use of specialized algorithms allows the codec to make simplifying
assumptions about the characteristics of the content.

Copyright © 2004 Nortel Networks Essentials of Real-Time Networking


106 Section I: Real-Time Applications and Services

Audio codecs offer multiple data rates. For some codecs, the higher rates
offer lossless compression. Recall that lossy compression is generally used
for real-time applications such as telephony. Lossless compression allows
complete recovery of the original; thus, there is no coding distortion.
However, there are limits to the amount of compression that can be
achieved with lossless techniques, and further compression must be lossy.
In the realm of streaming (and stored) audio, it is important to differentiate
between codecs, file formats, and players. Codecs convert the audio signal
from one form to another. The file format is a defined structure that
provides information needed by a player to parse the incoming data. (An
example of a format is .wav.) The format includes such information as the
codec and data rate used for compression. A player is a software device
used to play back the audio signal, and contains one or more decoders
associated with different codecs. The player is equipped to read the format
information and select the right decoder and any settings to restore the
analog output.
Some commonly used audio codecs are royalty-free standards. Others offer
a licence-free decoder, but licence the encoder to content providers. Some
audio codecs commonly used for streaming and music file exchange are
described below.

MP3
The MP3 codec is the de facto standard for music files stored and played in
the computer environment. It is closely associated with the Internet because
of music file sharing and the associated copyright disputes. It was
developed as an audio codec for digital video, which is hidden in its name:
MP3 stands for Motion Picture Experts Group (MPEG) 1, Layer 3.
The MP3 codec is used to compress audio files to reduce the space needed
to store them. MP3 compresses to several different data rates, which trade
off fidelity for file size. Because MP3 is used for listening only, encoding
delay is not an important factor.

MPEG-4 AAC
This new audio codec is part of the latest MPEG video coding standard. It
has been predicted that MPEG-4 AAC will displace MP3 as the de facto
music file standard because of its improved quality and features.

Ogg Vorbis
Ogg Vorbis is a creation of the Xiph.Org Foundation, a nonprofit developer
of tools for the Internet. It consists of two separate tools: the Vorbis codec,
which is a free-form variable bit-rate codec, and the Ogg transport
mechanism, which supplies free-form framing, sync, positioning and error
correction. Both the Vorbis codec and the Ogg transport mechanism are

Essentials of Real-Time Networking Copyright © 2004 Nortel Networks


Chapter 5 Codecs for Voice and Other Real-Time Applications 107

available royalty-free. The Vorbis codec can be used with RTP, rather than
Ogg, as the transport protocol.

RealAudio
RealAudio* is a proprietary coding system that offers free use of its
decoder, in the form of RealPlayer*. RealPlayer is used extensively for
audio streaming and audio clips offered over the Internet. RealAudio (and
the associated RealPlayer) includes a number of decoders ranging from
very low to very high bit rates. RealAudio 10, the release current at this
writing, uses data rates ranging from 12 to 800 kb/s (recall that monaural
linear PCM requires 705 kb/s). The codecs sport various rates, frequency
bands, and options that optimize for specific audio content and application
(for example, speech/music, mono/stereo, Dolby*). RealAudio 10 includes
the MPEG-4 AAC codec as one of its operating modes.
RealAudio includes a packet loss concealment feature. In addition, the
Surestream feature can dynamically adapt to changes in the available bit
rate to maintain a continuous playout even where the access channel may
be intermittently shared.

Wave
Wave is an audio data file format, not a codec. The Wave format is a
version of an Interchange File Format (IFF, a standard established for all
kinds of data, from sound files to pictures to musical scores). Wave files are
designated as .wav. They includes information about the type of data in the
file, how the data is encoded, the length of the file, and so on. The format
also specifies how the data is structured (chunked) inside the file, so that
players that read the file know what setup to use and how to parse the data.

Video Codecs
Analogous to speech and audio codecs, video codecs convert standard
analog video signals into digital Pulse Code Modulated (PCM) signals or
compress them. Because of the much greater information content of a
video signal, compression is even more important for video than for audio.
The nature of the analog video signal, the way video information is used
and transmitted, and the characteristics of human visual perception all
place special requirements on video codecs. Some background on the
analog video signal will assist in understanding the general digitization
process. The sidebar below describes how analog video signals are
generated and the formats used to transfer the signals to local and remote
receivers.

Copyright © 2004 Nortel Networks Essentials of Real-Time Networking


108 Section I: Real-Time Applications and Services

Sidebar: Analog video signals


A video signal represents the instantaneous brightness and colour
information of each element or “spot” of an image, as captured with a
camera or other video source. An image consists of a large array of such
picture elements or pixels, and motion or other changes are reproduced
by presenting updated images at a regular rate. The basic quality of the
image depends on the size of the image, the density of pixels used to
record it, and the rate at which the image is updated. Full-motion video,
especially the NTSC (North American) or PAL (European) broadcast TV
standards, has been developed so that the presentation rate of the images
is fast enough to offer an acceptable representation of the actual event.
(High definition television—HDTV, which operates differently from the
system described here—offers a significant improvement in image
quality over these standards.)
By convention, the image is scanned along each row of pixels, starting at
the top left and finishing at the bottom right of the image (as seen by the
viewer). Each scan provides a complete image called a frame, and the
rate at which these are created is known as the frame rate. For
television, the frame rate is 25 or 30 times per second. The scanning is
carried out in one of two ways: interlaced or progressive. Interlaced
scans proceed by scanning odd numbered lines, followed by even
numbered lines. Each of these scans of half of the lines is known as a
field. Progressive scanning takes each line in sequence. Interlacing is
applied to reduce flicker and is a very effective means of doing so.
Without interlacing, analog broadcast TV would have unacceptable
levels of flicker.
Interlace, however, creates some other problems when capturing and
displaying moving objects such as comb effects on horizontally moving
vertical lines. Storage and processing capability in modern display
devices allows a full frame to be generated from each interlace field.
This can then be presented at double the frame rate to reduce flicker and
at the same time avoid the problem of interlace artifacts. Computer
displays use progressive scan; they avoid flicker by using a higher frame
or refresh rate, which is at least twice that of TV.

Essentials of Real-Time Networking Copyright © 2004 Nortel Networks


Chapter 5 Codecs for Voice and Other Real-Time Applications 109

Sidebar: Analog video signals (Continued)


In high quality systems used for generation of broadcast quality video,
the sensor in a camera or imaging device usually consists of three
separate sensing devices, one for each of the three primary colours: red,
green, and blue (RGB). The incident light is optically separated into
three identical paths. An optical filter inserted in each light path,
allowing, red, green, or blue light to reach the sensor array. This same
effect is achieved (with lower definition) in consumer-grade video
cameras by using a single sensor with a colour filter of diagonal stripes
applied in the optical path. Processing is then applied to extract the
luminance (brightness) and chrominance (color-value) information.
Getting the image information from the camera to the display
RGB format provides the highest quality image reproduction, but also
requires high bandwidth to convey the information to the receiver. Two
alternative signal formats have been defined to reduce the bandwidth
required: component video and composite video.
Like RGB, component video uses three information channels, but
compresses the information by employing difference signals to carry the
color information. The three channels are designated as Y (Luminance),
CR, (Red minus Luminance difference signal), and CB (Blue minus
Luminance difference signal). By keeping the color and intensity
information separate, component video retains more of the original
image quality than does composite video by avoiding the noise
associated with imperfect separation of the signals.
Component video requires three parallel signal paths and is used in
broadcast production facilities and for distribution of video signals
locally, for example between a DVD player and a display device. Due to
the requirement for three parallel channels, component video is never
broadcast.
Composite video consolidates all the image information into a single
waveform. Composite video consists of a luminance signal to which is
added a composite sub carrier carrying the chrominance signal. This
chrominance signal is within the frequency band of the luminance signal
and therefore the combined signal can be broadcast using a single carrier
transmitter.1 A sub-carrier is a higher frequency component
superimposed on the primary carrier, which can be modulated
independently of the main carrier. The sub-carrier is a well defined fixed
frequency and can therefore be separated from the primary carrier in the
receiver.

Copyright © 2004 Nortel Networks Essentials of Real-Time Networking


110 Section I: Real-Time Applications and Services

Sidebar: Analog video signals (Continued)


Composite video loses some information in the consolidation process,
and the separation of signals in the receiver results in increased noise in
the luminance and color signals from imperfect separation. However, the
single waveform is a significant benefit to the design of distribution
systems, and the composite format is used for all analog broadcast and
cable TV.
To maintain accurate information as to which picture element is being
transmitted at any time, the signal contains indicators of the beginning of
each frame and the beginning of each line. This information consists of
an excursion of the signal amplitude into a region below that needed for
full black signal. A synchronization pulse occurs before the picture
information in each line. A longer pulse indicates the end of a field, in
the case of interlaced information, or a frame, in the case of progressive
scan. The period between lines, fields, or frames also provides blanking.
This “blacker than black” state ensures the “flyback” (the return scan of
the electron gun in a cathode ray tube [CRT] from the end of one line to
the beginning of the next) remains invisible on CRT display devices.
Gamma Correction
A nonlinearity known as Gamma correction was introduced to
compensate for non-linearities in brightness response of the receiver.
Differences in low level intensities are represented by proportionally
greater differences in signal level than similar changes in intensity at
higher intensity levels. Introducing this nonlinearity into the original
signal (that is, at the broadcast end) made the signal more robust to noise
picked up in the broadcast channel. It remains part of television
standards even though advances in the quality of broadcast and receiving
equipment mean that it is no longer necessary for its original purpose.
However, Gamma correction has provided benefits for videotape
recording as well as for digitization, where it reduces the number of
quantization steps needed to obtain good resolution for low intensity
signals.
1. The color signal is added to the subcarrier using a technique called quadrature
amplitude modulation, a combination of amplitude and phase modulation that
allows the sub-carrier to be modulated with two independent signals. A reference
signal called the color burst is added at the beginning of each scanned line, between
the synchronization pulse and the video information. This reference signal consists
of ten cycles of the unmodulated sub-carrier to be used as a phase reference for the
color subcarrier enabling both colour difference signals to be reliably extracted.
The frequency of the color subcarrier was carefully chosen to work well with either
a color or monochrome receiver. For NTSC, it is nominally 3.58 MHz and for PAL
it is nominally 4.43 MHz.

Essentials of Real-Time Networking Copyright © 2004 Nortel Networks


Chapter 5 Codecs for Voice and Other Real-Time Applications 111

Digitizing Video signals


Analog video signals may consist of separate signals for luminance and
color, or a single composite signal. These analog signals can be digitized,
just like the analog sound signals discussed earlier. Any particular video
codec will operate on only one of the three standard formats: RGB,
Component, or Composite. The input signals are not interchangeable. As
well, digitizing RGB or Component video generally yields a better
reproduction quality than digitizing composite video because the analog
signal is higher quality to begin with. Given the higher bandwidth
associated with RGB signals, however, video codecs for RGB input are
rare.
A video codec designed for component video is really a set of three codecs,
one for each of the Y, CR, and CB signals. In most cases, the sampling rate
used for the two colour difference signals (CR, CB) is lower than that used
for the luminance signal (Y). Established coding standards for component
video codecs define a sampling rate of 13.5 MHz for the luminance signal.
This frequency was selected to be compatible with both the NTSC and PAL
systems. The chrominance (color difference) signals are sampled at half of
this frequency (6.75 MHz). Because the frequency of 13.5 MHz was
chosen to be approximately four times the NTSC subcarrier frequency of
3.58 MHz, this sampling is referred to as 4:2:2, indicating the sampling rate
is approximately four times the nominal colour subcarrier for the
luminance sampling rate and twice the nominal sub-carrier for each of the
color luminance difference signals. The frequency 13.5 MHz has an
additional advantage of compatibility with both 525 line NTSC systems
and 625 line PAL systems. The digital bit stream formed from the three
components are transmitted sequentially, frame-by-frame, in a well-defined
pattern.
Unlike the speech and audio codecs described above, video codecs
invariably rely on linear step sizes for quantization. This means that equal
differences in signal level equate to equal levels of digitized bit patterns.
However, because the signal levels themselves are not linear, the actual
luminance levels are not represented in linear steps. This nonlinearity,
called the Gamma correction, was defined as part of the analog video
standard to improve reproduction of low brightness levels (see Sidebar on
Analog Video Signals, above). The existence of the Gamma correction in
the analog signal means that an acceptable dynamic range and noise
performance can be obtained with only eight-bit quantization. Without the
Gamma correction, a quantization using fourteen bits or more would have
been necessary.
Video signals require a large amount of data to describe each frame.
Fortunately, they contain substantial temporal and spatial redundancy,
providing the opportunity for significant compression. Video compression

Copyright © 2004 Nortel Networks Essentials of Real-Time Networking


112 Section I: Real-Time Applications and Services

is lossy, but with careful design of the compression technique, these losses
will not be detrimental to the perceived image.
Processing is done to identify redundancies between adjacent frames. For
example, a static background need not be updated until there is any change.
A moving object either remains approximately static on the screen while
being tracked by the camera, in which case the background will move, or
the object moves across the display and the background remains static.
Compression algorithms also look for a number of other common types of
movement to facilitate compression. Such analyses identify redundant
information that can be removed, and allow a high proportion of the
remaining information to be coded in differential terms.
Video codecs use either eight or ten-bit encoding. Composite video is
normally eight-bit encoded. Higher definition signals are sometimes ten-bit
encoded, in which case the most significant eight bits are regarded as the
integer part and the least significant two bits are regarded as fractional.
This allows decoding equipment designed to handle eight-bit streams to
handle the ten-bit words by simply truncating them to eight bits.
Synchronization of sound and video applies to both analog and digital
signals. However, it is perhaps more of a problem in digitized video
because of differences in encoding delay for speech/audio codecs and
video codecs. Audio and video signals may be separated, which is
beneficial where the identity of the far end receiver is not known, because it
may be possible to capture and play the audio component even where there
is no video receiver and display. However, separate streams for audio and
video must be synchronized for playback within certain tolerances;
otherwise, the quality of the playback is reduced. For full motion video, the
audio signal should not lead the video image by more than 20 ms, nor trail
it by more than 40 ms.

Common Video Coding Standards in IP Applications


Some popular products in use today are listed below. However, this list
changes rapidly, so this should be considered examples only. There are over
one hundred video codec products available.

File formats, players, encoders and decoders, streams


Care must be taken not to confuse codecs (compression algorithms and
procedures) with file formats. The MPEG and ITU (H.261, H.262, H.263,
H.264) standards define the syntax of a video stream together with methods
for reconstructing the video stream suitable for presentation. The latest
version of the stream format is known as Advanced Video Compression or
AVC. ITU-T has standardized AVC as H.264.
At the highest level there are players. Players are software devices that
contain one or more decoders, and that determine from information in the

Essentials of Real-Time Networking Copyright © 2004 Nortel Networks


Chapter 5 Codecs for Voice and Other Real-Time Applications 113

file which decoder to use. There are three players in common use today:
Apple’s* QuickTime*, Windows Media* Player, and Realplayer*. These
are described below.
Players take streamed information (information being presented in real
time) or information in a recorded file and convert it back to a form suitable
for presentation to a driver and hence to a display device. These players can
accept information in many common file and stream formats. File and
stream formats contain not only the information to be presented, but also
information telling the player how to decode the information in the stream.
Among other things, the information indicates which decoder and data rate
to use. Typically, files consist of frames (file frames, not to be confused
with video frames) with a file header containing the control information
followed by the video data. Most players can begin decoding at the start of
any frame and can ignore any partial frame data in the bit stream prior to
the next start-of-frame.
Even when they only provide the decoding function, video decoders are
usually referred to as codecs. The need for a decoder assumes the prior use
of an encoder. Decoders are sometimes referred to by the coding standards
to which they refer. For example, H.261 and H.263 are video codecs, as is
MPEG. Strictly, MPEG is both a file format and a codec. The MPEG
standard includes enough detail to define a file or stream that an MPEG
decoder can play. H.261 has been largely superseded by H.263. MPEG is
included here in the codec section, but is more strictly a standard defining
the data structure that a codec will act upon.

Video Codecs
Sorenson. Sorenson is a proprietary codec from Sorenson Media*. Apple
uses it in their QuickTime player. It has a number of encoding features
including bi directional (B frame) frame encoding. Added to this it
incorporates the ability to drop B frames when needed due to short term
restrictions on data rate. This makes it very tolerant to variations in data
rate. It also automatically determines video frames where new scenes
begin, based on how much video changes between adjacent frames, and
flags these as key frames, which are then used to begin a sequence of
decoding.
Cinepak*. Cinepak was an early codec that was rapidly established as a
standard. Cinepak is an asymmetric codec: the video has a long
compression time, and can not accept video input in real time, but it can
decode in real time. This ensures a smooth playout of streamed material.
Cinepak is included with Quicktime. Like Sorenson, it identifies key
frames based on the difference between adjacent video frames, and flags
these to indicate the beginning of a decoding sequence.
H.263. H.263 was designed for and is primarily used for video
conferencing and is a symmetrical real-time codec. It is limited to displays

Copyright © 2004 Nortel Networks Essentials of Real-Time Networking


114 Section I: Real-Time Applications and Services

of 352x288 pixels or smaller and when acting on material of larger size, it


scales accordingly, losing resolution. It is optimized for low data rates and
relatively low motion (slow), with little change between successive video
frames. The compression technique used in H.263 is Discrete Cosine
Transform (DCT) with motion compensation. It works well for its intended
use as a video conferencing standard and codec.
Indeo* (formerly “Intel* Indeo” but now owned by Ligos* corporation).
Indeo is a range of codecs based on both DCT and later versions on
Wavelet, another compression technology. It is used for gaming and
streamed material. It is included as part of the QuickTime player, and was
once provided as part of the Microsoft* operating systems, but is no longer.
MPEG-2. MPEG-2 is the de facto standard for broadcast quality video. It
provides excellent image quality and resolution. A subset of MPEG-2 is the
standard for DVD video. The compression ratio for MPEG-2 is variable
between 50 and 100, averaging about 70.
Playback is normally by means of a hardware decoder built into a DVD
player or cable broadband access set-top box. It offers a wide range of
options in terms of data rates and display definition levels. The
compression scheme used is Discrete Cosine Transform (DCT) with
forward and bi-directional motion prediction.
MPEG-4. MPEG-4 is based on, and similar to, H.263. However, it is
designed to deliver interactive multimedia across networks. It is capable of
dealing with objects within the video frames as well as video frames
themselves. The compression scheme is DCT with motion prediction.
MEG-4 AVC (MPEG-4 Part 10). Although it is part of MPEG-4,
Advanced Video Codec (AVC) operation is closer to MPEG-2 than it is to
the to the other parts of MPEG-4. In particular, it does not incorporate the
object processing features of the MPEG-4 suite. AVC requires more
processing power than MPEG-2, but offers several advantages. It offers
higher compression than MPEG-2, works better with packet-based
networks, and is more robust to packet loss. AVC has been standardized by
ITU-T as H.264.
Dicas*. Dicas produces a range of codecs for MPEG under the brand name
Mpegable(TM), which is MPEG-based video coding technology with a
specific focus on MPEG-4 and MPEG-4 AVC/H.264.
DivX*. The DivX decoder is offered as freeware for personal use. DivX is
often used for video clips traded over the Internet.

Players
QuickTime. QuickTime is a comprehensive system from Apple and is
among the most popular coding and decoding systems available today. It is
a comprehensive system that handles video, still images, music, and speech

Essentials of Real-Time Networking Copyright © 2004 Nortel Networks


Chapter 5 Codecs for Voice and Other Real-Time Applications 115

(telephony), and is compatible with formats designed for wireless. The


decoder is available free for all major platforms. QuickTime contains
technologies from Sorenson and Indeo among others.
The QuickTime architecture is progressive download, or “fast-start,” which
allows users to start playing video and audio before the entire file has
downloaded. It also has a wide range of features, such as text tracks for
subtitling, Virtual Reality panorama, and object support.
RealPlayer*. Similar to the RealAudio player described under Audio
Codecs, above, Realplayer is a comprehensive system including a player,
or decoder, available free, as well as encoders and a Digital Rights
Management system. It is available for all major platforms and is very
popular for viewing streamed information over the Internet and Wireless.
RealNetworks*, Inc. claims to provide “the universal platform for the
delivery of any digital media from any point of origin across virtually any
network to any person on any Internet-enabled device anywhere in the
world.”
RealMedia* is particularly suitable for long-duration or live-broadcast
time-based media. Subjective performance is significantly influenced by
performance of the processing architecture used for decoding typically a
computer and the performance of that computer.
Windows Media Player. Windows Media Player is a comprehensive
system incorporating codec technology from a number of third parties.
Windows Media Player is widely available as a component of Microsoft
operating systems. The player accepts media formatted in all common
formats. The soft decoders are downloaded as needed. Due to the
widespread distribution of Windows Media Player, it is becoming a de
facto standard for streamed and stored video files. The Windows Media 9
video codec has been standardized by the Society of Motion Picture and
Television Engineers (SMPTE) as VC-1.

Copyright © 2004 Nortel Networks Essentials of Real-Time Networking


116 Section I: Real-Time Applications and Services

What you should have learned


This chapter provided background on speech, audio, and video codecs
required for understanding their use in real-time applications. The basic
characteristics of codecs were defined, including sampling rate, data rate,
encoding delay, frame size, look-ahead, and compression ratio. Types of
codecs such as waveform and CELP were described.
Compression is an important aspect of coding for all analog media.
Compression allows more efficient use of network bandwidth, but
introduces impairments such as distortion and delay. Among speech
codecs, low bit-rate codecs provide higher compression ratios, but sacrifice
general utility, since they operate better on some signals (speech) than
others (tones, music), and differ in how well they cope with unwanted
signal components such as noise. Transcoding between compression
codecs can degrade quality, codecs can react differently to data loss (packet
loss). Mitigation techniques like packet loss concealment were covered.
Some codecs come equipped with silence suppression. How silence
suppression works, when and where silence suppression can improve
efficiency, and associated impairments were described. Effects of
background noise on Silence Suppression performance was addressed. The
concept of comfort noise was introduced.
The effects of transcoding on call quality were addressed in detail. The
reader was shown how to identify transcodings that increase distortion
versus those that do not. Transcoder-free operation is a goal for packet
equipment, since it both minimizes the number of transcodings and
maximizes the bandwidth efficiency of the network. Conference bridges
introduce transcoding, since the signals must be decoded to PCM so that
they can be mixed. Some bridges avoid mixing, but can introduce other
impairments because of this.
A detailed review of telephony speech codecs was provided. Codec
selection for a voice services network can be based on voice performance
(minimizing distortion and delay), bandwidth efficiency, or other criteria.
A short review of audio (non-speech) codecs was provided, including a
summary of commonly used codecs for audio streaming.
The general digitization of video signals and the operation of video codecs
were discussed. The general methods used to identify and remove
redundancy were described. Synchronization of audio and video is
important. The differences between file formats, codecs, players, and
streams were defined. A sampling of video codec standards and commonly
used players for video streaming was offered.

Essentials of Real-Time Networking Copyright © 2004 Nortel Networks


Chapter 5 Codecs for Voice and Other Real-Time Applications 117

References
General References:
ITU-T Recommendation P.862, Perceptual evaluation of speech quality
(PESQ): An objective method for end-to-end speech quality assessment of
narrow-band telephone networks and speech codecs, Geneva: International
Telecommunication Union Telecommunication Standardization Sector
(ITU-T), 2001.
ITU-T Recommendation G.107, The E-Model, a computational model for
use in transmission planning, Geneva: ITU-T, 1998.
Codec Standards:
733-A, ANSI/TIA-733-A-2004, High Rate Speech Service Option 17 for
Wideband Spread Spectrum Communications Systems,
Telecommunications Industry Association, 2004.
BV16, J.Chen et al, BroadVoice™16 Speech Codec Specification, Version
1.2. October, 2003. (For further information, contact PacketCable, Cable
Television Laboratories, Inc.)
EVRC, ANSI/TIA-127-A-2004, Enhanced Variable Rate Codec Speech
Option 3 for Wideband Spread Spectrum Digital Systems,
Telecommunications Industry Association, 2004.
ITU-T Recommendation G.711, Pulse code modulation (PCM) of voice
frequencies, Geneva: ITU-T, 1988.
ITU-T Recommendation G.722, 7-kHz Audio-Coding within 64 kBit/s,
Geneva: ITU-T, 1989.
ITU-T Recommendation G.722.1, 7kHz Audio - Coding at 24 and 32 kb/s
for hands-free operation in systems with low frame loss, Geneva: ITU-T,
1999.
ITU-T Recommendation G.723.1, Dual-rate speech coder for multimedia
communications, (includes Annex A: Silence Suppression, and Annex C:
Channel Coding Scheme for use in wireless applications.) Geneva: ITU-T,
1996.
ITU-T Recommendation G.726, 40, 32, 24, 16 kbit/s Adaptive Differential
Pulse Code Modulation (ADPCM), (includes Annex A: Extensions of
Recommendation G.726 for Use with Uniform-Quantized Input and
Output-General Aspects of Digital Transmission Systems.), Geneva: ITU-
T, 1990.
ITU-T Recommendation G.728, Coding of Speech at 16 kBit/s Using Low-
Delay Code Excited Linear Prediction, Geneva: ITU-T, 1992.

Copyright © 2004 Nortel Networks Essentials of Real-Time Networking


118 Section I: Real-Time Applications and Services

ITU-T Recommendation G.729, Coding of Speech at 8 kbit/s Using


Conjugate-Structure Algebraic-Code-Excited Linear-Prediction (CS-
ACELP), (includes Annex A, Reduced Complexity 8 kbit/s CS-ACELP
Speech Codec, and Annex B, Silence Compression Scheme for G.729
Optimized for Terminals Conforming to Recommendation V.70.), Geneva:
ITU-T, 1996.
GSM-AMR, GSM 06.71 Version 7.0, EN 301 703 v7.0.1, Sophia
Antipolis. Digital Cellular Telecommunications System (Phase 2+);
Adaptive Multi-Rate (AMR); Speech Processing Functions; General
Description, (includes GSM AMR codec, VAD, DTX, NS, and other GSM
speech processing features), France: European Telecommunications
Standards Institute (ETSI), 1999.
GSM-EFR, EN 301 243 V4.0.1, Sophia Antipolis, Digital Cellular
telecommunications system (Phase 2): Enhanced Full Rate (EFR) speech
processing functions; general description (GSM 06.51 version 4.0.1),
(includes GSM EFR codec, VAD, DTX, and other GSM speech processing
features), France: European Telecommunications Standards Institute
(ETSI), 1997.
ITU-T Recommendation H.261, Video codec for audiovisual services at p x
64 kbit/s, Geneva: ITU-T, 1993.
ITU-T Recommendation H.263, Video coding for low bit rate
communication, Geneva: ITU-T, 1998.
ITU-T Recommendation H.262, Information technology-Generic coding of
moving pictures and associated audio information: Video, Geneva: ITU-T,
2000.
ITU-T Recommendation H.264, Advanced video coding for generic
audiovisual services, Geneva: ITU-T, 2003.
iBLC, Speech Codec Fixed Point Reference Code for PacketCable, Version
1.0.3, October 2003. (For further information, contact PacketCable, Cable
Television Laboratories, Inc. or see
http://www.ilbcfreeware.org/)
IS-641, TIA/EIA IS-641-A, TDMA Cellular/PCS Radio Interface
Enhanced Full-Rate Voice Codec, Revision A, Telecommunications
Industry Association, 1998.
MP3, ISO/IEC 11172-3, Information technology—Coding of moving
pictures and associated audio for digital storage media at up to about 1,5
Mbit/s. Part 3: Audio, Geneva: ISO, 1993.
MPEG-2, ISO/IEC 13818, Information technology—Generic coding of
moving pictures and associated audio information, (The complete suite
consists of nine parts), Geneva: ISO, 2000.

Essentials of Real-Time Networking Copyright © 2004 Nortel Networks


Chapter 5 Codecs for Voice and Other Real-Time Applications 119

MPEG-4, ISO/IEC 14496, Coding of audio-visual objects, (The complete


suite consists of nineteen parts), Geneva: ISO, 2004.
MPEG-4 AVC, ISO/IEC 14496-10, Coding of audio-visual objects, Part
10, Geneva: ISO, 2003.
SMV, C.S0030-0 v3.0, Selectable Mode Vocoder (SMV) Service Option for
Wideband Spread Spectrum Communication Systems. 3rd Generation
Partnership Project 2, 2004.
Further Information on MPEG video and audio coding:
John Watkinson, The MPEG Handbook, Oxford England/Burlington, MA:
Focal Press, 2001.

Copyright © 2004 Nortel Networks Essentials of Real-Time Networking


120 Section I: Real-Time Applications and Services

Essentials of Real-Time Networking Copyright © 2004 Nortel Networks


121

Section II:
Legacy Networks
Legacy networks rely largely on Time Division Multiplexing (TDM) and
Synchronous Optical NETwork (SONET) technologies. Time Division
Multiplexing networks were designed and built specifically for one real-
time application, namely Voice. TDM networks now carry data as well as
voice, but make no distinction between real time and non–real time
applications because all data are treated as real-time. TDM is exceptionally
good at delivering real-time service, and Chapter 6 reviews how it achieves
such high performance.
TDM uses time slots to combine individual calls together on faster links in
the network core. Optical networking provide very fast links, and the
SONET standard was developed to facilitate interworking between optical
networks. Synchronous Digital Hierarchy (SDH) is the international
equivalent of SONET.
SONET is not just for Telcos—virtually all optical Layer 1 uses SONET.
Large Enterprises and even small Enterprises use SONET for long (> 15
km) distances. SONET is agnostic about the form of the data riding on it,
and continues to be used as Layer 1 for IP links.

Copyright © 2004 Nortel Networks Essentials of Real-Time Networking


122

Essentials of Real-Time Networking Copyright © 2004 Nortel Networks


123

Chapter 6
TDM Circuit-Switched Networking
Stephen Dudley

Signaling Voice

SS7 T1 ISDN

TCAP ISUP MF Q.931 G.711 Voice codec

SCCP MTP3 ISDN


AAL1/2
MTP2 Voice Switching ATM
MTP1

SONET / TDM
Figure 6-1: Transport path diagram
For the TDM Network, our transport path diagram has components that
don't exist in the packet switching network as well as components that we
would recognize in the other diagrams in the book. Shown in the diagram
are the following components:
The voice trunk signaling mechanisms most commonly used in the
network (MF, ISDN, SS7).
The voice path, which always uses a G.711 codec when the Public
Switched Telephone Network (PSTN) is employed.
Other speech codecs besides G.711 can be used, but they have to be set up
on a dedicated path without PSTN switching, one example, using an ATM
transport, is shown in Figure 6-1. When using this kind of a dedicated
connection, a wide variety of codecs can be used.

Copyright © 2004 Nortel Networks Essentials of Real-Time Networking


124 Section II: Legacy Networks

Concepts covered
Why the TDM Network is built around a 64 kb/s channel.
How a telephone call proceeds through the network.
How the digital switch uses a dedicated connection to end users
(line) and a nondedicated connection between switches (trunk).
How the digital switching network uses a switch hierarchy with a
trunking overlay of high usage to minimize the information
maintained in call routing tables.

TDM principles
The public telephone network was being converted to digital in the 1970's.
Even at that time, data rates for digital signals were fast enough that a
single transmission facility could send far more data than was needed to
support a single voice conversation. The technique of multiplexing multiple
digital streams onto a single facility by assigning each stream to a
particular block of time in round robin fashion, called Time Division
Multiplexing (TDM), became the basis for the Public Switched Telephone
Network (PSTN).

Framing
Sequence

DS0 #1

DS0 #2

DS0 #3

DS0 #3
Data
Stream
Figure 6-2: Time Division Multiplexing
The diagram illustrates how multiple signal streams (called DS0 here) can
be put together with a Framing sequence to create a complete bit stream.
Each data stream is assigned a time slot to transmit. The framing sequence
contains a recognizable pattern that can be easily detected in the data
stream and that has a definite start and endpoint. By knowing the starting
point of the framing sequence, it is possible to know which bits belong to
which data streams.

Essentials of Real-Time Networking Copyright © 2004 Nortel Networks


Chapter 6 TDM Circuit-Switched Networking 125

Sidebar: Voice and data convergence, Round 1


Although the prime motivator for public carrier network architecture is
the efficiency that TDM brings to voice transmission, protocols were
developed to adapt TDM paths to carry data traffic as well. It is not
uncommon for networks that claim to be all data, to have some portion
of the transport path over TDM links (for example, T1/E1, DS3, OC3/
OC12/OC48). Mixing of voice and data on TDM networks developed
partly because it was conceptually simple to do, but also because TDM
networks extend everywhere. Once the protocols were implemented, it
was possible to send data anywhere voice could go.

64 kHz building block data rate


The TDM network was built on the assumption that it would be carrying
digital streams from voice conversations. The standard codec used in TDM
networks is G.711 that runs at 64 kb/s. Each G.711 sample is an eight-bit
byte (or word), representing the amplitude of the waveform where
sampling occurs. A total of 8000 samples are taken per second to create the
digital stream for one voice channel. The choice of 8000 samples per
second is based on the need to be able to reconstruct signals of up to 4000
Hz.
8 bits per sample × 8000 samples per sec = 64,000 bits/s = 64 kb/s
Since calls are duplex, each call requires two 64-kb/s paths, one in
each direction.
Note: The minimum sampling rate is double the frequency of the highest
frequency being carried.
Higher and lower data rates are possible on a TDM network by subdividing
or combining time slots. Higher rates are used to transmit high quality
signals such as studio quality audio (which are used, for example, in TV
and radio interviews with people in remote locations). The lower rates are
used where transport facilities have a higher cost (for example, long-haul
trunks) or where there are a limited number of channels compared to the
number of people who use them (for example, undersea cable). As an
example, a commonly used lower rate is the G.726 codec, running at
32 kb/s. A 64 kb/s channel can carry the data associated with two 32 kb/s
signals.

Multiplexing
A channel capable of carrying one call is called a DS0. The faster the
transmission rate, the larger number of calls can be combined onto it.
Combining channels together is called multiplexing and the output of one

Copyright © 2004 Nortel Networks Essentials of Real-Time Networking


126 Section II: Legacy Networks

multiplexer can be used as an input to other multiplexers. The requirement


to interwork between vendors means that a set of standard multiplexing
rates has been developed.
Figure 6-3 shows how DS0 channels are combined into larger chunks
within the TDM hierarchy. The number of channels being combined varies
slightly, depending on whether you are in North America or elsewhere in
the world. In North America, 24 DS0s are combined into a DS1, also called
a T1. In Europe and much of the rest of the world, thirty two DS0s are
combined into an E1. Similarly, T1s and E1s are combined into higher
capacity facilities. The standard sizes of higher capacity links are given in
Figure 6-3.
Note: A T1 (that is, Trunk Level 1) was originally an outside plant
transmission system (with analog repeaters) and did not refer to the digital
interfaces on multiplexers inside the telephone office, which are called
DS1. The terms are often used as if they mean the same thing, and we will
use them interchangeably in this chapter as well.
In Figure 6-3, SONET systems are indicated as yellow boxes. Systems
shown in green employ asynchronous, rather than synchronous
transmission technologies. They eventually are mapped into the lowest
level of the SONET hierarchy at the OC-3 or STM1 level. For example,
three DS-3s are mapped into an OC-3 (North America) and three E3s are
mapped into an STM1. The OC-3 and STM-1 have identical data rates.
Knowing how many voice channels fit into each level of the hierarchy can
be used to calculate the total payload of each system. The chart in
Figure 6-3 shows the following information:
DS1 = 24 channels
DS3 = 24 channels X 28 DS1/DS3 = 672 channels
OC-3 = 24 channels X 28 DS1/DS3 X 3 DS3/OC3 = 2016 channels
OC-12 = 24 channels X 28 DS1/DS3 X 12 DS3/OC3 = 8064
channels
OC-192 = 24 channels X 28 DS1/DS3 X 192 DS3/OC3 = 129024
channels

Essentials of Real-Time Networking Copyright © 2004 Nortel Networks


Chapter 6 TDM Circuit-Switched Networking 127

Figure 6-3: SONET/SDH hierarchy

The importance of clock rate and synchronization in TDM


TDM relies on both a calibrated clock rate and exact synchronization
among all the clocks in the network. In some ways, TDM data travels like a
headerless packet. Only the data is sent. The way that the network knows
which bit belongs to which data stream, is by which position in which time
slot it occupies in the data stream. TDM transmission systems are
essentially point to point. The TDM network delivers the data on a specific
time slot, to a specific destination. That destination is configured offline
from the data transport process. The only way to flexibly determine a
destination for a data stream, is to use a set of switches that knows how to
select transmission facilities and timeslot(s) to reach that destination. On
the Public Switched Telephone Network, digital voice switches handle that
role and in a data network, a router or ATM switch can handle that role.

Copyright © 2004 Nortel Networks Essentials of Real-Time Networking


128 Section II: Legacy Networks

Principles of digital switching, voice switches


Almost everyone in the world can be reached by telephone, and it is almost
certain that their connection is to a digital switch. Some understanding of
the way that they work makes it easier to build a network that
accommodates this interface. To understand how digital switching works, it
is probably best to start with the familiar wireline telephone set and look at
a typical call flow.
Figure 6-4 illustrates call progression from off hook to talk path
establishment.
Each telephone set has dedicated hardware (probably a line card) in the
digital switch that is responsible for managing signaling and transporting
speech information.
When the telephone set goes off hook, the digital switch detects that the
user wants to make a call and applies dial tone to the line. The user hears
the dial tone and starts dialing digits. Those digits are received by the
switch and the connection to the called party is set up. The switch then
sends a ringing tone to that party and a ringback tone to the calling party.
When the called party answers the phone, the talk path is completed
between them.

Digital Switch
IB M

Calling Party Called Party

1 2 3 1 2 3

4 5 6 4 5 6

7 8 9 7 8 9

* 8 # * 8 #

Off Hook Start Signal

Dial Tone

Dial Digits Digits


Ringing
Ringback
Phone Rings
Conversation
Answer Phone

Figure 6-4: Call progress


For almost all residential wireline (as opposed to wireless) telephone
company customers, conversion from analog to digital occurs in the line
card at the telephone switch (wireless analog to digital conversion is in the
handset). After conversion, the user's voice is transmitted digitally on a
TDM network of some kind. At the far end, the terminating switch converts

Essentials of Real-Time Networking Copyright © 2004 Nortel Networks


Chapter 6 TDM Circuit-Switched Networking 129

digital to analog and transmits the signals to the user's telephone. In


between, the signal is digital, and in a traditional telephony environment in
the format defined by G.711.
In switching terminology, each 64 kb/s channel running between switches
is called a trunk. The connection between the switch and the telephone is
called a line. Each wireline telephone customer has a dedicate pair of wires
that run from either the telephone Central Office or PBX all the way to the
customer's location or from Digital Loop Carrier (DLC) equipment located
in the customer's neighborhood. Figure 6-5 shows the differences between
a trunk and a line.

User Phone Digital Switch Digital Switch


64 Kb/s
Channel
IBM IBM

Wire
1 2 3
4 5 6
7 8 9
* 8 #

Line Trunk

Digital Loop Carrier Digital Switch Digital Switch


User Phone
64 Kb/s IBM
64 Kb/s IBM

Wire Channel Channel


1 2 3
4 5 6
7 8 9
* 8 #

Line Trunk

Figure 6-5: The difference between a trunk and a line


Figure 6-6 illustrates switching from line to line, line to truck, or trunk to
line. In its simplest form, the switch provides the interconnection between
all of the lines (which connect to telephones), and all of the trunks (which
connect to other switches). For every call initiated by a user on that switch,
the user's line must be connected to either another user's line in the switch
or with a trunk leaving the switch. For every call arriving at a switch on a
trunk, that call must either be connected to a user's line on that switch or to
another trunk for transmission to another switch.

Copyright © 2004 Nortel Networks Essentials of Real-Time Networking


130 Section II: Legacy Networks

1 2 3
4 5 6
7 8 9
* 8 #
IBM

Line to Line Trunk to Trunk

1 2 3
4 5 6
7 8 9
* 8 #

Line to Trunk Trunk to Line

Figure 6-6:Switching from line to line, line to trunk or trunk to line


The concept of lines and trunks does not exist for VoIP calls. For a VoIP
caller to reach users of traditional phones, a gateway must be provided to
convert the VoIP data stream into a 64 kb/s channel on a trunk that connects
to the switch. A discussion of some of the different types of trunks is
included near the end of this chapter.

Sizing network connections, trunks, from a switch


Every user of the switch has an individual line appearance in the switch.
There are not, however, as many trunks as there are lines. There are only as
many trunks between any two switches as the traffic volume between them
requires.

Sidebar: Erlangs
The number of trunks needed to support a given calling volume was
initially studied by a man by the name of Erlang. Because of the
statistical nature of call arrivals, it is not possible to add up the total
number of minutes or seconds that people want to talk on the phone and,
with a little arithmetic, calculate the number of trunks needed. However,
since it is based on statistics, the relationship between the number of
trunks needed to support a given calling volume 99% of the time or
99.9% of the time is always the same; so, traditionally, these values have
been captured in tables. As you might expect, they are often called
Erlang tables or Poisson tables because of the basic distributions
involved.
The one piece of direct arithmetic that does go into an Erlang table is the
calling volume. It is either measured in units of call seconds or in units
of (surprise!) Erlangs. One Erlang is one trunk continuously used for one
hour. This measurement is directly equivalent to 3600 call seconds (60
seconds X 60 minutes = 3600 call seconds). And 3600 call seconds = 1
Erlang.

The telephone company identifies how many trunks are needed by


measuring how much call traffic is associated with the busy hour. Traffic

Essentials of Real-Time Networking Copyright © 2004 Nortel Networks


Chapter 6 TDM Circuit-Switched Networking 131

fluctuates over the course of the day and the hour with the most traffic is
called the busy hour. Traffic also fluctuates from day to day in both a
statistical and a nonstatistical way. In North America, the busiest hour of
the whole year is likely to be sometime on Mother's Day. Almost all
telephone networks are engineered to deliver 99% or more of all traffic on
that busiest hour of the year.

Control, call routing


Call routing in a telephone system is based on the switch knowing which
trunks can be used to route a call to a specific telephone number. Part of
this information is kept in tables inside each switch, and part of it is
information built into the architecture of the entire switching system.
Digital voice switches carry routing information about a very small number
of destinations. The way that the telephone network optimizes the path
between endpoints, is through the offline traffic engineering practice of
setting up high usage trunks and the deliberate distribution of routing
information between switches in a hierarchical manner. If a high usage
trunk path is unavailable at a given switch, it uses a trunk destined for the
switch sitting at the next higher level of the hierarchy until it reaches a
point where tables within the switch carry routing information about either
the destination digital switch (called and end office) or the area code or the
country code that the user dialed. At the international gateway level, all
country codes are data-filled so that all calls will be guaranteed to receive,
as a minimum, information about how the far end can be reached.
An optimum path is not calculated for each call. Although protocols for
dynamically maintaining routing information between telephone switches
have been developed, they have not been widely deployed. From a data
networking paradigm, most of the information in digital switch tables looks
something like static routes, in that it is manually provisioned. It is,
however, much more flexible than static routes, in that it can be provisioned
to have an extremely complex route selection algorithm that accommodates
a wide variety of load sharing and failover mechanisms. Unlike the data
networking example, the digital switching world also has a reliable
mechanism for detecting end-to-end path availability so the reliability of
the routing process is better. These differences between the digital
switching and IP routing approaches to routing are important to note for
engineers who need to implement real-time networking solutions on a data
network. What is not there in each approach, can sometimes be as
important as what is there.

Signaling
Two types of signaling systems must exist within the switch, one for lines,
and one for trunks. Line side signaling communicates with telephone sets.
There are dozens of signaling protocols that might be used, each specific to
a particular country, type of telephone switch, or telephone set. Trunk side

Copyright © 2004 Nortel Networks Essentials of Real-Time Networking


132 Section II: Legacy Networks

signaling systems communicate between switches. There are three major


categories of signaling systems:
per trunk
ISDN
SS7
There are perhaps hundreds of per trunk signaling systems in use
throughout the world. These are the oldest kinds of signaling systems. They
steal bits from the data stream on the trunk to send signaling information
such as going on hook (hanging up). These allow point-to-point
communication between switches that share trunks.
ISDN signaling systems came along at a time when people were trying to
simplify signaling and build a universal network. You might expect that
you could just specify an ISDN connection and be able to count on the
signaling protocol, but there are dozens of ISDN variants. ISDN signaling
steals one or two trunks from a T1 or E1 to send signaling information for
all of the calls on that T1 or E1 (as many as eight T1).
Signaling System 7 is the closest thing that there is to a universal signaling
system between telephone switches. Entire countries often use the same
variant, usually some flavor of the ANSI or ETSI standards. SS7 signaling
relies on an entirely separate signaling network, very closely monitored, to
send signaling information between switches.

Per Trunk signaling

Gateway Switch A

Trk Trk
IP
Network
Call Control
No Gateways for
Per Trunk Trunks to
other
Signaling
switches

Trk: Trunk Interface with Per Trunk Signaling

Figure 6-7: Trunking - Per Trunk Signaling


When trunk signaling is done on a per trunk basis, each trunk (that is,
64 kbps channel) donates some of its bits to provide a signaling channel.
This reduces the effective bandwidth of the trunk to 56 kbps. Each switch
makes independent routing decisions to select a trunk bound for the next
and then pass call routing information to the next switch in the path.
From a practical perspective, there are no known gateways to/from IP
networks that support any of the per trunk signaling mechanisms.

Essentials of Real-Time Networking Copyright © 2004 Nortel Networks


Chapter 6 TDM Circuit-Switched Networking 133

ISDN signaling
Robbing bits from every channel to convey signaling information had
limitations for data communications and setting up an end-to-end
connection was slow. Another mechanism to convey signaling information
is to rob a complete channel from a DS1 or E1 for signaling purposes to
allow the other channels to be delivered at full rate (that is, 64 kb/s). This
scheme is implemented in the Primary Rate Interface1 (PRI) of ISDN
(Integrated Services Digital Network). A PRI is essentially one DS1 or one
E1. For a DS1, channel 23 (the last one in the 0-23 sequence) is used for
signaling. For an E1, channel 15 is used. These channels are called D (data)
channels and the other 23 (or 31) are called B (bearer) channels.

Switch
PRI Gateway
PRI Trk Trk PRI
IP
D D
Network
Call Control

Trk Trunk Interface


D D Channel Handler
PRI Primary Rate Interface Trunks

Figure 6-8: Trunking - ISDN


A digital switch that supports ISDN signaling will have some kind of a D
channel handler to relay messaging to the call control function of the
switch. As with the per trunk signaling mechanisms, the call setup process
progresses from switch to switch over the same transport facilities used to
carry the bearer channels.
To complete a call, each switch must find a path to the next switch in the
chain that leads towards the far end user. The first switch finds an idle
trunk, marks it so that nobody else can use it, and signals the far end switch
over the D channel. The signaling protocol used for ISDN is Q.931.

Q.931
The signaling protocol used by ISDN facilities is Q.931. The Q.931
protocol defines the messages sent over the D-Channel, both in terms of the
message format and the message sequencing. Figure 6-9 illustrates a

1. Note that if someone has an ISDN phone, it would not have PRI signaling. Instead, it would have BRI
signaling. A Basic Rate Interface (BRI) is two 64 kbps bearer channels and one 16 kbps D channel.
There are D Channel handlers for line side peripherals as well that permit signaling information to be
relayed to the call control functions of the switch.

Copyright © 2004 Nortel Networks Essentials of Real-Time Networking


134 Section II: Legacy Networks

typical call connection and tear-down sequence between two ISDN phones
connected to a Private Automatic Branch Exchange (PABX).

IBM

Originating Terminating
Switch IBM
Switch 1
4
2
5
3
6
7 8 9
* 8 #
1 2 3

Tandem
4 5 6
7 8 9
* 8 #

Switch

Off Hook
Setup
Dial Setup
Ringing
Proceeding Proceeding
Alerting
Alerting Answer
Hear Connect
Ringing Connect
Connect Ack
nowledge
Connect Ack
nowledge

Talk Path Established

t Hang Up
Disconnec
t
Hang Up Disconnec
Release
Release
omplete
omplete Release C
Release C

Figure 6-9: Q.931 Signaling


Besides having a D channel designated on every DS1 or E1, a single
64 kb/s channel can also convey signaling information for more than 23
channels. This mode of operation is called Non–Facility-Associated
Signaling (NFAS). In some switches, up to eight PRI can use the same
signaling channel. The way that Non-Facility Associated Signaling
participates in call processing is almost identical to Facility Associated
Signaling. The only difference is that rather than using the D Channel from
an individual DS1 or E1, the D-Channel provides the signaling channel for
multiple DS1s or E1s.

SS7 signaling
The last way that signaling information can be provided is through an
entirely outboard communication system. This method is the way that
CCS7 (Common Channel Signaling System number 7) is implemented.
Figure 6-10 shows the interconnection of switches by way of the CCS7
Network.

Essentials of Real-Time Networking Copyright © 2004 Nortel Networks


Chapter 6 TDM Circuit-Switched Networking 135

Switch A
PRI Gateway
ISUP Trk Trk ISUP
IP
SS7
Network
Call Control Trunks to
other
switches

STP
Switch Z
STP

ISUP Trk Trk ISUP

SS7
Signaling SS7
Links Network
STP
Call Control
Trk Trunk Interface STP
SS7 SS7 Interface
STP Signal Transfer Point
ISUP ISDN User Part Trunks

Figure 6-10: Trunking - SS7


Each switch in the network is referred to as a Service Switching Point
(SSP) and it sends signaling messages to other switches by way of a
network of Signal Transfer Points (STP). Databases (not shown) can exist
in the network to help with tasks like translating an 800 number dialed by a
caller to local number. These databases are called Service Control Points
(SCP).
There are several IP to SS7 gateways available on the market today.
Generally, these systems are capable of supporting a very large number of
calls (thousands to tens of thousands). Using them requires connecting to
the SS7 network with appropriate signaling links. (See the following
paragraphs for more details.)
If the messages being sent are related to call setup, they will use the ISDN
User Part protocol (ISUP) as shown in the transport path diagram at the
beginning of the chapter. All other messages, such as queries for call
forwarding and local number portability, will use the Transaction
Capability (TCAP) protocol.
Common Channel Signaling System number 7 (that is, SS7 or C7) is a
global standard for telecommunications defined by the International
Telecommunication Union (ITU) Telecommunication Standardization
Sector (ITU-T). The standard defines the procedures and protocol by which
network elements in the Public Switched Telephone Network (PSTN)
exchange information over a digital signaling network to effect wireless
(cellular) and wireline call setup, routing and control. The ITU definition of
SS7 allows for national variants such as the American National Standards
Institute (ANSI) and Bell Communications Research (Telcordia

Copyright © 2004 Nortel Networks Essentials of Real-Time Networking


136 Section II: Legacy Networks

Technologies) standards used in North America and the European


Telecommunications Standards Institute (ETSI) standard used in Europe.
The SS7 network and protocol are used for the following functions:
basic call setup, management, and tear down
wireless services such as Personal Communications Services
(PCS), wireless roaming, and mobile subscriber authentication
Local Number Portability (LNP)
toll-free (800/888) and toll (900) wireline services
enhanced call features, such as call forwarding, calling party name/
number display, and three-way calling
The way that Common Channel Signaling is implemented is that each
CCS7 enabled switch in an area has a connection to a pair of STPs that
serve the local area. That pair of STPs connects to each other and to other
pairs of STPs in a hierarchy of connections that span a region. Switches
that serve as gateways can have connections to a pair of STPs that serve a
different area or region.
The way that Common Channel Signaling participates in call processing is
on a switch by switch basis. The switches talk directly to each other
through the CCS7 network. Each switch along the path receives a request
to participate in the creation of an end to end circuit. Each switch must
make a routing decision and determine if it has trunks available to
participate in that circuit. However, the time it takes to signal each switch
along the path is very short, since each switch along the path needs to
identify only whether a trunk is available before the initial address message
is forwarded.
Each signaling point in the network has a unique address called a Point
Code. Point Codes are assigned in a hierarchical manner. Switches are
grouped together into a cluster, and clusters form a network. The Point
Code shows the Network Identifier, the Network Cluster Identifier and the
Cluster Member number.

Signaling links
SS7 messages are exchanged between network elements over 56 or 64 kb/s
bidirectional channels called signaling links. From the perspective of
setting up a connection to a VoIP gateway, two or more of these links will
need to be set up. The company that owns the SS7 network will require that
the equipment being connected goes through a rigorous certification
process. The company depends heavily on their network and needs to be
sure that any equipment attached to it, and the messages networked behave
in an expected manner, and that unwanted messages do not flood the
network.

Essentials of Real-Time Networking Copyright © 2004 Nortel Networks


Chapter 6 TDM Circuit-Switched Networking 137

SS7 Message Flow - ISUP


The following figure illustrates at a high level the signaling messages sent
to set up a call on the SS7 network. Testing for continuity is optional but is
a common practice for large carriers.

IBM IBM

Originating Terminating
Switch Switch 1
4
2
5
3
6
7 8 9
* 8 #
1 2 3
4 5 6
7 8 9
* 8 #

Off Hook Inital Addre


ss Messag
e (IAM)
Continuity
Loop

Continuity
M essage
Ringing
ge (ACM)
s s Com p lete Messa
Hear Addre
Ringing Off Hook
NM)
essage (A
Answer M

Talk Path Established


Hang Up
Release
Release C
omplete
Hang Up
Release

omplete
Release C

Figure 6-11: SS7 Call Setup Signaling

Copyright © 2004 Nortel Networks Essentials of Real-Time Networking


138 Section II: Legacy Networks

What you should have learned


Voice channels on the Public Switched Telephone Network (PSTN) are
sampled 8000 times per second. This produces, with eight bit samples, a
channel of 64 kb/s. All TDM equipment carries 64 kb/s channels or
multiples of 64 kb/s channels. Every bit in a TDM stream occupies a
predetermined timeslot.
Conversion from analog to digital is often done at the digital switch.
Signaling (off hook, dialed digits) is needed for a digital switch to know
how to handle a particular call from a telephone set.
A digital switch has a dedicated line for every telephone set. It has
nondedicated trunks to connect between switches. Trunks go from one
switch to another.
The number of trunks needed to handle interswitch communication is
calculated from calling volume statistics using Erlang or Poisson tables.
A digital switch maintains its call routing tables (almost always) as manual
entries. In order to minimize the size of these tables, the switch takes
advantage of a hierarchy of switches within the network that maintain
information about portions of the numbering plan. A switch maintains its
local connection information, plus information about high usage routes. At
the apex of the hierarchy are international gateways that contain
information about all country codes in the world.
ISDN signaling uses Q.931 signaling.
SS7 signaling requires the setup of signaling links between switches and
Signal Transfer Points in an out of band.

Essentials of Real-Time Networking Copyright © 2004 Nortel Networks


139

Chapter 7
SONET/SDH
Anthony Lugo

Media Session Related

Session Gateway
Media Related Control Control

H.248 / MGCP
View From
Audio

Video
Voice

Real-Time

Application
Control

/ NCS
RTCP
RTSP

H.323
Perspective

SIP
To
codec Ntwk

RTP

UDP / TCP / SCTP


IP

QoS
MPLS Packet
AAL1/2 AAL5 Cable

Resiliency
ATM FR Ethernet DOCSIS

xDSL Cellular Fiber Copper WLAN HFC

SONET
SONET/ TDM

Figure 7-1: Transport path diagram

Concepts covered
The evolution of SONET, which is driven by the next generation
applications and services of the telecommunications market.
The SONET protocol functionality within the network layers.
SONET Network element types and attributes.
Variance of flexible solutions to ensure network survivability,
redundancy and exceptional reliability.
The significant role synchronization plays within the SONET world.

Copyright © 2004 Nortel Networks Essentials of Real-Time Networking


140 Section II: Legacy Networks

Introduction
Today we deliver information at the speed of light, and the fiber optic
medium has proven to be a viable solution as a delivery agent.
Synchronous Optical Network (SONET) is an optical standard that permits
multiple vendors to deliver data, voice and multimedia applications to
residential and business customers. The purpose of this chapter is to define
and illustrate the concept of Synchronous Optical Network (SONET),
allowing the reader to understand the associated SONET structure and how
it can deliver value added applications to the telecommunications industry,
benefiting the consumer with new services.
Figure 7-2 illustrates the progressive evolution of SONET, beginning with
the foundation of traditional private line services growing into the optical
private line service and entering the new market demand for the SONET
next generation data applications and services.

Fast
SONET next Ethernet
ATM
generation ESCON
RPR
Data Services SAN FDDI
Service Flexibility
& Development

OC-3c/ Fibre
STM-1 Channel
Optical Private OC-12c/
STM-4 HDTV Video
Line Service OC-48c/ GE LAN
STM-16 WAN
DS1
OC-192c/
Traditional Private DS3 STM-64
10 Gigabit
Ethernet
Line Service DS0

Network Service Evolution

Figure 7-2: SONET service application evolution

Overview
The SONET Network has evolved over the years due to the demands of
new applications and services. The increasing demand for the
telecommunications industry to deliver and meet the demands of the
consumer market, has migrated the once traditional SONET legacy, time
division multiplexing networks, into the Next generation of SONET.
SONET enters a new stage of growth with the telecommunications market
striving for reduced capital and operating expenditures. The growing
demand of high-speed internet access, network security, new application
delivery, and the full drive of a competitive market is demanding the
utilization of the SONET network infrastructure, which has been critical
for a successful business market operation.

Essentials of Real-Time Networking Copyright © 2004 Nortel Networks


Chapter 7 SONET/SDH 141

What once was LAN, MAN or WAN has been blended into the SONET
fiber cloud, creating flexible, secure applications, such as ATM, Storage
Area Networks (SAN) and high bandwidth applications.
As the evolution of applications and services has been developing, so has
the Optical Network backbone infrastructure. In the early development of
the Optical Legacy network, the majority of configurations used to be a
simple ring or linear applications.
Today's optical network involves more complexity, as a network element
has evolved into a HUB or meshed network creating more flexibility and
density per Network Element (NE). A single NE that used to add and drop
traffic for a single BLSR ring, can now terminate multiple different types
of network configurations, such as a BLSR, UPSR, Linear 1+1, and 0:1
unprotected.
This advancement in the Optical Networks arena allows for the
interconnecting or meshing of networks, to a scale no one could have
foreseen within a multivendor environment. What used to be a separation
of networks in terms of LAN, MAN, and WAN, is now seamless, due to the
SONET optical network backbone.

SONET a practical introduction


As technology has evolved so has the complexity of the SONET network
infrastructure, which has been flexible to deliver a variety of applications to
meet market demand. SONET is an acronym for synchronous optical
network and is a standard for optical communications. The SONET
standard was initiated by Telcordia (formerly known as Bellcore) on behalf
of the Regional Bell Operating Companies (RBOCs) for the following
reasons:
multivendor environment (mid-span meet)
positioning the network for transport of a multitude of services
synchronous networking
enhanced Operations, Administration, Maintenance, and
Provisioning (OAM&P)
bandwidth management capabilities
In summary, SONET provides a solution with multivendor internetworking
capabilities. SONET also allows a single platform for multiplexing and
Demultiplexing since it is a synchronous format. Due to SONET's
bandwidth flexibility, it can carry many other services, such as high speed
packet switched services, ATM, Gigabit Ethernet, Resilient Packet Rings
(RPR), Generic Framing Protocol (GFP), ESCON*, and Fiber Channel,
while still permitting the existing DS1, DS3 Digital signal infrastructure.

Copyright © 2004 Nortel Networks Essentials of Real-Time Networking


142 Section II: Legacy Networks

SONET network migration


The vision of an all SONET/SDH network includes direct optical interfaces
on the digital switches and direct optical interconnection between all the
network elements comprising the fiber transport network.
This network minimizes and simplifies the equipment required for
bandwidth management and is not restricted to the transport of DS1s and
DS3s. The transport infrastructure can now be used for new services,
including video, data, SAN, ESCON, fiber channel, ATM, and Gigabit
Ethernet. These new service applications developed, utilizing the current
SONET transport infrastructure, while still maintaining the traditional
digital voice switches.

The copper DS0-DS1-DS3 (traditional services)


The SONET/SDH transports the asynchronous data streams that existed
prior to the introduction of SONET. In North America, the standard pre-
SONET digital hierarchy consisted mainly of the DS1 (1.544 Mb/s) and the
DS3 (44.736 Mb/s). The DS1 is the main interface to digital voice switches
and channel banks, whereas the DS3 is the main interface to fiber optic
transmission systems. The M13 multiplexer links the two together.
A similar hierarchy exists in most of the rest of the world. The 2.048 Mb/s
(E1) interface is the key digital switch interface, whereas most pre-standard
fiber optic transmission systems carry the 139 Mb/s signal.

ATM and other traffic services


One of the important benefits of SONET is its ability to position the
network for carrying new revenue-generating services. With its modular,
service-independent architecture, SONET provides vast capabilities in
terms of service flexibility.
High-speed packet-switched services, LAN transport, and High-Definition
Television (HDTV) are examples of new SONET-supported services.
Many of these broadband services may use Asynchronous Transfer Mode
(ATM)-a fast-packet-switching technique using short, fixed-length packets
called cells.
Asynchronous transfer mode multiplexes the payload into cells that may be
generated and routed as necessary. Because of the bandwidth capacity it
offers, SONET is a logical carrier for ATM.

Optical Ethernet applications


Optical Ethernet applications include the GbE application and the RPR
application, both explained below.

Essentials of Real-Time Networking Copyright © 2004 Nortel Networks


Chapter 7 SONET/SDH 143

GbE application
Ethernet service at the Layer 2 level, using a SONET/SDH circuit
connecting multiple sites, can offer point-to-point Ethernet connectivity for
data centers, remote data backup sites, and servers without the addition of
dedicated data equipment. It can broaden their service offering and
increase their revenue potential. GbE service also benefits from the Layer 1
SONET/SDH protection schemes.
The addition of GbE support provides for more efficient bandwidth usage
of the 10 G signal, by being able to have different size payloads to carry the
GbE traffic and by being able to mix different types of services within the
same optical link.

RPR application
Resilient Packet Ring is a distributed switch (IEEE 802.1D bridge
functionality) application that is connectionless, packet-based, and allows a
shared bandwidth networking solution for Ethernet traffic over a SONET/
SDH backbone. RPR has 10/100/1000 Base-X capabilities, which is an
efficient carrier grade method, connecting to routers and LAN's utilizing a
Layer 2 add/drop/pass-through technology.

SONET terminology
This section describes SONET terminology including SONET level rates
and SONET layers and architecture.

SONET level rates


In brief, SONET defines Optical Carrier (OC) levels and electrically
equivalent Synchronous Transport Signals (STSs) for the fiber-optic based
transmission hierarchy. The standard SONET line rates and STS-equivalent
formats are shown in Figure 7-3. The fiber-optic rates shown in bold, seem
to be the most widely supported by both network providers and vendors.
The basic building block of SONET is an STS-1 signal, which has a bit rate
of 51.84 Mbit/s. Higher-level signals are integer multiples of the base rate.

Copyright © 2004 Nortel Networks Essentials of Real-Time Networking


144 Section II: Legacy Networks

For example, an STS-N has exactly N times the rate of 51.84 Mbit/s (for
example, an STS-12 is exactly 12 x 51.84 = 622.080 Mbit/s).

SONET
SONET SDH
SDH Rate
Rate
OC-1
OC-1/ /STS-1
STS-1 STM-0
STM-0 51.84
51.84Mb/s
Mb/s
OC-3
OC-3/ /STS-3
STS-3 STM-1
STM-1 155.52
155.52Mb/s
Mb/s
OC-12
CC-12
CC-12/STS-12
/STS-12 STM-4
STM-4 622.08
622.08Mb/s
Mb/s
OC-48
OC-48/ /STS-48
STS-48 STM-16
STM-16 2488.32
2488.32Mb/s
Mb/s
OC-192
OC-192/ /STS-192
STS-192 STM-64
STM-64 9953.28
9953.28Mb/s
Mb/s
OC-768
OC-768/ /STS-768
STS-768 STM-256
STM-256 39813.12
39813.12Mb/s
Mb/s

Figure 7-3: SONET/SDH rates

SONET layers and architecture


The SONET frame format is segmented into four interfacing layers
including, photonic, section, line, and path layers. SONET provides
considerable overhead information, the section and line layer create the
transport overhead section in which the customer payload is carried in the
Synchronous Payload Envelope (SPE) that contains the path layer. The
path Layer then can carry STS path payloads and/or Virtual Tributary (VT)
path payloads. Figure 7-4 illustrates the SONET frame format with STS

Essentials of Real-Time Networking Copyright © 2004 Nortel Networks


Chapter 7 SONET/SDH 145

payloads and Figure 7-5, which depicts the SONET frame format with VT
payloads.

Figure 7-4: SONET FRAME FORMAT with STS path reload

Figure 7-5: SONET FRAME FORMAT with STS-1/VT Path Payload

Photonic layer
The optically transmitted SONET signal is referred to as an OC-N. This
layer is primarily responsible for the electrical to optical conversion. The
OC-N is essentially the optical equivalent of the STS-N, however, the
STS-N terminology is used when referring to the SONET format.

Copyright © 2004 Nortel Networks Essentials of Real-Time Networking


146 Section II: Legacy Networks

Section layer
The section layer transports the STS-N frames and the section overhead
across the photonic layer. This layer has the job of performing such
functions as, performance monitoring, local orderwire and Section Data
Communication Channels (SDCC). The SDCC is used to provide a
communications path between a centralized Operations System (OS) and
the various network elements.

Line layer
The line layer is responsible for the transportation of the SPE (customer
Payload) and the line overhead. Some attributes of this layer consist of Line
Data Communication Channels (LDCC), express orderwire, performance
monitoring, protection switching signaling and line alarms.

Path layer
The path layer transports the customer traffic at the DS-1, DS-3, DS-1VT,
DS-3VT, and Video level for the Path Terminating equipment. The Path
layer transports the customer payload and the path overhead to the
terminating SONET/SDH equipment.
Customer payload can be mapped into the path layer in a STS level payload
and a STS/VT level payload as illustrated in Figure 7-4 and
Figure 7-5.
In addition to the STS-1 Path level base format, SONET also defines
synchronous formats at sub-STS-1 levels that are defined as Virtual
tributaries. VT's are synchronous signals used to transport low-speed
signals.
The VT-Virtual Tributaries (VT) is designed to carry (asynchronous)
payloads, such as the DS1 that requires considerably less than 50 Mb/s
bandwidth. The DS1 is such an important payload, that the entire SONET
format can be traced back to the need for DS1 transport. There are seven
VT Groups within an STS-1 SPE, and different groups may contain
different VT sizes within the same STS-1 SPE. The structured STS-1
signal has VT payloads and VT path overhead that together constitute the
VT SPE similar to the STS SPE.

Summary
All SONET network elements have section and photonic layer
functionality. However, not all have the higher layers. A network element is
classified by the highest layer supported on the interface. Thus, a network
element with path layer functionality is referred to as Path Terminating
Equipment, either VT PTE or STS PTE.

Essentials of Real-Time Networking Copyright © 2004 Nortel Networks


Chapter 7 SONET/SDH 147

Similarly, a network element with line layer functionality but no higher, is


Line Terminating Equipment. Finally, a network element with section layer
functionality but no higher, is Section Terminating Equipment.
Figure 7-6 shows the complete end to end attributes of the SONET
overhead path mappings.

Path Section Line Path


Terminating Terminating Terminating Terminating
Equipment Equipment Equipment Equipment
(PTE) (STE) (LTE) (PTE)

DS1 / VT VT Path VT Path DS1 /VT

DS3 STS Path STS Path DS3

Line Line Line

Section Section Section Section

Photonic Photonic Photonic Photonic

Section Section Section

Line Line

STS Path

VT Path

Figure 7-6: Section, line and path illustration

The network element


The network element is the device that incorporates the customer traffic
payload, so that it may be transported on the SONET network
infrastructure. Although Network Elements (NEs) are compatible at the
OC-N level, they may differ in features from vendor to vendor. SONET
does not restrict manufacturers from providing a single type of product, nor
require them to provide all types.
The main flavors of the SONET NE come in the type of a terminal, Add/
Drop Multiplexer (ADM), and drop and repeat (repeater), broadband
digital cross-connect and the wideband digital cross-connect.

Terminal
A SONET Line Terminating Equipment takes in a number of electrical
signals (tributaries) and transmits a single electrical or optical signal.

Add/Drop Multiplexer (ADM)


A single-stage multiplexer/demultiplexer can multiplex various inputs into
an OC-N signal. At an add/drop site, only those signals that need to be

Copyright © 2004 Nortel Networks Essentials of Real-Time Networking


148 Section II: Legacy Networks

accessed are dropped or inserted. The remaining traffic continues through


the network element without requiring special pass-through units or other
signal processing.

Drop and repeat (repeater)


SONET enables drop and repeat (also known as drop and continue), which
is a key capability in both telephony and cable TV applications. With drop
and repeat, a signal terminates at one node, is duplicated (repeated), and is
then sent to the next node and to subsequent nodes.

Broadband digital cross-connect


A SONET cross-connect accepts various optical carrier rates, accesses the
STS-1 signals, and switches at this level. It is ideally used as an optical
SONET hub. One major difference between a cross-connect and an add-
drop multiplexer is that a cross-connect may be used to interconnect a
much larger number of STS-1s. The broadband cross-connect can be used
for grooming (consolidating or segregating) of STS-1s or for broadband
traffic management.

Wideband digital cross-connect


This SONET NE type is similar to the broadband cross-connect except that
the switching is done at sub-rate STS/VT levels (similar to DS-1/DS-2
levels). It is similar to a DS-3/1 cross-connect because it accepts DS-1s and
DS-3s and is equipped with optical interfaces to accept optical carrier
signals. This is suitable for DS-1 level grooming applications at hub
locations.

Network configurations
A network comprised of SONET network elements allows for a network
configuration to be formed. Some customers may have a need for a certain
type of customer payloads, which may create the need for certain types of
network configurations.
The SONET NE can be part of the three traditional configurations, linear,
Bidirectional Line Switched Ring (BLSR), and/or Unidirectional Path
Switched Ring (UPSR) configurations.
As technology evolves, so does the network topology. The SONET next-
generation NEs can utilize not only the three types of traditional
topologies, but all topologies in a single box, allowing the capability of the
Optical Hub and Meshed configurations.

Essentials of Real-Time Networking Copyright © 2004 Nortel Networks


Chapter 7 SONET/SDH 149

Linear configuration
A linear configuration can comprise of a 1+1 type or a 1: N type. The
protection switching detection for a single failure at a STS/OC-N level is at
the 50 ms level.

1+1 Protected
The most basic protection system is a linear 1+1 system (see Figure 7-7).
The term linear, differentiates it from ring systems and the 1+1 indicates
that there is one working fiber and one standby fiber, and the traffic in both
directions is permanently bridged onto both the working and the standby
fiber.

Terminal Terminal

Working Working

Linear
1+1

Protection Protection

Figure 7-7: 1+1 protected

1: N Protected
In a 1: N Configuration, there is one protection facility for several working
facilities (range one to fourteen). Figure 7-8 illustrates the 1: N protection
architecture. If one of the working lines detects a signal failure or line
degradation, then the working traffic will be switched to the protection line.

Copyright © 2004 Nortel Networks Essentials of Real-Time Networking


150 Section II: Legacy Networks

1:N Protection
Terminal Terminal

Working # 1 Working # 1

Working # 2 Working # 2

Working # N Working # N

N <= 14

Protection Protection

Figure 7-8: N protected

Ring configurations
The telecommunications market demands the utmost quality from a
SONET infrastructure, and network performance and network survivability
are important. Survivable rings and route diversity are two characteristics
that almost become a necessity for a SONET infrastructure, and the Uni-
directional Path Switched Rings (UPSR) and Bidirectional Line Switched
Rings is a solution to meet these demands.
The two primary types of ring configurations are as follows:
UPSR
Dedicated protection bandwidth
Bellcore GR-1400-CORE
BLSR
Shared protection bandwidth
Bellcore GR-1230-CORE
A comparison of UPSR and BLSR is shown in Figure 7-9.

Essentials of Real-Time Networking Copyright © 2004 Nortel Networks


Chapter 7 SONET/SDH 151

A Comparison of BLSR and UPSR


BLSR - Bellcore GR-1230-CORE UPSR - Bellcore GR-1400-CORE
STS-1 # Ch1 STS-1 # Ch 2

STS-1# Ch 2 Used
STS-1# Ch 2 Used in all Spans
NE 1 in all Spans NE 1

NE 2
STS-1 # Ch1 STS-1# Ch 1 STS-1 # Ch 2 NE 4 TA-496 NE 2
STS-1 # Ch 1
NE 4

NE 3
NE 3
STS-1# Ch 1 Used
in all Spans

STS-1# Ch 1
STS-1 # Ch 1
• Bi-directional flow enables timeslots to be
reused around the ring • Uni-directional flow requires dedicated timeslots
• Total BLSR Network capacity is always around entire ring
greater than or equal to capacity of UPSR • Total UPSR network capacity cannot exceed the
• Total 2F-BLSR Network Capacity* = optical line rate
(OC-N rate ÷ 2) x # Nodes • So Total UPSR bandwidth = OC-N rate of the main
*4F-BLSR Capacity = (OC-N rate) x # Nodes Optical Line Rate

Figure 7-9: UPSR vs. BLSR


UPSR and BLSR both provide route diversity and 100% survivability
against fiber cuts and node failures. Service restoration is provided by
interworking of the SONET network elements in the ring. The two main
differences between UPSR and BLSR are the total network capacity and
the protection switching process of the network elements.
BLSR can support more than one connection on the ring using the same
channels, as long as the connections do not overlap on any spans. This is
not true of the UPSR, which uses an entire channel in both directions, all
the way around the ring for each and every connection. Therefore, in
general, BLSR can support more connections than UPSR, even if they are
both operating at the same line rate.
The exception is when all of the connections have one end at the same
node, in which case, the BLSR and the UPSR have equivalent capacity.
Thus, UPSRs are often used in access networks, in which all of the traffic is
homing in on the local switching office.

Ring network survivability


Ring-based APS provides an increased level of survivability for SONET
networks, by allowing as much traffic as possible to be restored, even in the
event of a cable cut or a node failure.

Copyright © 2004 Nortel Networks Essentials of Real-Time Networking


152 Section II: Legacy Networks

In a ring, a number of network elements are connected in a closed loop. In


the event of a disruption at any point on the ring, the affected traffic can be
restored by rerouting it the other way around the ring. In the BLSR, half of
the capacity in each direction is assigned to working traffic and half is
assigned to protection. If it is a two fiber ring, then half of the STS-1s on
each fiber are assigned to working and half to protection. If it is a four fiber
ring, then two fibers, one in each direction, are assigned to working and the
other two are assigned to protection. Bidirectional flow enables timeslots to
be reused around the ring. Thus, the total ring capacity is always greater
than or equal to capacity of UPSR.

Optical hub and meshed configuration


Most existing asynchronous systems are only suitable for point-to-point,
but SONET supports an optical multipoint hub, or meshed configuration.
An optical hub is an intermediate site from which traffic is distributed to
three or more spurs. The hub allows the four nodes or sites to communicate
as a single network instead of, three separate systems. Optical hubbing
reduces requirements for back to back multiplexing and demultiplexing,
and helps realize the benefits of traffic grooming.
Also realize that with Optical hubbing, depending on the Vendor, there is
the ability to support many different configurations from the SONET hub
itself, that is, multiple BLSR, multiple UPSR and multiple 1+1
configurations.
A SONET Network Element can provide this level of flexibility, allowing
for tremendous savings in Capex and Opex. Network providers no longer
need to own and maintain customer-located equipment.
Figure 7-10 shows the comparison between a hub and meshed pattern
network infrastructure.

Uniform Mesh
Hub Pattern
Pattern

Figure 7-10: Comparison between hub and meshed pattern

Essentials of Real-Time Networking Copyright © 2004 Nortel Networks


Chapter 7 SONET/SDH 153

A meshed pattern simply does not only have the network element to
network element topology, but offers an interconnection to each and every
node within the meshed topology offering network survivability,
redundancy and exceptional reliability.
Gain in flexibility of today's optical network infrastructures provides
tremendous opportunities in terms of service and applications. Consider the
network shown in Figure 7-11, that can basically provide every type of
SONET topology within a customer's network infrastructure, thus enabling
an endless array of applications and services to meet every customer's
needs.

Linear
1+1

BLSR Hub Pattern Uniform Mesh


Pattern

Linear
1+1

UPSR
Linear
1+1

Figure 7-11: Example network

Synchronization

Synchronous vs. asynchronous


Traditionally, transmission systems have been asynchronous, with each
terminal in the network running on its own clock. In digital transmission,
“clocking” is one of the most important considerations. Clocking means
using a series of repetitive pulses to keep the bit rate of data constant and to
indicate where the ones and zeroes are located in a data stream.
Asynchronous multiplexing uses multiple stages. Signals such as
asynchronous DS-1s are multiplexed; extra bits are added (bit stuffing) to
account for the variations of each individual stream and are combined with
other bits (framing bits) to form a DS-2 stream. Bit stuffing is used again to
multiplex up to DS-3. The DS-1s are neither visible nor accessible within a
DS-3 frame. DS-3s are multiplexed up to higher rates in the same manner.

Copyright © 2004 Nortel Networks Essentials of Real-Time Networking


154 Section II: Legacy Networks

At the higher asynchronous rate, they can not be accessed without


demultiplexing.
In a synchronous system, such as SONET, the average frequency of all
clocks in the system will be the same (synchronous) or nearly the same
(plesiochronous). Every clock can be traced back to a highly stable
reference supply. Thus, the STS-1 rate remains at a nominal 51.84 Mbps,
allowing many synchronous STS-1 signals to be stacked together when
multiplexed without any bit stuffing.

Understanding synchronization
SONET-based equipment derives many of its basic attributes from
synchronous operation. Synchronization is required in networks that
contain:
Add/Drop Multiplexers
Terminals
Synchronous tributaries
These configurations require synchronization among the network elements,
to avoid the effects of the SONET synchronous transport signal pointer
repositioning within the frame. When a network element is synchronized,
all synchronous tributaries and high-speed signals generated by that
network element are synchronized to its timing source. Normally, one
network element in a UPSR is externally timed. To protect the network
timing against complete nodal failure, two network elements in a UPSR
can be externally timed.

Network element timing methods


Generally, each network element is synchronized by one of the following
methods:
Internal timing
External timing
Line timing
Figure 7-12 illustrates the different timing options for a typical network
element.

Essentials of Real-Time Networking Copyright © 2004 Nortel Networks


Chapter 7 SONET/SDH 155

Figure 7-12: Network Element Timing Mode Examples

Internal timing
A SONET-compliant free-running clock produced within the network
element provides internal timing. Network elements with certain circuit
packs can provide timing signals of Stratum 3 (ST3) quality.

External timing
An external timing signal is obtained from a building-integrated timing
supply (BITS) clock of ST3 or better. ST1 reference quality would be the
preferred level of timing for SONET network elements.

Line timing
Line timing is derived from an incoming SONET frame (OC-3, OC-12,
OC-48, and OC-192), DS1 facility or EC-1 facility.

Transport line timing


Transport line timing is shown in Figure 7-12, example (c). When using
transport line timing, a network element derives timing from a received
transport signal. Possible sources of transport line timing are OC-3, OC-12,
OC-48, and OC-192 facilities.

Copyright © 2004 Nortel Networks Essentials of Real-Time Networking


156 Section II: Legacy Networks

Tributary line timing


Tributary line timing is shown in Figure 7-12, example (d). When using
tributary line timing, a network element derives timing from a received
tributary signal. Possible sources of tributary line timing are OC-3, OC-12,
OC-48, DS1, and EC-1 facilities.
When the network element timing mode is set to line timing (no distinction
is made between Transport and Tributary on the user interface), it selects
one of up to two provisioned timing sources (primary and secondary timing
references) as the active timing reference. This signal is used in network
elements to synchronize the outgoing transport signals in all directions, and
the synchronous tributaries terminated by the network element. The
selection of the best quality signal is made, based on the stability of the
transport signal, the synchronization message, and any incoming
synchronization status provisioned by the user.

Stratum clocks
Stratum clocks are stable timing reference signals that are graded
according to their accuracy. American National Standards Institute (ANSI)
standards have been developed to define four levels of stratum clocks.
The accuracy requirements of these stratum levels are shown in
Figure 7-13.

Figure 7-13: ANSI Stratum Clock Accuracy Requirements

Timing loops
A timing loop is created when a clock is synchronizing itself, either
directly or through intermediate equipment. A timing loop causes excessive
jitter and can result in traffic loss. Timing loops can be caused by a
hierarchy violation, or by having clocks of the same stratum level
synchronize each other. In a digital network, timing loops can be caused
during the failure of a primary reference source, if the secondary reference
source is configured to receive timing from a derived transport signal
within the network.
A timing loop can also be caused by incorrectly provisioned
Synchronization Status Message (SSM) for some of the facilities in a linear
or ring system. Under normal conditions, if there is a problem in the system

Essentials of Real-Time Networking Copyright © 2004 Nortel Networks


Chapter 7 SONET/SDH 157

(for example, pulled fiber), the SSM functionality will heal the timing in
the system. However, if the SSM is incorrectly provisioned, the system
might not be able to heal itself and might segment part of itself in a timing
loop.

Synchronization-status messages
Synchronization-Status Messages (SSM) indicate the quality of the timing
signals currently available to a network element. The timing sources that
can be provisioned in a network element include external timing from a
BITS clock, timing derived from SONET interfaces, and the internal clock
of the network element. A network element can select the better of the two
timing signals provided by the primary and secondary timing sources
provisioned by the user. The selection is based on the quality values carried
in the SSMs.
Figure 7-14 provides an example of a network showing the synchronization
flow, head-end network element, synchronization boundary, and
synchronization status messaging.

Figure 7-14: SSM Signal Flow Example

Copyright © 2004 Nortel Networks Essentials of Real-Time Networking


158 Section II: Legacy Networks

What you should have learned


The reader should have an understanding of the evolution of SONET as it
progressed into the role for the next generation SONET applications it can
support today.
The basic building block of SONET is the STS-1 signal, which is at the
51.84 Mb/s rate. The SONET frame format is segmented into four layers
including, the photonic, section, line and path layers.
The Synchronous Payload Envelope (SPE) contains the actual STS and/or
Virtual Tributary (VT) customer traffic payload.
SONET network element types include, the terminal, add/drop multiplexer
(ADM), repeater, broadband digital cross-connect, and the wideband
digital cross-connect. As with these various flavors of network element
types, numerous network configurations can be created.
The traditional network configurations are linear, BLSR, and UPSR. BLSR
is a Bellcore GR-1230-Core standard and the UPSR is a Bellcore GR-
1400-Core standard. The two main differences between BLSR and UPSR
are the protection switching process and the total network capacity.
Synchronization is needed for terminals, ADMs, and synchronous
tributaries. The main network element timing schemes are comprised of,
internal timing, external timing, line timing, and tributary timing. Stratum
clocks are accurate synchronization reference signals, and the SSM can
indicate the current quality level of the timing source for a network
element. A timing loop is created when a clock is synchronizing itself,
either directly or through intermediate equipment.

Essentials of Real-Time Networking Copyright © 2004 Nortel Networks


159

Section III:
Protocols for Real-Time Applications
Protocols are basic building blocks of packet networks. Section 3
introduces protocols essential to real-time operation: protocols for media
transport and transport control, protocols for call and session setup, and
protocols and mechanisms that help real-time and non–real-time services
successfully share network resources. The section begins with a discussion
of Real Time Protocol (RTP) and its associated protocols Real Time
Control Protocol (RTCP) and Real Time Streaming Protocol (RTSP). The
characteristics and use of these protocols are described, and similarities and
differences with TCP and HTTP are highlighted. Chapter 9 talks primarily
about SIP and H.323. These are the media gateway protocols that establish
sessions, or call setup messaging, required in real-time communications.
The different setup protocols provide the same function, but accomplish
this in different ways. SIP works between end-points while H.323 places
the call setup intelligence in a centrally-located device.
The final chapter in this section covers the strategies and mechanisms
available in IP networks to provide Quality of Service (QoS). The chapter
explains what the different components, such as shapers and schedulers,
contribute to help differentiate flows and prioritize forwarding. The various
components are intended to ensure that real-time traffic is transported
across the network within performance limits, therefore, maintaining the
expected QoE (Quality of Experience). Later sections will address the
practical implementation of these techniques.

Copyright © 2004 Nortel Networks Essentials of Real-Time Networking


160

Essentials of Real-Time Networking Copyright © 2004 Nortel Networks


161

Chapter 8
Real-Time Protocols: RTP, RTCP,
RTSP
Hung-Ming (Fred) Chen

Media Session Related

Session Gateway
Media Related Control Control

H.248 / MGCP
View From
Audio

Video
Voice

Real-Time

Application
Control

/ NCS
RTCP
RTSP

H.323
Perspective
SIP
To
codec Ntwk

RTP

UDP / TCP / SCTP


IP

QoS
MPLS Packet
AAL1/2 AAL5 Cable

Resiliency
ATM FR Ethernet DOCSIS

xDSL Cellular Fiber Copper WLAN HFC

SONET / TDM
Figure 8-1: Transport path diagram

Concepts covered
The purpose and operation of RTP & RTCP
RTP relays and how they operate: RTP mixer, RTP translator
Comparison of RTP and TCP
The purpose and operation of RTSP
Real-Time aspects of streaming applications
How RTP/RTCP, RTSP combine for form a complete package for
management of media streaming
Comparison of RTSP and HTTP

Copyright © 2004 Nortel Networks Essentials of Real-Time Networking


162 Section III: Protocols for Real-Time Applications

Introduction
Today, real-time and near–real-time applications are becoming very
common on IP networks and are being used for many different purposes
spanning both conversational (at least two party) personal communication
and streaming (typically one-way) applications. VoIP and IP Telephony are
becoming popular. Radio stations and TV channels now offer streaming of
live and archived programming over the Internet. Corporations use
streamed messages to promote new products and to provide education and
product documentation to customers. There is great promise for new
multimedia real-time services from converged networks. A set of protocols
has been developed to address the requirements of transporting the content
of these real-time/near-real-time services and to offer basic control for
streaming services. These are the Real-Time Transport Protocol (RTP), the
Real-Time Control Protocol (RTCP), and the Real-Time Streaming
Protocol (RTSP).
Successful Real-Time streaming applications must coordinate several
protocols: HTTP, RTP, RTCP, and RTSP. HTTP is used to retrieve the
presentation description. RTSP uses the description to set up and tear down
the sessions. RTP transports the contents to the end device, and RTCP is
used to report transmission statistics back to the server so that RTP can
adapt to network conditions.
In this chapter, we discuss features and operations of these protocols, and
the relationships between them. In addition, we compare RTP with TCP,
and RTSP with Hypertext Transfer Protocol (HTTP) to identify operational
similarities and differences.

Real-Time Transport Protocol (RTP)


The Real-Time Transport Protocol (RTP) is a lightweight protocol with
properties intended to assist delivery of real-time application data. Features
include timing information, loss detection, security, and content
identification. Although it is intended to operate over UDP/IP, the
developers have tried to make RTP operate independently of underlying
network protocols such as Connectionless Network Protocol (CLNP),
Internetwork Packet Exchange (IPX), UDP/TCP, IP, Frame Relay, or ATM.
In particular, RTP is designed to meet the requirements of multiparticipant
multimedia conferences. In cases where packets are sent to a large
distribution list, a companion protocol—the Real-Time Control Protocol
(RTCP)—adds some features that improve operation. A primary RTCP
function may be used to provide feedback to the application or to the
network service provider to monitor and respond to transport difficulties.
On the other hand, RTP does not require bundling with RTCP if preferred.
RTCP is discussed in more detail in the following paragraphs.

Essentials of Real-Time Networking Copyright © 2004 Nortel Networks


Chapter 8 Real-Time Protocols: RTP, RTCP, RTSP 163

Other applications can use RTP as well, for such functions as such as
storage of continuous data, interactive distributed simulation, active badge
tracking systems, and control and measurement.
RTP was defined by the IETF in RFC 1889 and revised in RFC 3550. The
International Telecommunications Union (ITU) has adopted RTP as one of
its standards for multimedia data streaming (H.225.01). Many standard
protocols use RTP to transport real-time content, including H.323 and SIP
for IP telephony applications and point-to-point and video conferencing,
RTSP for streaming applications, and Standard Announcement Protocol/
Session Description protocol (SAP/SDP) for pure multicast applications.
Its data format provides important information for operation and control of
real-time applications.

Features
RTP was designed with several goals in mind. First, it is intended to be a
lightweight and efficient protocol to define Application Level Framing
(ALF) and integrated layer processing. Second, one flexible mechanism
was provided rather than several dedicated algorithms. Third, RTP is
protocol-neutral, allowing it to integrate with various lower layered
protocols, such as UDP/IP, IPX, and ATM-AALx. Then, the elasticity of
the Contributing Source Identifier (CSRC) field provides a scalability
mechanism to communicate with large number of sources. Meanwhile,
partitioning of the control and transport functions into separate protocols
simplifies the operation. Finally, RTP also provides secure transport
through support of encryption and authentication protocols.
The RTP packet header provides information on packet sequence
numbering, type of payload, timestamping, and delivery monitoring. The
sequence number can be used to identify missing packets and to reorder out
of sequence packets. The payload type is indicated using a profile that
identifies all the types of data that may be used by the application during
the session. The timestamp corresponds to the time of packetization of the
first data sample in the packet. Timestamping permits inter- and intra-
media synchronization, such as time-alignment of audio and video signals
in a film (lip synch), since the audio and video components may be
transported as separate RTP streams. RTCP handles the delivery
monitoring function, sending reports to inform the RTP layer about the
status of network, such as reporting lost packets, interarrival jitter, and
other statistics. The feedback allows coding and transmission settings to be
adjusted to optimize the quality of the application or service.

1. H.225.0 is a component of the H.323 standards family.

Copyright © 2004 Nortel Networks Essentials of Real-Time Networking


164 Section III: Protocols for Real-Time Applications

Sequence numbering of packets


The sequence numbers provided by RTP support play-out functions in the
application. RTP does not contain any functions of its own to address
packet play-out or sequencing problems. The sequence numbers can be
used to identify gaps in the data or packets arriving at the buffer out of
sequence. This arrangement gives applications great flexibility to handle
packet aberrations, and keeps RTP a very thin protocol.

Payload type identification


The payload type field provides information for applications to identify
what type of coding is being used, the sampling rate, the number of
channels, and other information needed for playback. Once a session is
established, only one payload type is allowed. Changing the payload type
requires setting up a new session. IETF standards called Profiles define
generic fields within the RTP header, including payload type values
assigned to particular coding schemes. A couple of RTP profiles have been
proposed, such as RFC 3551 (RTCP-based feedback and TCP friendly rate
control, including RTP A/V), and RFC 3711 (RTP Secure A/V Profile).
The RTP A/V Profile defines audio and video applications. There are two
main categories of payload type specified in RTP A/V profile: static and
dynamic. Static payload types are those that are most commonly used, and
the payload type values are the same for everyone. Codecs with values
already assigned include G.723.1 (PT=4), JPEG (PT=26), and H.263
(PT=34). Dynamic payload types map payload type to a specific audio or
video encoding only for the duration of a session. The available values for
dynamic payload type range from 96 to 127. A dynamic payload type
might be used for session description, such as announcement and
invitation, and other signaling protocols.
Typically, a profile is used for an application within a particular RTP
session. A multimedia application uses more than one session to deliver
contents. For instance, a multimedia application transports audio and video
streams in concurrent sessions. Synchronization of the different sessions in
the output requires the global time information (NTP field in RTCP header)
provided in RTCP.
The payload format specifications define payload data carried in RTP. For
instance, payload format for MPEG1/MPEG2 video is specified in RFC
2250. Each static payload type defined in RTP/AVP assigns its own
payload format. Payload types not included in the RTP A/V profile can be
registered in the MIME subtype.

Timestamps of packets
The timestamp captures the sampling time of the first octet (sample) of the
packet. The timestamp increases monotonically according to the clock of

Essentials of Real-Time Networking Copyright © 2004 Nortel Networks


Chapter 8 Real-Time Protocols: RTP, RTCP, RTSP 165

the source and the size of payload. The receivers use the information to
check for gaps in the data and for out-of-order packets. Timestamps can be
used to synchronize flows from different sources, such as different end
systems or different sessions (audio and video) within a multimedia
application. However, RTP itself does not provide the mechanisms to do
this; the applications must contain the appropriate functionality to use this
information (and global time information within RTCP messages) to
synchronize streams.
For audio, the packet size (the number of bits in the packet, based on the
interval of packetization and the sampling rate) determines the increment
of the timestamps. For instance, an audio stream using a sampling rate of
32 kHz and a 20 ms packet will have a timestamp increment of 640 for two
consecutive packets. If silence suppression is used and no packet is sent,
the timestamp increments nevertheless, so the timestamp on the next
speech packet that is sent will include any silent interval.
For video, timestamps vary for different conditions. In general, timestamps
increase with the nominal frame rate. For instance, timestamps increase by
3,000 for each frame with a 30-frame/sec video, while timestamps step by
3,600 for a 25-frame/sec video. When a video frame is segmented across
several RTP packets, all the packets are marked with the same timestamp.
Where an atypical system is used, such as a special codec, or the
application cannot determine the frame number, the timestamps might need
to be computed from the system clock.
Timestamps and sequence numbers allow the application to play out the
audio and video packets with the correct timing, even where silence
suppression is used (timestamp), and to detect and compensate for missing
packets (sequence number). When RTP packets have the same timestamp
(that is, a video frame segments into several RTP packets), the sequence
numbers are used to determine the appropriate order for decoding and play-
out.

RTP relay
RTP Relay agents are frequently used to translate payload formats for
flows where the two end systems cannot exchange packets directly. There
are two classes of RTP Relay agents: RTP translator and RTP mixer.

RTP mixer
Several RTP flows can be merged into a single flow with the help of RTP
mixers. For example, where the original sessions require more bandwidth
than network can provide, two audio streams can be combined into a
single, more efficient flow. Synchronization is calibrated from the
contributing flows according to the content. The RTP mixer also assigns a
new source identification for the combined stream. Therefore, RTP mixer
can greatly reduce the bandwidth consumption, which is especially helpful

Copyright © 2004 Nortel Networks Essentials of Real-Time Networking


166 Section III: Protocols for Real-Time Applications

with low speed dial up access network. Figure 8-2 shows the mixer
combining two individual sessions (SSRC = 7 and SSRC = 36) into one
combined session with SSRC = 42.
End System X

RTP Mixer End System Z


SSRC=7
SSRC=42
SSRC=36 CSRS List = {7, 36}

High Speed Low Speed


End System Y Links Links

Figure 8-2: Example of mixer

RTP translator
An RTP translator performs a similar function, but maintains the individual
RTP flows instead of combining RTP sessions into single one. The source
identifiers are preserved so they can be separated again downstream. The
RTP translator can be used to convert media encodings, duplicate multicast
streams into unicast streams, and filter RTP flows on the application level,
as, for example, firewall services. Figure 8-3 shows two translators placed
on each end of a tunnel for secured media distribution. In this diagram, the
two end systems use separate sessions (SSRC=7 and SSRC=36) end-to-
end, including the encrypted portion of the channel.

End System W
End System Y

RTP Translator RTP Translator


SSRC=7 SSRC=7 SSRC=7

SSRC=36 SSRC=36
SSRC=36

End System X
End System Z

Figure 8-3: Example of translator

RTP Limitations
RTP is often criticized for its heavy overhead. For each media session, the
combination of the IP, UDP, and RTP control information adds up to forty
bytes (twenty bytes IP, eight bytes UDP, and twelve bytes RTP). The

Essentials of Real-Time Networking Copyright © 2004 Nortel Networks


Chapter 8 Real-Time Protocols: RTP, RTCP, RTSP 167

problem of overhead is especially important on low-speed links, such as


dial-up access. For instance, for a 64 kb/s access link, the serialization time
of the overhead is 2.5 ms. Some solutions have been proposed to relieve
cumbersome overhead, such as use of header compression or definition of a
light version of RTP.
To maintain a thin protocol, the design of RTP does not provide any
mechanism to ensure quality of service or timely delivery of packets. It
relies on lower-layer mechanisms to guarantee real-time performance.
Similarly, RTP does not contain any feature to detect missing or mis-
ordered packets, or to reorder packets arriving out of order. It relies on
applications to handle these problems.
Firewalls present a difficulty for RTP. The purpose of firewall is to prevent
unauthorized access. RTP uses UDP to transport its packets and most
firewalls deployed do not allow UDP traffic to pass through. An RTP proxy
has been introduced to make RTP compatible with firewalls.

Real-Time Control Protocol


RTCP is the control protocol defined to work in conjunction with RTP. It
provides the measurement and statistics feedback information for an
originator to adjust the delivery of media flows. It was originally
standardized in RFCs 1889 and 1890; these were updated in RFC 3550 and
3551, respectively.
The main features of RTCP include:
Provide quality feedback to the send and receive sides, such as
Network Time Protocol (NTP) timestamp, cumulative number of
packet loss, and interarrival jitter for senders and receivers;
Identify RTP participants with canonical names (CNAMEs, also
called persistent transport-level identifiers); for example, a
multimedia application requires CNAME to associate multiple
session data streams from a participant;
Determine feedback intervals for scalability; in particular, for large
group conference sessions, where each participant sends an RTCP
report to all the others;
Provide basic session control information for participants; for
instance, in sessions allowing participants to join and leave the
without extensive authentication control and/or parameters
negotiation (optional).

QoS monitoring and congestion control


Periodically, the RTP session endpoints report statistics via RTCP. The
function provides useful feedback to applications regarding the timeliness
and completeness of packet delivery. The reports include statistics on the

Copyright © 2004 Nortel Networks Essentials of Real-Time Networking


168 Section III: Protocols for Real-Time Applications

number of packets sent, the number of packets dropped in the network, and
packet jitter. The recipient of these quality reports uses the information to
adjust the system or to diagnose problems. For instance, a sender can use
the statistics to modify its transmission settings. Receivers can determine
whether problems are local or remote. Network and service providers can
use the RTCP information to monitor network performance, and in
particular, performance with multicast connections.

Translating source identifiers to real world names


RTCP provides a persistent transport-level canonical name for a source.
The typical format of a canonical name looks like
“smallgroup.biggroup.com.” This identifier will not change even when the
synchronization source identifier (SSRC) changes during a transaction. It
also provides binding across multiple media tools from a single user. The
receiver can use the identifier to synchronize different sessions, such as
audio and video flows.

Ensuring scalability
For a conferencing or multicast session, there is an inevitable tradeoff
between adding more and more participants and preventing overwhelming
network bandwidth requirements. This suggests that RTCP control
messages should be limited to a small fraction, say five percent, of the
overall session traffic. During video conferencing, each endpoint sends
control packets to other endpoints; thus, every endpoint can keep track of
the number of participants. Depending on the number of participants and
proportional traffic allowed for RTCP control messages, each member can
calculate the frequency with which to send RTCP packets. In addition, it is
suggested that at least 25% of RTCP bandwidth should be reserved for
source reports to permit new receivers timely recognition of their canonical
names.

Conveying minimal session control information


RTCP can serve as an optional channel for all participants to exchange
session control information. Normally, this is done by the Session Control
Protocol (SCP) with heavy information exchange between client and
server. RTCP provides a convenient channel with loose control for
exchanging information between participants. For example, participant
information supplied by RTCP can then be used by the user interface to
display participants names on the application GUI. This can be especially
useful where users can join and leave sessions casually, without
authentication.

Essentials of Real-Time Networking Copyright © 2004 Nortel Networks


Chapter 8 Real-Time Protocols: RTP, RTCP, RTSP 169

RTP and TCP


As mentioned above, RTP is a thin protocol. To achieve this, it was
designed without many of the properties normally associated with transport
protocols. Missing functions include delivery acknowledgement, reliable
end-to-end communications (acknowledgement and resend capabilities), a
demultiplexing mechanism, and flow/congestion control. RTP runs on end-
to-end systems and provides a demultiplexing capability; however, it lacks
reliability assurance as well as flow and congestion controls. Since RTP is
usually implemented within an application, it can rely on the application or
lower layer protocols to provide these missing properties.
In contrast, Transmission Control Protocol (TCP) is a full-function
transport protocol. Even where TCP is implemented as a part of an
application, these functions are available. In fact, it is the operation of these
reliability and congestion control features that make TCP unsuitable for
real-time applications.
Real-time applications cannot wait for far-end acknowledgement of receipt
of packets, and TCPs tendency to throttle back the packet rate when faced
with congestion is incompatible with real-time data delivery requirements.
The real-time applications require continuous and timely delivery of
information. A reliable communication guarantees all packets arrived via
packet retransmission. The acknowledgement and retransmission increase
delay dramatically; the media output from the receiver would be held up
until a late packet is finally received or the process timed out.
TCP uses an adaptive interval (congestion window) to adapt its sending
rate to network congestion. The size of the interval or window determines
how many packets TCP will send before waiting for an acknowledgement.
Initially, the window is small (slow start), and increases exponentially as
acknowledgement (ACK) messages are received. The window size will
converge on a balance between sending rate and received ACK messages,
after which it stabilize until conditions change (self-clocking). If a packet
loss is detected, the TCP flow control mechanism decreases the congestion
window by half. In contrast, real-time applications cannot tolerate the
sudden change of bandwidth; their bandwidth must persist until a lower
rate can be properly negotiated.
In addition, TCP cannot support multicast. Media streaming applications
are multicast in nature. Neither does TCP carry any timestamp or encoding
information to support synchronization and timely play-out. We can see,
then, that TCP and RTP fill very different roles, and TCP cannot replace
RTP to distribute real-time flows. Table 8-1 lists the comparison of
features between TCP and RTP.

Copyright © 2004 Nortel Networks Essentials of Real-Time Networking


170 Section III: Protocols for Real-Time Applications

Service TCP RTP

Session protocol

Reliable connection

Flow/Congestion control

Error recovery

Multicast

Timing synchronization
Table 8-1: Service comparison of TCP and RTP

Real-Time Streaming Protocol (RTSP)


Real-time Streaming Protocol (RTSP) (RFC 2326) is an application-level
protocol that provides a command channel between the user and the media
server. It provides an extensible framework to provide controls on delivery
of streaming data, whether from stored contents, live broadcast, or
multicast. RTSP does not handle the data delivery (which is left to a
transport protocol, such as RTP), but acts as a network “remote control” for
streaming media servers. It provides functionality—such as play, fast
forward, reverse, pause, and stop—similar to VCR functions.
RTSP also plays an important role in interoperability of streaming media
services. It offers a standard method for clients and servers from multiple
vendors to coordinate delivery of multimedia streaming contents. Many
components are required to establish a media streaming flow, including
players, servers, and encoders. RTSP is only one of the components needed
for end-to-end delivery of media. For example, RTSP can deliver
commands to retrieve RealAudio data formatted for a QuickTime player,
but does not guarantee the data can be successfully played on that player.

Functions
RTSP is more like a framework rather than a protocol. The control
mechanisms include session establishment, session termination, and
authentication. RTSP is designed to carry “VCR-style” commands and
coordinates with RTP to control and deliver media data. Therefore, RTSP
takes advantage of RTP features, such as selection of different delivery
channels, including UDP, TCP and IP multicasting. RTSP can control
multiple delivery sessions simultaneously.

Essentials of Real-Time Networking Copyright © 2004 Nortel Networks


Chapter 8 Real-Time Protocols: RTP, RTCP, RTSP 171

RTSP supports the following operations:


Retrieval of media from media server: A client can invoke a
media presentation by sending an HTTP request to the media
server. The media server replies with an associated presentation
description. The presentation description will include either a
multicast address and ports or a unicast destination address that the
client may use.
Invitation of a media server to a conference: For distribution
purposes, a media server can be “invited” to join an existing
multicast/broadcast presentation. The invited media server can
either distribute the presentation as a whole or in part, or record the
presentation for later distribution. The media servers participating
in the presentation can take turns controlling the presentation. For
example, if a server is scheduled for maintenance, then another
media server can take over the duty to deliver streams.
Addition of media to an existing presentation: During a live
presentation, the server can tell the client about additional media
becoming available. For example, a poll for audience feedback in
an online event could take advantage of this operation.

Methods and operation

RTSP methods
An RTSP request takes the same form as HTTP. However, while HTTP
request are limited to content download or stp, RTSP can request any of
several actions, referred to as methods, which are specified in the header.
The four main commands for real time services are:
SETUP: Requests that the server establish a session with the
requesting client. The SETUP request for a URI specifies the
transport mechanism to be used for the streamed media. The
available transport parameters of the client are given in the transport
header. Upon receiving the options, the server responds with the
selected transport parameters.
PLAY: Requests that the server begin transmitting streaming data
over the specified transport channel.
PAUSE: Requests that the server suspend transmission, but keep
the session open and wait for another command.
TEARDOWN: Requests that the server stop sending data and
terminate the session. The resources used by the media stream are
all be released.

Copyright © 2004 Nortel Networks Essentials of Real-Time Networking


172 Section III: Protocols for Real-Time Applications

Other commands and messages are available in RTSP.


OPTIONS: Request to receive a list of available commands
(required).
DESCRIBE: Get the description of a presentation or media object
from server (recommended).
ANNOUNCE: Either send the description of a presentation or
media object to server or update session description from server
(optional).
GET_PARAMETER: Access the parameters of a presentation or
stream (optional).
SET_PARAMETER: Set the value of a parameter in a
presentation or stream (optional).
REDIRECT: Request that the client connect to a different server
(optional)
RECORD: Request that the server record a presentation, such as
video/audio conferencing (optional)

Operation
RTSP, the so called “Internet VCR remote control protocol”, provides a
method of emulating VCR commands. It does not handle the transport of
streaming data between server and client, but relies on a transport protocol
to deliver it. RTSP works with other transport protocols, but is generally
combined with RTP and RTCP. Coordination of RTSP with RTP/RTCP
provides complete functionality for transport and control of streaming
media. Figure 8-4 shows a typical example in which RTSP and RTP are
used to stream stored content from a media server. The relationships
among HTTP, RTSP and RTP are shown as well.

Essentials of Real-Time Networking Copyright © 2004 Nortel Networks


Chapter 8 Real-Time Protocols: RTP, RTCP, RTSP 173

web browser
HTTP Web
Server
meta file

rtsp://stream.server.net/foo/presentation.abc
meta file

RTSP
meta file,
streaming commands Content
Server
RTP
audio/video contents

media player
Figure 8-4: Streaming media retrieving with RTSP
To play a media presentation, the client (web browser) sends out a request
URL with required configuration parameters to the web server. The web
server responds with a presentation description containing information
about the media server and other required parameters. Meanwhile, the web
browser brings up the media player on the client processor. Upon the
receipt of presentation description, the SETUP request is sent to the media
server with available transport parameters. The media server chooses
transport settings and responds to the request. The player then sends the
PLAY command to request start of delivery of media streams. During the
play out period, reports about the streaming data reception might be sent
back periodically to the media server. The PAUSE or TEARDOWN
commands can be sent to the media server either during the playout or at
the end of the presentation to temporarily interrupt or terminate the
presentation, respectively. A walkthrough of the operation for a multimedia

Copyright © 2004 Nortel Networks Essentials of Real-Time Networking


174 Section III: Protocols for Real-Time Applications

presentation is illustrated in Figure 8-5. Note that the Web Browser and the
Media Player are both on a client endpoint.
Client Media Web
Web Media
Browser Player Server Server
HTTP GET

Presentation
Internal Description
Commun.
SETUP

PLAY

RTP Audio/Video
RTCP

PAUSE

TEARDOWN

Figure 8-5: Typical multimedia presentation walkthrough


RTSP control commands can be sent over either TCP or UDP. The
sequence of control requests affects play out behavior. For example, a
PAUSE request would not take effect before a PLAY command has been
executed. For smooth operation, it is helpful to have confirmation of
control commands. Where UDP is used, acknowledgement/retransmission
mechanisms need to be provided by the application to ensure that control
requests are received by the media server. Therefore, TCP is used most
often for transport layer communications.

Performance
As discussed in Chapter 2, streaming media to a client is not considered a
real-time process. Because the media path is one-way, the QoE of the
streaming session is not limited by response time, and the delay can be as
high as several seconds without affecting user performance. The addition
of the remote commands of RTSP changes all this. The use of interactive
commands demands a certain level of responsiveness from the system, and
so the session control becomes a real-time application.
Since UDP is can transport RTSP control requests, reliable communication
is not guaranteed within RTSP. Where a session has been PAUSED and a
subsequent PLAY command is lost, the media server may appear to be
frozen. For UDP operation, it is often left to the user to realize the loss of a

Essentials of Real-Time Networking Copyright © 2004 Nortel Networks


Chapter 8 Real-Time Protocols: RTP, RTCP, RTSP 175

control request and repeat the command. This can affect the QoE of the
session.
RTSP is not robust to hardware or software failures. For example, consider
an unplanned reboot of a home PC system where a user is running a
streaming session. The session will be lost on the PC side, and the media
player cannot request that the server terminate the stream. The media
server continues to send streaming data to the client and occupying
bandwidth in the downstream direction. Even when the PC reboots, there is
no mechanism specified allowing the media player to recapture or
terminate the session. So far, RTSP does not specify how to recover a lost
state with the session identifier.

RTSP and HTTP


The syntax and operation of RTSP is intentionally similar to HTTP/1.1,
using the same framework to parse the control requests.
The RTSP Uniform Resource Locator (URL) is used to determine the
essential protocol which controls the delivery of the media streams. The
scheme “rtsp:” requires a reliable transport protocol, such as TCP, to
deliver the control requests, while the scheme “rtspu:” specifies an
unreliable protocol being used to establish the channel. When a secured
TCP connection is required, the scheme “rtsps:” is used to deliver
commands with Transport Layer Security (TLS) protocol. For example, a
valid RTSP URL, “rtspu://foo.media.net:5150”, will transmit a control
request to server “foo.media.net” with Port Number 5150 using an
unreliable transport protocol.
There are several reasons that RTSP mirrors HTTP/1.1. First, any new
extensions of HTTP/1.1 apply to RTSP automatically with little or no
adjustment. Second, HTTP or MIME parsers can be adapted to parse RTSP
methods without significant modification. Finally, HTTP mechanisms for
web security, caches, and proxies extend to RTSP.
One significant difference between HTTP/1.1 and RTSP is state
maintenance. The order of RTSP control requests influences the operation
of media presentations. Media servers are designed to serve many sessions
with different presentations, and, therefore, must maintain the “session
state.” This allows servers to match RTSP requests with the relevant
presentations. Web browsing relies on user command and is stateless.
A final difference between HTTP/1.1 and RTSP is the interaction between
client and server. In HTTP, the client issues a request and the server
responds by sending data to the client. However, RTSP allows the server to
send a request to the client. For instance, if RTCP feedback indicates
glitches on the network to the media server, it can issue a request to change
transport parameters to adjust the streaming data. HTTP has no mechanism

Copyright © 2004 Nortel Networks Essentials of Real-Time Networking


176 Section III: Protocols for Real-Time Applications

to make such adjustments, relying instead on the TCP transport protocol


reliability features.
Fundamentally, RTSP inherits most features from HTTP with minor
enhancement to meet the requirements of real-time applications. As
mentioned above, RTSP needs to maintain connection states; it also offers a
mechanism for servers to initialize requests, and always uses absolute URI
(Uniform Resource Identifier). The similarity and differences of RTSP and
HTTP are summarized in Table 8-2.

Property HTTP RTSP

Text base

MIME headers

Request Line + Headers + Body

Status code

Security

URL format

Content negotiation

State maintenance

Request from server

Always absolute URI


Table 8-2: Comparison of properties of HTTP and RTSP

Essentials of Real-Time Networking Copyright © 2004 Nortel Networks


Chapter 8 Real-Time Protocols: RTP, RTCP, RTSP 177

What you should have learned


The Real-time Transport Protocol (RTP) is efficient, flexible, protocol-
neutral, and scalable. Control functions are separated from the transport
mechanism with RTCP. Security mechanisms are provided with help from
other protocols.
RTP supports payload type identification, sequence timing, timestamping
and delivery monitoring. RTP mixer and translator serve as relay agents to
process RTP packets in the network core.
RTP has heavy overhead and firewall traversal problems and does not
guarantee performance or end user quality.
Real-Time Control Protocol (RTCP) provides feedback statistics to the
sending end, identifies RTP participants, determines the frequency of
RTCP feedback reports, and provides basic session control information for
participants.
RTP provides multicast and timing synchronization mechanisms. It does
not have the congestion control, receive acknowledgement, or error
recovery mechanisms of TCP. The delay associated with the operation of
these mechanisms is incompatible with Real-Time application
performance.
Real-Time Streaming Protocol (RTSP) uses a framework similar to HTTP,
with extensions for delivering real-time data. It is designed to deliver
“VCR-like” commands and integrates with RTP to deliver media content.
Successful Real-Time streaming applications must coordinate HTTP, RTP,
RTCP, and RTSP. HTTP is used to retrieve the presentation description.
RTSP uses the description to set up and tear down the sessions. RTP
transports the contents to the end device, and RTCP is used to report
transmission statistics back to the server.

Copyright © 2004 Nortel Networks Essentials of Real-Time Networking


178 Section III: Protocols for Real-Time Applications

References
RFC 1889, “RTP: A transport protocol for Real-Time applications,” IETF,
1996 (replaced by RFC 3550).
RFC 3550, “RTP: A transport protocol for Real-Time applications,” IETF,
2003.
RFC 1890, “RTP profile for audio and video conferences with minimal
control,” IETF, 1996 (replaced by RFC 3551).
RFC 3551, “RTP profile for audio and video conferences with minimal
control,” IETF, 2003.
RFC 2326, “Real-Time streaming protocol (RTSP),” IETF, 1998.
RFC 3711, “MIKEY: Multimedia Internet KEYing,” IETF, 2004.
RFC 2250, “RTP payload format for MPEG1/MPEG2 video,” IETF, 1998.
RFC 2586, “The Audio/L16 MIME content type,” IETF, 1999.

Essentials of Real-Time Networking Copyright © 2004 Nortel Networks


179

Chapter 9
Call Setup Protocols: SIP, H.323,
H.248
François Audet

Media Session Related

Session Gateway
Media Related Control Control

H.248 / MGCP
View From
Audio

Video
Voice

Real-Time

Application
Control

/ NCS
RTCP
RTSP

H.323
SIP Perspective

To
codec Ntwk

RTP

UDP / TCP / SCTP


IP

QoS
MPLS Packet
AAL1/2 AAL5 Cable

Resiliency
ATM FR Ethernet DOCSIS

xDSL Cellular Fiber Copper WLAN HFC

SONET / TDM

Figure 9-1: Transport path diagram

Concepts covered
H.323
SIP
H.248/MEGACO, MGCP, NCS/J.162

Introduction
Communications over the global IP network use three different types of
real-time and control protocols:

Copyright © 2004 Nortel Networks Essentials of Real-Time Networking


180 Section III: Protocols for Real-Time Applications

Real-time protocols for transporting and controlling media streams


(RTP/RTCP)
Peer-to-peer control protocols for establishing “calls” or “sessions”
(H.323, SIP)
Master/Slave gateway control protocols (H.248/MEGACO, MGCP)
Let us examine what makes communicating over an IP network so special.
People are quite familiar with client/server applications such as browsing
the web using HTTP, transferring files with protocols FTP, accessing
databases with LDAP, accessing information using XML/SOAP, and
monitoring devices through SNMP. In contrast, communication is
inherently a peer-to-peer application. Communication requires protocols
that let individuals communicate freely with other individuals, without
previous knowledge of each other’s location, capabilities and status.
Furthermore, these protocols need to be very scalable in order to allow
communication for a large number of users that ultimately, could be equal
to the World’s population, and to allow for a potentially even greater
number of physical devices.
The first type of protocols for communication, relates to the transport of
real-time media and is what early attempts at communicating over an IP
network focused on. Fortunately, a single standard emerged for
transporting real-time media over IP: the Real-Time Protocol (RTP) and its
associate Real-Time Control Protocol (RTCP).
While this in itself was no simple task, it avoided some of the most difficult
aspects of communicating over an IP network. Early voice over IP
communication usually involved the two participants phoning each other
(with their PSTN phones) or e-mailing, in order to exchange their IP
addresses. They would also agree on a common codec.
The second type of protocol is required by modern communication over an
IP network and requires peer-to-peer protocols performing the following
tasks:
Locate the other participant(s), and get the IP addresses necessary
to establish the session (call)
“Invite to the session” or “call” the participant(s)
Negotiate the type of session/call (for example audio, video, text
messaging, whiteboarding, application sharing)
Establish and modify the session/call
Create and remove media streams
Clear the session/call
Two peer-to-peer protocols stand out for communication in the global IP
network: SIP & H.323.

Essentials of Real-Time Networking Copyright © 2004 Nortel Networks


Chapter 9 Call Setup Protocols: SIP, H.323, H.248 181

Both H.323 (an ITU-T protocol) and SIP (an IETF protocol) were created
to address those requirements. There are consequently of lot of similarities
between the protocols. Furthermore, H.323 and SIP rely on other protocols
defined by the IETF and the ITU-T. For example, both protocols use
transport protocols such as RTP/RTCP, UDP, TCP and IP (IETF protocols),
and both protocols use voice codecs such as G.711 and G.729 (ITU-T
protocols). There are of course differences between the two, as they
evolved from two very different backgrounds.
A third type of protocol, consists of Gateway control protocols. They are
master/slave protocols necessary for a gateway controller to control slave
gateway devices. The main gateway control protocols used today are
MEGACO/H.248, MGCP and NTS/J.162.
The main difference between peer-to-peer and master/slave protocols, is in
the way intelligence is distributed between the network edge devices and
network based servers.
The master/slave approach, exemplified by Megaco/H.248, MGCP, NCS/
J.162, allows network gateway functions to be distributed or decomposed
into intelligent (master) and nonintelligent (slave) parts. Application
intelligence, such as call control, is contained in the functional control
servers (master), which also implement the peer-level protocols to interact
with other functional elements in the system and manage all feature
interactions. These control servers then drive a large number of dumb slave
devices that are optimized for their specific interface function and devoid
of application complexity, hence, they are lower in cost and not subject to
change as new services and features are introduced at the control servers.
A communication network can be comprised of both peer-to-peer and
master/slave elements, along with the real-time transport protocol.
Figure 9-2 illustrates the three types of protocols.

Copyright © 2004 Nortel Networks Essentials of Real-Time Networking


182 Section III: Protocols for Real-Time Applications

SIP, H.323
Peer Entity Peer Entity

Media Master Entity

H.248/Megago,
MGCP, etc.

Slave Entity

Media
RTP/RTCP

Figure 9-2: Three types of protocols

H.323

Architecture
H.323 has a well-defined architecture, with well-defined components,
functions and protocols.
H.323 defines the following physical components:
Gatekeeper The gatekeeper provides routing and call control
services to H.323 endpoints. It can not generate or receive calls.
Endpoint An endpoint can make or receive calls. Endpoints are of
one of the following three subtypes:
Terminal A terminal can be an IP phone, a PC, a PDA, a set-top
box providing video-conferencing, a voice-mail system or any other
device offering H.323 services to the end user.
Gateway Gateways provide an interface to non-H.323 network,
such as the GSTN, or a SIP network.
Multipoint Control Unit (MCU) MCUs support multipoint
conferences and must contain an MC. They can also contain an MP.
Physical components can be collocated. For example, it is very common to
have devices that are a Gatekeeper, a Multipoint Control Unit, and a
Gateway simultaneously. All Endpoints behave the same way at the
protocol level. It is not terribly important if a particular device is classified
as a terminal, a gateway or an MCU (or a combination of both), however, it
is important that they are “Endpoints” and not Gatekeepers. Sometimes,
the line can be a little blurry. For example, an IP PBX or TDM switch could

Essentials of Real-Time Networking Copyright © 2004 Nortel Networks


Chapter 9 Call Setup Protocols: SIP, H.323, H.248 183

be any of them, depending on your point of view. When traditional PBXs


and TDM switches started to incorporate IP connectivity through IP
“trunks,” they were described as “H.323 Gateways” because they
interfaced between H.323 on one side, and TDM on the other. PBXs and
TDM switches then evolved into full blown “IP PBXs,” “Soft Switches”
and “Call Servers.” They even became able to control IP phones directly.
Often, these IP phones are a slave device of the Call Server. In H.323
language, the Call Server is the Endpoint. Figure 9-3 shows the H.323
components.

Gatekeeper
Terminals Gatekeeper
• address translation (IP, telephone)
• IP phones, PCs, • admission control
PDAs, set-top boxes
• cannot generate or terminate calls

Endpoints
• can make or receive calls
MCU Gateway
Terminal
Gateway
Multipoint Control Unit (MCU) • Interworking with other multimedia
terminals and GSTN
• Support for multipoint conferences
Figure 9-3: H.323 components
In addition, H.323 defines the following logical components:
Multipoint Controller (MC): MCs control multipoint conference
connections
Multipoint Processor (MP): MPs process and mix multiple audio/
video channels
MCs and MPs are logical components and not stand-alone entities. They
must reside within a physical component, such as a Terminal, a Gateway, a
Multipoint Control Unit or a Gatekeeper.
H.323 introduces the concept of a “Zone.” A Zone consists of one (and
only one) Gatekeeper, along with all the endpoints that are registered to
that Gatekeeper (see Figure 9-4). A Zone is independent of geography; it
can span multiple LANs segments and can include Endpoints anywhere on
the Internet.

Copyright © 2004 Nortel Networks Essentials of Real-Time Networking


184 Section III: Protocols for Real-Time Applications

Zone = 1 GK

Gatekeeper

Terminal Terminal MCU Gateway Gateway Terminal Gateway Terminal

Figure 9-4: H.323 zone


A gatekeeper is a very important component of an H.323 network. It is the
entity responsible for enabling connectivity between endpoints. A
Gatekeeper performs that following mandatory functions:
Address translation This is the main role of the Gatekeeper.
Address translation is the process by which an endpoint can locate
another endpoint, using a well-known address. That well known
address is typically a “phone number,” although other types of
addresses are also used (for example, an H.323-ID which is an
abstract name, an e-mail address, or an H.323 URI). The role of the
Gatekeeper is to determine the IP address to be used for
communication with an endpoint. Without a Gatekeeper, every
endpoint would need to know the IP address of every single
endpoint in the network, which is not practical in a real-life
deployment. The process by which an Endpoint makes itself known
to its Gatekeeper is known as “Registration”: it allows the
Gatekeeper to associate an abstract address to the IP address of the
Endpoint.
Admissions control A gatekeeper may allow or deny a call from
one Endpoint to another based on any criteria. In a trivial
configuration, Admission Control can be a null function (that is, all
calls are allowed). A gatekeeper may verify security credentials
before allowing a call, or allow calls only between certain users.
Bandwidth control Bandwidth control allow terminals to change
bandwidth usage mid-call, for example, for changing codecs or
adding a video conferencing channel to an existing voice call. In
practice, this function is very often a null function: the protocol is
still in place, but the requests are always granted.
Zone management Zone Management refers to the process by
which a Gatekeeper performs the previous tasks for all Terminals,
MCUs, and Gateways within its control. For example, it allows for
assigning a numbering plan to the various components within a
zone.

Essentials of Real-Time Networking Copyright © 2004 Nortel Networks


Chapter 9 Call Setup Protocols: SIP, H.323, H.248 185

In the H.323 architecture, the Gatekeeper performs additional functions. In


fact, it can pretty much do whatever it likes, provided it doesn't break the
protocol. Therefore, a lot of these functions (such as call authorization,
alias address modification, maintaining Call Logs and billing) are not
formalized in the specification. One key optional function of a gatekeeper
is to complete all call signaling from endpoints, effectively acting as a
middleman. This will be described later.
H.323 is really an “umbrella” standard (called “Recommendation,” in ITU-
T jargon). It describes the role of Gatekeepers and Endpoints, using a series
of protocols defined in other Recommendations. An H.323 Endpoint and
its relationship with recommendations and standards under the scope of
H.323 are illustrated in Figure 9-5.

Scope of H.323

Video Codec
Video I/O equipment H.261, H.263

H.225.0
Audio Codec Transport Layer,
Audio I/O equipment G.711, G.723.1, RTP/RTCP,
G.729, etc. UDP, TCP, IP
(IPX, ATM, etc.) Network
Interface
User Data Applications
t.120, etc.

System & Media


Control - H.245

System Control User


Interface Call Control
H.225.0 (Q.931),
H.450.X

RAS Control
H.225.0

Figure 9-5: H.323 endpoint


The H.323 protocol stack is illustrated in Figure 9-6.

Copyright © 2004 Nortel Networks Essentials of Real-Time Networking


186 Section III: Protocols for Real-Time Applications

H.225.0 Audio/Video Streams


H.245
Call Control RAS RTCP RT P
(Q.931)

TCP or
UDP
UDP/H.323 Annex E

IP

Physical layer

Figure 9-6: H.323 protocol stack


The H.225.0 (RAS), H.225.0 (Q.931) and H.245 protocols are the core
control protocol of H.323. These protocols are encoded using Abstract
Syntax Notation One (ASN.1) with the Packed Encoding Rules (PER).
Audio and video streams are transported over RTP/RTCP, over UDP and IP.

H.225.0 (RAS)
H.225.0 (RAS) is the Registration, Admission and Status protocol.
RAS is used only in environments containing a Gatekeeper, but since most
environments these days have Gatekeeper, it is almost always mandatory.
RAS is the protocol used between and Endpoint and its Gatekeeper, and
between a Gatekeeper and another Gatekeeper, to perform the tasks that are
not strictly speaking part of the call establishment.
RAS includes messages that are independent of individual calls.
Gatekeeper Discovery messages are normally sent to a well known
multicast address, to allow the endpoint to automatically discover their
Gatekeeper. This is useful in LAN environments, for plug-and-play
operations. Registration allows an Endpoint to let its presence be known to
its Gatekeeper, allowing others to route calls to that particular endpoint.
Other messages are call related. Admission Requests (ARQ) are used prior
to a call, to get permission from the Gatekeeper to make a call; the
Gatekeeper will confirm the ARQ by telling the endpoint where to place
the call to.
There are other messages that are less-used, such as bandwidth request
messages, in environments where the gatekeeper manages the bandwidth
utilization in the network, as well as messages for status requests.

Essentials of Real-Time Networking Copyright © 2004 Nortel Networks


Chapter 9 Call Setup Protocols: SIP, H.323, H.248 187

Another type of RAS messages are the Location Requests (LRQ)


messages. These messages are normally used between gatekeepers to
locate endpoints across different gatekeeper zones.
H.225.0 Annex G expands the concept of Gatekeeper-to-Gatekeeper
communication a bit more, by adding extra messages and procedures
across different administrative domains (for example, different public
operators).
RAS messages are carried over UDP. Typically, a well-known UDP port is
used for RAS, but it is possible to configure a different port if required.
UDP Port 1718 is used for multicast Gatekeeper communication, and UDP
Port 1719 is used for unicast Gatekeeper communication. Multicast
Gatekeeper communication can be used for Gatekeeper discovery and
Location Request. However, because of scalability and security issues,
multicast is not widely used in H.323.
Figure 9-7 illustrates an Endpoint registering with its Gatekeeper, then
making an admission request. Its gatekeeper can not locate the called
endpoint, so it performs a location request to another Gatekeeper.

Endpoint A Gatekeeper-A Gatekeeper-B Endpoint B

ARQ (+1 408 555 1212)


LRQ (+1 408 555 1212)

LCF (address for Q.931


signalling)
ACF (address for
Q.931 signalling)
H.225.0 (Q.931) Call Setup establishment
ARQ (+1 408 555 1212)

ACF

Figure 9-7: H.323 call walk-through with slow start

H.225.0 (Q.931)
H.225.0 (Q.931) is used for Call Control. It allows for the establishment of
a call between endpoints.
Once an endpoint has obtained permission to make a call from its
gatekeeper, as per the RAS Admission procedures, it will attempt to
establish a call using H.225.0 (Q.931) procedures.
H.225.0 (Q.931) is transported over TCP. The TCP port to be used is
exchanged as part of the RAS procedures. Typically, the well-known TCP
Port 1720 is used, although it is possible to use a different port.

Copyright © 2004 Nortel Networks Essentials of Real-Time Networking


188 Section III: Protocols for Real-Time Applications

H.225.0 (Q.931), as its name suggests, is based on the ISDN Q.931


protocol. However, H.225.0 (Q.931) is not Q.931. Many messages have
been eliminated (for example, DISCONNECT and RELEASE), and the
procedures are much looser than in proper Q.931, for example with regards
to timers. Furthermore, unlike proper Q.931, H.225.0 (Q.931) does NOT
establish a bearer channel (that is, a media channel), it only establishes a
relationship between the endpoints. This relationship is referred to as a
“call” in H.323. The media is established using the H.245 protocol
described in the next section.
Figure 9-8 illustrates a typical Call Setup and Call tear down.

Calling Called
endpoint endpoint
SETUP

CALL PROCEEDING

ALERTING

CONNECT
H.245 session

RELEASE COMPLETE

Figure 9-8: Call setup and call tear down


H.225.0 (Q.931) messages also include in the User-User Information
(UUIE) element, a large amount of ASN.1/PER-encoded protocol
information. It is important to note that most of the information used by
H.323 for call control comes from the UUIE, as opposed to from the Q.931
Information Elements themselves. In many cases, the Q.931 information is
simply irrelevant (for Bearer information for example), or it may be

Essentials of Real-Time Networking Copyright © 2004 Nortel Networks


Chapter 9 Call Setup Protocols: SIP, H.323, H.248 189

conflicting with the UUIE information, in which case arcane rules are
defined to make sense out of it.

Sidebar: Should the phone number be in the UUIE or the Q.931


information elements?
One of the major sources of confusion and interoperability problems in
H.323 is the fact that when the address (both called and calling) is a
phone number, there are two places in the same message (for example, a
SETUP message) to put the phone number! Since H.323 was meant for
supporting arbitrary types of addresses (including phone numbers), a
phone number can be included in the AliasAddress in the UUIE in
SETUP. Since H.225.0 (Q.931) inherits information elements from
Q.931, phone numbers can also be in the Calling party number, Called
party number and Connected number information elements. Since H.323
was written by a committee, instead of choosing one, or the other, or
both, they came up with the following rule:
Public numbers (that is, E.164 numbers) are to be included in the
Q.931 information element
Private and unknown numbers (such as dialed digits) are to be
included in the UUIE AliasAddress
To add insult to injury, prior to H.323 version 4, dialed digits were called
e164 in the protocol, even if they are not E.164 numbers. They were
renamed dialedDigits in H.323v4.
Needless to say, there are a lot of nonconformances in various products
in the market. It is not uncommon to see implementations that use the
Q.931 information elements for everything, the UUIE AliasAddress for
everything, or even implementations that populate the numbers in both
places.

The H.225.0 (Q.931) signaling channel can be established in one of two


modes, depending on the Gatekeeper's preference which will be
communicated through the RAS protocol:
Gatekeeper-routed model (Figure 9-9) In this model, the H.225.0
(Q.931) messages are routed through the Gatekeeper. This allows
the Gatekeeper to monitor the status of the call, perform accurate
billing, and modify the call. It also allows for funnelling the
signaling to a specific location. This model is widely used by
Carriers and Service providers.

Copyright © 2004 Nortel Networks Essentials of Real-Time Networking


190 Section III: Protocols for Real-Time Applications

H.225.0 (RAS)
Gatekeeper Gatekeeper

H.225.0 (Q.931)

H.225.0 (Q.931)
H.225.0 (RAS)

H.225.0 (RAS)
Endpoint Endpoint

Figure 9-9: Gatekeeper routed model


Direct-routed model (Figure 9-10) In this model, the H.225.0
(Q.931) messages are routed directly between the endpoints,
bypassing the Gatekeeper. This allows for greater scalability as the
Gatekeeper is not participating in the calls themselves. It does not
allow the Gatekeeper to participate in call control. This model is
widely used with Enterprises when centralized control is not
required.

H.225.0 (RAS)
Gatekeeper Gatekeeper
H.225.0 (RAS)

H.225.0 (RAS)

H.225.0 (Q.931)
Endpoint Endpoint

Figure 9-10: Direct-routed model

H.245
After the H.225.0 (Q.931) call has been set up, it is possible to establish the
H.245 control channel in order to establish media sessions and control
sessions.
An H.245 IP address and port number is exchanged as part of the H.225.0
(Q.931) protocol for carrying H.245 over TCP. Unlike H.225.0 (RAS) and
H.225 (Q.931), a dynamic port is used; there is no default H.245 port.

Essentials of Real-Time Networking Copyright © 2004 Nortel Networks


Chapter 9 Call Setup Protocols: SIP, H.323, H.248 191

H.245 starts by a process called Terminal Capability Negotiation


(Figure 9-11), by which the Endpoints exchange their capabilities with
each other, such as voice, video, fax, data collaboration (T.120),and DTMF
transport. It also describes preferences, for example, it could be G.711 is
preferred but G.729 is also allowed. It also negotiates a maximum payload
size for each voice codec: a default value of 20 ms is normally used.

Termina
lCapab
(G.711, ilitySet
G.729)

k
tySetAc
lCapabili
Termina a b ili ty Set
lCap
Termina , G.711)
(G.729

Termina
lCapa bilitySe
tAck

Figure 9-11: Terminal capability negotiation


The next step is to perform the master/slave determination process
(Figure 9-12). This process will result in the random assignment of a
master and a slave in a two-way call. This is purely a mechanism for
conflict resolution when, due to the asymmetric nature of connection setup,
two endpoints simultaneously try to open incompatible codecs with each
other in both directions. This is particularly useful since asymmetric codec
operation (for example, G.729 in one direction, G.711 in the other) is often
not supported.

n
rminatio
laveDete
MasterS

MasterS
laveDete
rminatio
MasterS n
laveDete
rminatio
(Slave) nAck

nAck
rminatio
laveDete
MasterS (Master)

Figure 9-12: Master/slave determination


After the terminal capability set negotiation and the master/slave
determination procedures are completed, the endpoints are free to open
media channels (Figure 9-13). Each endpoint will open a forward media
channel independently. When opening a media channel, each endpoint asks

Copyright © 2004 Nortel Networks Essentials of Real-Time Networking


192 Section III: Protocols for Real-Time Applications

the other endpoint what IP address and port will be used for sending media
on a dynamic RTP/UDP/IP port.

.711)
annel (G
gicalCh
OpenLo

OpenLo
gicalCh
annel (G
.711)
(G.711, OpenLogicalC
RTP/RT ha
CP=192 nnelAck
.168.0.1
:5200/5
201)
k
annelAc 2/4313)
gicalCh 1
OpenLo =10.10.0.2:413)
C P
RTP/RTTCP Media (G
.71
(G.71 1,
RTP/R

RTP/RT
CP Med
ia (G.711)

Figure 9-13: Open media channels


H.245 allows for many mid-call control commands. There are H.245
commands for opening video, T.120 data conferencing or fax channels. It
also has commands for camera control, sending in-band DTMF, resource
reservation, and many others.
One very important building block in H.245 is the “third-party pause and
reroute” procedures. Third party pause and reroute is a process by which an
endpoint (or gatekeeper) can request another endpoint to stop transmission
of media and close all its opened channels. The endpoint of gatekeeper
initiating third-party pause and reroute does so by sending a Terminal
Capability Set with no capabilities listed whatsoever. The receiver of the
“empty capability set” interprets this as the other entity telling it that it can
not support any media channels at all, and thus closes all its forward media
channels. Note that the entity that sent the empty capability does not have
to close its forward media channels, but it may. This process is the “pause.”
Later on, the entity that sent the empty capability set can send a proper
terminal capability set. This “full terminal capability set” contains the
capabilities of the endpoint and may be different from the original terminal
capability set. This full terminal capability set will trigger the master/slave
determination and ultimately the opening of media channels. The media
channels may be the same as the original one, or they can be different. It is
essentially a mid-call “slow start.” Third-party pause and re-route is a
powerful generic mechanism allowing endpoints and gatekeepers to
implement many traditional telephony features such as transfer, ad hoc
conferencing and call hold. Because it is a generic simple “primitive,” it is
extremely popular and does not require each feature to be implemented

Essentials of Real-Time Networking Copyright © 2004 Nortel Networks


Chapter 9 Call Setup Protocols: SIP, H.323, H.248 193

one-by-one, as with H.450. Figure 9-14 illustrates a transfer feature


implemented using third-party pause and rerouting.
H.323 Endpoint A H.323 Transferor B H.323 Endpoint C
Established Voice Call

B initiates the call transfer to C

pty)
ty S et(e m
C ap ab ili
Te rm in al
Term inal Close the audio
C apabili
ty S etA ck connection
with node A
B)
nn el(A <-
gica lC ha
C lo se Lo

C lo se L og
ic alC ha
n ne lA ck
C lo se Lo
gica lC ha
nn el(A ->
B)

nn elA ck Setup (fastStart)


gica lC ha
C lo se Lo
CallProceeding
Alert
Connect (fastStart)
Speech Path between B & C

T er m inal
Make a new call to C apabili
ty S et(B )
node C
A ck
C apab ility
Te rm in al (C )
in al C ap ab iltyS et
T er m

T er m inal
C apa bilty
S etA ck
Master Slave Negotiation
Now B transfers the call to A T er m in al
C ap ab ili
ty S et(E m
pty)

ty A ck
C ap abili
T er m inal
Close the audio C lo seLo
gica lC ha
connection nnel(A <-
B)
with node C
nn elA ck
gica lC ha
C lo seLo
ha nn el(A -> B )
gica lC
C lo se Lo

C lo se Lo
gica lC ha
n nelA ck
iltyS et (C ) T er m in al
Te rm inalC ap ab C ap ab ili
ty S et(A )

T er m inal ty A ck
C apab ilt C ap ab ili
M as te rS
yS etA ck Te rm in al
la ve D ete D et er m inatio n
rm in atio la ve
n M as te rS

m in ation M as te rS
la ve D eter la ve D et
M as te rS e rm in at
io n
M as te rS ck
la veD eter in at ionA
m in ationA la ve D eterm
ck M as te rS
ck M as te rS
m in atio nA la ve D eter
la ve D eter m in ationA
M as te rS ck
O p en Lo <- C )
g icalC ha
nn el (A -> icalC ha nn el (A
C) O pe nL og
O pe nL o
el (A <- C ) gica lC ha
icalC hann nn el (A ->
O pe nL og C)
O pe nL og A ck
ic alC h an hannel
ne lA ck C lo se Lo gica lC

O pen Lo
nn elA ck g ic alC ha
gica lC ha nn e lA ck
C lo seLo

Speech Path between A & C

Figure 9-14: Transfer using third party pause and re-route

Copyright © 2004 Nortel Networks Essentials of Real-Time Networking


194 Section III: Protocols for Real-Time Applications

FastStart and H.245 tunneling


The H.245 procedures are very powerful and were designed to allow for the
largest variety of terminals. In particular, it allows for complex capability
negotiation for video conferencing equipment. However, the procedures are
heavy and slow and are not always suitable for IP telephony. The FastStart
procedures were defined to address this problem. The normal H.245
connection setup procedures are sometimes informally called “slow start,”
as opposed to the FastStart procedures described later.
Starting with H.323 version 2, the “fastStart” (that is, fast connect)
procedures were defined in order to speed up the process of setting up a
call, and getting the participants “talking” as fast as possible.
FastStart is a process that allows for combining call setup and connection
setup in one round-trip. See Figure 9-15.
FastStart allows the caller to include the SETUP message in the first
H.225.0 (Q.931) message, and a list of proposed media channels that it
wishes to open. This proposal consists of a list of H.245
OpenLogicalChannels elements, that is, the same H.245 messages used for
opening channels, using the standard non-FastStart procedures. The
OpenLogicalChannels, however, are interpreted differently. Instead of
being requests to open a specific media channel, it is a proposal for opening
one or more media channels from a list of many. The proposal includes
both forward (caller to callee) and reversed media channels (callee to
caller). Each OpenLogicalChannel includes the media type, codec and IP
address and RTP/RTCP ports for reverse channels, and the media type and
codec for the forward media channel. The callee must pick the
OpenLogicalChannels from the list and return it to the caller, without
modifying the OpenLogicalChannels, except for providing the IP address
and RTP/RTCP ports for the forward channel. The fastStart answer is
provided in any message sent from the callee to the caller in response to the
SETUP message, up to and including the CONNECT message.

SETUP (+1 408 555 1212, h245Tunnel


ling)
fastStart(G.711, G.729, RTP/RTCP=
192.168.0.1:5200/5201)

CALL PROCEEDING
ALERTING
CONNECT
2/4313)
fastStart(G.711, RTP/RTCP=10.10.0.2:431
dia (G.711)
RTP/RTCP Me
RTP/RTCP Media (G.
711)

Figure 9-15: FastStart


It is important to note that with FastStart, the caller proposes both the
forward and reverse media channels, and the callee chooses which channels
to use, for both directions. It is thus, very different from the non-FastStart

Essentials of Real-Time Networking Copyright © 2004 Nortel Networks


Chapter 9 Call Setup Protocols: SIP, H.323, H.248 195

procedures, which are unidirectional (each side setting up their forward


media channels). It is also very important to note that the callee ultimately
makes the decision, but can not change anything from the proposal.
Another key aspect of FastStart is that it does not include a real capability
negotiation. The list of proposed channels does not describe all the
capabilities of the terminal, it describes what the caller is willing to open at
that particular time. If an endpoint is setting up a voice call for example,
there will be no indication that the endpoint supports video, T.120 data
conferencing, fax or even sending DTMF digits. FastStart’s only goal is to
get the endpoints talking as fast as possible. For this reason, most FastStart
calls are followed by the opening of an H.245 media channel in order to
perform proper terminal capability negotiation. The process looks exactly
like a “slow start” except that no opening of media channels is done after
the terminal capability set negotiation and master/slave determination
procedures, and the endpoints act as if the OpenLogicalChannels had been
opened using slow start instead of fastStart (they “inherit” the media
channels). The H.245 channel enables the endpoints to modify the call at a
later time, by adding a video channel, switching to fax transport,
redirecting media using the pause and reroute procedures, and even send
DTMF. None of these things would be possible if the H.245 channel was
not set up. Most practical H.323 implementations today will perform a
fastStart channel opening, and will follow it up with the standard “slow
start” procedures for terminal capability negotiation.
One of the early criticisms of H.323 was that it required two control
channels, each with their own TCP connection: the H.225.0 (Q.931)
channel and the H.245 channel. To make things worst, the H.245 channel
was dynamic and thus difficult to see. For this reason, H.245 Tunneling
was defined in H.323 version 2. H.245 tunneling is a process by which no
separate TCP connection is used for H.245, instead the H.245 messages are
embedded (“tunneled”) inside H.225.0 (Q.931) messages.
It should be noted that the H.245 channel can be set up using H.245
tunneling after fastStart, or a separate H.245 channel can be used.

Call walk-through
Figure 9-16 illustrates a complete call walk-through including H.225.0
(RAS), H.225.0 (Q.931) and H.245, with the normal (“slow start”)
procedures with the gatekeepers operating in direct routed mode.

Copyright © 2004 Nortel Networks Essentials of Real-Time Networking


196 Section III: Protocols for Real-Time Applications

Endpoint A Endpoint B
Gatekeeper A Gatekeeper B
(192.168.0.1) (10.10.0.2)

ARQ (+1 40
8 55 5 1212)
LRQ (+1 408
555 1212)
LCF
.0.2:1720)
ACF (Q.931=10.10
.0.2:1720)
(Q.931=10.10 SETUP (+1 408 555 1212)

555 1212)
ARQ (+1 408
.931=10.10.0.2:1720)
(Q ACF
CALL PROCEEDING
ALERTING
CONNECT (H.245=10.10.0.2:8878)

Termina
lCapab
(G.711, ilitySe
G.729) t

ck
ilitySetA
lCapab
Termina ility Se t
lCapab
Termina , G.711)
(G.729
n
rminatio
laveDete
MasterS
Termina
lCapab
MasterS ilitySetA
laveDete ck
rminatio
MasterS n
laveDete
rmin
(Slave) ationAck

ck
inationA
sterSla veDeterm
Ma (Master)nel (G.711)
an
gicalCh
OpenLo

OpenLo
gicalCh
annel (G
.711)
(G.711, O pe nLogicalC
RTP/RT hannelA
CP=192 ck
.168.0.1
:5200/5
201)
k
annelAc 2/4313)
gicalCh 1
OpenLo =10.10.0.2:413)
1, RTP /RTCPMedia (G.71
(G.71 RTP/RTCP

RTP/RT
CP Media (G
.711)

Figure 9-16: H.323 Call walk-through with slow start


Figure 9-17 illustrates a call walk-through with fastStart, using H.245
tunneling and including the terminal capability negotiation, with the
gatekeepers operating in direct routed mode.

Essentials of Real-Time Networking Copyright © 2004 Nortel Networks


Chapter 9 Call Setup Protocols: SIP, H.323, H.248 197

Endpoint A Endpoint B
Gatekeeper A Gatekeeper B
(192.168.0.1) (10.10.0.2)

ARQ (+1 40
8 55 5 1212)
LRQ (+1 40
8 555 1212)
LCF
.0.2:1720)
ACF (Q.931=10.10
0.0.2:1720)
(Q.931=10.1 SETUP (+1 408 555 1212, h245Tunne
lling)
fastStart(G.711, G.729, RTP/RTCP=
192.168.0.1:5200/5201 )
555 1212)
ARQ (+1 408
.93 1= 10.10.0.2:1720)
(Q ACF
CALL PROCEEDING
ALERTING
CONNECT
2/4313)
fastStart(G.711, RTP/RTCP=10.10.0.2:431
dia (G.711)
RTP/RTCP Me
RTP/RTCP Media (G.
711)

FACILIT
Y{Termina
lCapabili
MasterS tyS
laveDete et (G.711, G.72
rminatio 9),
n} 11),
.729, G.7
a p ab ilitySet (G
inalC rmination}
Y {Term
FACILIT MasterSlaveDete
IL Y IT
13a: FAC inationAck}
ve D ete rm
la
{MasterS
13b: FA
{MasterS CILITY
laveDete
rminatio
nAck}

Figure 9-17: H.323 Call walk-through with fastStart

Other H.323-related protocols


H.323, being an umbrella standard, has a large number of protocols that
may be used by H.323 systems.
H.323 has a very large number of optional Annexes and Appendices that
build on top of the core H.323 protocols. Here is a list of the most widely
used H.323 Annexes and Appendices:
Annex C: H.323 on ATM
Annex D: Real-time Fax
Annex E: Multiplexing of control channels on TCP or UDP
Annex M: Tunneling of protocols such as QSIG and ISUP in H.323
Annex R: Robustness
Appendix II: RSVP and QoS
Appendix V: Use of E.164 and ISO/IEC 11571 Numbering Plan

Copyright © 2004 Nortel Networks Essentials of Real-Time Networking


198 Section III: Protocols for Real-Time Applications

There are also a number of H.323-related recommendations:


H.341: SNMP MIB for managing H.323 entities
H.235: Security for H.323 systems
H.450.X: Supplementary services
H.460.X: Generic H.323 extensions
H.450 Supplementary services, such as Message Waiting Indication, Call
Transfer, and Call Forwarding, are derived from QSIG supplementary
services. However, they are different enough that they have to be rewritten
one-by-one. Also, H.450 supplementary services require specific protocols
for every service. Many vendors often prefer to use the generic “third-party
pause and re-route” primitives to implement most of their features.
H.460.X with Generic H.323 extensions include extensions for number
portability, circuit maps, digits maps, QoS monitoring reporting.

SIP

Architecture
The purpose of SIP is to initiate multimedia sessions. SIP includes user
location, user availability and capability negotiation, session establishment,
and session modification.
SIP allows a user to invite others to a session. For example, Alice would
invite Bob to an IP voice call by sending an INVITE message describing
the voice codec to be used and the IP address and port where the media
stream should be sent. The INVITE message will be routed trough the SIP
network (through proxies, redirection servers and other network elements),
using a location service, and will be presented to Bob. Bob will accept the
invitation and provide his own IP address and port for the media stream.
Figure 9-18 illustrates a SIP session through two SIP Proxy servers.

Essentials of Real-Time Networking Copyright © 2004 Nortel Networks


Chapter 9 Call Setup Protocols: SIP, H.323, H.248 199

Alice Bob
Proxy Server A Proxy Server B

INVITE sip:b
ob@biloxi.com
INVITE sip:b
ob@biloxi.com
INVITE sip:bo
100 Trying b@biloxi.com
100 Trying
100 Trying
180 Ringing
180 Ringing 200 OK
180 Ringing 200 OK
200 OK

ACK
ACK
RTP/RTCP ACK
Media (G.711
)

edia (G.711)
RTP/RTCP M
BYE
BYE
BYE
200 OK
200 OK
200 OK

Figure 9-18: SIP session walk-through using proxy servers


SIP defines a registration mechanism by which Alice and Bob's terminals
will send a REGISTER message to a registrar, in order for its presence and
IP address to be known by the SIP network. This registrar will work with
the location service to locate users.
SIP allows for users to register at multiple locations using multiple aliases.
Bob could, for example, have a SIP phone at work that is always registered.
He could also register dynamically with a “software phone” on his PC,
when traveling. Bob could also have his mobile phone and home phone
(either of which could be either native SIP phones, or could be traditional
phones accessed through a SIP gateway). He could also have a SIP
voicemail system.
Bob could define routing rules used by his SIP proxies that define how to
reach him. For example, Bob could opt for a “sequential search” where the
SIP network would first try to reach his SIP phone at work, and if he
doesn't answer, try his home, cell phone, or soft client. Alternatively, Bob
could opt for a “parallel search” where the SIP network would ring all the
phones at once. He could also opt for a mix, such as searching for him first
at work, then on the mobile phone, soft client and home in parallel, and if
everything fails, sequentially deliver the call to the voice mail system. For
example, this process of sequential or parallel search (called “forking”) can
also be used with multiple users in call centers.
Figure 9-19 illustrates parallel forking.

Copyright © 2004 Nortel Networks Essentials of Real-Time Networking


200 Section III: Protocols for Real-Time Applications

Alice Proxy Server A Bob Bob's Mobile

INVITE sip:b
ob@biloxi.com
INVITE sip:b
ob@biloxi.com
100 Trying
100 Trying
180 Ringing
180 Ringing
INVITE sip:bo
b@biloxi.com

100 Trying
180 Ringing

180 Ringing 200 OK

CANCEL

200 OK
t Terminated
487 Reques
ACK
200 OK

Figure 9-19: Parallel forking


Figure 9-20 illustrates sequential forking.

Alice Proxy Server A Bob Bob's Mobile

INVITE sip:b
ob@biloxi.com
INVITE sip:b
ob@biloxi.com
100 Trying
100 Trying
180 Ringing
180 Ringing CANCEL
Timeout
200 OK
t Terminated
487 Reques
ACK

INVITE sip:b
ob@biloxi.com

100 Trying
180 Ringing
180 Ringing 200 OK
200 OK

Figure 9-20: Sequential forking


SIP does not define “services” on a one-by-one basis, which has have been
the case in other protocols such as ISUP, Q.931, QSIG or H.450. Rather, it
describes a set of “primitives” in the protocol that allow feature developers

Essentials of Real-Time Networking Copyright © 2004 Nortel Networks


Chapter 9 Call Setup Protocols: SIP, H.323, H.248 201

to define their own. This allows for much greater flexibility in the number
of features that can be deployed.
The core SIP specification is RFC 3261. Many other RFCs are necessary
for a functional commercial SIP implementation. For example, SDP
Session Description Protocol (RFC 2327) is used for media session
description, and the RTP Real-Time Protocol (RFC 3550) and its Audio
and Video profile (RFC 3551) are used for transporting media.
SIP is based on an HTTP-like request/response transaction model. The
originator of a request is a Client and the response is provided by the
Server. Because of the peer-to-peer nature of communication, each user in
a communication may make some requests. This means that the SIP
protocol entity acting on behalf of a user (called a User Agent) operate both
as a Client and as a Server, depending on which sides is initiating the
request. The protocol itself frequently uses the terms User Agent Client
(UAC) and User Agent Server (UAS). These terms are only useful from a
protocol point of view, as physical devices will always include both, and be
simply called User Agents (UA).
Requests from a SIP UAC invoke a particular Method. A Method will
generate at least one response. Methods and Responses are called SIP
Messages.
SIP Messages are encoded in text format using augmented Backus-Naur
Form grammar (BNF). SIP is loosely defined as a structured protocol,
although most people don't typically view it that way. The BNF-defined
syntax and grammar layer is the first layer. The second layer is the
transport layer. SIP typically uses TCP or UDP as the transport for SIP
messages, but other transports such as TLS and SCTP are also allowed.

Sidebar: UDP or TCP Transport?


The initial version of SIP 2.0 (RFC 2543) supported both UDP and SIP.
It made UDP transport mandatory and TCP optional. The rationale was
that the UDP made SIP more lightweight and it would allow for better
scalability when a large number of sessions are supported
simultaneously on one physical interface (for example, a large Proxy or
Gateway). A lot of early SIP implementations supported only UDP
transport. However, it became apparent with time that there are a certain
number of things that TCP does better, one of them being congestion
control. Another is that TCP has less difficulty traversing NATs and
Firewalls. Yet another one is that TLS can run only on TCP, and TLS is
the standard way of security SIP. Very large messages might not be
transportable in one UDP datagram and may thus require TCP transport.
For these reasons, RFC 3261 made both UDP and TCP support
mandatory. It is expected that TCP transport will ultimately dominate.

Copyright © 2004 Nortel Networks Essentials of Real-Time Networking


202 Section III: Protocols for Real-Time Applications

The third layer is the transaction layer described earlier, consisting of a


request, and one or more responses. The fourth layer is the transaction user
layer. All SIP components (described later), except a stateless proxy are
transaction users.
Although SIP is defined as a protocol, and the SIP entities are defined in
relation to that protocol, it is useful to understand the role of the different
entities, and how they relate to real-world equipment.

User agent (UA)


The UA is the entity generating the SIP messages on behalf of a “user.”
That user can be human, or an automatic task or service. A UA can reside
on an IP telephone, a PC Multimedia Client, video conferencing
equipment, a voice mail server, or an Analog/TDM circuit-switch gateway.
When making a request, the UA is a UAC, when responding to a request, it
is a UAS. A UA will use SIP to communicate with another UA, a Registrar,
or any Network Element.

Registrar
A Registrar allows a UA to make its presence known to the network. It will
associate an address-of-record URI, with one or more contact addresses
(normally IP addresses). This binding can be done manually, or through a
dynamic mechanism called “Registration.” SIP itself does not specify how
network elements, such as proxies, will use a location service to locate the
users or services based on their URIs. There is an implication that
somehow the registrar stores the binding of address-of-record URI and
contact addresses in a “location server” and that the proxy will somehow
use that location service. In practice, the registrar and proxy are very often
collocated, or have access to the same database.

Redirect server
A Redirect server is a very simple server that responds to a session
invitation by an indication of the requested address-of-record location. It is
up to the UA to then reestablish the session directly to the new URI. The
Redirect Server is not involved in that second session, as it does not stay in
the signaling path. A redirect server is transaction stateful, but it only
understands that particular simple transaction. Figure 9-21 illustrates a SIP
session establishment through a SIP Redirect server.

Essentials of Real-Time Networking Copyright © 2004 Nortel Networks


Chapter 9 Call Setup Protocols: SIP, H.323, H.248 203

Alice Redirect Server Bob

INVITE sip:b
ob@biloxi.com

porarily
302 Moved tem
b@10.10.0.2
contact: sip:bo
ACK

INVITE sip:b
ob@10.10.0.
2

100 Trying
180 Ringing 0.2
:bob@10.10.
contact: sip
200 OK

ACK

Figure 9-21: Redirect server

Proxy server
SIP proxies are elements that route SIP requests to user agent servers and
SIP responses to user agent clients. It routes SIP messages and therefore
stays in the signaling path. Proxy servers have a lot of flexibility in the
amount of “state” information it keeps, depending on what its function is.
One key point with proxy servers is that it is only allowed to modify very
specific parts of the SIP messages, mainly related to routing. SIP forbids
proxies to modify most of the message content (for example, the SDP is not
allowed to be modified). SIP was very much written with end-to-end
transparency in mind. Figure 9-22 illustrates a SIP session establishment
through a SIP Proxy server.

Alice Proxy Server


Bob

INVITE sip:b
ob@biloxi.com
INVITE sip:b
ob@biloxi.com
100 Trying
100 Trying
180 Ringing
180 Ringing 200 OK
200 OK
ACK
ACK

Figure 9-22: Proxy server

Stateless proxy server


A stateless proxy server is the simplest form of proxy servers, it does not
maintain transaction state. It is basically a message router forwarding

Copyright © 2004 Nortel Networks Essentials of Real-Time Networking


204 Section III: Protocols for Real-Time Applications

requests and responses blindly. It does not understand the concept of a


transaction, let alone a session (a call).

Transaction stateful proxy server


A transaction stateful proxy server is aware of the transaction state, that is,
the request and responses. It does not understand the concept of a session.

Call stateful proxy server


A call stateful proxy server maintains session state awareness. It
understands, for example, that an Invitation transaction starts a call, and
that a Bye transaction terminations it.

Back-to-back user agent (B2BUA)


As its name implies, a B2BUA is simply an entity that completely
terminates a SIP session on one side, and starts another one on the other
side. A B2BUA is technically not an entity precisely defined in SIP.
B2BUA was defined because many implementers realized that numerous
useful SIP services can not be implemented, while preserving SIP message
transparency (as a proxy would). B2BUA’s can do pretty much anything
they wish, provided that they look like a UA on both sides. They are much
more complicated than call stateful proxy servers, as they maintain call
state from two sessions, and correlate them together. They can be used for
Application servers (for example, call centers, conference bridges, and
voice mail), as well as, for special routing services where the IP addresses
of the UAs are hidden from each other. Figure 9-23 illustrates a sample SIP
session establishment initiated by a B2BUA.

UA A B2BUA UA B

b@biloxi.com
INVITE sip:bo
100 Trying
180 Ringing
200 OK
INVITE sip:b
ob@biloxi.com
ACK
100 Trying
180 Ringing
200 OK

ACK

Session A to left side Session B to right side


of B2BUA of B2BUA

Figure 9-23: B2BUA

Essentials of Real-Time Networking Copyright © 2004 Nortel Networks


Chapter 9 Call Setup Protocols: SIP, H.323, H.248 205

Different SIP network elements can be used for different purposes. A


Redirect server can be used for a completely distributed routing system.
Proxies are very useful for “funnelling” signaling through a specific
location. A Stateless proxy server or a Transaction Stateful Proxy Server
can be used for a simple routing service where billing and call detailed
recordings are not necessary. A Call Stateful Proxy Server can be used for a
routing service where billing or call detailed recordings are required.
B2BUAs can be used for Application Servers.

SIP messages
SIP Messages are defined using a syntax inspired from HTTP/1.1,
however, SIP is not an extension of HTTP.
Requests contain a method name and a Request-URI (Universal Resource
Identifier). The Request-URI indicates the user or service to which the
request is being addressed (see Table 9-1).

SIP Request Methods

Defined in RFC 3261: Defined in other RFCs:


REGISTER PRACK
INVITE REFER
ACK SUBSCRIBE
BYE NOTIFY
CANCEL
OPTIONS
INFO
Table 9-1: SIP request methods

Copyright © 2004 Nortel Networks Essentials of Real-Time Networking


206 Section III: Protocols for Real-Time Applications

Responses include a response code and phrase. The phrase is a textual


explanation, of the response with no particular meaning from a protocol
point of view (see Table 9-2).

SIP Responses

1xx: Provisional, Most common are as 3xx: Redirection, Most common are:
follows: 303 “Move Temporarily”
100 “Trying” 4xx: Client Error, Most common are:
180 “Ringing” 404 “User not found”
183 “Call Progress” 486 “User Busy”
2xx: Success, Most common is: 5xx: Server Error
200 “OK” 6xx: Global Failure

Table 9-2: SIP responses


All nonprovisional responses are called final responses.
SIP Messages include Header fields similar to HTTP header fields. They
include “To:”, “From:”, and “Contact:”.
REGISTER is used by a UA to dynamically provide its registrar with an
association between its address and its address-of-record URI and makes
its presence known.
A session is established by a UA sending an INVITE request. After
receiving one or more provisional responses, the UA will receive a final
response. In a typical session, the first provisional response will be a 100
Trying, followed typically by a 180 Ringing. When the session is
completed (that is, when the call is answered) a 200 OK final response is
received. The sender of the INVITE has to acknowledge the final response
with an ACK message.
An established session can be modified by sending an INVITE, modifying
the parameters of an existing call. This type of INVITE is known as a
“re-INVITE” and can be sent by either side, regardless of which side sent
the initial INVITE. That re-INVITE may modify the SDP for example, to
add a video channel to an audio session or to change a voice codec.
A session is terminated by sending a BYE message.
An OPTIONS request can be used to query the capabilities of another UA
at any time, inside the context of an established session or outside of it.
INVITE transactions are acknowledged by sender of the INVITE by
sending an ACK request. ACK is a special type of method since it does not
trigger a response.

Essentials of Real-Time Networking Copyright © 2004 Nortel Networks


Chapter 9 Call Setup Protocols: SIP, H.323, H.248 207

Since SIP does not mandate a reliable transport, reliability is handled at the
SIP level itself. INVITE requests are retransmitted using an exponential
back-off until there is a response, or until there is a timeout. Final responses
to INVITE are also repeated until acknowledge by an ACK, or until there is
a timeout.

SDP and the offer/answer model


The Session Description Protocol (SDP) is a text-based protocol used to
describe the media characteristics of a session. The SDP is included in a
SIP message as a MIME Body.
SDP includes information such as the IP address for of a media
termination. It is also used to describe the type of media (for example,
audio or video), the codec to be used (G.729, G.711, etc.), the transport
(RTP for multimedia), and well as other details of a session. Other
capabilities that can be described in SDP include, parameters for
transmission of DTMF and audio tones in-band in special RTP packets
using RFC 2833, multicast backbone information, and timing information.
SDP also supports IPv4 and IPv6, as well as other transports such as ATM.
SDP is used by many protocols, including the Media Gateway Control
Protocol (MGCP), the Gateway Control Protocol (MEGACO/H.248), the
Real-Time Streaming Protocol (RTSP), the Session Announcement
Protocol (SAP), and of course SIP.
As its name indicates, SDP merely describes the characteristics of a
session. It does not describe how and when media sessions are established.
Merely looking at an SDP description is not sufficient to know if it
represents an existing media session, a request to open media channel(s) or
if it represents the capabilities of a device. SDP relies on the protocol using
it to establish the media sessions, and infer the meaning of the SDP itself.
This means that it can not simply pass along an SDP from one protocol to
another.
Early versions of SIP were very “loose” in the way media sessions were
established, which led to many interoperability problems. Some
implementations considered the SDP in response to the initial SDP in the
invitation to be a description of capabilities of the UA, others considered it
to be the selected media session. After many years of confusion, SIP finally
agreed on the Offer/Answer model.
The Offer/Answer model in theory, could be used by any protocol that uses
SDP. At this point however, it is only used by SIP.
In the Offer/Answer model, a UA sends an SDP in a SIP message to
another UA with which it is involved in a session. That initial SDP is called
the Offer. The Offer contains a proposal for a media session. It will indicate
which type of media streams the offering UA is proposing to open, audio,
video, application sharing, or a combination of them. It also describes the

Copyright © 2004 Nortel Networks Essentials of Real-Time Networking


208 Section III: Protocols for Real-Time Applications

characteristics of these media streams, for example, which codecs it


supports for an audio call (for example, G.711 or G.729). It will also
describe the IP addresses and port to be used for receiving the media. The
offer can also indicate if a stream is bidirectional (send/receive), or
unidirectional (send or receive). An offer can consist of multiple
alternatives. An offerer may, for example, indicate that it is willing to set up
a bidirectional G.711 audio stream or a G.729 audio stream (but not both at
the same time), and an H.261 video session. In summary, the Offer is a list
of proposed media stream alternatives.
The UA receiving the Offer may accept or reject an Offer. Acceptance of an
Offer is done by sending an Answer. The Answer is based on the Offer, it
must pick streams from the Answer. As such, it does not necessarily
represent the full set of capabilities of the answerer, it represents what the
answerer is willing to establish. The answer can not add streams that were
not included in the offer. The answered is expected to add only information
that was not known by the offerer (for example, the IP addresses and ports
where the media streams should be sent).
Typically, an answerer would pick the one session description it is willing
to accept, for example, pick one voice codec from a list of alternative.
However, it is allowed to select more than one, in which case it is leaving
the offerer the choice to use any of them (and even to alternate between
them without warning). If the answerer is not willing to received voice
alternating from one codec to another without warning, it has to explicitly
pick one.
After an Offer/Answer exchange is completed, both ends can establish
media sessions as per the negotiated parameters.
In a SIP session, the SDP is carried as a MIME Body within a SIP message.
Typically, the Offer is included in the INVITE. The answer must be in the
first reliable nonfailure response, that is, the 2XX, or an 18X if reliability
of provisional response is used. See Figure 9-24.

Essentials of Real-Time Networking Copyright © 2004 Nortel Networks


Chapter 9 Call Setup Protocols: SIP, H.323, H.248 209

Alice Bob
(192.168.0.1) (10.10.0.2)

INVITE sip:bo
SDP (A) b@biloxi.com
Offer=SDP(A)
G.711, G.729
RTP/RTP=192.168.0.1:5200/5201)
100 Trying
g
180 Ringin

200 OK
(B)
Answer=SDP
SDP (B)
G.711
RTP/RTP=10.10.0.2:4312/4313
ACK

RTP/R
TCP M
edia (G
.711)
a (G.7 11)
CP Medi
RTP/RT

Figure 9-24: Offer/answer in SIP session establishment


If the offer is not in the INVITE, the offer will come from the UAS in the
first reliable non-failure response (that is, in the same message that would
normally be used for the answer), and the answer will be in the
acknowledgment for that message. In a typical session, the Offer is in the
INVITE, and the answer is in the 200 when the far-end “answers” the call.
See Figure 9-25.

Alice Bob
(192.168.0.1) (10.10.0.2)

INVITE sip:bo
b@biloxi.com

100 Trying

180 Ringing

200 OK
)
Offer=SDP(A
SDP (A)
G.711, G.729
RTP/RTP=192.168.0.1:5200/5201)
ACK
SDP (B) Answer=SDP
(B)
G.711
RTP/RTP=10.10.0.2:4312/4313
RTP/R
TCP M
edia (G
.711)
edi a (G .711)
CP M
RTP/RT

Figure 9-25: Offer/answer in SIP session establishment, delayed


It is possible for either side involved in the session to modify a session by
sending another Offer, typically in a re-INVITE, which will trigger another

Copyright © 2004 Nortel Networks Essentials of Real-Time Networking


210 Section III: Protocols for Real-Time Applications

Answer. However, this can only be done after the initial offer has been
answered.
The MMUSIC is currently defining a next generation SDP protocol for
Session Description and Capability Negotiation called SDPng. Unlike the
current SDP, it will have a clear distinction between capabilities and
session parameters. Instead of defining its own syntax, SDPng uses XML.

Early media, PRACK and UPDATE


In order to interwork with the PSTN, it is necessary to support the concept
of “early media.” Early media is used in the PSTN to provide media before
a call is accepted. A typical usage of early media is to provide “in-band”
ringing tones or an announcement. Since early media occurs before the call
is accepted, it is not possible to wait for a 200 OK. The concept of early
media in SIP allows for an answer to be provided in an 18X provisional
response (typically a 183 Progress), thus allowing media to be provided to
the caller before the called party answers.
However, provisional responses are not sent reliably. If the 183, for
example, is lost, there will be no answer and the session will fail. The
reliability of provisional responses introduces the PRACK method which
allows for confirmation of the 18X response. This allows the sender of the
18X message to ensure that it is received by the caller. Figure 9-26
illustrates early media.

Alice Bob
(192.168.0.1) (10.10.0.2)

INVITE sip:bo
b@biloxi.com
SDP (A) , Supported:
Offer=SDP(A 100rel
G.711, G.729 )
RTP/RTP=192.168.0.1:5200/5201)
100 Trying
0rel
required: 10
n Progress,
183 Sessio DP (B )
Answer=S
SDP (B)
G.711
RTP/RTP=10.10.0.2:4312/4313
PRACK

200 OK
)
ia (G.711
P Early Med
RTP/RTC

200 OK

ACK
RTP/RTCP
Media (G.7
11)

Figure 9-26: Early media


The PRACK method has other usages as well. It also applies to provisional
responses regardless of early media. For example, in a case where no

Essentials of Real-Time Networking Copyright © 2004 Nortel Networks


Chapter 9 Call Setup Protocols: SIP, H.323, H.248 211

in-band early media is used, the caller may very well ensure that the 180
Ringing with no SDP is delivered reliably, in order to ensure that proper
alerting treatment be provided to the caller (that is, the let the caller know
the “phone is ringing”). Another usage for PRACK is to allow other pre-
conditions to apply before setting up the media session, for example,
resources may need to be reserved (through RSVP, or a circuit-based
transport like ATM).
Another twist on early media is that it is sometimes necessary to modify an
Offer before the call is answered (for example, when a preanswer
announcement is provided). Since modifying an INVITE (through a
re-INVITE) is not allowed by SIP before the first INVITE is accepted
through a 200 OK, the UPDATE method was introduced to allow a client to
update the parameters of a session (such as the Offer or Answer) but has no
impact on the state of a dialogue. In that sense, it is like a re-INVITE, but
unlike re-INVITE, it can be sent before the initial INVITE has been
completed. The IETF preferred to introduce this new method to modify the
INVITE method to allow this behavior for backward compatibility reasons.

REFER
The REFER method allows for a UA to “refer” another UA to the resource
provided in the REFER request and for the UA to be informed of the result
of the referral. It is a very generic and powerful primitive that has a very
large number of usages.
For example, it can be used to implement a call transfer feature. For
example, if Alice is in a call with Bob and decides Bob needs to talk to
Carol, Alice can tell her SIP UA agent to send a REFER to Bob's UA with
Carol's Contact information. Bob's UA will attempt to call Carol. Bob's UA
will then report with a notification whether it succeeded in reaching the
contact to Alice's UA. Figure 9-27 details this scenario.

Copyright © 2004 Nortel Networks Essentials of Real-Time Networking


212 Section III: Protocols for Real-Time Applications

Alice
Bob Carol

INVITE sip:bo
b@example.co
m

100 Trying

180 Ringing
200 OK
ACK

re-INVITE sip
:bob@exam
(hold existing ple.com
call)
200 OK
REFER sip:bo
Refer-To: sip
b@ biloxi.com
:carol@examp
le.com
d
202 Accepte INVITE sip:ca
rol@exam ple.com

100 Trying
180 Ringing
200 OK

ACK
NOTIFY
200 OK

BYE

200 OK

Figure 9-27: Alice calls Bob and transfer to Carol


REFER can also be used to refer to other resources, for example, a
conference bridge. REFER can also be used outside the context of a
session. Alice could tell his UA to send a REFER to Bob with Carol's
Contact information. Bob will be asked by his UA if he wishes to talk to
Carol before the call is attempted from Bob to Carol.
REFER is not limited to SIP. When a REFER is referring to a sip resource,
the request URI contains a SIP URI. However, it is possible to refer to any
type of URI. For example, one could refer to an HTTP URI (useful for
implementing a Web push), an H.323 URI, or even an e-mail address.
REFER is a very versatile and powerful tool. It is essential that security
holes not be opened by it. One could imagine unsolicited REFERs being
sent to force people to call others. For this reason, it is very important when
using REFER to have proper mechanisms to ensure that the requests are
valid. In many cases, this means that the receiver of the REFER request
will have to explicitly accept the REFER before the resource is accessed
(through a pop-up window for example). In other cases, the sender of the
REFER must be authenticated and authorized.

Essentials of Real-Time Networking Copyright © 2004 Nortel Networks


Chapter 9 Call Setup Protocols: SIP, H.323, H.248 213

SUBSCRIBE/NOTIFY and SIMPLE


The SUBSCRIBE/NOTIFY allows for a UA to request asynchronous
notification of events, through a subscription mechanism. It is a very
generic tool that can be used for many purposes. The SUBSCRIBE method
allows for a UA to request that it be informed of events as requested in the
SUBSCRIBE message body. The NOTIFY method is the mechanism by
which the UA will be informed of the event.
The event reported can be reported by a third-party. The SUBSCRIBE/
NOTIFY mechanism is generic enough that it can apply to a variety of
applications. Examples of applications that can use the SUBSCRIBE/
NOTIFY mechanism include:
Automatic Callback Service (based on terminal service)
Presence (based on “Friends lists”)
Message Waiting Indication (based on mailbox state change)
Keypad presses (based on user input)
Conference service (based on conference status)
In certain cases, an “implicit” subscription exists; for example, REFER
means that the sender of the REFER will be notified of the status of its
request with a NOTIFY message without requiring sending a SUBSCRIBE
message.
A NOTIFY message includes a message body whose content and format
depends on the application. Very often, it will be an XML description.
SIP for Instant Messaging and Presence Leveraging Extensions (SIMPLE)
makes use of the SUBSCRIBE/NOTIFY methods to support Presence
(similar to a “friend or buddy list”). SIMPLE also defines procedures for
Instant Messaging, which is independent of Presence. Figure 9-28
illustrates a simple example where Alice subscribes to Bob presence status
and is informed of two status changes.

Copyright © 2004 Nortel Networks Essentials of Real-Time Networking


214 Section III: Protocols for Real-Time Applications

Alice Bob

SUBSCRIBE
Bob's Presen
ce Status
200 OK
Bob sets his status
to "unavailable"
NOTIFY
ble
status=unavaila
200

Bob sets his status


to "available"
NOTIFY
status=available
200

Figure 9-28: Subscription to presence status and notification

SIP-T
SIP for Telephones (SIP-T), is a set of practices describing the use of SIP
for interoperability with SS7 PSTN gateway. The main idea behind SIP-T
is that SS7 PSTN gateways can use SIP to set up “calls” and maintain total
PSTN transparency, by allowing ISUP to be encapsulated in SIP messages
as a MIME Body. SIP-T is thus, SIP and ISUP at the same time. It is an
“Inter Call Server” protocol. Tunneling ISUP inside SIP messages has an
advantage over backhauling ISUP independently, as it maintains
intrinsically the association between the SIP session and the PSTN call.
ISUP messages are tunneled in corresponding SIP messages when possible
(for example, an IAM in an INVITE), and in the INFO method defined for
this purpose, when no other SIP message is appropriate. SIP-T is well
suited to Carrier equipment migrating from TDM SS7 architecture to SIP
IP architecture. As such, SIP-T is largely limited to “Carriers” as ISUP is
not widely deployed in Enterprises.
Figure 9-29 illustrates the concept of tunneling ISUP in SIP.

Essentials of Real-Time Networking Copyright © 2004 Nortel Networks


Chapter 9 Call Setup Protocols: SIP, H.323, H.248 215

PSTN PSTN
Gateway Gateway
SIP UA SIP SIP SIP UA

IP Network

ISUP ISUP
Termination Termination

ISUP ISUP

PSTN

Figure 9-29: SIP-T


Other protocols, such Q.931, QSIG or proprietary PBX protocols can also
be tunneled in SIP using the same mechanism as SIP-T (although it
wouldn't really be called SIP-T).
SIP-T has the advantage in that it is an addition to SIP. A UA that does not
understand ISUP (such as an Enterprise SIP Gateway or a SIP phone) could
ignore the tunneled ISUP information and treat the call as a normal SIP
session.
However, with tunneled protocols approaches such as SIP-T, the tunneled
protocol can not conflict with SIP. Great care has to be taken to ensure
consistency. Over time, useful ISUP features are likely to have an
equivalent or superior version defined natively in SIP. SIP-T can thus be
viewed as a “transition” protocol (although with a potentially very long
transition).

Other SIP-related protocols capabilities


There are numerous other protocols and capabilities related to SIP in
existence or in development.
A mechanism for compressing SIP signaling is defined for environments
where the size of SIP messages is a concern (including for example low
bandwidth wireless interfaces).
Scripting languages such as the Call Processing Language (CPL) can be
used to define services and policies on a SIP proxy or redirection server or
B2BUA.

Copyright © 2004 Nortel Networks Essentials of Real-Time Networking


216 Section III: Protocols for Real-Time Applications

ENUM allows an ENUM client to perform a reverse DNS query on an


E.164 telephone number. The ENUM server then responds with contact
information consisting of Uniform Resource Identifiers (URI). Since this
URI would typically be a SIP URI, ENUM allows for reaching a SIP user
by knowing only a phone number. It permits not only to bypass the PSTN,
but to also use SIP advance multimedia capabilities. With ENUM, an
Enterprise could publish the E.164 numbers that can be directed to its
corporate proxy server. It would allow anybody to reach any of those
numbers through SIP.
KPML is a stimulus protocol currently being defined, using the
SUBSCRIBE/NOTIFY mechanism to allow for reporting key presses on a
phone to a third party for providing services. For example, it would allow a
service provider to provide a “calling card” service based on pressing key
presses on a SIP phone, even if the entity offering the service is not in the
media path (and could therefore not use the in-band RFC 2833 DTMF in
the RTP stream). Other SIP protocols are being defined for Conferencing.
SIP is continuously being built upon.

Comparison of SIP and H.323


H.323 and SIP are two very different protocols. H.323 uses binary ASN.1/
PER encoding, which puts most of the complexity in the encoding and
decoding of the messages, while SIP uses text representation, which puts
most of the complexity in the parsing. Countless studies have been
published showing why one protocol is superior to the other. Very often,
these studies were based on comparison with an older version of the
protocols that lacked certain features. Through time, H.323 and SIP fed on
each other, adopting the capabilities of the other protocol that it lacked.
This proved to be a very beneficial process, as it resulted in two more
powerful and robust protocols. It also ensured that interoperability between
the two was feasible, even if it is not necessarily easy. On the efficiency and
performance front, the two protocols are now very similar because H.323
became more streamlined with time and SIP had to add extra protocol to be
more usable.
It is sometimes useful to draw the following parallels (see Table 9-3).

H.323 SIP

H.225.0 SIP

H.245 SDP

Endpoint User Agent (UA)

Gatekeeper (for Registration) Registrar

Essentials of Real-Time Networking Copyright © 2004 Nortel Networks


Chapter 9 Call Setup Protocols: SIP, H.323, H.248 217

Gatekeeper (Direct-routed model) Redirect server

Gatekeeper (GK-routed model) Proxy Server

Call Session

Registration (RRQ) Registration (REGISTER)

Admission Request (ARQ) INVITE

SETUP INVITE

CALL PROCEEDING 100 Proceeding

Out-of-band ALERTING 180 Ringing (no SDP)

In-band ALERTING (Progress 183 Progress (with SDP)


Indicator)

CONNECT 200 OK

RELEASE COMPLETE BYE

H.450 Transfer Request or REFER


FACILITY redirection

FastStart Offer/Answer start in first


INVITE

SlowStart Offer/Answer start in response to


INVITE

H.245 Capability negotiation OPTIONS

Third party pause and re-route Re-INVITE with sendonly/


(TCS=0) inactive

H.245 media channel modifications Re-INVITE/UPDATE

H.323 Annex M - QSIG & ISUP SIP-T and ISUP & QSIG
Tunneling Tunneling
Table 9-3: Parallels between the protocols
H.323 is well-defined protocol suite with roots in both video conferencing
(H.320) and telecommunications (Q.931). These roots make it palatable to
software writers who are familiar with one or the other sector.
SIP is a simpler and more flexible protocol that has its roots in Internet
Protocols and as such, is much more palatable to software writers who
write for the Internet. SIP also makes it easier to integrate with other
software applications than H.323. SIP is a better protocol for simple

Copyright © 2004 Nortel Networks Essentials of Real-Time Networking


218 Section III: Protocols for Real-Time Applications

applications that can conceivably use a smaller subset of SIP than they
could of H.323.
H.323 includes well-defined procedures for video-conferencing, while SIP
is starting to include the protocol necessary for commercial video
conferencing. H.245 is a much more feature rich protocol than SDP.
H.323 has a larger installed based today. SIP does not have a large installed
based yet, but most of the development dollars are being spent on SIP, and
not H.323.
Today, the overwhelming majority of protocol development taking place at
the Standards level is on SIP and SIP-related protocols. H.323 is essentially
in a maintenance mode, and only minor additions are actively worked on.
The amount of work devoted to SIP is blossoming. The original IETF
MMUSIC Working Group was so swamped with SIP work that it spun-off
the SIP Working Group who itself spun-off a SIPPING Working Group
(SIP focuses on the core SIP protocol, while SIPPING focuses on noncore
aspects). There is now a SIMPLE Working Group for SIP Presence and
Instant Messaging, and an XCON for Centralized Conferencing. Certain
groups are also defining extensions or writing protocols with SIP in mind,
and not H.323. Some examples are Firewall and NAT traversal solutions in
the MIDCOM, MMUSIC and NSIS Working Groups, or telephony-number
related work in the ENUM or IPTEL Working Groups. Other features such
as Presence and Instant Messaging as defined by the SIMPLE Working
Group can have “equivalent” in H.323, but are much more appropriate in a
SIP environment. IPv6 transition mechanisms are also more usable by SIP
than H.323.
At the development level, the situation is similar. R&D investments are
pouring in SIP, while H.323 development is more in the established phase.
SIP will certainly become the most pervasive protocol for most people and
most applications, but there will still be vast amounts of H.323 systems out
there and it may well remain dominant in certain markets for the
foreseeable future.

Essentials of Real-Time Networking Copyright © 2004 Nortel Networks


Chapter 9 Call Setup Protocols: SIP, H.323, H.248 219

Gateway Control protocols

Figure 9-30: Network topology

Sidebar: Evolution of gateway control protocols


Gateway control protocols have had a long, confusing, and convoluted
history. This complicated evolution is not well understood, nor has it
been well represented in the press. The next few paragraphs clearly
outline the evolution of gateway control protocols and provide a context
for Megaco/H.248.
Beginning in early 1998, several competing gateway control approaches
were proposed by many vendors and researchers. While the need was
clear, there was no consensus on the best approach.
Toward the end of 1988, the IETF formed the MEdia GAteway COntrol
(Megaco) Working Group with the strong mandate to provide an open
standard for IP-based gateway device control, using master/slave
principals. Nortel was, and still is, Chair of this Working Group.
There were several proposals brought forward and discussed, the early
leaders being MDCP (media device control protocol) and MGCP (media
gateway control protocol). Although some early products were deployed
using MGCP, none of the early proposals were accepted in their entirety
by IETF Megaco WG or other open standards bodies.

Copyright © 2004 Nortel Networks Essentials of Real-Time Networking


220 Section III: Protocols for Real-Time Applications

Sidebar: Evolution of gateway control protocols (Continued)


Instead, key aspects of the MGCP and MDCP proposals, along with
many other inputs, were integrated into the Megaco WG development
stream, choosing the best aspects from each. The end result was that, by
March 1999, the Megaco Protocol was created.
Parallel to the efforts of the IETF, the ITU-T was evaluating a number of
options, and in May 1999, the ITU-T Study Group 16 (SG-16) initiated
an H-series gateway control protocol project, at that time called H.GCP,
and later designated H.248.
The IETF and ITU SG-16 began to work on a compromise approach
between the Megaco Protocol and H.GCP. In the summer of 1999, an
agreement was reached between the two organizations to create one
international standard—Megaco/H.248 Protocol was born. During the
following year, considerable effort was expended to perfect the solution,
and Megaco/H.248 Protocol was approved by both standards bodies in
June 2000.
While the IETF and ITU SG-16 were developing their compromise
approach, the IETF was also asked by MGCP supporters to release a
Request for Comments document on MGCP, for information purposes
(Informational RFC). In October 1999, the document was released but is
not an “official” IETF standard, nor was it accepted by the ITU-T SG-
16. Although currently supported by the International Softswitch
Consortium (ISC), MGCP has failed to achieve consensus in the open
standards bodies and is therefore not a true, open standard.
The PacketCable* NCS protocol also evolved out of the MGCP
approach, and NCS1.0 became the de facto standard for analog phone
control behind cable modem networks. In the first quarter of 2000,
PacketCable was successful in having the NCS specification accepted by
the ITU-T SG-9, which saw its transition from de facto standard to
legitimate standard for analog telephone control behind cable access, as
Recommendation J.162.
Megaco/H.248 remains the only truly open international standard for
master/ slave control of all media gateway device types. Megaco/H.248
continues to be further improved and developed over time, with ongoing
efforts in both IETF and ITU-T as well as other standards organizations.

Megaco/H.248 overview
As shown in Figure 9-31, the architecture of Megaco/H.248 is based on the
media gateway control (MGC) layer, the media gateway (MG) layer, and
the Megaco/H.248 Protocol itself.

Essentials of Real-Time Networking Copyright © 2004 Nortel Networks


Chapter 9 Call Setup Protocols: SIP, H.323, H.248 221

Figure 9-31: Megaco / H.248 gateway control architecture


The MGC layer contains all call control intelligence and implements call
level features such as forward, transfer, conference, and hold. This layer
also implements any peer-level protocols for interaction with other MGCs
or peer entities, manages all feature interactions, and manages any
interactions with signaling, such as SS7.
The MG layer implements media connections to and from the packet based
network (IP or ATM), interacts with those media connections and streams
through application of signals and events, and also controls gateway device
features, such as user interface. This layer has no knowledge of call level
features and acts as a simple slave, but it is the Media Gateway that
establishes the end to end connection, not the Media Gateway Controller.
The Megaco/H.248 Protocol drives master/slave control of the MG by the
MGC. It provides connection control, device control, and device
configuration. Because the Megaco/H.248 Protocol is separate from, and
independent of the peer call control protocol (for example, SIP or H.323),
different systems can be used at the call control level with minimal cost
impact on the gateway control layer.

Call walk through


Figure 9-32 illustrates a simplified message flow for the Megaco protocol.
The Media Gateway detects the off-hook condition and advises the Media
Gateway Controller (MGC). The MGC instructs the Media gateway to
apply dial tone and to collect digits for the dialed number. The MGC then
advises the far end (Service Data Point) how to set up the call. While
waiting for the far end to answer, the MGC will receive and pass on the

Copyright © 2004 Nortel Networks Essentials of Real-Time Networking


222 Section III: Protocols for Real-Time Applications

Service Data Point information needed to establish the call and will signal
the Media gateway that audible ringing has been established. When the far
end answers, the MGC will be notified and will pass this on to the Media
Gateway to establish the two way talk path.

Media
Media
Gateway
Gateway
Controller
Lift Handset
Notify: "Off Hook"

Req Notify: "dial tone"


Dial Tone

Dial Digits
Notify: "digits"

Create Connection

Ack: Connection Servcice


Data Point Info
Call Setup to far
end (multiple
protocol options)
Modify Connection: Far end Service Data
Point info, signal ringing
Audible Ringing
Far End
Answers
Modify Connection: Send-Receive

IP Message Path Talk Path

Hang up
Notify: "on hook"

Delete Connection

Figure 9-32: Megaco / H.248 Call walk through

The Megaco/H.248 protocol in more detail


Megaco/H.248 uses a connection/resource model to describe the logical
entities or objects within the Media Gateway (MG) that can be controlled
by the Media Gateway Controller (MGC). It is fundamentally based on two
key concepts, termination and context.

Terminations
Terminations identify media flows or resources, implement signals,
generate events, have properties and maintain statistics. They can be
permanent (provisioned) or transient (ephemeral). All signals, events,

Essentials of Real-Time Networking Copyright © 2004 Nortel Networks


Chapter 9 Call Setup Protocols: SIP, H.323, H.248 223

properties, and statistics are defined in packages that are associated with
the individual terminations.

Contexts
As shown in Figure 9-33, context (C) refers to associations between
collections of terminations (T), defines the communication between the
terminations, and acts as a mixing bridge. A context can contain more than
one termination and can be layered to support multimedia.

Figure 9-33: Megaco / H.248 connection / resource model


The command structure of Megaco/ H.248 has only seven commands—
Add, Subtract, Modify, Move, Notify, AuditValue/AuditCapabilities, and
ServiceChange. One level of event/signal embedding is allowed, such that
trigger events can cause reflex actions in the MG.
Commands between the MGC and the MG are grouped together into
transactions using simple construction rules that result in small messaging
overhead. Commands use descriptors to group related data elements. Only
those descriptors needed for the particular intended operations need to be
sent along with commands.

Megaco/H.248: connection model


Based on two key concepts—Termination and Context
Terminations represent media connections to/from the packet
network, allow signals to be applied to the media connections, and
events to be received from the media connections
Contexts implement bridging and mixing of the media flows
between terminations
Only seven commands: Add, Subtract, Move, Modify, Notify,
Audit, ServiceChange
Grouping of commands into transactions, using flexible
construction rules

Copyright © 2004 Nortel Networks Essentials of Real-Time Networking


224 Section III: Protocols for Real-Time Applications

Commands use descriptors to group related data elements


Package extension mechanism provides a clear, simple and openly
extensible method to specify signals, events, properties and
statistics on terminations
Profile interoperability mechanism defines MG organization and
specific selection of optional elements for particular applications

Sidebar: Terminations
All signals and events are assumed to occur at a specific termination and
they provide a mechanism for interacting with the remote entity
represented by that termination. Specific signals and events are defined
in packages. Examples of signals include tone generation, playing of
announcements, and the display of caller identity. Examples of events
include line off-hook, DTMF digit received, and fax tone detected.
Properties are defined within the Megaco/H.248 Protocol in two ways.
The term can be assigned to any piece of information that may be placed
into a descriptor in either a request or a response. The term can also
apply to package definition where properties act as state, configuration,
or other semistatic information regarding the termination to which the
package is attached.
Statistics can be accumulated at particular terminations and returned
from the Media Gateway (MG) to the Media Gateway Controller (MGC)
to provide information relevant to monitoring of the MG, network
performance or user activity. Statistics are also defined in packages.
Examples of statistics include, number of bytes sent and received while
in a context, duration of a termination in a context, packet loss rate and
other operational measurements.

Sidebar: Packages and profiles


Packages are the primary extension mechanism within Megaco/H.248.
Packages define new termination behavior by way of additional
properties, events, signals and statistics. Packages use the Internet
Assigned Numbers Authority (IANA*) registration processes. New
packages can be defined based on existing packages (“extends”
mechanism, version mechanism).
Profiles define applications of Megaco/H.248 at the MG, including
package/termination organization and requirements, specific selections
of optional elements (such as transport and encoding), and any other
behavior definition required for the application. Profiles are application
level agreements between the MGC and MG that specify minimum
capabilities for improved interoperability and reduced complexity/cost.

Essentials of Real-Time Networking Copyright © 2004 Nortel Networks


Chapter 9 Call Setup Protocols: SIP, H.323, H.248 225

What you should have learned


There are distinctions between real-time transport protocols (such as RTP/
RTCP), peer-to-peer call/session control protocols such as H.323 and SIP,
and gateway control protocols such as MEGACO/ H.248, MGCP, NCS/
J.162. Each has distinct advantages in a converged architecture.
It is paramount that you can clearly articulate call flows when designing for
real-time applications. H.225.0 (RAS) is used for communications between
an endpoint and a Gatekeeper, or between Gatekeepers. RAS performs
registration, address translation, admission control, bandwidth control and
zone management. Following admission control, H.225.0 (Q.931) is used
for establishing a call between endpoints. A gatekeeper-routed model, or a
direct-routed model can be used. After the call is established, an H.245
connection is established. This allows for Terminal Capability Negotiation,
Master/Slave Determination, and eventually set up of logical channels for
media (typically on RTP). FastStart is a popular mechanism for performing
the opening of the media channel in the H.225.0 (Q.931) phase without
resorting to a separate H.245 channel. During a call, H.245 procedures,
such as third-party pause and rerouting can be used to modify the
characteristics of the media streams.
A SIP User Agent may make it's presence known to a SIP network by
registering with a Registrar. A SIP session is established by a SIP User
Agent sending an INVITE message, which is routed by SIP proxy or
redirect servers to another User Agent. B2BUA can be used by Application
Servers for implementing special services. The User Agent receiving the
invitation may accept it with a 200 OK response. In order to establish
media sessions, the Offer/Answer model is used with the Session
Description Protocol included as a MIME body in the SIP messages. After
a session is established, media can be re-negotiated using re-invitation and
updates. Reliability of provisional responses can be used for things such as
providing early media. REFER allows for requesting that another User
Agent contact another resources, for example, another SIP User Agent
thereby performing a transfer service. Subscription and notification can be
used to report events to a third party, enabling important features such as
Presence, part of SIMPLE, which also includes Instant Messaging.
Gateway Control protocols include H.248/MEGACO, MGCP, NCS and
J.162. The gateway control protocol is between the Media Gateway
Controller and the Media Gateway. The call control intelligence reside in
the MGC. The media connection is on the MG. The MG is a slave device
while the MGC is a master device.
MEGACO/H.248 uses “terminations” to identify media flows or resources,
implement signals, and generate events, have properties and maintain
statistics. A “Context” refers to associations between collections of
terminations, defines the communication between the terminations, and

Copyright © 2004 Nortel Networks Essentials of Real-Time Networking


226 Section III: Protocols for Real-Time Applications

acts as a mixing bridge. MEGACO/H.248 has seven commands, Add,


Substract, Move, Modify, Notify, Audit and ServiceChange. Packages
provide extensibility for, and Profiles provide interoperability options.

Essentials of Real-Time Networking Copyright © 2004 Nortel Networks


Chapter 9 Call Setup Protocols: SIP, H.323, H.248 227

References
ITU-T Recommendation H.225.0v4, Call Signalling Protocols and Media
Stream Packetization for Packet Based Multimedia Communications
Systems, International Telecommunication Union Telecommunication
Standardization Sector (ITU-T), 2000
ITU-T Recommendation H.245v7, Control Protocol for Multimedia
Communication, ITU-T, 2000
ITU-T Recommendation H.323v4, Packet Based Multimedia
Communications Systems, ITU-T, 2000
RFC 2833, “RTP Payload for DTMF Digits, Telephony Tones and
Telephony Signals,” IETF, http://www.ietf.org/rfc/rfc2833.txt
RFC 3551, “RTP Profile for Audio and Video Conferences with Minimal
Control,” IETF, ftp://ftp.rfc-editor.org/in-notes/rfc3551.txt
RFC 3550, “RTP: A Transport Protocol for Real-Time Applications,”
IETF, ftp://ftp.rfc-editor.org/in-notes/rfc3550.txt
RFC 2326, “Real Time Streaming Protocol (RTSP),” IETF, ftp://ftp.rfc-
editor.org/in-notes/rfc2326.txt
The tel URI for Telephone Calls, IETF, http://www.ietf.org/internet-drafts/
draft-ietf-iptel-rfc2806bis-02.txt
RFC 3261, “SIP: Session Initiation Protocol,” IETF, http://www.ietf.org/
rfc/rfc3261.txt
RFC 3515, “The Session Initiation Protocol (SIP) Refer method,” IETF,
http://www.ietf.org/rfc/rfc3515.txt
RFC 3264, “An Offer/Answer Model with the Session Description Protocol
(SDP),” IETF, http://www.ietf.org/rfc/rfc3264.txt
RFC 3265, “Session Initiation Protocol (SIP) – Specific Event
Notification,” IETF, http://www.ietf.org/rfc/rfc3265.txt
RFC 2976, “The SIP INFO Method,” IETF, http://www.ietf.org/rfc/
rfc2976.txt
RFC 3262, “Reliability of Provisional Responses in the Session Initiation
Protocol (SIP),” IETF, http://www.ietf.org/rfc/rfc3262.txt
RFC 3311, “The Session Initiation Protocol (SIP) UPDATE Method,”
IETF, http://www.ietf.org/rfc/rfc3311.txt
IETF MMUSIC Working Group, IETF, http://www.ietf.org/html.charters/
mmusic-charter.html
IETF SIP Working Group, http://www.ietf.org/html.charters/sip-
charter.html

Copyright © 2004 Nortel Networks Essentials of Real-Time Networking


228 Section III: Protocols for Real-Time Applications

IETF SIPPING Working Group, http://www.ietf.org/html.charters/sipping-


charter.html
IETF XCON Working Group, http://www.ietf.org/html.charters/xcon-
charter.html
IETF SIMPLE Working Group, http://www.ietf.org/html.charters/simple-
charter.html

Essentials of Real-Time Networking Copyright © 2004 Nortel Networks


229

Chapter 10
QoS Mechanisms
Ralph Santitoro

Media Session Related

Session Gateway
Media Related Control Control

H.248 / MGCP
View From
Audio

Video
Voice

Real-Time
Control Application

/ NCS
RTCP
RTSP

H.323
Perspective

SIP
To
codec Ntwk

RTP

UDP / TCP / SCTP


IP

QoS
MPLS Packet
AAL1/2 AAL5 Cable

Resiliency
ATM FR Ethernet DOCSIS

xDSL Cellular Fiber Copper WLAN HFC

SONET / TDM

Figure 10-1: Transport path diagram

Concepts Covered
Network convergence
Comparison of voice application over TDM and IP packet networks
Convergence drives the need for QoS mechanisms in packet
networks
Implementing QoS mechanisms versus adding bandwidth
Overview of QoS mechanisms – classifier, meter, marker, dropper,
shaper, scheduler
DiffServ QoS architecture
TOS and DiffServ Field Definitions and their importance in
determining the IP QoS a packet should receive.

Copyright © 2004 Nortel Networks Essentials of Real-Time Networking


230 Section III: Protocols for Real-Time Applications

Overview of DiffServ PHB groups, including the Expedited


Forwarding (EF) PHB, Assured Forwarding (AF) PHB, the Class
Selector (CS) PHB group and the Default Forwarding (DF) PHB.
DiffServ configuration considerations when DiffServ is
implemented differently on different products in the network.
Overview of Ethernet IEEE 802.1Q and the definition of the VLAN
ID and 802.1p user priority bits
Other simple forms of QoS (port prioritization and IP address
prioritization) that work for single application hosts
Packet fragmentation and interleaving to reduce voice packet jitter
introduced by large data packets

Introduction
Quality of Service (QoS1) is a broad term used to describe the treatment an
application's traffic receives from the network. Quality of Service involves
a broad range of technologies, architecture, and protocols. Network
operators achieve end-to-end QoS by ensuring that network elements apply
consistent treatment to traffic flows as they traverse the network.
Today, network traffic is highly diverse and each traffic type has unique
requirements in terms of bandwidth, delay, loss and availability. With the
explosive growth of the Internet, most network traffic today is IP-based.
Having a single end-to-end transport protocol is beneficial because
networking equipment becomes less complex to maintain, resulting in
lower operational costs. This benefit, however, is countered by the fact that
IP is a connectionless protocol, that is, IP packets may take different paths
as they traverse the network from source to destination. This can result in
variable and unpredictable delay in a best-effort IP network.
The IP protocol was originally designed to reliably get a packet to its
destination with less consideration to the amount of time it takes to get
there. IP networks must now support many different types of applications.
Real-time applications, such as voice and video, require low latency
(delay) and loss. Otherwise, the end-user quality may be significantly
affected or in some cases, the application simply does not function at all.
Consider a voice application. Voice applications originated on public
telephone networks using Time Division Multiplexing (TDM) technology,
which has a very deterministic behavior. On TDM networks, the voice

1. QoS typically deals with the measurement of parameters associated with a specific treatment. For a long
time, Quality of Service was also used to indicate the overall experience of the user or application.
However, because the objective assurance of meeting a specific parameter can sometimes result in
different levels of overall quality, the term Quality of Experience (QoE) is now used to indicate the
overall experience. For example, meeting a specific delay or jitter objective on a network might be
thought of as QoS, while QoE would deal with the user's perception of voice quality on a network
with that amount of delay and jitter.

Essentials of Real-Time Networking Copyright © 2004 Nortel Networks


Chapter 10 QoS Mechanisms 231

traffic experienced a low and fixed amount of delay with essentially no


loss. Voice applications require this type of behavior to function properly.
Voice applications also require this same level of “TDM voice” quality to
meet user expectations.
Take this “TDM voice” application and now transport it over a best-effort
IP network. The best-effort IP network introduces a variable and
unpredictable amount of delay to the voice packets and also drops voice
packets at network congestion points. Because of this unpredictability, the
best-effort IP network does not provide the behavior that the voice
application requires. QoS mechanisms can be applied to the best-effort IP
network to make it capable of supporting VoIP with acceptable, consistent
and predictable voice quality.
IP networks are not single purpose, that is, they are not used to exclusively
carry one type of traffic such as voice (VoIP). They also transport other
types of traffic, such as best effort Internet traffic, network Operations,
Administration and Management (OAM) traffic and other applications
requiring better than best effort treatment. When supporting voice (VoIP)
over an IP network, QoS mechanisms ensure that VoIP packets do not
receive excessive delay, delay variation or loss that will result in
unacceptable voice quality to the end user. The voice application, originally
designed for the TDM network, still has the same requirements and user
expectations that must also be met over the IP network with the help of
QoS mechanisms.

QoS and Network Convergence


Since the early 1990s, there has been a movement towards network
convergence, that is, transport all services over the same network
infrastructure. Traditionally, there were separate networks for each type of
application. However, many of these networks are being consolidated
(converged) over IP networks to reduce operational costs or improve profit
margins by enabling new applications or services.
Not too long ago, an Enterprise may have had a private TDM-based voice
network, an IP network to the Internet, an ISDN video conferencing
network, an SNA network and a multiprotocol (for example, IPX and
AppleTalk*) LAN. Similarly, a service provider may have a TDM-based
voice network, an ATM metro and SONET backbone network, and a frame
relay, TDM or ISDN access network.
Today, all data networks are converging on IP transport because the
applications have migrated towards being IP-based. The TDM-based voice
networks have also begun moving towards IP. When the different
applications had dedicated networks, QoS technologies played a smaller
role because the traffic was similar in behavior and the dedicated networks
were fine-tuned to meet the required behavior of the particular application.

Copyright © 2004 Nortel Networks Essentials of Real-Time Networking


232 Section III: Protocols for Real-Time Applications

With the converged network, different types of traffic are mixed and each
application often has very different performance requirements. These
different traffic types often react unfavorable together. For example, a voice
application expects to experience essentially no packet loss and a minimal,
constant amount of packet delay. The voice application operates in a
steady-state fashion with voice channels (or packets) being transmitted in
fixed time intervals. The voice application receives this performance level
when it operates over a TDM network. Now take the voice application and
run it over a best-effort IP network as Voice over IP (VoIP). The best-effort
IP network has varying amounts of packet loss and delay caused by the
amount of network congestion at any given point in time. The best-effort IP
network provides almost exactly the opposite performance required by the
voice application. Therefore, QoS technologies play a crucial role to ensure
that diverse applications can be properly supported in a converged IP
network.

Overview of QoS Mechanisms


The fundamental QoS mechanisms implemented in routers consist of:
Classifier
Marker
Policer
Shaper
Scheduler
Queue Management
Each of these mechanisms has a different effect on the QoS a packet will
receive and are subsequently described in more detail.

Classifier
Packets entering an interface are classified based on some filtering criteria
specified by local or network-wide QoS policy. This is done to properly
identify the application for subsequent marking with the appropriate class
of service identifier (CoS marking), after which, the packets are sent to a
rate enforcer (policer). Classifiers may filter based on OSI Layers 2-7
information, although routers most commonly support classification based
on OSI Layers 2-4 criteria.
The classifier is also useful for security purposes and by applying multiple
filters in combination, for example Layers 2, 3 and 4 filters, one can
improve the likelihood that the application classified is not an unauthorized
application or user attempting to get better QoS than permitted by the
network’s application or user QoS policies. Real-time applications initiated
from fixed function hosts, such as voice gateways, are the simplest to

Essentials of Real-Time Networking Copyright © 2004 Nortel Networks


Chapter 10 QoS Mechanisms 233

classify because their IP addresses are static and rarely changed. Mobile or
easily movable devices, for example IP phones, require more complex
classification and authentication techniques since their addresses are
typically dynamically assigned.
Finally, many real-time applications use dynamically assigned port
numbers, so care must be taken to select the right combination of filters to
properly identify the application. It is important to properly classify real-
time packets so they are marked properly. This ensures that the routers in
the network provide them with the appropriate QoS treatment.

Real-Time Packet Classification


Real-time packets can be classified based on Layers 2-7 information. All
routers can classify based on the following Layers 3-4 information in the IP
header:
Source Address
Destination Address
TOS Field
Source Port Number
Destination Port Number

Marker
Once IP packets are classified, they are marked to indicate the class of
service to which they belong. This marking is done in the DiffServ/TOS
field in the IPv4 packet header and the Traffic Class Octet in the IPv6
header. This marking is important because it will indicate to routers how
the packets should be treated across the network. If a real-time packet is
marked improperly, a router may introduce higher delay or jitter, or drop
(packet loss) the packet under different network operating conditions.
The original definition of this field was referred to as the Type of Service
(TOS) field. In 1999, the Internet Engineering Task Force (IETF) created a
new QoS architecture called IP Differentiated Services (DiffServ) and had
redefined this TOS field to now be called the DiffServ Field. Since the TOS
field definition has changed several times over the years, there is much
confusion surrounding this fields definition, so a bit of history is warranted.

Old TOS Field Definition


The older TOS definition in RFC 1349 consists of two subfields, namely,
the 3-bit IP Precedence field and the 4-bit TOS field. The least significant
bit of the field must be set to zero. Figure 10-2 illustrates the different sub-
fields.

Copyright © 2004 Nortel Networks Essentials of Real-Time Networking


234 Section III: Protocols for Real-Time Applications

Figure 10-2: Old TOS field definition


Of these two subfields, the TOS field was rarely used and network
deployments used only the IP Precedence field. Refer to Figure 10-3.
Legacy routers supporting IP QoS typically implement only the IP
Precedence field. The three IP Precedence bits result in eight possible
classes of service. The TOS terminology has unfortunately been highly
misused in the industry and when one refers to TOS, for example, a
network or router supporting “TOS”, they really mean IP Precedence.

Figure 10-3: IP Precedence in TOS field

DiffServ Field Definition


The DiffServ field uses the same TOS field (byte) location in the IP packet
header, but the definition of the bits and their purpose are now quite
different. The six most significant bits (MSBs) are referred to as the
DiffServ Codepoint (DSCP) and the two least significant bits (LSBs) are
used by routers that support explicit congestion notification (ECN). If a
router does not support ECN, then the router sets the two least significant
bits (LSBs) to zero and they are ignored by the router. ECN enables a
router to inform its neighboring routers that it is experiencing congestion,
so that the neighboring router may take action to reduce congestion, such
as start dropping packets that are eligible for discard (for example, best
effort packets or packets marked with a higher drop precedence). ECN
provides similar functionality that the BECN (backward ECN) protocol
used in frame relay networks.
The value of the DSCP determines the standard DiffServ class to which the
traffic belongs and the type of treatment, called a DiffServ per hop
behavior (PHB), the traffic will receive. Out of 64 possible values for the
DSCP, only 32 are currently defined for public use while the other 32 are

Essentials of Real-Time Networking Copyright © 2004 Nortel Networks


Chapter 10 QoS Mechanisms 235

defined for local or experimental use. Figure 10-4 shows the new TOS
field.

Figure 10-4: DiffServ - new definition of TOS field


Twenty-one of the 32 DSCPs have been standardized by the IETF, leaving
eleven DSCPs that can be used for user-defined purposes.

Policer (Meter/Remarker/Dropper)
Once the incoming packets are classified, the policer uses a configured rate
and burst size to determine which packets are conformant and which are
not. Depending upon the implementation, a router may have a single rate or
dual rate policer or a combination of both. In a single rate implementation,
there is a committed information rate (CIR) whereby CIR, conformant
packets are assured delivery. In a dual rate implementation, there is a CIR
and an excess information rate (EIR) that determines the amount of excess
(CIR-non conformant) traffic allowed into the network. The dual rate
policer is used for traffic that varies in packet size or arrival time, that is,
bursty traffic. The single rate policer is often used for applications that
transmit at regular intervals, e.g. VoIP.
If incoming packets are not CIR conformant, they can be either remarked
to indicate higher drop precedence for a dual rate policer with a non-zero
EIR or dropped outright for a single rate policer or dual rate policer with
EIR set to zero. Since voice traffic is sent at a constant rate, a single rate
policer is sufficient. Video traffic can use either a single or dual rate policer,
since some video applications send packets at a constant rate while others
send packets at a variable rate.

Shaper
A shaper provides a smoothing effect on bursty traffic so it can be delivered
more efficiently over lower speed interfaces. The policer provides some
shaping (buffering) and some routers implement a secondary shaper,
especially over WAN interfaces. Shaping, effectively buffers (delays)
packets before being sent. Therefore, the delay added by shaping must be
accounted for in the end-to-end delay budget for the real-time application.
Shaping is generally not recommended for real-time applications.

Copyright © 2004 Nortel Networks Essentials of Real-Time Networking


236 Section III: Protocols for Real-Time Applications

Scheduler
The scheduler determines how packets are queued out an interface. There
are two classes of schedulers, namely, priority scheduler (priority queuing)
and weighted schedulers. Priority schedulers simply continue transmitting
packets until their queues are empty, resulting in the least amount of packet
delay. Weighted schedulers transmit packets based on an assigned weight.
For example, the weight could indicate a percentage of time a queue is
emptied before the next queue is serviced. There are many forms of
weighted schedulers, for example, Weighted Fair Queuing (WFQ) and
Weighted Round Robin (WRR), as well as variants of these, for example,
Deficit WRR (DWRR) and Class Based WFQ (CBWFQ).
Voice applications should use a priority scheduler. Video applications
could use a priority scheduler. However, some weighted schedulers may
also be able to support video applications. Schedulers have a direct impact
on packet delay and jitter.

Queue Management
Queue management determines how queued packets are handled as the
number of packets in a queue increases. The queue becomes fuller as more
packets from multiple sources traverse the same interface. There are two
basic forms of queue management, namely, tail drop and active queue
management (AQM). Tail drop simply drops arriving packets when the
buffer is full (or when a provisioned buffer depth is exceeded). AQM
randomly drops discard eligible packets when one or more buffer depths
are exceeded. Examples of AQM are random early discard (RED),
weighted RED (WRED) and multilevel RED (MRED).
Queue management has an effect on packet loss. AQM methods, in
general, are not recommended for real-time applications unless the
application can detect packet loss and readjust its transmission rate. For
example, some video applications can detect packet loss and switch to a
lower bit rate codec. When packet loss is no longer detected, the video
application can then switch back to the higher bit rate (higher quality)
codec. AQM must never be used with voice applications.
As you can see, there are many QoS mechanisms that can be used and each
has a different impact on delay, jitter and loss. Each QoS mechanism must
be tailored to real-time application. The following sections will cover this
in more detail.

Implementing QoS Mechanisms versus Adding Bandwidth


In some cases, it is possible to add sufficient bandwidth such that network
elements and connections rarely become over utilized (congested). Without
any network congestion, packets pass though the network unimpaired by
packet loss or excessive delays. Unfortunately, bandwidth discontinuities in

Essentials of Real-Time Networking Copyright © 2004 Nortel Networks


Chapter 10 QoS Mechanisms 237

the network, for example, LAN to MAN/WAN, are potential congestion


points resulting in variable and unpredictable QoS. It is often impractical
(unaffordable) to add bandwidth to a network due to the additional OpEx
(monthly charge for WAN bandwidth) or Capex (new hardware to support
higher bandwidth) that is not incurred when implementing QoS
mechanisms.

DiffServ QoS Architecture


DiffServ is a network architecture that defines different nodal forwarding
classes, their identification and their performance. These are called
DiffServ Per-Hop Behaviors (PHBs). Each PHB is identified by one or
more IETF-standardized DSCPs. The next sections describe each of the
DiffServ PSCs and their usage.

Class Selector DiffServ PHB Group


The Class Selector (CS) DiffServ PHB group consists of eight classes and
uses the same bit positions as the IP Precedence field in the older TOS
definition. The CS PHB group was defined to provide some level of
backward compatibility with legacy routers. The CS PHB group specifies
that there must be at least two different forwarding classes provided for
packets marked with any of the eight DSCP values. Historically, CS7
(‘111000’) and CS6 (‘110000’) marked packets are used for network
control applications and the CS0 (‘000000’) marked packets are used for
“best effort” applications. The remaining six CS DSCPs may ‘inherit’ the
PHB group performance specified in other DiffServ PHBs (described later)
or create custom forwarding behaviors. The CS DSCPs are often used to
mark signaling or control traffic for real-time applications.
The CS PHB group may be implemented using a priority or weighted
scheduler with tail drop queue management, traffic is policed to the CIR
and excess traffic may be dropped or shaped.
Table 10-1 lists the Class Selector PHB group DSCP values.

Copyright © 2004 Nortel Networks Essentials of Real-Time Networking


238 Section III: Protocols for Real-Time Applications

CS Code CS Code
Point Name Point Value
(in binary)

CS7 '111000'

CS6 '110000'

CS5 '101000'

CS4 '100000'

CS3 '011000'

CS2 '010000'

CS1 '001000'

CS0 '000000'
Table 10-1: Class Selector PHB group DSCP values

Expedited Forwarding DiffServ PHB


The Expedited Forwarding (EF) DiffServ PHB provides the lowest latency
and lowest loss class and is ideally suited for VoIP, and some forms of
video such as video conferencing. The EF DSCP is represented by the
binary value ‘101110’.
The EF PHB is typically implemented using a priority scheduler with tail
drop queue management, traffic is policed to the CIR and excess traffic is
dropped.

Assured Forwarding DiffServ PHB Group


The Assured Forwarding (AF) DiffServ PHB group consists of four
classes. Note that there is no implied ‘priority’ ordering among the four AF
PHB classes. Each AF PHB class can be engineered to the desired
performance by tuning the QoS mechanisms.
Each AF class has three drop precedence levels resulting in twelve different
DSCP values. Routers use these drop precedence values to determine
which priority to discard packets under network congestion. Under
congestion, the AQM mechanism randomly discards packets marked with
high drop precedence first. If congestion still exists once these packets are
depleted, packets marked with medium drop precedence are then randomly
discarded.
Table 10-2 lists the Assured Forwarding PHB DSCP values.

Essentials of Real-Time Networking Copyright © 2004 Nortel Networks


Chapter 10 QoS Mechanisms 239

AF DSCP per class (in binary)


Drop
Precedence Class 1 Class 2 Class 3 Class 4
'001010' '010010' '011010' '100010'
Low
(AF11) (AF21) (AF31) (AF41)

'001100' '010100' '011100' '100100'


Medium
(AF12) (AF22) (AF32) (AF42)

'001110' '010110' '011110' '100110'


High
(AF13) (AF23) (AF33) (AF43)
Table 10-2: Assured Forwarding PHB DSCP values
The AF PHB is implemented using a weighted scheduler with AQM queue
management, traffic is policed to the CIR, and excess traffic within the EIR
is remarked to higher drop precedence and may subsequently be dropped or
shaped. The AF PHB group may be used for some forms of real-time
traffic, e.g., streaming audio and video and variable rate video
conferencing, but should never be used for voice applications.

Default Forwarding DiffServ PHB


The Default Forwarding (DF) DiffServ PHB provides ‘best effort’
performance. Since the DF PSC performance is unpredictable under
varying network loading (congestion) conditions, it is not recommended
for real-time applications. The DF DSCP is represented by the binary value
‘000000’.

DSCP Configuration Considerations


Routers and application hosts sometimes implement DSCP markings
differently. While the DiffServ field is an eight bit value, the DSCP is only
a six bit value. Some routers or hosts require you to configure an eight bit
value (DSCP + ‘00’). Some devices require you to configure only the
actual six bit DSCP value. In this case, the devices would append two zeros
to the six bit DSCP value. For example, the six bit DSCP value for
Expedited Forwarding (EF) is '101110' (binary), 2E (hexadecimal) or 46
(decimal). The eight bit DiffServ field value for Expedited Forwarding
(DSCP+00) is '10111000' (binary), B8 (hexadecimal) and 184 (decimal).
Furthermore, the Microsoft Windows* operating system adds the two zeros
to the six bit DSCP the opposite way that routers do. Using the previous
example with the Expedited Forwarding DSCP ('101110'), routers would
create the eight bit DiffServ field value by appending two zeros to the
DSCP value (DSCP+00) resulting in ‘10111000’ (184 decimal). Microsoft
Windows QoS Application Programming Interfaces (APIs) prepend two
zeros to the six bit DSCP (00+DSCP), resulting in ‘00101110’ (46

Copyright © 2004 Nortel Networks Essentials of Real-Time Networking


240 Section III: Protocols for Real-Time Applications

decimal), which is quite different from the eight bit value created by the
router. While the Windows approach may be masked by the application, the
application developer must keep these differences in mind to ensure that
Windows-based real-time applications are properly marked with the correct
DSCP value.

Ethernet IEEE 802.1Q


Ethernet is a critical Layer 2 technology that essentially all real-time
applications will use at some point in the network. Therefore, it is
important to understand its QoS mechanisms.
802.1Q is an IEEE Ethernet standard that adds four additional bytes to the
standard IEEE 802.3 Ethernet frame. IEEE 802.1Q describes a marking
method to identify the class of service over Ethernet and a Virtual LAN
identifier (VLAN ID) for traffic separation and classification. Essentially
all Enterprise Ethernet switches support the 802.1Q standard with the
exception of the lowest cost ones. See Figure 10-5.

Figure 10-5: 802.1p User Priority and VLAN ID in 802.1Q Tag

VLAN ID field
The VLAN ID is used to group certain types of traffic based on common
requirements. Packets marked with a particular VLAN can be classified
and the appropriate QoS mechanisms applied, for example, all packets in
VLAN 10 are IP telephony packets and are given DiffServ Expedited
Forwarding treatment.

802.1p User Priority field


The 802.1p user priority field provides three bits that can be used to
identify up to eight classes of service for packets traversing Ethernet
networks. Note that IEEE 802.1Q standard, unlike DiffServ, does not
define what type of QoS PSC to apply for packets marked with different
802.1p values. Table 10-3 includes the definitions for each of the 802.1Q
fields.

Essentials of Real-Time Networking Copyright © 2004 Nortel Networks


Chapter 10 QoS Mechanisms 241

802.1Q Field Description

EtherType Always set to 8100h for Ethernet 802.1Q

Three bit priority field Value from 0-7 representing user priority
(802.1p)

Canonical Field Always set to 0 for Ethernet


Identifier (CFI)

Twelve bit VLAN ID VLAN identification number


Table 10-3: 802.1Q Field Definitions
Due to the lack of QoS capabilities defined for Ethernet, Enterprises
typically implement the more robust IP QoS provided by the DiffServ
architecture. Therefore, when implementing QoS, a mapping function is
performed between DiffServ and Ethernet 802.1p to identify the DiffServ
PSCs at the Ethernet layer. Refer to Chapter 3.
Finally, when supporting VoIP, shared-media Ethernet hubs must never be
used because they add significant packet loss, unpredictable delay and
jitter. QoS mechanisms, such as 802.1p and VLA can be used for VoIP
traffic over Ethernet networks. If the Ethernet switches support Layer 3
capabilities, then QoS mechanisms such as DiffServ are the preferred
methods to provide QoS.

Host DSCP or 802.1p Marking


Some VoIP devices such as IP phones and media gateways can premark
packets with a DSCP or 802.1p value prior to transmitting them to the
network. The routers and Ethernet Layer 2 switches would then ‘trust’ the
DSCP or 802.1p marking and apply the appropriate QoS to the packets.
This approach allows for simple network administration. However, care
must be taken with this approach because a rogue user may mark their
nonvoice packets with the voice DSCP or 802.1p marking and get better
QoS than allowed per the network QoS policy. The security of this
approach can be enhanced by adding additional filters on the routers to not
only classify on the DSCP or 802.1p value, but also classify on other items
such as the UDP port number or IP address to determine that the
application is truly a voice applications. Note that some products can
classify on both DSCP and Ethernet 802.1p values so this allows for
additional classification choices to improve security.

Packet Fragmentation and Interleaving


Packet fragmentation divides large data packets into multiple, smaller
packet fragments, typically the size of one or two voice packets. The router

Copyright © 2004 Nortel Networks Essentials of Real-Time Networking


242 Section III: Protocols for Real-Time Applications

then interleaves the transmission of data packet fragments and voice


packets to ensure that voice packets experience a low and fixed amount of
delay. Most routers use a default maximum packet size, or Mean
Transmission Unit (MTU), of 1500 bytes, which can take a considerable
amount of time to transmit over a low bandwidth (< 1 Mbps) WAN
connection. Consider a 1500 byte data packet being transmitted over a 64
kbps WAN connection. It would take 188 ms to serialize this data packet
from the router onto the 64 kbps connection. This same serialization delay
is again added as the packet is received at the router on the other side of the
connection. In general, a desirable one way delay goal for a voice packet
(to achieve high quality voice) is 150–200 ms. Over a 64 kbps connection,
the data packet uses up most, if not all of the entire one way delay budget
for the voice packet before the first voice packet is ever transmitted!
For example, a voice packet enters a router, followed by a large data packet
that is followed by a second voice packet. The first voice packet begins to
get transmitted. Then, the first fragment of the data packet (equal in size to
a voice packet) is transmitted. Next, the second voice packet is transmitted.
Finally, the next fragment of the data packet is transmitted. If no more
packets enter the router, then the data packet fragments will continue to be
transmitted until the complete data packet is transmitted.

PPP Fragmentation and Interleaving


Most routers support PPP (Point to Point Protocol) Fragmentation. PPP
fragmentation splits larger data packets into multiple, packet fragments and
encapsulates them into PPP frames before queuing and transmission. The
packets are interleaved, as previously described, so the maximum delay a
voice packet will experience is one or two voice packet times, depending
upon the fragment size. PPP fragmentation occurs only over the PPP
connection between routers. Once the packet fragments are reassembled in
the receiving router, they are no longer fragmented unless they traverse
another PPP connection.

IP Fragmentation and interleaving


All routers support IP fragmentation. IP fragmentation, like PPP
fragmentation, splits larger data packets into multiple, smaller IP packets,
whose size is determined by the configured IP MTU (Maximum
Transmission Unit) size. Also, like PPP interleaving, the voice and data
packets are interleaved during transmission to minimize the transmission
delay for the voice packets. Unlike PPP fragmentation, however, packets
subject to IP fragmentation remain as small packets all the way through to
the destination. This results in overall reduced throughput for the data
packets because of the additional overhead of an IP header for each smaller
packet.

Essentials of Real-Time Networking Copyright © 2004 Nortel Networks


Chapter 10 QoS Mechanisms 243

Other methods to achieve QoS


Depending upon the application, it is sometimes possible to implement
very simple QoS mechanisms to achieve good performance for real-time
applications. In specific use cases, good QoS could be achieved by simple
methods described below.

Port Prioritization
When a VoIP gateway or IP PBX is installed in the network, it typically is
assigned a static IP address and connects to a specific port on the Ethernet
Layer 2 switch that rarely, if ever, gets changed. In this application, the L2
switch is configured to receive and transmit all incoming traffic from this
port, ahead of all other traffic entering other switch ports. If the next hop
device is a router, then it can classify and mark the voice packets with the
appropriate DSCP value for use by other routers across the network.
If the device attached to the Ethernet Layer 2 switch port is an IP phone,
then port prioritization is not recommended since IP phones may be
unplugged and moved and another user may attach to the port and receive
inappropriate or unauthorized QoS. For example, if a PC were connected to
a port configured to use port prioritization, then all of the PC's traffic
would be given high priority treatment in the switch. See Figure 10-6 for a
port prioritization example.

Figure 10-6: Port prioritization example

Copyright © 2004 Nortel Networks Essentials of Real-Time Networking


244 Section III: Protocols for Real-Time Applications

Traffic Prioritization using VLANs


All voice traffic can be placed into one VLAN and all other data traffic into
another VLAN. The “Voice” VLAN traffic can be prioritized over the
“Data” VLAN traffic. In some cases, this may be the easiest method if all
Ethernet switches support the 802.1Q standard for VLANs and do not
support DiffServ. VLANs can also be used to separate traffic for security
purposes. See Figure 10-7 for a VLAN prioritization example.

Figure 10-7: VLAN prioritization example

IP Address Prioritization
VoIP traffic can also be prioritized by its IP address. This approach is ideal
for devices with statically assigned IP addresses that rarely, if ever, change.
IP PBXs, VoIP gateways and call servers are VoIP devices that would have
their IP addresses statically assigned. A network administrator can
configure the routers to filter (classify) and prioritize all packets originating
from or destined to these IP addresses. See Figure 10-8 for an example of
IP address prioritization.

Essentials of Real-Time Networking Copyright © 2004 Nortel Networks


Chapter 10 QoS Mechanisms 245

Figure 10-8: IP address prioritization example


In this example, any traffic entering the router from IP address 10.10.10.1
will be received and forwarded ahead of traffic entering other switch ports.

Copyright © 2004 Nortel Networks Essentials of Real-Time Networking


246 Section III: Protocols for Real-Time Applications

What you should have learned


There are many mechanisms used to achieve good QoS. Each mechanism
affects real-time application performance differently, depending upon the
configuration of the mechanisms. The TOS field definition has changed
over time and is now standardized by the DiffServ architecture. The
DiffServ code point (DSCP) determines the DiffServ PHB that is applied to
the packets in that class.
There are four standardized PHBs, namely, the Expedited Forwarding (EF)
PHB, Assured Forwarding (AF) PHB group, Class Selector (CS) PHB
group and Default Forwarding (DF) PHB. The EF PHB and CS PHB group
are best suited for voice media and signaling applications, respectively
while the EF PHB or AF PHB group can be used for different video
applications.
DiffServ configurations vary by product so it is important to understand the
differences in implementations. Ethernet 802.1Q provides the VLAN ID
and 802.1p user priority bits to identify real-time traffic transported via
Ethernet frames.
Packet fragmentation and interleaving is required for converged networks
supporting real-time applications to minimize serialization delay over low
bandwidth WAN connections. When implemented with PPP, the
fragmentation is locally significant. When implemented with IP MTU size
adjustment, the fragmentation is global.
Finally, there are other, simple forms of QoS such as port or IP address
prioritization that can be used for hosts dedicated to a port or with a static
IP address such as a voice gateway.

Essentials of Real-Time Networking Copyright © 2004 Nortel Networks


Chapter 10 QoS Mechanisms 247

References
RFC 3246, “An Expedited Forwarding PHB,” IETF, http://
www.ietf.org/rfc/rfc3246.txt
RFC 3168, “The Addition of Explicit Congestion Notification
(ECN) to IP,” IETF, http://www.ietf.org/rfc/rfc3168.txt
RFC 2597, “Assured Forwarding PHB Group,” IETF, http://
www.ietf.org/rfc/rfc2597.txt
RFC 1349, “Type of Service in the Internet Protocol Suite,” IETF,
http://www.ietf.org/rfc/rfc1349.txt
RFC 2475, “An Architecture for Differentiated Services,” IETF,
http://www.ietf.org/rfc/rfc2475.txt
RFC 2474, “Definition of Differentiated Services Field (DS Field)
in IPv4 and IPv6 Headers,” IETF, http://www.ietf.org/rfc/
rfc2474.txt
RFC 2686, “Multi-Class Extensions to Multilink PPP,” IETF, http://
www.ietf.org/rfc/rfc2686.txt
RFC 1990, “PPP Multilink Protocol (MP),” IETF, http://
www.ietf.org/rfc/rfc1990.txt
ATM Forum Traffic Management Specification v4.1, ftp://
ftp.atmforum.com/pub/approved-specs/af-tm-0121.000.pdf
IEEE 802.1Q, Virtual Bridged Local Area Networks, http://
standards.ieee.org/getieee802/download/802.1Q-2003.pdf
Introduction to Quality of Service (QoS), http://
www.nortelnetworks.com/products/02/bstk/switches/bps/collateral/
56058.25_022403.pdf
RFC3260, “New Terminology and Clarifications for DiffServ,”
IETF, http://www.ietf.org/rfc/rfc3260.txt

Copyright © 2004 Nortel Networks Essentials of Real-Time Networking


248 Section III: Protocols for Real-Time Applications

Essentials of Real-Time Networking Copyright © 2004 Nortel Networks


249

Section IV:
Packet Network Technologies
Earlier sections dealt with the requirements of real-time applications and
about the ways that TDM and SONET handle these demands. You also
learned about the protocols that handle transport, call setup, and flow
priority management needed for real-time packet networking. Section IV
covers various transport and access technologies that can be used to
provide differentiated service to your network traffic. It is important that in
a converged network environment the network operator understands the
media and protocols that underlie network operation, and that help transfer
and maintain Quality of Service (QoS). The selection of technologies and
protocols should provide a seamless fabric from end of the network to the
other.
The section begins by describing the incumbent technologies
Asynchronous Transfer Mode (ATM) and Frame Relay. These technologies
have a large installed base and are thus very important in providing real-
time services over converged networks. ATM was designed to provide a
high degree of QoS and inherently ensures packets arrive with their QoS
bounds. Frame relay, which came into popularity just prior to ATM, has
continued to extend its functionality through the MPLS/Frame Relay
Alliance (formerly known as the Frame Relay Forum) to aid in providing
real-time differentiated services. While the specifics of these technologies
fills volumes, we will offer a brief description of the basic characteristics of
ATM and Frame Relay, and will then focus our attention on their real-time
capabilities.
While ATM and FR technologies continue to be important, today's
networks are migrating to IP/Multiprotocol Label Switching (MPLS)
cores. Chapter 12 addresses MPLS, providing the basic concepts and
functions that help MPLS provide real-time service. The chapter introduces
MPLS concepts associated to Label Switch Path (LSP) creation, how these
paths are set up using MPLS signaling protocols, and label stacking and
integrating DiffServ into MPLS EXP (experimental) bits.
Chapter 13 is about Optical Ethernet (OE), a relatively new technology that
is growing in prominence. OE combines a Layer 2 protocol (Ethernet) with
Layer 1 protocols like SONET and Dense Wave Division Mulitplexing
(DWDM). OE has the ability to ride over long-haul fiber as well as a
number of added extensions that allow it to emulate ordinary Ethernet
LANs. It also incorporates some of the redundant functionality associated
to optical, such as resilient packet ring.

Copyright © 2004 Nortel Networks Essentials of Real-Time Networking


250

We continue our look into Layer 1 in Chapter 14 on Network Access. This


chapter explains several common access technologies: 2G and 3G cellular,
wireless LAN, cable modem, and DSL. This covers a lot of ground! We
have two goals for this chapter: first, to demonstrate that the network can
carry real-time applications with each kind of access technology, and
second, that with careful engineering, many of these technologies can and
do operate on the same network. This is an important aspect of network
convergence. We hope to encourage engineers to think about how new
services should be provided, as well as how they can optimize the
performance of packet infrastructure combined with the access
technologies already present in their networks.
The final chapter in this section describes IPv6. This topic is forward-
looking, and is of special interest to telecom professionals working in the
Asia market. Interest in IPv6 is growing and will continue to accelerate as
more and more devices become IP aware, like your refrigerator, coffee
maker, and oven. The chapter illustrates the features of IPv6 by identifying
similarities and differences with IPv4.
This is the final theoretical section of the book. Once you have mastered
these topics, you will be ready to consider implementation techniques and
network examples engineered to provide resiliency and differentiated
service for multiple traffic types and applications.

Essentials of Real-Time Networking Copyright © 2004 Nortel Networks


251

Chapter 11
ATM and Frame Relay
Sinchai Kamolphiwong

Shardul Joshi

Timothy Mendonca

Media Session Related

Session Gateway
Media Related Control Control

H.248 / MGCP
View From
Audio

Video
Voice

Real-Time

Application
Control

/ NCS
RTCP
RTSP

H.323

Perspective
SIP

To
codec Ntwk

RTP

UDP / TCP / SCTP


IP

QoS
MPLS Packet
AAL1/2 AAL5 Cable

Resiliency
ATM FR Ethernet DOCSIS

xDSL Cellular Fiber Copper WLAN HFC

SONET / TDM

Figure 11-1: Transport path diagram

Concepts Covered
Layered protocol
ATM interfaces
ATM architecture
ATM adaptation layer
QoS and services in ATM networks

Copyright © 2004 Nortel Networks Essentials of Real-Time Networking


252 Section IV: Packet Network Technologies

Voice and telephony over ATM


Frame relay, FRF.1.2, FRF.11.1 and FRF.12
Frame relay conversion of voice into FRF.11 packet
Codecs supported by FRF.11
Use of Silence Information Descriptor (SID) and Voice Activity
Detection (VAD)
Use of Frame relay fragmentation FRF.12 to reduce serialization
delay on a slow speed link
Three areas where FRF.12 may be used effectively
Seven major engineering best practices

Introduction
The ITU (International Telecommunication Union) has chosen ATM
(Asynchronous Transfer Mode) as the switching and multiplexing
technology for carrying all signals in a high speed network. To support
such goals, network architecture was moved away from circuit switching to
a range of packet switching systems. ATM uses fixed-length packets, called
cells, and is a virtual connection-oriented system. The cell length of 53
bytes (five byte header + 48 byte information field) is an engineering
compromise, to accommodate conflicting requirements of a whole range of
traffic types, be it computer data or real-time traffic such as voice or video.
An ATM network consists of a set of ATM switches to multiplex/
demultiplex traffic streams. Each ATM switch is connected by point-to-
point ATM links. Each ATM link can accommodate several VPs (virtual
paths) and each VP may comprise a number of VCs (virtual channels). This
allows the aggregation of dissimilar types of traffic streams to be
accomplished in one ATM link.
Frame relay is a connection oriented protocol that is a precursor to ATM.
The sections in this chapter that address frame relay will not go into the
basics of the technology, but will examine the evolution of the frame relay
protocol to ensure its viability in tomorrows networks. More specifically,
the chapter will evaluate the use of the MPLS and Frame Relay Alliance
(formerly known as the Frame Relay Forum) Implementation Agreements:
FRF.11.1, the Voice over Frame Relay Agreement, and FRF.12, the Frame
Relay Fragmentation Agreement. The chapter evaluates how the two
specifications work in isolation and how efficiently they work together.
The main advantages of ATM are as follows:
ATM is a connection-oriented network, with each connection setup
associated with its QoS (quality of service) requirements, for
example, delay, loss and cell delay variation. With a high
guaranteed QoS offered by ATM, real-time communications, for

Essentials of Real-Time Networking Copyright © 2004 Nortel Networks


Chapter 11 ATM and Frame Relay 253

example, voice and video traffic, are suitable for carriage over ATM
networks, as shown in Figure 11-2.
With ATM, the incoming traffic channels are aggregated using
statistical multiplexing into one communication link. High system
utilization is easily obtained.
ATM provides services for traffic types of multipriority services.
ATM offers the opportunity for all traffic sources to use resources
fairly, regardless of distance and the number of connections.

Voice
Voice

Data channel

ATM Switch

Multimedia Voice channel Multimedia


computer computer

Video channel

Telephone channel

Figure 11-2: Real-time communication over ATM networks

Layered protocol
To compare ATM to other protocols, OSI (Open Systems Interconnection)
is an appropriate model to use as a reference. The ATM layer is above the
physical layer (Layer 1), and provides transport functions required for the
switching and flow control of ATM cells. In this context, “transport” refers
to the use of ATM switching and multiplexing techniques at the data link
layer, for example, Layer 2 of OSI model, as shown in Figure 11-3, to
convey end-user traffic from source to destination within a network.

Copyright © 2004 Nortel Networks Essentials of Real-Time Networking


254 Section IV: Packet Network Technologies

Application Application

Presentation Presentation

Session Session

Transport Transport

Network Network

Data link AAL AAL


Layer

ATM Layer ATM Layer

Physical Physical

Figure 11-3: ATM in the data link layer of OSI model


ATM is a transport technology and it must interoperate with other network
technologies. The ATM Forum1 considered how ATM technology will
interoperate with the installed base of Ethernet and Token Ring equipment,
data networking protocols, and legacy applications. This is a valid concern,
since the ATM architecture differs fundamentally from legacy LAN
technologies, for example, Ethernet is connectionless while ATM is a
connection-oriented technology. Therefore, to use ATM for practical data
networking, there must be some way of adapting existing network layer
protocols (working on legacy LANs), such as IP, to the connection-oriented
paradigm of ATM. To meet this end, ATM (both AAL and ATM layer) is
inserted into the MAC (Media Access Control) sublayer, under the LLC
(Logical Link Control) sublayer of IEEE 802.2, as shown in Figure 11-4.

IEEE 802.2 Logical Link Control (LLC)


Data Link Layer
CSMA/CD Token Bus Token Ring DQDB
802.4 802.5 802.6
FDDI ATM MAC
802.3
Sublayer

Physical Layer

Figure 11-4: Overlay of ATM in IEEE 802.2


A widely used network is the Internet that employs the Internet Protocol
(IP). The TCP/IP protocol suite has been integrated into all of the most

1. ATM Forum (http://www.atmforum.com/) is an implementer's agreement body to define the ele-


ments of ATM services.

Essentials of Real-Time Networking Copyright © 2004 Nortel Networks


Chapter 11 ATM and Frame Relay 255

popular operating systems. Recently, ATM has often been used as a link-
layer technology for both local and international regions of the Internet. A
special AAL (ATM adaptation layer) type, called AAL-5, has been
developed to allow TCP/IP to interface with ATM, as shown in Figure 11-
5. Fundamentally, the network layer sees ATM as a data link protocol. At
the IP-ATM interface, AAL-5 prepares ATM transport for IP datagrams.
Application Layer (HTTP, FTP, etc.)
Transport Layer (TCP or UDP)

Network Layer (IP)


AAL-5 (ATM Adaptation Layer)
ATM Layer
ATM Physical Layer

Figure 11-5: Internet-over-ATM protocol stack

ATM interfaces
The ATM standard defines two main types of interface in ATM networks:
User-to-Network Interface (UNI)
Network-to-Network Interface (NNI)

User-to-network interface
ATM can be used within both a private network and a public network,
referred to as private User-to-network interface (UNI) and public UNI,
respectively, as shown in Figure 11-6. The public UNI is used to
interconnect an ATM user (or ATM terminal) with an ATM switch
deployed in a public service provider's network, while the private UNI is
used to interconnect an ATM user with an ATM switch that is managed by
a private organization, for example, computer center. Both UNIs share an
ATM layer specification, but may utilize different physical media. The
primary distinction between these two UNIs is physical reach. There are
also some functional differences between the public and private UNIs, due
to the requirements associated with each interface. For example, the
administrative function of private UNI for all domains in an organization,
may follow a local management scheme, which may not seriously consider
interconnection issues.

Network-to-network interface
The Network-to-network (NNI) is the interface between ATM switches.
There are two types of NNI, public NNI and private NNI. The public NNI,
also known as the Broadband Intercarrier Interface (BICI), defines an
interconnect interface between public ATM switches. The private NNI

Copyright © 2004 Nortel Networks Essentials of Real-Time Networking


256 Section IV: Packet Network Technologies

(PNNI) defines an interconnect interface between ATM switches managed


by private networks.
The term “UNI” is used generically to indicate interfaces to both public
and private UNI. Similarly, the term “NNI” can refer to either private or
public NNI. The term “ATM switch” refers to both public and privates
switches. Where necessary, a specific term is used.
Public UNI
ATM
User Public ATM Network ATM
User
ATM Private Public Public
User ATM ATM ATM
Switch Switch Switch
ATM
User ATM
Private NNI User

Private Public NNI Public UNI


ATM
Private UNI Switch

Figure 11-6: ATM over private and public UNI and NNI

ATM architecture
The B-ISDN protocol reference model has been defined in ITU-T
Recommendation I.121, as shown in Figure 11-7. The model contains both
horizontal and vertical structures. The horizontal layer consists of four
main layers:
Higher layer that specifies functions for applications.
ATM Adaptation layer (AAL). AAL is concerned with a number of
processes necessary to transform the user data stream into a format
suitable for ATM, such as segmentation/reassembling of higher
layer Protocol Data Unit (PDU) into ATM cells. The AAL is
divided into two sublayers, the convergence sublayer (CS) and the
segmentation sublayer (SAR)
ATM layer specifies the ATM structure and functions at the cell
level (more details of AAL and ATM layers will be given in the
next section).
Physical layer specifies media technology dependent issues.

Essentials of Real-Time Networking Copyright © 2004 Nortel Networks


Chapter 11 ATM and Frame Relay 257

Higher Layer

ATM Adaptation Layer (AAL)


Convergence sub-layer (CS)
Segmentation and reassembly sub-layer (SAR)

ATM Layer
Physical Layer
Transmission convergent sub-layer
Physical media-dependent sub-layer

Figure 11-7: ATM protocol layer

ATM layer
There are two different formats of ATM cell header, one for use at the
User-to-Network Interface (UNI), and the other one for use at the Network-
to-Network Interface (NNI), as shown in Figure 11-8.
Bit Bit
8 5 4 0 8 5 4 0

GFC VPI 1 VPI 1


Cell
VPI VCI 2 VPI VCI 2
header
VCI 3 VCI 3
(5 bytes)
VCI PTI CLP 4 Byte VCI PTI CLP 4 Byte
HEC 5 HEC 5
. .
Payload (48 bytes) . Payload (48 bytes) .
. .
53 53

(a) (b)
CLP Cell loss priority PT Payload type
GFC Generic flow control VCI Virtual circuit identifier
HEC Header error control VPI Virtual path identifier
NNI Network-network interface UNI User-network interface
Figure 11-8: ATM cell headers: (a) UNI (User-Network Interface) and (b) NNI
(Network-Network Interface)
At the UNI, the header contains a four bit generic flow control (GFC) field,
a 24 bit label field containing virtual path identifier (VPI) and virtual
channel identifier (VCI) subfields (eight bits for the VPI and sixteen bits
for the VCI), a 2 bit payload type (PT) field, a 1 bit priority (PR) field, and
an eight bit header error check (HEC) field. The cell header for an NNI cell
is identical to that for the UNI cell, except that it lacks the GFC field; these
four bits are used for an additional four VPI bits in the NNI cell header, as
shown in Figure 11-8 (b).

Copyright © 2004 Nortel Networks Essentials of Real-Time Networking


258 Section IV: Packet Network Technologies

ATM virtual channel connection


A single ATM physical link can support multiple Virtual Paths (VP), and
each VP can contain multiple Virtual Channels (VC). In most practical
cases, VPs are used between switches because of manageability of the
ATM network. Figure 11-9 shows the relationship between a physical
transmission link, VP, and VC.

1 VP1 1
VC
VC 2
VP1 2

1
1 VP2 VC
VC 2 VP2 2

Physical Transmission Link


1
1
VC 2
VP3 VP3 2 VC
3
3

VC: virtual channel VP: virtual path


Figure 11-9: Relationship of physical transmission link, virtual path (VP),
and virtual channel (VC)
ATM is a connection-oriented network, and it must create an explicit link
path between end points before transmitting any information. A virtual
channel connection (VCC, sometimes called virtual channel link [VCL]) is
a logical link, a unidirectional connection between two points, which can
be a pair of ATM switches or an ATM switch and the end user. A VCC is a
logical link connection identified by a unique VCI (virtual channel
identifier) value in the ATM cell header. Although, a VCC is
unidirectional, VCCs are always used in pairs, one VCC for each direction.
Thus, bidirectional communication requires a pair of VCCs that share the
same physical path through the network. The concepts of “VC” and “VCC”
are almost the same. The acronym VC is mostly used in a generic context
while VCC or VCL is used in a more specific way.
A virtual path (VP) defines a number of virtual channels, as a unit of
observation, for unidirectional traffic between an ATM end-system and an
ATM switch. Connections relying on either VC or VP are logically equal.
A virtual path connection (VPC, or a virtual path link [VPL]) defines a
logical end-to-end connection between two ATM end users or ATM
switches. A VPC can be established permanently, semipermanently, or
dynamically. A virtual path identifier (VPI) in the ATM cell header is used
to identify any VPCs. Similar to VC and VCC, the acronym VP is often
used in a generic context while VPC or VPL is used in a more specific way.
The primary function of the ATM layer is VPI/VCI translation. As ATM
cells arrive at ATM switches, the VPI and VCI values contained in their

Essentials of Real-Time Networking Copyright © 2004 Nortel Networks


Chapter 11 ATM and Frame Relay 259

cell headers are examined by the switch to determine which out-port should
be used to forward the cell. In the process, the switch translates the VPI and
VCI values of the original cell (received from the input-port) into new
outgoing VPI and VCI values, which are used in turn by the next ATM
switch to send the cell toward its intended destination. The table used to
perform this translation is initialized during the establishment of the call.
An ATM switch may either be a VP switch, in which case it only translates
the VPI values contained in cell headers (as shown in Figure 11-10), or it
may be a VP/VC switch, in which case it translates the incoming VCI value
into an outgoing VPI/VCI pair (as shown in Figure 11-10). In a VP switch,
all VCI numbers accommodated in a particular VP are not changed even if
the VPI number of the VP may be changed. Since VPI and VCI values do
not represent a unique end-to-end virtual connection, they can be reused at
different switches through the network. The VPI and VCI are local labels
between each switch pair for a given connection. This is important, because
the VPI and VCI fields are limited in length and would be quickly
exhausted if they were used simply as destination addresses.

ATM Switch

ATM Switch VP Switch

VCI 1 VPI 1 VPI 7


VP Switch
VCI 1 VPI 1 VPI 7 VCI 1 VCI 1
VPI 2 VPI 8
VCI 2
VCI 1
VCI 1 VCI 1
VPI 2 VPI 8 VCI 2 VPI 3 VPI 9
VCI 2 VCI 2
VCI 3
VCI 3
VCI 1
VCI 2 VPI 3 VPI 9 VCI 1
VCI 2 VPI 10
VCI 3

VCI 2 VCI 1 VCI 5 VCI 6

VC Switch

Figure 11-10: VP and VC switching in ATM switch


Virtual paths are an important component of traffic control and resource
management in ATM networks. Advantages of using VP switching are as
follows:
Simplifying call admission control (CAC)
Statistical multiplexing of the carried VCs makes effective use of
the bandwidth of a VP
Dynamic paths and routing are feasible because the VP logical
structure can be rearranged easily
Logical transport is separated from physical transport. In other
words, the logical is independent of the actual physical connection

Copyright © 2004 Nortel Networks Essentials of Real-Time Networking


260 Section IV: Packet Network Technologies

VPs may be routed through an ATM switch by reference to the VP number,


or they may terminate in the ATM switch. A VP entering an end-point
always terminates at the end-point.
Figure 11-11 shows a sample scenario of how VPI/VCI translation in ATM
switches works. In this example, ATM switch two maintains a table of VPI
translations between VPI-in and VPI-out ports, for example, if VPI-in is
five, it will be translated to seven for VPI-out. All ATM switches in this
sample scenario are VP switches, only VPI values are changed.
VPI=7, VCI=1,2,3
VPI=7, VCI=1,2,3
A
VPI-in VPI-out
VPI=5, VCI=1,2,3 B
ATM 5 7
switch 1

VPI=9, VCI=3,4 ATM


switch
2

VPI-in VPI-out

7 5
9 7 VPI=7, VCI=3,4

ATM
VPI-in VPI-out switch 3

7 3

VPI=3, VCI=3,4

Figure 11-11: A sample scenario of VPI/VCI translation in ATM switches


In conclusion, the main functions of the ATM layer are as follows:
ATM cell header processing (for example, PTI field, HEC)
VCI/VPI translation (for example, cell switching from in-port to
out-port)
ATM cell multiplexing (for example, multiplexing among different
ATM connections)
Flow control (for example, GFC, traffic shaping)
ATM is connection-oriented, which implies the need for ATM-specific
signaling protocols and addressing structures, as well as protocols to route
ATM connection requests across the ATM network. However, these topics
are out of our scope of this book.

Essentials of Real-Time Networking Copyright © 2004 Nortel Networks


Chapter 11 ATM and Frame Relay 261

AAL (ATM adaptation layer)


The AAL relays information received from higher layers into ATM cells in
the ATM layer. The AAL provides mapping of application to ATM service
types. AAL plays a key role in the ability of an ATM network to support a
variety of applications and services. There are five types of AAL, AAL-0,
AAL-1, AAL-2, AAL-3/4, and AAL-5. The AAL is divided into two
sublayers, the convergence sublayer (CS) and the segmentation sublayer
(SAR), as shown in Figure 11-12.
Service
AAL 0 AAL 1 AAL 2 AAL 4 AAL 5 types

CS CS CS SSCS SSCS A
CS A
CPCS CPCS
L
SAR SAR SAR SAR SAR

ATM Layer

Figure 11-12: Sublayers in AAL (ATM adaptation layer): CS (convergence


sublayer) and SAR (segmentation and reassembly sublayer)
The main functions of AAL are as follows:
Segmentation of AAL-PDU to ATM cells at the source node and
reassembly of ATM cells at the destination node
Detection of lost, erroneous, and misinserted cells. There is no
re-transmission mechanism.
Recovery of the source clock frequency, if required (for example,
AAL-1).
Transfer of timing between source and destination, if required (for
example, AAL-2).
Handling of cell delay variations, if required (for example, AAL-1).
AAL is implemented in the ATM end systems (for example, entry and exit
nodes in ATM networks), not in the intermediate ATM switches. Thus, the
AAL layer is analogous to a transport layer for applications.

AAL structure
The user data (for example, user-PDU) from a higher layer is first
encapsulated in a common part convergence sublayer protocol data unit
(CS-PDU) in the convergence sublayer (CS), as shown in Figure 11-13. In
this sublayer, the CS-PDU header and trailer are added. Typically, the
CS-PDU size is much too large to fit into the payload of a single ATM cell.

Copyright © 2004 Nortel Networks Essentials of Real-Time Networking


262 Section IV: Packet Network Technologies

As a result, the CS-PDU has to be segmented at the ATM source and


reassembled at the ATM destination. This function is handled by SAR
sublayer. The SAR sublayer segments the CS-PDU and adds an AAL
header and trailer to form the SAR-PDU in a block of 48 bytes. The
SAR-PDU is passed to the ATM layer and will be a payload of an ATM
cell.
.

Variable length (up to 64 Kbytes) CS-PDU padding

CS-PDU User-PDU CS-PDU trailer


header

Convergence
CS-PDU SAR-PDU User-PDU sub-layer A
header
A
SAR-PDU
Trailer SAR sub- L
SAR-PDU layer
48 bytes
ATM cell ATM
ATM Payload header ATM Payload Layer
ATM cell

Physical Layer

Figure 11-13: AAL structure

AAL-0
AAL-0 is the null function (CS and SAR are each an empty function). Cells
from the higher layer are transferred, through the AAL-0 service interface,
directly to the ATM layer service.

AAL-1
AAL-1 has been standardized in both the ITU-T and ANSI since 1993, and
is incorporated in the ATM Forum specifications for circuit emulator
services (CES). AAL-1 supports constant bit rate (CBR) services with a
fixed timing relation between source and destination users synchronous
traffic (for example, uncompressed voice). The AAL-1 service is offered by
most ATM equipment manufacturers. AAL-1 provides the following
services to the AAL user:
Transfer of service data units with a constant source bit rate and the
delivery of them with the same bit rate
Transfer of timing information between source and destination,
explicit time indication is used by inserting a timestamp in the
CS-PDU
Source clock recovery at the receiver by monitoring the buffer
filling (if needed)
Detection of lost or misinsertion cells

Essentials of Real-Time Networking Copyright © 2004 Nortel Networks


Chapter 11 ATM and Frame Relay 263

The SAR sublayer defines a 48 bytes protocol data unit (SAR-PDU). The
SAR payload contains 47 bytes of user data (of which one byte can be used
for a pointer), four bits for a sequence number (SN), and four bits for
sequence number protection (SNP). The SNP field is used for a CRC
(cyclic redundancy check) value to detect errors in the SN field.

AAL-2
AAL-2 is designed for a service to handle variable bit rate (VBR). AAL-2
needs a different mechanism from AAL-1, since VBR traffic behavior
differs from CBR. Due to a variable bit rate of VBR traffic, cell interval
time is not a constant value. Maximum delay time must be defined in the
AAL-2 packetization mechanism.
As stated above, VBR traffic is not a constant bit rate. There is a chance
that the SAR-PDU may not be filled during a particular time interval. The
delay may be large if the SAR-PDU waits until the payload is completed.
So, the SAR-PDU contains the following information, six bits of SAR-
PDU Header, sixteen bits of SAR-PDU Trailer, and 362 bits of SAR-PDU
Payload. Based on a new service requirement in the AAL layer, the ATM
Forum and ITU -T Study Group (SG) 13 discusses a new AAL-2 to provide
efficient transport of low bit rate voice that allows a very small transfer
delay across the network.

AAL-3/4
Originally, AAL-3 and AAL-4 were separated. AAL-3 tended to support a
connection-oriented service over ATM, while AAL-4 was a connectionless
operation. However, the rest of functions were similar. Therefore, AAL-3
and AAL-4 were combined. As a result, AAL-3/4 supports both connection
oriented and connectionless services. The SAR-PDU consists of sixteen
bits of the header, sixteen bits of the trailer, and 44 bytes of the payload.

AAL-5
AAL-5 is a low-overhead AAL, that is mainly used to transport IP
datagrams over ATM networks. With AAL-5, the header is empty. Similar
to other AALs, the SAR function takes care of segmenting the CS-PDU
into blocks of 48 bytes, but no SAR overhead is added in AAL-5.
The PAD ensures that the CS-PDU is an integer multiple of 48 bytes. The
length field identifies the size of the actual CS-PDU payload, so that the
user data can be retrieved at the destination.

Copyright © 2004 Nortel Networks Essentials of Real-Time Networking


264 Section IV: Packet Network Technologies

From a service point of view, AAL-3/4 and AAL-5 offer the same layer
functionality. The main differences between these two service types are as
follows:
The AAL-5 performs minimum error control mechanisms in
comparison to the AAL-3/4
The AAL-5 does not offer a multiplexing capability
AAL-5 is the most widely used AAL. Currently AAL-5 offers a service for
the transport of IP networks, and a frame relay service. AAL-5 is
considered in the ATM for possible use, to transport real-time multimedia
information.

QoS and services in ATM networks


To meet the needs of a variety of users, be it for voice, data, or video, ATM
provides four classes of services:
Constant bit rate (CBR) is intended for fixed-bandwidth
transmission at Peak Cell Rate (PCR) negotiated between the user
and the network.
Variable bit rate (VBR) is intended for a user requiring a
negotiated Sustainable Cell Rate (SCR), and a maximum cell rate
bounded by PCR, greater than SCR. There are two sub-classes in
VBR, Real-Time VBR (RT-VBR) and Non–Real-Time VBR
(NRT-VBR). RT-VBR has more precisely defined requirements for
throughput and delay than NRT-VBR.
Available bit rate (ABR) is a lower priority service than CBR and
VBR, but provides an agreed Minimum Cell Rate (MCR) and will
accommodate bit rates in excess of MCR, if network bandwidth is
available. The service is intended for connection-less traffic (for
example, LAN traffic or internetworking TCP/IP traffic).
Unspecified bit rate (UBR) is the service of lowest priority, also
known as “best effort.” There are no specified traffic parameters,
except PCR and CDVT (Cell Delay Variation Tolerance), no flow
control and no specified QoS (such as cell loss, cell delay and
variance). This service is also intended for connection-less traffic
(for example, LAN traffic or internetworking TCP/IP traffic).

Essentials of Real-Time Networking Copyright © 2004 Nortel Networks


Chapter 11 ATM and Frame Relay 265

Figure 11-14 illustrates how the bandwidth is used in ATM networks by


the four service classes [24].
Priority level
Low
UBR Traffic

ABR Traffic
VBR Traffic
Load Utilization CBR Traffic
UBR Traffic
High
1.0
ABR Traffic

VBR Traffic

CBR Traffic
Time

Figure 11-14: Bandwidth usage of the four service classes in ATM networks
The VBR and CBR classes are higher priority classes used to transport
real-time or high quality audio and video data. The CBR and VBR services
guarantee the negotiated throughput (and therefore the necessary
bandwidth), the maximum cell delay, and variance. Hence, a switch first
allocates link bandwidth to these classes. The remaining bandwidth, if any,
is given to ABR and UBR traffic. To enable the ABR service to function
effectively, a suitable closed-loop flow control mechanism must be
implemented. To that end, the ATM Forum proposed a rate-based flow
control scheme, which is a closed-loop control mechanism. With the rate-
based scheme, the network controls the transmission rate of the sources to
maximize the network performance. Thus, at times when resources are
plentiful, the network will allow a source to increase its rate of
transmission, but at other times, when the traffic is heavy, the source rate
will be throttled to a safe value. In contrast, the UBR service has no flow
control mechanism (or open loop) and does not specify traffic-related
service guarantees, but may be subject to a local policy in individual
switches and end systems. It is a “best effort” service.
The characteristics of the four ATM services are summarized in Table 11-
1.

Copyright © 2004 Nortel Networks Essentials of Real-Time Networking


266 Section IV: Packet Network Technologies

Functions CBR VBR ABR UBR


Bandwidth guarantee Yes Yes Yes -
Cell delay variation guarantee Yes Yes - -
Cell rate guarantee Yes Yes Yes -
Article I. Congestion - - Yes -
feedback notification
Table 11-1: Characteristics of ATM service classes
Table 11-2 summarizes traffic and QoS parameters for each ATM service
class. Traffic parameters are used for negotiation during connection setup.
QoS parameters, in general, are defined in terms of the measurements of
end-to-end connection.

Parameters CBR VBR UBR ABR


RT-VBR NRT-VBR
PCR and CDVT(5,6) Yes Yes Yes Yes(3) Yes(3)
Traffic SCR, MBS, N/A Yes Yes N/A N/A
Parameters CDVT(5,6)
MCR4 N/A N/A N/A N/A Yes
CDVT(5) Yes Yes Yes Yes(3) Yes(4)
CDVT(5,6) N/A Yes Yes N/A N/A
QoS Parameters Peak-to-peak CDV Yes Yes No No No
Mean CTD No No Yes No No
Max CTD Yes Yes No No No
CLR Yes(1) Yes(1) Yes(1) No Yes(2)
PCR: Peak Cell Rate CDV: Cell Delay Variation CTD: Cell Transfer Delay CDVT: Cell Delay Variation Toleran
CLR: Cell Loss Ratio SCR: Sustained Cell Rate MCR: Minimum Cell Rate MBS: Maximum Burst Size
Notes: (1) The CLR may be unspecified for CLP=1
(2) Minimized for sources that adjust cell flow in response to control information
(3) May not be subject to CAC and UPC procedures
(4) Represents the maximum rate at which the source can send as controlled by the control information.
(5) These parameters are either explicitly or implicitly specified for PVCs or SVCs.
(6) Different values of CDVT may be specified for SCR and PCR.

Table 11-2: The traffic and QoS parameters for the ATM service classes
However, some QoS parameters are not negotiated during connection
setup, for example, Cell Error Ratio (CER), Severely Errored Cell Block
Ration (SECBR), and cell misinsertion rate (CMR).
The QoS parameter CDV should not be confused with the connection
traffic parameter CDVT. Even though CDV is a QoS parameter, it is not
used for negotiation. CDV is introduced by cell multiplexing when cells
from two or more connections are multiplexed (to the same output
channel). Cells of a given channel may be delayed while cells of other
channels are being inserted at the output of multiplexer. In practice, the
upper bound of CDV is measured in CDVT. The value of CDVT is chosen
such that the output cell flow conforms to a bandwidth enforcement
mechanism.
A user of an ATM connection (a VCC or a VPC) is provided with one of a
number of QoS classes supported by the network. At VPC establishment,

Essentials of Real-Time Networking Copyright © 2004 Nortel Networks


Chapter 11 ATM and Frame Relay 267

the QoS is determined from a number of QoS classes supported by the


network. It should be noted that a VPC may carry a number of VC links of
various service classes that require different QoS parameters. For instance,
a VPC may carry different VCC types, for example, one may be VBR and
others may be ABR and/or UBR. To do so, the QoS of the VPC must meet
the most demanding QoS of the VC links carried as defined in “B-ISDN
Asynchronous Transfer Mode Functional Characteristics”. The service
class associated with a given ATM connection is indicated to the network at
the time of connection establishment and will not change for the duration
of that ATM connection (in other words, it stays for the life time of the link
connection).
Table 11-3 shows example applications that are suitable for each ATM
service class.
Applications CBR VBR ABR UBR
Real-Time-VBR None-Real-Time-
(RT-VBR) VBR (NRT-VBR)
Critical data
LAN Interconnection,
LAN Emulation
Data transport/Interworking
(IP-FR-SMDS)
Circuit emulation - PABX
POTS/ISDN - video conference
Compressed audio
Video distribution
Interactive multimedia

=Excellent =Good =Fair =Poor

Table 11-3: Example of applications for ATM service classes


Because ATM is connection-oriented, a communication channel must be
created before sending any information. This means that a communication
path between a source and a destination needs to be established. This
request will be sent along all ATM switches on the selected paths, to ask for
resource allocation and ensure user requested QoS can be guaranteed. A
connection request is accepted or rejected based on the availability of
resources. It is the process of Call Admission Control (CAC).
Negotiated characteristics of an ATM connection are specified by the
traffic contract. The traffic contract is an agreement between a user and a
network. QoS (for example, cell delay variation, cell loss ratio), traffic
descriptors (for example, peak cell rate), and conformance definition (for
example, conforming cells) are described in the traffic contract.
The next step is resource allocation at the call level, according to the
desired traffic contract. If the requested resources agree with the allocated
resource, the path generation step will be executed. Otherwise, alternative
paths will be selected. The allocated resources on the path will be reserved,

Copyright © 2004 Nortel Networks Essentials of Real-Time Networking


268 Section IV: Packet Network Technologies

virtual channel and virtual paths are assigned. During these steps, however,
if any conditions can not be done, the alternative path selection process will
be performed. If no alternative path is found, the call will be rejected.
After CAC process is complete (see Figure 11-15), and the requested path
is created, data may be sent over this communication channel. To ensure
user contracts, which have been agreed upon during the CAC, the cell level
process will monitor and enforce traffic according to the contact
parameters. Any violating users may cause a rejection of the established
channel. To that end, the Generic Cell Rate Algorithm (GCRA) is used to
specify the arrival cell as either, conforming or nonconforming. The cell
that is conforming can be admitted, where a cell with nonconform may be
blocked. A widely used mechanism to shape the traffic is 'leaky bucket' as
shown in Figure 11-16.
Path bandwidth
Call rejection Network
request
level
No
Call setup request
Is alternate Yes
path selection Call
successful? level
No Agree
with network load
conditions?

Yes
Path generation phase CAC
(Call
Admission
Agree with
Control)
No
load condition of trunks in
the path?

Yes
Connection
No
Is link phase
allocation phase
successful?

Yes

VC and VP assignment
phase

Connection
establishment

Cell
Traffic Policing
level

Figure 11-15: Call admission control (CAC) flow

Essentials of Real-Time Networking Copyright © 2004 Nortel Networks


Chapter 11 ATM and Frame Relay 269

Traffic Traffic
source source
AAL Layer

MUX

ATM Layer

Physical Layer
UNI
Cell with token
VCI/ VPI
Cell buffer
ATM
Switch

Continuous-state leaky
bucket Token generator

Figure 11-16: Traffic shaper and leaky bucket.

Voice and telephony over ATM


The ultimate requirement of the next generation network is to handle
packetized voice and data in a converged manner. There are numerous
technical and commercial justifications as to why separate investments in
voice and data infrastructure is not feasible. Packet voice may be applied to
“Voice over the Internet”, but the primary interest will be in ‘voice’ carried
over business quality packet infrastructure. Next generation networks are
not just a PSTN replacement, but at a minimum they must provide the
equivalent voice quality and reliability of today's PSTN.
Traditionally, voice is carried over leased lines and Time Division
Multiplexing (TDM) circuits. ATM is designed to support time delay
sensitive traffic, for example, using AAL-1 or AAL-2. It is appropriate to
deploy ATM to replace the TDMs. The use of AAL-1 was subsequently
extended to allow replacement of 64 K circuits (or traditional digital voice
circuits), providing a means to convey voice on ATM backbones instead of
TDM infrastructures. ATM support for CBR based on AAL-1 is known as
Circuit Emulator Service (CES). Two types of CES have been standardized
as follows:
Unstructured circuit emulator service
Structured circuit emulator service

Copyright © 2004 Nortel Networks Essentials of Real-Time Networking


270 Section IV: Packet Network Technologies

The unstructured circuit emulator service, also known as a circuit emulator


service, uses the AAL-1 service connection to carry a full T1 or E1 link
between two points in the network.
For example, the connection may be used to carry end-to-end transport
circuits between PABXs, as shown in Figure 11-17.

AAL-1: Circuit Emulator

T1/E1 T1/E1
PABX ATM Switch ATM Switch PABX
Voice Codec Voice Codec
AAL AAL
ATM ATM
ATM Network
PHY PHY

Figure 11-17: Circuit emulator service (CES) using AAL-1


The structured circuit emulator service employs AAL-1 to create a
connection with N x 64 kbps circuits, as opposed to a full T1 or E1.
Overall, the structured circuit emulation service offers improved
granularity for establishing fractional T1 or E1 circuits over ATM
networks.
However, although AAL-1 is offered by many vendors, AAL-1 should not
be considered an optimum solution for voice and telephony over ATM
(VTOA) for the following reasons:
Only a single user of the AAL can be supported (per virtual
channel, configured as a point to point permanent connection)
Reducing delay requires significantly more bandwidth than
necessary, especially when its peak is high
No allowance for partially filled cells (always 48 bytes of payload),
bandwidth is used even when there is no traffic
Voice is always 64 kbps (using G.711 codec) or bundles of 64 kbps
(N users x 64), this means that each connection requires at least 64
kbps
No standard mechanisms exist in the AAL-1 structure for
compression, silence detection/suppression, idle channel removal
The bandwidth can not simply change to meet new application
requirements
As mentioned above, AAL-1 has some limitations and may not operate
efficiently to carry low bit rate voice traffic. The ATM Forum and ITU-T
study Group (SG) 13 discussed a new service in AAL layer that can carry
low bit rate voice information. The primary requirement on the new AAL,

Essentials of Real-Time Networking Copyright © 2004 Nortel Networks


Chapter 11 ATM and Frame Relay 271

is to provide efficient transport of small native packets over ATM networks


in such a way that allows a very small transfer delay across the ATM
network, while still allowing the receiver to recover the original packets. To
this end, new AAL-2 with multiplexing capability was created. New AAL-
2 provides the following advantages when compared with AAL-1:
Efficient bandwidth usage through VBR service
Support for voice compression and silence detection/suppression
Support for idle voice channel detection
Multiple user channels with varying bandwidth on a single ATM
connection
VBR ATM traffic class that allows the bandwidth changed
Figure 11-18 shows an example of how new AAL-2 multiplexes several
voice channels into one channel (one virtual channel (VC) in ATM
context). Each 64 kbps voice channel is compressed by using the
CS-ACELP scheme with silence suppression. During a silent period there
is no data to send. New AAL-2 is suitable for compressed wireline
telephony, wireless telephony, and wireless data. Some performance results
have been evaluated.

Voice 1 Voice 2 Voice 3


64 kbps PCM 64 kbps PCM 64 kbps PCM

Codec Codec Codec


Voice payload

Voice channel
AAL Layer 1 2 3 1

ATM cell header


ATM Layer
ATM cell payload

Figure 11-18: Voice and telephony multiplexing over ATM using AAL-2

Frame relay and FRF.11/12


Frame relay is a connection-oriented protocol that is widely used in many
Enterprise networks for connectivity from customer edge (CE) to provider
edge (PE). The Frame Relay Forum (now known as the MPLS and Frame
Relay Alliance) has been very active through the years developing and
expanding frame relay technology to remain current with business
requirements. It is important to note that frame relay is essentially an
access technology in today's environment. The PE will encapsulate or inter-
work the frame relay frame.

Copyright © 2004 Nortel Networks Essentials of Real-Time Networking


272 Section IV: Packet Network Technologies

The two major Frame Relay Forum specifications important to time


sensitive traffic are, FRF.11 and FRF.12. FRF.11 is the Voice over Frame
Relay Implementation Agreement. This specification deals with how to
transport voice data over a frame relay network. FRF.11's focus is to extend
the application support of frame relay to include digital voice payloads.
FRF.12 is the Frame Relay Fragmentation Implementation Agreement.
This specification deals with fragmenting long frames into a sequence of
shorter frames. One of the primary benefits of this is it reduces delay and
delay variation for time sensitive packets when non–real-time and time
sensitive packets are being sent across the same interface.

FRF.11.1
As mentioned above, FRF.11.1 is the Voice over Frame Relay
Implementation Agreement. This specification deals with a number of
different concepts. This section concentrates on the primary piece
concerning transport of voice within the frame relay payload, support of
various codecs and the effective utilization of low-bandwidth connections.

Transporting voice via frame relay


In voice over frame relay networks, a new edge device must be introduced
to the voice coding. These devices are collectively called, voice frame relay
access devices (VFRAD), and are positioned between the PBX and the
frame relay network. The VFRAD will consolidate the normal FRAD
(frame relay access device) and voice functionality. The VFRAD will
multiplex all of the traffic types onto the frame relay connection. While the
FRF.11 spec supports the majority of common compression algorithms, it
is important to ensure that the VFRAD will support the selected codec. The
VFRAD uses the frame relay user-to-network interface (FRUNI) as the
preferred interface type for voice, voice signaling, and data.
FRF.11 also includes a support for silence information descriptor (SID),
which is a frame that identifies the end of voice activity and relays the
general comfort noise parameters. The SID frames are only available with
PCM and G.726 (ADPCM). The SID supports a variety of voice activity
detection (VAD) and silence suppression plans.
If VAD is used, the SID frame may be transmitted after the last voice
frame. The receiver will identify the acceptance of the SID, as an explicit
indication of the end of the voice frames. The SID frames are also sent
between voice frames to maintain the comfort level parameters with the far
end. You should not use SID independently from VAD. The codecs that are
supported through the FRF.11 specification are listed in Table 11-4.
Highlighted, are the primary choices currently used today's networks.
Further information can be found in the FRF.11 specification.

Essentials of Real-Time Networking Copyright © 2004 Nortel Networks


Chapter 11 ATM and Frame Relay 273

Codecs
G.729 G.728 G.723.1 G.726/G.727 G.711

CS-ACELP LD CELP MP-MLQ ADPCM PCM

Table 11-4: Codecs

Transporting voice with DLCI


Each frame relay PVC (DLCI) represents one logical stream or traffic flow,
and it will generally interconnect two logical points in a network. The
concept of a sub-channel extends this ability, allowing the formation of
multiple streams within a single PVC. With sub-channels, any given PVC
may support up to 255 sub-channels or streams, creating up to 255 logical
traffic lanes within the connection between two network points. The
content, also called the payload, of individual PVC is transparent to a frame
relay network, so the implementation of sub-channels remains compatible
with existing frame relay services. As such, it becomes solely the
responsibility of the end-devices to handle and manage the use of sub-
channels within the PVC content.
This use of sub-channel allows for a greater degree of traffic separation.
Sub-channels can be identified for fax, voice and data communications.
The traffic is sent across the DLCI through first in first out (FIFO)
scheduler, however, the use of FRF.12 allows time-sensitive traffic to stay
within its QoS parameters. The use of sub-channels will also reduce the
operating expense of having multiple frame relay links into the carrier
network. There will be no need to have separate links for each type of
traffic, increasing efficiency of the links in use and again, reducing the
number of connections.
This is just a high-level view of the FRF.11.1 functionality. A more in-
depth view, and specifics associated to the codecs selected, can be found in
the functional specification. It is important to remember that the
specification allows for the use of a number of different codecs. The use of
sub-channels allows for the multiplexing of different virtual traffic streams
into one DLCI. FRF.11 also incorporates the use of SID to ensure a more
efficient use of the bandwidth available. These items, combined with
FRF.12, ensure the viability of frame relay to provide voice and data
services into the future.

FRF.12 frame relay fragmentation


The FRF.12 specification, as the name states, deals with the ability to
fragment long frames into a sequence of shorter frames, which can be
reassembled by the destination data terminal equipment (DTE) or data
circuit-terminating equipment (DCE). Frame fragmentation is particularly

Copyright © 2004 Nortel Networks Essentials of Real-Time Networking


274 Section IV: Packet Network Technologies

important when delay and delay variation needs to be controlled, as in real-


time or time sensitive applications. FRF.12 should be employed when a
mix of data and voice traffic is being sent across a single interface. When
voice traffic is partitioned, there will be no need to use fragmentation, as
the voice packet size will be constant. Frame fragmentation can also aid in
reducing serialization delay, associated to low speed interfaces, thereby,
maintaining a constant delay variation and increase frame interleaving.
Flexibility is added into the function by allowing the operator to determine
the size of the fragments, and must be configured on a per interface basis.
Slow speed interface is a relative term, which will be dependent on the link
speed, the use of channelization, and the amount of traffic associated to the
interface.

Applications of fragmentation
There are three applications for fragmentation defined in the specification:
Locally across a frame relay UNI interface between the DTE-DCE
peers
Locally across a frame relay NNI interface between the DCE peers
End-to-end between two frame relay DTE peers
Frame relay User-to-Network Interface (UNI) and frame relay Network-to-
Network Interface (NNI) share the same type of data fragment format. By
that, the header information for the first two types of fragmentation match.
This contains a two-octet header that precedes the frame relay header
information.
End-to-end fragmentation's header information is a quite a bit more
complex. It contains the same information as in the UNI and NNI
fragmentation schemes, however, adds a number of other information into
the header. Missing is the network layer protocol ID (NLPID), which is
assigned to identify this fragmentation header format and identifies the data
content. The unnumbered information (UI) bits are also missing and are
associated to multiprotocol encapsulation.
Note: Some customers may have a requirement to communicate via
frame relay to a web site that is not their own and does not support
fragmentation. In these cases the customer may not be willing to
implement fragmentation.

Frame relay between DTE and DCE peers


UNI (DTE-DCE) fragmentation is strictly local to the interface, and the
fragment size can be optimally configured to provide the proper delay and
delay variation, based upon the logical speed of the DTE interface. You
must remember that in frame relay a DTE must connect to a DCE,
therefore, this type of connection would be between the customer edge

Essentials of Real-Time Networking Copyright © 2004 Nortel Networks


Chapter 11 ATM and Frame Relay 275

device and the provider edge device. Since fragmentation is local to the
interface, the network can take advantage of the higher internal trunk
speeds by transporting the complete frames, which is more efficient than
transporting a larger number of smaller fragments. UNI fragmentation is
also useful when there is a speed mismatch between the two DTEs at the
ends of a VC (virtual circuit).

Frame relay NNI between DCE peers


As the name would imply, this type of fragmentation occurs between frame
relay networks, as NNI is Network to Network Interface. These inter-
network links are usually fewer in number and fragmentation allows
interleaving of frames, allowing the delay to be minimized. When this type
of fragmentation is done between NNIs, all of the DLCIs associated to the
interface must be fragmented, according to the specification.

End-to-end fragmentation between two DTE peers


End-to-end between two frame relay DTEs is the final type of
fragmentation method defined in the FRF.12 spec. These DTEs may be
interconnected by one or more frame relay networks. When using end-to-
end, the fragmentation procedure is transparent to frame relay networks
between the transmitting and receiving DTEs. End-to end is restricted to
frame relay PVCs. The primary use is when the associated DTEs are
exchanging traffic using slower speed interfaces. Many times this
fragmentation scheme is used when the associated User-Network Interface
(UNI) does not support fragmentation.

Seven engineering best practices


When ATM and frame relay were originally released, they were both
called “Fast Packet” technology. This was based on X.25 (very slow),
which was the leading packet standard at the time. ATM was developed for
all services including real-time services, while frame relay was specifically
for data.
Frame relay can carry VoIP effectively, however, it can be a big challenge,
there is no defined QoS in basic frame relay services, only congestion
notification. There is a notion of “QoS aware” frame relay networks, but
many use proprietary technology or are based off of the new X.141 suite of
specifications. The service is referred to as Dynamic Packet Routing
System (DPRS) and uses a Dijkstra-based algorithm, similar to PNNI, to
provide end-to-end QoS. Nortel has worked with the Frame Relay Forum
to standardize this in FRF.1.2 (PVC User-to-Network Interface (UNI)
Implementation Agreement). The newest frame relay networks provide IP
aware DLCI(s) without having to use FrDTE for IP encapsulation.
Special engineering and care needs to be taken when planning to
implement VoIP over a frame relay network. When dealing with frame

Copyright © 2004 Nortel Networks Essentials of Real-Time Networking


276 Section IV: Packet Network Technologies

relay, concerns include, speed of the service, access rate, segmentation,


shaping, policing, pacing and architecture of PVCs.
In Enterprise Networks, frame relay is generally deployed as a point-to-
network technology, therefore, in most cases, the central site will be a
different (higher) speed than other locations. It is this speed difference that
creates some of the problems. In frame relay, there is a committed
information rate (CIR) that most people reference (noted in Figure 11-19).
However, CIR is just an average value. The important parameters of frame
relay to properly engineer or understand, are the time interval and burst
rate. Most carriers and switches use a time interval of one second, which
simplifies calculations.
There are at least seven major engineering best practices that should be
addressed in a frame relay network. The first issue is segmentation, which
deals with the problem of large data frames. This is a function of both the
data packet size and speed of circuit. If the data packet is sufficiently large
and circuit speed is slow enough, a single data frame that gets in front of a
voice packet can introduce enough jitter in the voice flow to disrupt voice
quality. Segmentation breaks larger frames into smaller frames for
transmission over facilities. A standards implementation of segmentation is
FRF.12 (Frame Relay Fragmentation Implementation Agreement).
The second and third have to do with the frame relay contract itself, and are
known as shaping and policing. Shaping is the function of the customer
premise equipment, limiting the amount of traffic passed on to the frame
relay network. Shaping is done at the egress point of the customer
premises, and assures that traffic is sent out to the specifications of the
contract. In other words, if the contract is for a CIR of 256K, then it assures