You are on page 1of 19

Achieving Redundancy in

Comcasts IMS Network

Carl Klatsky
Comcast NE&TO Product Engineering

In late 2010, Comcast deployed an IMS based network suppor<ng
residen<al voice service to Comcasts digital voice subscribers
Network architecture and protocols u<lize 3GPP IMS standards, following
Cable Labs PacketCable2.0 prole with modica<ons
Highlights include
SIP based voice endpoints
Highly redundant server architecture
Redundant blade implementa<on within each server
All necessary redundant physical and network connec<ons
Redundant SIP proxies connec<ng to external telephony providers &
Two vendors supplying CSCFs, HSS, & TAS

Network Overview Core

Currently 3 core sites serving Comcasts na<onal voice footprint

Two primary sites, Site-1 & Site-2 congured to failover to Site-3
One secondary site, Site-3 congured to failover to Site-2
Site-1 & Site-2 have 2.5M capacity each
Site-3 supports a nominal amount of subscribers (N) for staging new core
element upgrades, but its main purpose is to serve as a failover site for
the primary sites
Total subscriber capacity expressed as 5M-N

Network Overview Access

enhanced Digital Voice Adapter (eDVA) embedded within cable modem ,
resides at customer premise
eDVA is SIP based
Cable Modem Termina<on System (CMTS) serves as termina<on point for
link layer DOCSIS protocol and entry point onto Comcasts backbone
Redundant links between CMTS & backbone

IMS Network Element Overview

Call Serving Control Func<on (CSCF) Collec<on of three SIP proxies
performing various func<on of subscriber access control, subscriber
loca<on func<ons, and SIP registra<on
Home Subscriber Sever (HSS) Central database of live subscriber data
including subscriber iden<ty & subscriber access creden<als
Telephony Applica<on Server provides tradi<onal telephony services
(e.g. call forwarding upon no answer); follows IMS applica<on server
Media Relay Func<on Control / Media Relay Func<on Playout (MRFC /
MRFP) responsible for announcement control & playout for IVRs
Media Gateway Control Func<on / Border Gateway Control Func<on
(MGCF / BGFC) Provides call signaling interface to external networks

IMS Core Site - Single Site Overview

Support Systems


Application Layer


Network Peers and Applications





Other SIP


Session Routing & User Data Layer



Access & Edge





Existing Network Infrastructure




Network Overview

Terminating Call Redundancy

SIP Route Proxy (SRP) with mul<ple connec<ons to the three IMS core
SRP performs call route lookup to determine which IMS core site will
terminate the inbound call
SRP can forward calls to alternate IMS core site if its connec<vity to the
des<na<on IMS core site is down, with I-CSCFs ac<ng in the standard IMS
role to interrogate the HSS to determine which IMS core site is serving the

Terminating Call Redundancy (continued)

IMS core sites also have inter-element failover, if local instance signaling
element is unavailable
For example, if the S-SCSF determines that the local TAS is unavailable, it
has a redundant connec<on to a TAS in another IMS core site to con<nue
call termina<on

Originating Call Redundancy

eDVAs dynamically learn available IMS core sites at boot up using DNS
SRV mechanism (_sip._udp.<domain_name>)
DNS SRV response returns priori<zed lis<ng of P-CSCFs
eDVA establishes SIP registra<on with primary CSCFs, but has cached list
of backup CSCFs
eDVAs per Comcast specica<on, trigger failover on non-response to
certain SIP messages. Receipt of any message back, regardless of
provisional or nal response, cancels failover for that transac<on


Originating Call Redundancy (continued)

Prime failover drivers REGISTER & INVITE
eDVA congura<on set with Timer F = 8s, Timer B = 4s
During ini<al registra<on at boot up, or re-registra<on during steady
state, if no response at all is received before Timer F expires at 8s, eDVA
will aeempt SIP registra<on with next CSCFs learned in DNS SRV response
in priority order


Originating Call Redundancy (continued)

During call setup, if eDVA does not receive a response to INVITE before
Timer B expires at 4s, eDVA will cancel that call aeempt and return fast
busy to the subscriber while simultaneously establishing SIP registra<on
with one of the back CSCFs
User experience on fast busy is to hang-up and try the call again
SIP registra<on has already been established by the <me the user begins
to aeempt call again, and call completes as normal
Failover also originally supported on non-response to SUBSCRIBE
SUBSCRIBE used in Comcast network to subscribe the message-summary
event package and registra<on event package


Operational Challenge #1
In an early stage of the deployment with ~10K ac<ve subscribers, a ber
cut due to maintenance ac<vity inadvertently blocked network access to
one of the core sites
Almost all subscribers failed over, but around ~500 did not complete
Bug was detected in failover rou<ne, where a failover aeempt and
reSUBSCRIBE aeempt happened to overlap, the CSeq number in REGISTER
would be incorrect, resul<ng in error leading to SIP registra<on being
Worked with eDVA vendor to resolve CSeq issue


Operational Challenge #2
Original failover design called for failover on non-response to as many SIP
messages as can be supported, with eDVA vendors provided failover
support for non-response to SUBSCRIBE
Due to problems with Presence AS that handled message-summary event
package subscrip<ons, the AS was unavailable at various <mes
IMS core network element Timer F = 32s by default but eDVA Timer F = 4s
eDVA transmieed SUBSCRIBE and IMS core network forwarded
SUBSCRIBE onto Presence AS


Operational Challenge #2 (continued)

The eDVA Timer F would expires prior to IMS Timer F expiring, triggering
eDVA failover due to non-response to SUBSCRIBE.
Had IMS core network Timer F red before eDVAs Timer F with a 408
Timeout sent to the eDVA, which would have canceled the failover trigger
eDVA failed over to secondary IMS site, but Presence AS s<ll not available
through secondary IMS site, resul<ng in failover apping
Worked with eDVA vendors to de-couple non-response to SUBSCRIBE
from triggering failover


Operational Challenge #3
During 24-hour large scale produc<on failover test, an issue was uncovered with
regard to message-summary event package subscrip<on behavior
Normal opera<ng behavior results in eDVA sending a SUBCRIBE through the IMS
core to the Presence Server for the message-summary event package.
Subscrip<on is established with a 12 hour expira<on
Upon geo-redundant failover detec<on, device successfully establishes SIP
registra<on through secondary CSCFs
Following successful registra<on failover, ini<al subscrip<on to message-summary
event package is successfully established. This len a stale subscrip<on ac<ve on
the Presence Server for each device that failed over


Operational Challenge #3 (continued)

While one secondary CSCFs, each device should aeempt to obtain registra<on
back at its primary CSCF, and if unavailable, re-register through its secondary CSCF
One eDVA type also aeempted an ini<al subscrip<on through secondary CSCF
upon each re-registra<on. This resulted in a storm & build up of stale
subscrip<ons, overloading the Presence Server
Working with eDVA vendor to unSUBSCRIBE following the failover to the
secondary CSCFs


With appropriate design and sonware support, large scale redundancy
can be achieved resul<ng in highly available service to the end subscriber
Careful considera<on should be taken on over doing redundancy. Only
truly essen<al services should receive failover support
Comcast is also reviewing addi<onal global trac management & trac
engineering within its IMS network, as means to further distribute trac
across the sites


My Contact Info
Carl Klatsky
Product Engineering
Comcast Cable
One Comcast Center
1701 John F. Kennedy Blvd.
Mailstop: 39.210C
Philadelphia, PA 19103
215 286 8256