You are on page 1of 19

Achieving Redundancy in

Comcasts IMS Network

Carl Klatsky
Comcast NE&TO Product Engineering
6/26/2012

Background
In late 2010, Comcast deployed an IMS based network suppor<ng
residen<al voice service to Comcasts digital voice subscribers
Network architecture and protocols u<lize 3GPP IMS standards, following
Cable Labs PacketCable2.0 prole with modica<ons
Highlights include
SIP based voice endpoints
Highly redundant server architecture
Redundant blade implementa<on within each server
All necessary redundant physical and network connec<ons
Redundant SIP proxies connec<ng to external telephony providers &
PSTN
Two vendors supplying CSCFs, HSS, & TAS

Network Overview Core

Currently 3 core sites serving Comcasts na<onal voice footprint


Two primary sites, Site-1 & Site-2 congured to failover to Site-3
One secondary site, Site-3 congured to failover to Site-2
Site-1 & Site-2 have 2.5M capacity each
Site-3 supports a nominal amount of subscribers (N) for staging new core
element upgrades, but its main purpose is to serve as a failover site for
the primary sites
Total subscriber capacity expressed as 5M-N

Network Overview Access


enhanced Digital Voice Adapter (eDVA) embedded within cable modem ,
resides at customer premise
eDVA is SIP based
Cable Modem Termina<on System (CMTS) serves as termina<on point for
link layer DOCSIS protocol and entry point onto Comcasts backbone
network
Redundant links between CMTS & backbone

IMS Network Element Overview


Call Serving Control Func<on (CSCF) Collec<on of three SIP proxies
performing various func<on of subscriber access control, subscriber
loca<on func<ons, and SIP registra<on
Home Subscriber Sever (HSS) Central database of live subscriber data
including subscriber iden<ty & subscriber access creden<als
Telephony Applica<on Server provides tradi<onal telephony services
(e.g. call forwarding upon no answer); follows IMS applica<on server
model
Media Relay Func<on Control / Media Relay Func<on Playout (MRFC /
MRFP) responsible for announcement control & playout for IVRs
Media Gateway Control Func<on / Border Gateway Control Func<on
(MGCF / BGFC) Provides call signaling interface to external networks

IMS Core Site - Single Site Overview


Support Systems
OSS

CCF

Application Layer

Neptune

Network Peers and Applications


PSTN
network

Presence
Server

TAS

VM

CAS
Other SIP
Peer
Networks

TNS

Session Routing & User Data Layer


HSS
MGCF
MRFC/
MRFP

S_CSCF
I-CSCF
P-CSCF

Access & Edge


CMTS

eDVA

PCMM

BGCF/
ESCF

Existing Network Infrastructure


STP
CMS

SRP
ENUM

SBC

Network Overview

Terminating Call Redundancy


SIP Route Proxy (SRP) with mul<ple connec<ons to the three IMS core
sites
SRP performs call route lookup to determine which IMS core site will
terminate the inbound call
SRP can forward calls to alternate IMS core site if its connec<vity to the
des<na<on IMS core site is down, with I-CSCFs ac<ng in the standard IMS
role to interrogate the HSS to determine which IMS core site is serving the
subscriber

Terminating Call Redundancy (continued)


IMS core sites also have inter-element failover, if local instance signaling
element is unavailable
For example, if the S-SCSF determines that the local TAS is unavailable, it
has a redundant connec<on to a TAS in another IMS core site to con<nue
call termina<on

Originating Call Redundancy


eDVAs dynamically learn available IMS core sites at boot up using DNS
SRV mechanism (_sip._udp.<domain_name>)
DNS SRV response returns priori<zed lis<ng of P-CSCFs
eDVA establishes SIP registra<on with primary CSCFs, but has cached list
of backup CSCFs
eDVAs per Comcast specica<on, trigger failover on non-response to
certain SIP messages. Receipt of any message back, regardless of
provisional or nal response, cancels failover for that transac<on

10

Originating Call Redundancy (continued)


Prime failover drivers REGISTER & INVITE
eDVA congura<on set with Timer F = 8s, Timer B = 4s
During ini<al registra<on at boot up, or re-registra<on during steady
state, if no response at all is received before Timer F expires at 8s, eDVA
will aeempt SIP registra<on with next CSCFs learned in DNS SRV response
in priority order

11

Originating Call Redundancy (continued)


During call setup, if eDVA does not receive a response to INVITE before
Timer B expires at 4s, eDVA will cancel that call aeempt and return fast
busy to the subscriber while simultaneously establishing SIP registra<on
with one of the back CSCFs
User experience on fast busy is to hang-up and try the call again
SIP registra<on has already been established by the <me the user begins
to aeempt call again, and call completes as normal
Failover also originally supported on non-response to SUBSCRIBE
SUBSCRIBE used in Comcast network to subscribe the message-summary
event package and registra<on event package

12

Operational Challenge #1
In an early stage of the deployment with ~10K ac<ve subscribers, a ber
cut due to maintenance ac<vity inadvertently blocked network access to
one of the core sites
Almost all subscribers failed over, but around ~500 did not complete
failover
Bug was detected in failover rou<ne, where a failover aeempt and
reSUBSCRIBE aeempt happened to overlap, the CSeq number in REGISTER
would be incorrect, resul<ng in error leading to SIP registra<on being
terminated
Worked with eDVA vendor to resolve CSeq issue

13

Operational Challenge #2
Original failover design called for failover on non-response to as many SIP
messages as can be supported, with eDVA vendors provided failover
support for non-response to SUBSCRIBE
Due to problems with Presence AS that handled message-summary event
package subscrip<ons, the AS was unavailable at various <mes
IMS core network element Timer F = 32s by default but eDVA Timer F = 4s
eDVA transmieed SUBSCRIBE and IMS core network forwarded
SUBSCRIBE onto Presence AS

14

Operational Challenge #2 (continued)


The eDVA Timer F would expires prior to IMS Timer F expiring, triggering
eDVA failover due to non-response to SUBSCRIBE.
Had IMS core network Timer F red before eDVAs Timer F with a 408
Timeout sent to the eDVA, which would have canceled the failover trigger
eDVA failed over to secondary IMS site, but Presence AS s<ll not available
through secondary IMS site, resul<ng in failover apping
Worked with eDVA vendors to de-couple non-response to SUBSCRIBE
from triggering failover

15

Operational Challenge #3
During 24-hour large scale produc<on failover test, an issue was uncovered with
regard to message-summary event package subscrip<on behavior
Normal opera<ng behavior results in eDVA sending a SUBCRIBE through the IMS
core to the Presence Server for the message-summary event package.
Subscrip<on is established with a 12 hour expira<on
Upon geo-redundant failover detec<on, device successfully establishes SIP
registra<on through secondary CSCFs
Following successful registra<on failover, ini<al subscrip<on to message-summary
event package is successfully established. This len a stale subscrip<on ac<ve on
the Presence Server for each device that failed over

16

Operational Challenge #3 (continued)


While one secondary CSCFs, each device should aeempt to obtain registra<on
back at its primary CSCF, and if unavailable, re-register through its secondary CSCF
One eDVA type also aeempted an ini<al subscrip<on through secondary CSCF
upon each re-registra<on. This resulted in a storm & build up of stale
subscrip<ons, overloading the Presence Server
Working with eDVA vendor to unSUBSCRIBE following the failover to the
secondary CSCFs

17

Conclusions
With appropriate design and sonware support, large scale redundancy
can be achieved resul<ng in highly available service to the end subscriber
Careful considera<on should be taken on over doing redundancy. Only
truly essen<al services should receive failover support
Comcast is also reviewing addi<onal global trac management & trac
engineering within its IMS network, as means to further distribute trac
across the sites

18

My Contact Info
Carl Klatsky
Product Engineering
Comcast Cable
One Comcast Center
1701 John F. Kennedy Blvd.
Mailstop: 39.210C
Philadelphia, PA 19103
215 286 8256
carl_klatsky@cable.comcast.com

19