Professional Documents
Culture Documents
. . . . . Authors
. . . . . . . . and
. . . . Contributions
....................................................... 3
. . . . . Authors
................................................................... 5
. . . . . Dedications
................................................................... 7
. . . . . Acknowledgments
................................................................... 9
. . . . . Book
. . . . . Writing
. . . . . . . .Methodology
...................................................... 11
. . . . . Who
. . . . . Should
. . . . . . .Read
. . . . . This
. . . . Book?
.............................................. 13
. . . . . General
. . . . . . . . Troubleshooting
................................................................ 15
. . . . . Basic
. . . . . .Information
. . . . . . . . . . . for
. . . Troubleshooting
............................................... 17
. . . . . Hardware
. . . . . . . . . .Architecture
.............................................................. 21
. . . . . ASR
. . . . .5000
.............................................................. 23
. . . . . ASR
. . . . .5500
.............................................................. 31
. . . . . Software
. . . . . . . . . Architecture
............................................................... 35
. . . . . ASR
. . . . .5000/ASR
. . . . . . . . . .5500
. . . . .Boot
. . . . .Sequence
.......................................... 37
. . . . . Context
................................................................... 45
. . . . . Software
. . . . . . . . . Managers
. . . . . . . . . .of. .ASR
. . . .5000/ASR
. . . . . . . . . . 5500
................................ 49
. . . . . Software/Hardware
. . . . . . . . . . . . . . . . . . . .Redundancy
............................................... 51
. . . . . System
. . . . . . . .Verification
................................................................ 57
. . . . . System
. . . . . . . .Verification
. . . . . . . . . . .and
. . . .Troubleshooting
. . . . . . . . . . . . . . . Overview
............................. 59
. . . . . Initial
. . . . . . Data
. . . . .Collection
........................................................ 61
. . . . . Hardware
................................................................... 73
. . . . . Local
. . . . . .Ethernet
. . . . . . . . and
. . . . IP
. . .Network
.............................................. 91
. . . . . Call
. . . . Processing
............................................................... 105
. . . . . Chassis
. . . . . . . .Fabric
........................................................... 115
. . . . . Collecting
. . . . . . . . . . a. .Show
. . . . . Support
. . . . . . . . Details
.......................................... 123
. . . . . Platform
. . . . . . . . .Troubleshooting
............................................................... 125
. . . . . CPU/Memory
. . . . . . . . . . . . . .issues
..................................................... 127
. . . . . Licensing
................................................................... 139
. . . . . HD
. . . .RAID
............................................................... 145
. . . . . Software
. . . . . . . . . Crashes
.......................................................... 155
. . . . . ARP
. . . . .Troubleshooting
.............................................................. 165
. . . . . Card
. . . . . and
. . . . Port
. . . . .Troubleshooting
..................................................... 167
. . . . . LAG
. . . . .Troubleshooting
.............................................................. 177
. . . . . Switch
. . . . . . . Fabric
. . . . . .&
. . NPU
. . . . .Troubleshooting
............................................... 189
. . . . . Session
. . . . . . . .Imbalances
........................................................... 195
. . . . . CDMA
........................................................................ 205
. . . . . CDMA
. . . . . . .Overview
............................................................ 207
. . . . . PDSN
. . . . . ./
. . FA
........................................................... 215
. . . . . HA
................................................................... 241
. . . . . UMTS
........................................................................ 251
. . . . . UMTS
. . . . . . .Overview
............................................................ 253
. . . . . Troubleshooting
. . . . . . . . . . . . . . . . SGSN
................................................... 263
. . . . . Troubleshooting
. . . . . . . . . . . . . . . . GGSN
................................................... 301
. . . . . LTE
........................................................................ 327
. . . . . LTE
................................................................... 329
. . . . . Troubleshooting
. . . . . . . . . . . . . . . . MME
................................................... 341
. . . . . Troubleshooting
. . . . . . . . . . . . . . . . SGW
................................................... 353
. . . . . Troubleshooting
. . . . . . . . . . . . . . . . PGW
................................................... 361
. . . . . Troubleshooting
. . . . . . . . . . . . . . . . GTP
. . . . .Path
. . . . Failure
.......................................... 369
. . . . . Handover
. . . . . . . . . .Troubleshooting
......................................................... 375
. . . . . Session
. . . . . . . .Troubleshooting
................................................................ 381
. . . . . Overview
................................................................... 383
. . . . . Protocol
. . . . . . . . .Analyzer
. . . . . . . . Utility
.................................................. 391
. . . . . Generic
. . . . . . . . session
. . . . . . . debugging
. . . . . . . . . . commands
.......................................... 393
. . . . . Session
. . . . . . . .logging
........................................................... 423
. . . . . Routing
. . . . . . . . Protocol
................................................................ 427
. . . . . Static
. . . . . . Routing
............................................................. 429
. . . . . OSPF
. . . . . .Routing
............................................................. 431
. . . . . BGP
. . . . .Routing
.............................................................. 441
. . . . . Interchassis
. . . . . . . . . . . .Session
. . . . . . . Recovery
..................................................... 451
. . . . . Inter
. . . . . Chassis
. . . . . . . .Session
. . . . . . . Recovery
............................................... 453
. . . . . ICSR
. . . . . Troubleshooting
.............................................................. 455
. . . . . Radius
........................................................................ 463
. . . . . Overview
................................................................... 465
. . . . . Troubleshooting
. . . . . . . . . . . . . . . . Radius
. . . . . . .Issues
............................................ 479
. . . . . Diameter
........................................................................ 505
. . . . . Diabase
................................................................... 507
. . . . . Policy
. . . . . . Control
. . . . . . . .-. Gx
.................................................... 531
. . . . . Online
. . . . . . .Charging
. . . . . . . . .-. Gy
.................................................. 551
. . . . . Authentication
. . . . . . . . . . . . . . .-.S6a
................................................... 569
. . . . . DNS
........................................................................ 579
. . . . . DNS
. . . . .Client
.............................................................. 581
. . . . . Proxy
. . . . . . DNS
............................................................. 591
. . . . . Active
. . . . . . .Charging
. . . . . . . . Service
......................................................... 595
. . . . . Active
. . . . . . .Charging
. . . . . . . . Service
.................................................... 597
. . . . . ACS
. . . . .ruledef
. . . . . . .Optimization
....................................................... 605
. . . . . FW
. . . ./. .NAT
............................................................. 609
. . . . . X-Header
. . . . . . . . . .Insertion
. . . . . . . . .and
. . . .Encryption
. . . . . . . . . . Example
.................................. 617
. . . . . Troubleshooting
. . . . . . . . . . . . . . . . ACS
................................................... 621
. . . . . GTPP
. . . . . . (CDR's)
.................................................................. 629
. . . . . GTPP/CDR
. . . . . . . . . . . Overview
........................................................ 631
. . . . . GTPP
. . . . . . troubleshooting
............................................................. 639
. . . . . Bulkstat
. . . . . . . . and
. . . . KPI
............................................................ 649
. . . . . Bulkstats
................................................................... 651
. . . . . Troubleshooting
. . . . . . . . . . . . . . . . Bulkstats
................................................... 655
. . . . . KPIs
. . . . .Overview
.............................................................. 661
Preface
Preface
Authors
This com
mand shows the con
trib
u
tors of this book:
Dedications
This com
mand shows the au
thors grat
i
tude for sup
port from their loved ones:
A big thank you to my wife Chika, my sons Yusei and Soshi. Thanks as well to the team for
giving me this great opportunity - Tomo
I would like to dedicate this effort to my beautiful wife Karen and daughters Paityn and
Katie. Thank you for your patience, support, and love - Mike
Thanks to my father, mother, wife, daughters and brothers that have supported me during all
these years. - Gui
To my wife, son, parents, brother & his family for their continued love, patience, support &
encouragement throughout the years. Thanks to Cisco for this opportunity – Manoj
Dedicated to my parents, my wife Soumya, beautiful sons Allen and Alden. Thanks as well to
Cisco for the opportunity - Solomon
Zahvalio bih se mojoj lijepoj supruzi Tihani, na ljubavi i stalnoj podršci. Zahvalio bih
svojoj obitelji I prijateljima na bezuvjetnoj podršci na svim mojim životnim putevima - Nebojša
Big thanks to my family and colleagues in the Cisco TAC in Brussels for supporting this
intense 1-week onsite book-writing session here in Boxborough – Steven
Dedicated to my Parents Gurdarshan Kaur and Gurdip Singh. But Big thanks to my Wife Mandeep,
daughter and son Gurneet and Lakshveer who supported me a lot. Also thanks to my extended Cisco
Family for the great opportunity - Balihar
Preface
For the original Grand Theft Auto which, many years ago, inspired my interest in networking
and prompted me to set up a LAN that allowed me to play against my friends when the University
network was too slow. I would also like to say ‘thank you’ to all the people in my life, both
Dedicated to my family who put up with my not being around for a whole week while creating
For my wife Lee and daughter Lauren and all my extended family, thanks for your support
throughout the years. Thanks as well to Cisco for the opportunity, it continues to be a fun
ride. – Jeff
To my wife, Suba who is always my constant support and a firm believer in my work, without
which all of this would have been tough to imagine. Certainly to my daughters Vennila and
Meghana, for their own special way of making my every day light and easy, hence make me believe
I can do anything after all! And of course to my parents who would always be proud of me no
“For my Mother and Grandmother for their love and support throughout my life.” - Dennis
“To my wife, Arshiya for her love, support and encouragement, my son Arfan for the sweet
Special thanks to my family and friends for their love and support. Big thanks to my
Acknowledgments
Spe
cial thanks to Cisco’s Shane Her man, Cisco Di rec
tor of Tech
ni
cal Sup
port and Engi
neer
ing
teams who sup ported the real
iza
tion of this book. We would like to thank you for your con
tin
u
-
ous in
nova
tion and the value you provide to the indus
try.
Spe
cial recog
ni
tion to Cisco’s Tech
nical Services leader
ship teams for be
liev
ing in this ini
tia
tive
and the support pro
vided since the in
cep tion of the idea.
In par
tic
u
lar we want to express grat
i
tude to the fol
low
ing in
di
vid
u
als for their in
flu
ence and
support both prior and dur
ing the book sprint.
Anand Ram
Carl Harding
Devrim Kucuk
Joe Jette
Fred Carpen ito
Joe Fal
cone
Kat
suji Deguchi
Ken Krzyzewski
Man soor Mo hamed Omer
Marty Mar tinez
Rick Harris
Richard Moore
Scott Page
Shane Kirby
Ketan Kulka rni
Peo
ple that should read this book are cus
tomers, part
ners and Cisco em ploy
ees who need to
op
er
ate and/or troubleshoot the Cisco ASR 5000 and ASR 5500 plat
forms.
It is ex
pected that the reader has prior knowl
edge, train
ing and ex
pe
ri
ence on these prod
ucts
and es
pecially, the related tech
nolo
gies. The book fo
cuses on trou bleshoot
ing and also cov
ers
the
o
ret
i
cal con cepts for a bet
ter un
der
stand
ing of spe
cific fea
tures.
For ad
di
tional as
sis
tance re
lated to con fig
u
ration and features, please check the Cisco ASR
5000 and ASR 5500 Con fig
u
ra
tion Guides. For as sis
tance with pro
to
col spe
cific is
sues, con
sult
the 3GPP or appro
pri
ate body for re
lated speci
fi
cations.
General Troubleshooting
General Troubleshooting
Overview
Many com plex prob
lems are con se
quences of changes in the net
work. It could be a con
fig
u
ra
-
tion change, topol
ogy change, traffic pat
tern change, sub
scriber be
havior changes, sig
nal
ing
storms or flap
ping of dif
fer
ent components. Any prob
lem can re
sult in some KPI degra da
tion
and show an anom
aly in sin
gle or var
i
ous com
po
nents or coun
ters.
Prior to troubleshooting, it is most im portant to get the broad est possi
ble un derstand ing of the
problem and to try to eval uate all possi
ble changes in the net work, plat form and de sign. The
chal
lenge here is that many pro duction systems are main tained by mul ti
ple teams, who may not
be aware of all the changes made by other teams. Com plex systems also re quire par tic
u
lar lev
-
els of mon i
tor
ing, and an in di
vid
ual trou bleshooter may not have ac cess to all mon i
tor
ing
tools. Even the most com plex prob lems can be re solved by sim ple ques tions, if asked at the
right time. Mas ter
ing the tech nol
ogy is a mat ter of building up experi
ence and a reper toire of
ques
tions which bring dif
fer
ent an
gles and di
men
sions to bear on the prob
lems.
Initial questions:
• If the problem is no longer seen, how long did the problem persist? What was done
that caused the problem to be resolved?
• What activity was going on in the network during the time the problem was seen? This
activity is not necessarily related to the ASR 5x00, but could be related to other
activities like routing changes, switch replacement, etc.
• Was there a change in network management prior to the incident?
• Has this problem occurred before? If so, what was the history of that occurrence?
• What else was observed?
• How many subscribers are affected? Is there any specific type of subscriber affected?
• How many subscribers are not affected? Is there anything specific shared by those who
are not affected?
• What specific KPI degradation was observed? Which values are expected and what are
some previous trends. Is there any other KPI which follows the same trends?
• Confirm if the problem is related to:
Single, some or all services
Specific APN, group of APNs or all APNs
Specific card/port, group or all cards/ports
Specific sessmgr/aaamgr or all sessmgr/aaamgrs
Single or multiple services
Single or multiple vlans/interfaces
Single, multiple or all NPUs. If NPU is not the problem, check the switch fabric
Type of calls, and if so, what is the duration of those calls or any other factor
specific to those calls?
A specific interface - how is the call flow affected and in which phase is it affected?
A specific peer group or all peers - is there anything specific about the peers
affected?
• Anything specific to the affected components or objects?
• Identify any events which occurred around same times when a previous example of this
issue happened. What is common to these events?
• When thinking about symptoms or conditions always confirm: Where was the problem
experienced? Where was it not experienced? What was not experienced? Where?
Where not? When? When not?
General Troubleshooting
• What percentage of devices or users are affected? What observations can be made
about these devices or users?
• Confirm whether this issue is expected behavior and if not, in what ways it is
unexpected. What differences are there between the affected and unaffected? What is
the key difference from non-expected behavior?
• What are the examples of working and non-working case scenarios?
• What was the duration of the issue? Is this a recurring or one-time event?
• History of the problem:
Issue trigger?
Configuration/Recent/Network changes?
Change management?
Flapping?
Timeline of events?
Logs?
Patterns?
Differences and delta?
Pre-conditions?
Micro and macro perspectives?
Confirmed and excluded conditions or triggers?
Trends?
• Relevant configuration details:
Working config
Non-working config
Problematic?
Specific details?
• Relevant counters, thresholds, limits, return codes, frequency, timeouts, delta
max/avg/min/deviation?
• Any specific ranges, oscillations, deviations or differences? Trends/Graphs?
General Troubleshooting
• Hardware
• Software version
• Configuration changes
• Are these Cisco devices or third party devices?
• Any other unknowns: related info, logs, traces, events or history?
• Diagrams or graphs
• Design documents
• Previous stability duration
• Previous versions
• Is this a new setup?
• Is this a production, lab or FOA (First Office Application) environment?
• Are other teams involved?
Reproduction:
Workarounds:
• Is it possible?
• What are the disadvantages of implementing the workaround? Any subscriber impact?
• Is it confirmed, tested and validated in the lab?
Hardware Architecture
Hardware Architecture
ASR 5000
Overview
The ASR 5000 is a ver sa
tile plat
form capa
ble of provid
ing mul
ti
ple ser
vices in the Mobil
ity
space, includ
ing SGSN/GGSN/MME/SGW/PGW/HA/FA/PDSN/HSGW amongst oth ers.
Mul
ti
ple ser
vices can be com bined in a sin
gle plat
form, for ex
ample; MME/SGSN ser vices in
one sin
gle physi
cal plat
form. The ASR 5000 uses StarOS as its op er
at
ing sys
tem which is a cus
-
tomized version of Linux that pro
vides a ro
bust and flex
i
ble en
vi
ron
ment.
In ad
di
tion to 2 x 're
dun
dant' and 2 x 'active' PSC cards, the mini
mum con fig
u
ra
tion of an ASR
5000 chassis also re
quires the fol
low
ing items to be provi
sioned; 2 x Sys
tem Management Cards
(SMC), 2 x Redundancy Crossbar Cards (RCC), 2 x Switch Proces sor I/Os (SPIO) and two copies
of whichever Sys tem Soft
ware has been se lected (one per SMC).
SPIO - Switch Process IO Card: It is the card that pro vides connectiv
ity for local and remote
man age
ment, CO alarm ing, and Build
ing In
te
grated Tim ing Supply (BITS) tim ing input. SPIOs
are in
stalled in chas
sis slots 24 and 25, be
hind SMCs. Dur ing nor
mal op er
a
tion, the SPIO in slot
24 works with the ac tive SMC in slot 8. The SPIO in slot 25 serves as a re
dun dant com po
nent. In
the event that the SMC in slot 8 fails, the redundant SMC in slot 9 be comes ac tive and works
with the SPIO in slot 24. If the SPIO in slot 24 should fail, the redundant SPIO in slot 25 takes
over.
Special
ized hardware engines sup
port paral
lel dis
trib
uted processing for com
pres
sion, clas
si
fi
-
ca
tion, traf
fic sched
ul
ing, for
ward
ing, packet fil
ter
ing, and sta
tis
tics.
The packet-process
ing cards use con
trol processors to per
form packet-process
ing op
era
tions,
and a ded
i
cated high-speed net
work processing unit (NPU). The NPU does the fol
low
ing:
• Manages both external line card ports and the internal connections to the data and
control fabrics
To take advan
tage of the distributed pro cess
ing ca pa
bil
i
ties of the sys tem, packet-pro cess
ing
cards can be added to the chas sis with
out their sup porting line cards, if de sired. This re
sults in
in
creased packet-han dling and transaction-pro cessing capabil
i
ties. Another advantage is a de-
crease in CPU uti
liza
tion when the sys tem per forms proces sor-in tensive tasks such as en cryp-
tion or data compression. Packet-pro cessing cards can be in stalled in chas sis slots 1 through 7
and 10 through 16. Each card can be ei ther Ac tive (avail
able to the sys tem for ses sion process-
ing) or re
dun
dant (a standby com ponent available in the event of a fail ure).
Packet Ser
vices Cards come in 2 ver
sions: PSC2 and PSC3.
The PSC2 uses a fast net work proces sor unit, fea
turing two quad-core x86 CPUs and 32 GB of
RAM. These proces sors run a sin gle copy of the op erat
ing system. The oper
at
ing sys
tem run-
ning on the PSC2 treats the two dual-core proces sors as a 4-way multi-proces sor. The PSC2
has a ded
i
cated security processor that provides the high est perfor
mance for crypto
graphic ac
-
cel
er
a
tion of next-gen era
tion IP Security (IPSec), Se cure Sock ets Layer (SSL) and wireless
LAN/WAN se curity appli
cations with the lat
est se
curity al
gorithms.
PSC2s should not be mixed with PSC3s. Due to the dif fer
ent processor speeds and mem ory
con
figu
ra
tions, the PSC2 can
not be com bined in a chas sis with other packet-process
ing card
types. The PSC2 can dynami
cally ad
just the line card connection mode to sup
port switching be
-
tween XGLCs and non-XGLCs with min i
mal service in
ter
ruption.
Eth
er
net Line Cards
Fast Eth
er
net Line Card (FLC2)
Gi
g a
bit Eth
er
net Line Card (GLC2)
Quad Gi
g a
bit Eth
er
net Line Card (QGLC)
10 Gi
g a
bit Eth
er
net Line Card (XGLC)
The sup ported re dundancy schemes for XGLC are L3, Equal Cost Multi Path (ECMP) and 1:1
side-by-side re dundancy. Side-by-side re dundancy al
lows two XGLC cards in stalled in neigh
-
boring slots to act as a re
dun
dant pair. Side-by-side pair slots are 17-18, 19-20, 21-22, 23-26, 27-
28, 29-30, and 31-32.
Side-by-side re
dun
dancy only works with XGLC cards. When con fig
ured for non-XGLC cards,
the cards are brought of
fline. If the XGLCs are not con
fig
ured for side-by-side re
dun
dancy,
they run in
de
pen
dently without redun
dancy.
Hardware Architecture
RCC - Re dun dant Cross bar Card: The RCC uses 5 Gbps se r
ial links to en
sure con nectiv
ity be
-
tween rear-mounted line cards and every non-SMC front loaded ap pli
ca
tion card slot in the
sys
tem. This cre ates a high avail
abil
ity ar
chi
tecture that mini
mizes data loss and en sures ses-
sion in
tegrity. If a packet-pro cess
ing card were to ex pe
ri
ence a fail
ure, IP traf
fic would be redi-
rected to and from the LC to the re dun dant packet-processing card in an other slot. Each RCC
connects up to 14 line cards and 14 packet-pro cess
ing cards for a total of 28 bidirec
tional links
or 56 se
r
ial 2.5 Gbps bidi rec
tional se
r
ial paths.
Traf
fic Path
ASR 5500
In ASR 5500 there are 4 different types of cards, each providing different functionality as
explained below:
Man age
ment I/O, Uni
ver
sal Manage
ment I/O Cards (MIO/UMIO): There are 2 Man agement
I/O cards (MIO) or Uni
ver
sal Man
agement I/O cards (UMIO) in slots 5 & 6. The UMIO card is
the same hard
ware as MIO, but needs an ad
di
tional li
cense.
Uni
ver
sal Data Process
ing Card, Data Processing Card (UDPC / DPC): Data Pro cess
ing Cards
(DPC) and/or Univer
sal Data Pro
cess
ing Cards (UDPC) can be placed in slots 1 to 4 and 7 to 10. A
UDPC card is the same hard ware as DPC, but needs an ad di
tional li
cense. DPC/UPDC cards
manage sub
scriber ses
sions and con
trol traf
fic.
• 96 GB of RAM
• NPU for session data flow offload
• Crypto offload engines located on a daughter card
Fabric and Storage Cards (FSC): There may be up to 6 FSC cards in
stalled in slots 13 to 18 which
are in front of the chas
sis.
• Two 2.5" serial attached SCSI (SAS), 200GB solid state drives (SSDs) with a 6 Gbps SAS
connection to each MIO/UMIO.
The ASR 5500 uses an array of solid state dri ves (SSDs) for short-term persis
tent stor
age. The
RAID 05 con fig
u
ra
tion has each pair of dri
ves on an FSC striped into a RAID 0 array; all the ar
-
rays are then grouped into a RAID 5 array. Each FSC pro vides the stor
age for one quarter of the
RAID 5 array. Data is striped across all four FSCs with each FSC pro vid
ing parity data for the
other three FSCs. The array is managed by the mas ter MIO/UMIO.
Sys
tem Sta
tus Cards (SSC): There are two SSC which are in ded
i
cated slots 11 and 12.
Overview
With dif
fer
ent cards per
forming dif
fer
ent func
tion
al
i
ties in ASR 5000/ASR 5500 chas
sis, it's im
-
por
tant to un
derstand the boot process of the chassis.
1 When power is first applied to the chassis or a reload is performed, the SMC cards in
slot 8 and 9 receive power. Once the system confirms cards are located in slots 8 and 9,
power is applied to the SPIO cards in slots 24 and 25.
2 POST (Power On Self Tests) are performed on each card to ensure the hardware is
operational.
3 If SMC 8 passes all POST tests, it will become the Active SMC card and the SMC card in
slot 9 will become Standby. If there is an issue with SMC 8, then SMC 9 will become the
Active SMC card.
4 The Active SMC will then begin loading the StarOS image designated in the boot stack
file, boot.sys, located on the SMC CompactFlash. The Standby SMC will load its image
from the Active SMC unless the Active SMC fails, and if so, the Standby SMC will then
load its image from its own boot.sys file and will then become the Active SMC.
5 After the Active SMC finishes loading, it will determine which slots are populated with
Application cards by providing power to the slots. If a card is present, power is left on
for that slot. Slots without a card are not powered on.
6 PSCs and Line Cards in the system will perform POST tests as they are powered up.
7 The PSC card will enter into Standby mode after successfully completing POST tests.
8 Line cards will remain in standby mode until the associated PSC cards are made Active.
If using half-height line cards, the Line Card in the upper slot will be made Active and
the Line Card in lower slot will be made standby.
9 The PSC will download its code from the SMC after entering Standby mode.
10 After successfully loading the software image, the system will load the configuration file
specified in the boot stack file. If no software configuration file is found, the system will
enter the Quick Setup Wizard.
Software Architecture
1 When power is applied to the chassis or after a reload of the chassis, the MIO/UMIOs in
slots 5 and 6 will receive power.
2 Power on Self Tests (POST) will be run on the MIO/UMIOs to validate the hardware is
operational.
Software Architecture
3 The MIO in slot 5 will become the Active MIO if all POSTs are successfully executed.
The MIO in slot 6 will be declared the Standby card. The MIO in slot 6 will become the
Active MIO if a problem is detected with the MIO in slot 5.
4 The Active MIO will then begin loading the StarOS image designated in the boot stack
file, boot.sys, located in the flash memory of the MIO. The Standby MIO will load its
image from the Active MIO unless the Active MIO fails, and the Standby MIO will then
load its image from its own boot.sys file and will then become the Active MIO.
5 After the Active MIO finishes loading, it will determine which slots are populated with
Application cards by providing power to the slots. If a card is present, power is left on
for that slot. Slots without a card are not left powered on.
6 DPC Cards in the system will perform POST tests as they are powered up.
7 The DPC card will enter into Standby mode after successfully completing POST tests.
8 The DPC will download its code from the Active MIO after entering Standby mode.
9 After successfully loading the software image, the system will load the configuration file
specified in the boot stack file. If no software configuration file is found, the system will
enter the Quick Setup Wizard.
Boot Sys
tem Pri
or
i
ties
The boot.
sys file is pop
u
lated via the the fol
low
ing con
fig
u
ra
tion com
mands:
configure
boot system priority <number> image <StarOS Image Name> config <Configuration file name>
end
configure
no boot system priority <number>
end
In order to ver
ify which boot sys
tem pri
or
ity was used to load the sys
tem, the fol
low
ing com
-
mand can be issued:
In order to ver
ify soft
ware is valid, the fol
low
ing com
mand can be is
sued:
De
fault boot.
sys file
De
fault boot.
sys for ASR 5500:
/flash/sys tem. bin
/flash/sys tem. cfg
/usb1/sys tem. bin
/usb1/sys tem. cfg
/flash/as r5500. bin
/flash/as r5500. cfg
/usb1/as r5500. bin
/usb1/as r5500. cfg
De
fault boot.
sys for ASR 5000:
/flash/sys tem.bin
/flash/sys tem.cfg
/pcm cia1/sys tem.
bin
/pcm cia1/sys tem.
cfg
/flash/st40. bin
/flash/sys tem.cfg
/pcm cia1/st40. bin
/pcm cia1/sys tem.
cfg
Software Architecture
Context
Overview
A con
text is a log
i
cal group ing or mapping of config
u
ration pa
ra
me ters that per
tains to var
i
ous
phys
i
cal ports, logi
cal IP in
ter
faces, and ser
vices. Each con text can be thought of as a vir tual
router in
stance. The sys tem sup ports the config
ura
tion of multi
ple contexts. Each con text is
config
ured and op
er
ates in
de
pen
dently of the oth
ers. Ser
vices can be con
fig
ured to allow data
to pass be
tween contexts.
Dur
ing troubleshoot ing, make sure to have the CLI in the cor
rect con
text. The com
mand "show
con
text" will list what is pre
sent in the con
fig
u
ra
tion. The com
mand "context [name]" will allow
movement be tween con texts.
The fol
low
ing com
mand will dis
play all the con
texts in use on the sys
tem:
What is rel
e
vant in this out
put is the Context ID. Each con
text is bound to and man
aged by a
vp
n
mgr process with an in stance ID that matches the Context ID:
Overview
This chap
ter talks about dif
fer
ent processes in ASR 5000/ASR 5500 and their func
tion
al
i
ties.
Software Processes
Tasks within the ASR 5000/ASR 5500 chas sis have a par
ent / child re
lation
ship that is re
ferred
to within a Controller and Manager framework. A con troller task will run on an SMC or MIO
de
pending on the hard ware ar
chi
tec
ture, and is re
sponsi
ble for cre
ating the man ager tasks.
Manager tasks will run on the PSC or DPC card depending on the hardware ar chi
tec
ture in use.
• vpnctrl
VPN Controller creates a vpnmgr for each configured context
Controls IP routing across and within the configured contexts of the chassis
• vpnmgr
One vpnmgr for each configured context
Implements Address Resolution Protocol (ARP)
Installs NPU flows
Maintains IP pool configs and usage for appropriate context(s).
Part of Session Recovery along with SessMgr and AAAMgr.
• sessctrl
Session Controller creates
sessmgr and aaamgr instances
Signaling managers for every service configured, for example:
gtpcmgr
sgtpmgr
imsimgr
egtpinmgr
Software Architecture
• sessmgr
Subscriber processing system that supports multiple subscriber types
Multiple Session Managers per CPU. SessMgr is distributed over multiple CPUs on
all active processor cards.
A single Session Manager can service sessions from multiple signaling Demux
managers and from multiple contexts.
An instance of the AAAMgr task is created and paired with each SessMgr task as
part of Session Recovery.
• aaamgr
AAAMgr tasks are paired with SessMgr tasks
All AAA protocol operations and functions performed for subscribers. Performs the
functions of a AAA client to AAA Servers.
Multiple AAA Managers per CPU
AAAMgr is distributed over multiple CPUs on all active processor cards
• diamproxy
1 diamproxy instance per Active DPC or PSC card
Responsible for the processing of Diameter messaging
Sessmgr uses a diamproxy on a different card where aaamgrs initiate requests for
S6b/STa/Rf
Sessmgr uses a diamproxy on the same card to initiate requests for Gx and Gy
• gtpcmgr
Created for the GGSN service.
Receives the PDP Context requests for GTP sessions from the SGSN.
Distributes to different sessmgr tasks for load balancing.
Maintains a list of current sessmgr tasks to aid in system recovery.
• imsimgr
Created for the SGSN RAN interface
Receives the attach and activate requests from UEs
Maintains a table of the relationship between PTMSI, IMSI and sessmgr.
• sgtpmgr
Created for the SGSN Gn interface
Receives PDP context from the SGSN service and multiplexes them onto the Gn
interface
Software Architecture
Manages Tunnels
GGSN lookups
Other SGSN lookups
• egtpinmgr
Created for the SGW and PGW S5 interface
• gtpumgr
Manages GTP-U flows for user sessions.
• npumgr
Manages the NPU subsystem
Executes on the SMC/MIO and DPC/PSC CPUs
Has local copy of NPU data
• cli
Task is created for each individual user logged into the chassis
Accepts input from the user
show commands
configuration additions or deletions
When CPU and/or Mem ory of tasks exceeds cer tain pre
de
fined lim
its, the sta
tus of the task
will change from 'good' to 'warn' or 'over'. This will be ac
com
panied by an SNMP trap and log to
no
tify the sys
tem ad
minis
tra
tor.
In such cases it might be im portant to find out why this happens, and to take cor
rec
tive ac
tion,
as it could indi
cate a more seri
ous issue. The Cisco TAC should be en gaged in such cases. For
ex
am mgr in over state could start re
ple, a sess ject
ing new subscriber ses
sions.
Software Architecture
Software/Hardware Redundancy
Overview
This chap
ter cov
ers both soft
ware and hard
ware re
dun
dancy op
tions in ASR 5000/ASR 5500.
The fol
low
ing tasks are rel
e
vant for the soft
ware/hard
ware re
dun
dancy:
Image - Ses
sion Re
cov
ery Ba
sics
At boot-up the system spawns new instances of “standby mode” session managers and AAA
managers for each active Control Processor (CP) being used. Additionally, other key system-
level software tasks, such as VPN manager, are started on physically separate Processor Cards
to ensure that a double software fault (e.g. session manager and VPN manager fails at same time
on the same card) cannot occur.
The Proces sor Card used to host the VPN man ager process is in ac
tive mode and is reserved by
the oper
ating sys
tem for this use when ses sion re
cov
ery is enabled. Other demux (sig nal
ing)
tasks will be running on this Proces
sor Card but there will be no Session man ager or AAA man -
ager in
stances on the card run ning the VPN man ager in
stances. The ad di
tional hardware re-
sources re quired for ses
sion recov
ery in
clude a standby SMC in ASR 5000, MIO for ASR
5500 and a standby Proces sor Card (PSC in ASR 5000 or DPC in ASR 5500).
In sum
mary:
• Calls in the ASR 5x00 are handled by work-horse proclets called sessmgrs.
• Each sessmgr is backed by a paired aaamgr. The sessmgr and aaamgr are arranged to be
started on 2 different processing cards (PSC/DPC) such that card failure scenarios can
be handled.
• Each call is check-pointed periodically from sessmgr to the corresponding aaamgr.
Software Architecture
• To speed up the session recovery, a standby session manager runs in every processing
card (PSC/DPC) and is quickly renamed to the lost sessmgr, after which a new standby
sessmgr is created on the processing card.
• Upon unexpected loss of a sessmgr process, the standby sessmgr process retrieves the
backed up information from aaamgr and re-builds its sessions.
• When aaamgr fails, the standby aaamgr queries the sessmgr to re-sync the relevant
subscriber info.
• When demux-mgr fails, it queries all sessmgrs and re-builds its data-base of call-
distribution information.
• Nothing is maintained by the sessctrl process. But when the sessctrl process
unexpectedly fails, it should ensure that all sessmgr processes are running with the
right configuration (as present in SCT process) and it should ensure this reconciliation.
Full session processing cards (PSC-2/PSC-3/DPC) recovery mode: Used when a PSC / DPC
hardware failure occurs, or when a card migration happens. In this mode, the standby card is
made active and the “standby-mode” session manager and AAA manager tasks on the newly
activated session processing card perform session recovery. Session/Call state information is
pulled from the peer AAA manager task; each AAA manager and session manager task is paired.
These pairs are started on a physically different PSC / DPC to ensure task recovery.
These are the basic steps taken when a new call ar
rives in the ASR 5000:
1 When a new call arrives, it first connects to the signaling service demux-mgr (step1
above), PGW, HA, GGSN for example. The demux-mgr then chooses a sessmgr to
handle the call and gives it to that session manager (in the picture above, a sessmgr in
card PSC-3).
2 Traffic from the call is sent from the UE to NPU and then on to SessMgr (step 2a above).
SessMgr will act on the traffic if required, and pass it back to NPU on the same PSC
which then sends the traffic through the midplane and out the configured physical
interface and into the network (step 2b above).
3 The response to the traffic arrives from the destination (step 3 above) and is sent back
to NPU and then SessMgr. SessMgr then inspects the traffic (if configured) and then
sends it back to NPU and ultimately out to the network and the UE (step 4 above).
System Verification
System Verification
Overview
This chap
ter out
lines the com
mands that can be ex
e
cuted to en
sure the ASR 5000/5500 chas
-
sis is in a healthy state or iden
tify and troubleshoot exist
ing is
sues. This pro
ce
dure ad
dresses
the need for a “generic” system veri
fi
ca
tion process of the ASR 5000/5500 chas sis.
• Any subsection in this chapter can be used individually to troubleshoot the area under
investigation.
• Counters can be cleared in order to help understand which values are currently
incrementing
• An administrator privilege via CLI is required.
• As a Customer running this procedure, the CLI output can be provided to Cisco in order
to review the health of the system.
Prequisites
The fol
low
ing is needed to perform the ac
tiv
i
ties in the ASR 5000/5500 Sys
tem Ver
i
fi
ca
-
tion Check and Trou
bleshoot
ing chap
ter:
1 ASR 5x00 Chassis with Console or IP connectivity into the chassis.
Overview
This sec
tion cov
ers com
mands and out
puts re
lated to boot
ing, sys
logs, alarms, crashes and li
-
cense.
Ex
pected Out
put:
Ex
pected Out
put:
Ver
ify that the value “local” is pre sent in the square brack
ets of the sys
tem prompt. If
“local” is not pre
sent, then redo step #2.
3. Ver
ify the time and date of the sys
tem #> show clock
System Verification
Ex
pected Out
put:
4. Ver
ify the StarOS ver
sion that is cur
rently being run by the chas
sis: #> show ver
sion
Ver
ify the sys
tem is run
ning the ex
pected soft
ware image ver
sion and build num
ber.
5. Ver
ify the sys
tem up
time of the chas
sis: #> show sys
tem up
time
Ex
pected Out
put:
6. Ver
ify the sys
tem for con
fig
u
ra
tion er
rors : #> show con
fig
u
ra
tion er
rors
System Verification
Ex
pected Out
put:
Ver
ify the sys
tem con
fig
u
ra
tion and fix re
ported con
fig
u
ra
tion mis
takes to avoid pos
si
ble prob
-
lems on the system.
7. Ver
ify the sys
tem’s boot con
fig
u
ra
tion: #> show boot
Ex
pected Out
put:
image /flash/asr5500-16.2.0.56515.bin \
config /flash/New-16.2-Sep2nd2014.cfg
The out put of this com mand can have up to 10 en tries and the spec i
fied files must be present in
the “/flash/” drive of the ASR 5x00 chas sis. The lowest pri
or
ity is booted first. If the config
u
-
ra
tion file spec i
fied is not valid (mis
spelled or not pre
sent in the /flash di rectory) and the chas-
sis is re
loaded, the sys tem will go into a con
tin
u
ous boot cycle.
The sys
tem will only move on to the next boot pri
or
ity if the StarOS BIN file is cor
rupted or not
found.
To de
ter
mine if the files are pre
sent, issue the fol
low
ing com
mand, must be logged in as an
admin with this com
mand:
8. Ver
ify which con
fig
u
ra
tion file the sys
tem booted with the last time the chas
sis was loaded:
#> show boot ini tial-con fig
System Verification
Ex
pected Out
put:
Ver
ify that the sys
tem was loaded with the proper boot con
fig
u
ra
tion file:
Im
por
tant:
• If the system configuration was recently saved, but the chassis was not reloaded, the
output of “show boot initial-config” and the lowest value of “show boot” may not match.
• If after a reboot, the “show boot” and “show boot initial-config” do not match, this will
need to be investigated further.
9. Dis
play the in
stalled Li
censes: #> show li
cense in
for
ma
tion
Ex
pected Out
put:
Ver
ify that the Li
cense Key has been in
stalled:
DHCP
[ ASR5K-00-CSXXDHCP ]
IPv4 Routing Protocols [ none ]
Enhanced Charging Bundle 2: [ ASR5K-00-CS01ECG2 ]
+ DIAMETER Closed-Loop Charging Interfac [ ASR5K-00-CSXXDCLI ]
+ Enhanced Charging Bundle 1 [ ASR5K-00-CS01ECG1 ]
IPSec [ ASR5K-00-CS01I-K9 / ASR5K-00-CS10I-K9 ]
Session Recovery [ ASR5K-00-PN01REC / ASR5K-00-HA01REC
ASR5K-00-00000000 / ASR5K-00-GN01REC
ASR5K-00-SN01REC / ASR5K-00-AN01REC
ASR5K-00-IS10PXY / ASR5K-00-IS01PXY
ASR5K-00-HWXXSREC / ASR5K-00-PW01REC
ASR5K-05-PHXXSREC / ASR5K-00-SY01R-K9
ASR5K-00-IG01REC / ASR5K-00-PC10SR
ASR5K-00-EG01SR / ASR5K-00-FY01SR
ASR5K-00-CS01LASR / ASR5K-00-FY01USR
ASR5K-00-EW01SR / ASR5K-00-SM01SR
ASR5K-00-S301SR ]
IPv6 [ N/A / N/A ]
Lawful Intercept [ ASR5K-00-CSXXLI ]
Inter-Chassis Session Recovery [ ASR5K-00-HA10GEOR / ASR5K-00-HA01GEOR
ASR5K-00-GN10ICSR / ASR5K-00-GN01ICSR
ASR5K-00-PW01ICSR / ASR5K-00-IGXXICSR
ASR5K-00-PC10GR / ASR5K-00-SW01ICSR
ASR5K-00-SG01ICSR ]
RADIUS AAA Server Groups [ ASR5K-00-CSXXAAA ]
Intelligent Traffic Control: [ ASR5K-00-CS01ITC ]
+ Dynamic Radius extensions (CoA and PoD) [ ASR5K-00-CSXXDYNR ]
+ Per-Subscriber Traffic Policing/Shapin [ ASR5K-00-CSXXTRPS ]
Enhanced Lawful Intercept [ ASR5K-00-CS01ELI / ASR5K-00-CS10ELI ]
Dynamic Policy Interface: [ ASR5K-00-CS01PIF ]
+ DIAMETER Closed-Loop Charging Interface [ ASR5K-00-CSXXDCLI ]
PGW [ ASR5K-00-PW10GTWY / ASR5K-00-PW01LIC ]
NAT/PAT with DPI [ ASR5K-00-CS01NAT ]
NAT Bypass [ ASR5K-02-CS01NATB ]
Local Policy Decision Engine [ ASR5K-00-PWXXDEC ]
System Verification
Session Limits:
Sessions Session Type
-------- -----------------------
10000 GGSN
11000 ECS
10000 PGW
If the out
put is not dis
played as expected, please con
tact Cisco to in
ves
ti
gate fur
ther or ref
er
-
ence the "Li
censing" sec
tion in "Plat
form Troubleshoot
ing".
10. Ver
ify the Con
texts that are con
fig
ured on the sys
tem: #> show con
text
Ex
pected Out
put:
Ver
ify that all ex
pected Con
texts are pre
sent and in an “Ac
tive” state.
11. Ver
ify all tasks on the chas
sis are in a “good” state: #> show task re
sources | grep -v
good
System Verification
Ex
pected Out
put:
12. Re
view all Events pre
sent on the chas
sis: #> show logs
Ex
pected Out
put:
The out
put of this com mand is listed from Newest to Old est. The pa
ra
me
ter “| more” can be
added to the end of the com
mand to dis play the out
put page by page.
When re
view
ing this out
put, it is im
por
tant to be fa
mil
iar with the sys
tem.
13. Re
view all re
cent SNMP traps pre
sent on the chas
sis: #> show snmp traps his
tory
verbose
Ex
pected Out
put:
The out
put of this command is listed from Oldest to Newest. The pa ra
me
ter “| more” can be
added to the end of the com
mand to dis play the out
put page by page.
When re
view
ing this out
put, it is im
por
tant to be fa
mil
iar with the sys
tem.
14. De
ter
mine if there are any ac
tive alarms on the sys
tem:#> show alarm out
stand
ing
verbose
System Verification
Ex
pected Out
put:
15. Re
view crashes that have oc
curred on the sys
tem: #> show crash list
To view an in
di
vid
ual crash: #> show crash num ber <Num
ber of crash from list
ing>
Ex
pected Out
put:
Total Crashes : 37
System Verification
The sys
tem should not have any re cent soft
ware or hardware crashes. Refer to the "Soft
ware
Crashes" sec
tion in the "Plat
form Trou
bleshooting" chap
ter for fur
ther in
for
ma
tion.
16. Ver
ify no un
planned card switchovers or mi
gra
tions have oc
curred on the chas
sis: #> show
rct stats
Ex
pected Out
put:
The sys
tem should not have any re cent unexplained card mi
gra
tions. Refer to the "Plat
form
Trou
bleshoot
ing" sec
tion for fur
ther in
for
mation what can be checked.
System Verification
Hardware
Overview
This sec
tion pre
sents the com
mands avail
able to check hard
ware health of ASR 5000 / ASR
5500
Es
ti
mated Time for com
ple
tion: 15 min
utes
1. Ver
ify the hard
ware in
ven tem: #> show hard
tory of the sys ware in
ven
tory
Ex
pected Out
put:
2. Ver
ify the sta
tus of cards in the sys
tem:
Ex
pected Out
put:
ASR 5000:
ASR 5500:
• Active: Indicates that the card is an active component and is providing services.
• Standby: Indicates that the card is a redundant component. Redundant components
will become active through manual configuration or automatically should a failure
occur.
• Offline: Indicates that the card is installed but is not ready to process subscriber data
sessions. This could be due to the fact that it is not completely installed (such as, the
card interlock switch is not locked). Refer to the Installation Guide for additional
information.
“SPOF” dis
plays whether or not the compo nent is a sin
gle point of fail
ure (SPOF) in the sys
tem. If
the compo
nent is a SPOF, then a "Yes"will ap
pear in this col
umn. If not, a "No"will be dis
played.
Es
ca
late to Cisco when any card is in an un
ex
pected state.
System Verification
3. Dis
play the out
put of each card with de
tailed in
for
ma
tion: #> show card infoor to per
-
form the com mand on one card: #> show card info <Card Num ber>
Ex
pected Out
put:
ASR 5000:
Re
view the card output in the ASR 5000: all cards con
fig
ured in the sys
tem should be in the ex-
pected state of Ac
tive or Standby. Es
ca
late any un ex
pected output to Cisco to in
ves
ti
gate fur
-
ther.
ASR 5500:
Re
view the card output in the ASR 5500. All cards con
fig
ured in the sys
tem should be in the ex-
pected state of Ac
tive or Standby. Es
ca
late any un ex
pected output to Cisco to in
ves
ti
gate fur
-
ther.
3. Dis
play the out
put of each card with de
tailed in
for
ma
tion: #> show card diagor to per
-
form the com mand on one card #> show card diag <Card Num ber>
Ex
pected Out
put:
ASR 5000:
Re
view the card out
put in the ASR 5000. Any un
ex
pected out
puts should be in
ves
ti
gated.
ASR 5500:
Last Reset Cause CPU 0: PRT Reset, ICH PLT Reset, ITP Reset
Last Reset Cause CPU 1: PRT Reset, ICH PLT Reset, ITP Reset
Current Environment:
Temp: CPU0 DDR-N0C0D0 : 35.00 C (limit 95.00 C)
Temp: CPU0 DDR-N0C1D0 : 35.00 C (limit 95.00 C)
Temp: CPU0 DDR-N0C2D0 : 35.00 C (limit 95.00 C)
Temp: CPU0 DDR-N1C0D0 : 33.00 C (limit 95.00 C)
Temp: CPU0 DDR-N1C1D0 : 33.00 C (limit 95.00 C)
Temp: CPU0 DDR-N1C2D0 : 33.00 C (limit 95.00 C)
Temp: CPU0 CPU-N0C0 : 32.00 C (limit 95.00 C)
Temp: CPU0 CPU-N0C1 : 27.00 C (limit 95.00 C)
Temp: CPU0 CPU-N0C2 : 24.00 C (limit 95.00 C)
Temp: CPU0 CPU-N0C3 : 29.00 C (limit 95.00 C)
Temp: CPU0 CPU-N0C4 : 34.00 C (limit 95.00 C)
Temp: CPU0 CPU-N0C5 : 29.00 C (limit 95.00 C)
Temp: CPU0 CPU-N1C0 : 35.00 C (limit 95.00 C)
Temp: CPU0 CPU-N1C1 : 29.00 C (limit 95.00 C)
Temp: CPU0 CPU-N1C2 : 29.00 C (limit 95.00 C)
Temp: CPU0 CPU-N1C3 : 28.00 C (limit 95.00 C)
Temp: CPU0 CPU-N1C4 : 35.00 C (limit 95.00 C)
Temp: CPU0 CPU-N1C5 : 31.00 C (limit 95.00 C)
Temp: CPU0 IOH : 55.50 C (limit 110.00 C)
Temp: NP4 #0 : 62.00 C (limit 115.00 C)
Temp: CPU1 DDR-N0C0D0 : 33.00 C (limit 95.00 C)
Temp: CPU1 DDR-N0C1D0 : 32.00 C (limit 95.00 C)
Temp: CPU1 DDR-N0C2D0 : 31.00 C (limit 95.00 C)
Temp: CPU1 DDR-N1C0D0 : 34.00 C (limit 95.00 C)
Temp: CPU1 DDR-N1C1D0 : 34.00 C (limit 95.00 C)
Temp: CPU1 DDR-N1C2D0 : 34.00 C (limit 95.00 C)
Temp: CPU1 CPU-N0C0 : 36.00 C (limit 95.00 C)
Temp: CPU1 CPU-N0C1 : 33.00 C (limit 95.00 C)
Temp: CPU1 CPU-N0C2 : 29.00 C (limit 95.00 C)
Temp: CPU1 CPU-N0C3 : 30.00 C (limit 95.00 C)
Temp: CPU1 CPU-N0C4 : 31.00 C (limit 95.00 C)
Temp: CPU1 CPU-N0C5 : 33.00 C (limit 95.00 C)
Temp: CPU1 CPU-N1C0 : 32.00 C (limit 95.00 C)
Temp: CPU1 CPU-N1C1 : 28.00 C (limit 95.00 C)
Temp: CPU1 CPU-N1C2 : 28.00 C (limit 95.00 C)
Temp: CPU1 CPU-N1C3 : 33.00 C (limit 95.00 C)
System Verification
Re
view the card out
put in the ASR 5500. Any un
ex
pected out
puts should be in
ves
ti
gated.
4. Ver
ify the sta
tus of all card pro
gram
ma
bles: #> show card hard
ware
Ex
pected Out
put:
ASR 5000:
ASR 5500:
Verify that all cards have "up to date" in the "Card Pro
gram bles : " field. Es
ma ca
late the
issue to Cisco if any other data is pre
sent.
5. Ver
ify the sta
tus of all fans: #> show fans
Ex
pected Out
put:
ASR 5000:
ASR 5500:
Ver
ify the state of the fans are nor
mal and the tem
per
a
ture is within normal oper
at
ing range.
The “verbose” pa ra
meter can be added to the com
mand for addi
tional de
tails.
6. Ver
ify the tem
per
a
ture of the chas
sis: #> show tem
per
a
ture
Ex
pected Out
put:
ASR 5000:
ASR 5500:
En
sure that all cards show a value of “Nor
mal”. The key
word “ver
bose” can be added to the
command for more de tailed in
for
ma
tion.
7. Ver
ify the HD-RAID sta
tus of the chas
sis: #> show hd raid
The key
word “ver
bose” can be added to the com
mand for more de
tailed in
for
ma
tion.
Ex
pected Out
put:
ASR 5000:
ASR 5500:
HD RAID:
State : Available
Degraded : No
UUID : 2010dbd9:fbad6ab5:d1dae3e0:2b96a066
Size : 1.2TB
Action : Idle
Card 14
State : In-sync card
Disk hd14a
State : In-sync component
Disk hd14b
State : In-sync component
..
Card 17
State : In-sync card
Disk hd17a
State : In-sync component
Disk hd17b
State : In-sync component
Overview
This sec
tion cov
ers Layer 2 and Layer 3 health check com
mands in ASR 5000/ASR 5500.
1. Ver
ify the sta
tus of the local Eth
er
net in
ter
faces: #> show port table
Ex
pected Out
put:
If the im
plementa
tion is using LACP (Link Ag
gre
ga
tion Con
trol Pro
to
col), the LAG Ports will
have the fol
low
ing possi
ble sta
tus:
If as
sis
tance is needed from Cisco, please open an SR.
2. This com
mand re
views de
tailed in
for
ma
tion on in
di
vid
ual ports: #> show port info
<card/slot>
Ex
pected Out
put:
Remote Fault : No
Laser Transmit Enabled : Yes
Configured Flow Control : Enabled
Interface MAC Address : 70-81-05-96-3F-E9
SRP Virtual MAC Address : None
Fixed MAC Address : 70-81-05-96-3F-C9
Link State : Up
Link Duplex : Full
Link Speed : 10 Gb
Flow Control : Enabled
Link Aggregation Group : 50 (global, master)
Link Aggregation LACP : Active, Short, Auto
LAG Toggle Link : No
LAG Redundancy Mode : Standard
LAG Hold Time : 10
Link Aggregation Master : 5/10
Link Aggregation State : Port distributing
Link Aggregation Actor : (8000,70-81-05-96-3F-00,001A,8000,050A)
Link Aggregation Peer : (007F,80-71-1F-4C-77-F0,002F,007F,0008)
Untagged:
Logical ifIndex : 84541441
Operational State : Up, Active
Tagged VLAN: VID 2011
Logical ifIndex : 84541442
VLAN Type : Standard
VLAN Priority : 0
Administrative State : Enabled
Operational State : Up, Active
Number of VLANs : 2
SFP Module : Present (10G Base SR)
With this com mand, it is important to make sure the port is in the expected state. The Link’s
Duplex and Speed set ting can also be veri
fied. Most in
stal
la
tions should be in a Full/Du
plex
set
ting. Dif
fer
ent set
tings of in
ter
est have been high
lighted in the output above.
Ver
ify the du
plex and speed set
tings match on both sides of the link.
3. Ver
ify the uti tion of the ports: #> show port uti
liza liza
tion table
System Verification
Ex
pected Out
put:
If there is:
• A significant imbalance
• Lower than normal utilization
• Traffic on ports that are not normally active
If the imple
men
tation is using LACP, the Tx and Rx val
ues should be very close in uti
liza
tion. If
they are not, this should be in ves
ti
gated fur
ther. Refer to the "LAG" section in the "Platform
Trou bleshoot
ing" chapter.
4. Ver
ify Layer 2 of the local Eth
er
net ports: #> show port datalink coun
ters
<card/slot>
System Verification
Ex
pected Out
put:
ASR 5500:
ASR 5000:
RX MAC ERR 0 |
RX JABBER COUNT 0 |
----------------------- -------------- + ----------------------- -------------
With this com mand, ver ify the Layer 2 coun ters for the in divid
ual Eth
er
net port connec
tiv
ity of
the ASR 5x00. All “bolded” coun ters should typ i
cally be “0” or very low. If these val ues are ac-
tively in
cre
menting they should be in ves ti
gated. Usu ally, if the counter is an "Rx" (Re
ceive) then
the issue is ex
ter
nal to the chassis. If it is "Tx" (Transmit), then the prob lem may be on the chas -
sis.
At
tempt the fol
low
ing to re
solve:
Ex
pected Out
put:
If the read
ings are not of the ex
pected val
ues, at
tempt the fol
low
ing:
6. Ver
ify:
• ARP/Neighbor tables are populated for each of the configured Contexts: #> context
<Each Context configured in the system>
• For configured IPv4 interfaces: #> show ip arp
• For configured IPv6 interfaces:#> show ipv6 neighbors
Ex
pected Out
put:
Ex
pected Out
put:
All config
ured in
ter
faces should be in their ex pected state. If an inter
face is not in the cor
rect
state, refer to the sec
tion above con tain
ing the "show port info" com mand. Refer to the "ARP"
or "Cards or Ports" section in the "Plat
form Trou bleshoot
ing" chap ter for fur
ther in
for
mation.
Loop
back in
ter
faces will show a state of “IN TIVE” if the chas
AC sis is in the SRP Standby state.
8. Ver
ify Layer 3 in
ter
face coun
ters: #> show port npu coun
ters <card/slot>
Ex
pected Out
put:
With this com mand, ver ify the Layer 3 coun ters for the in
di
vid
ual Eth er
net port of the ASR
5x00. All “bolded” coun ters should typ i
cally be “0”. If they are actively in
cre
ment ing they
should be inves ti
gated. Usu ally, if the counter is an "Rx" (Receive) then the issue is ex
ternal to
the chas
sis, if it is "Tx"(Transmit), then the prob lem may be on the chas sis.
System Verification
9. Ver
ify IP Rout
ing ta
bles are pre
sent for each of the con
fig
ured Con
texts: #> con
text <Each
Context con figured in the sys tem>
Ex
pected Out
put:
Destination Nexthop
Protocol Prec Cost Interface
*::/0 2001:4888:24:2010:223:25:: static 40 0 5/10-V6
*2001:f888:0:7:223:2a1::/128 2001:4888:24:2010:223:25:: static 1 0 5/10-V6
*2001:f888:0:7:223:2a1:0:1/128 2001:4888:24:2010:223:25:: static 1 0 5/10-V6
*2001:f888:0:1000::/128 :: connected 0 0 BGPV6
*2001:f888:24:2a10::/64 :: connected 0 0 5/10-V6
*2001:4f88:24:2a10:223:200::/128 :: connected 0 0 5/10-V6
*2600:1f06:21a0::/44 :: connected 0 0 chg961
*2600:1f06:21a0::/44 :: connected 0 0 chg961
*2600:1f06:81a0::/44 :: connected 0 0 ims961
*2600:1ff06:8a10::/44 :: connected 0 0 ims961
*2600:1f06:81a0::/44 :: connected 0 0 ims961
*2600:1f06:91a0::/44 :: connected 0 0 appl961
*2600:1f06:b1a0::/44 :: connected 0 0 intl961
*2600:1f06:b1a0::/44 :: connected 0 0 intl961
*2600:1f06:b1a0::/44 :: connected 0 0 intl961
*2600:1f06:f1a0::/44 :: connected 0 0 admin961
Call Processing
Overview
With many subscribers connecting to ASR 5000/ASR 5500, this sec
tion cov
ers how to ver
ify
ses
sion con
nec
tions and setup times.
1. Ver
ify the total num
ber of sub
scribers cur
rently at
tached to the chas
sis:#> show sub
‐
scribers sum
mary
Ex
pected Out
put:
pgw-gtps2a-ipv4-ipv6: 0
mme: 0
henbgw-ue: 0 henbgw-henb: 0
ipsg-rad-snoop: 0 ipsg-rad-server: 0
ha-mobile-ip: 997252 ggsn-pdp-type-ppp: 0
ggsn-pdp-type-ipv4: 0 lns-l2tp: 0
ggsn-pdp-type-ipv6: 0 ggsn-pdp-type-ipv4v6: 0
ggsn-mbms-ue-type-ipv4: 0pdif-simple-ipv4: 0
pdif-simple-ipv6: 0 pdif-mobile-ip: 0
wsg-simple-ipv4: 0 wsg-simple-ipv6: 0
pdg-simple-ipv4: 0 ttg-ipv4: 0
pdg-simple-ipv6: 0 ttg-ipv6: 0
femto-ip: 0
epdg-pmip-ipv6: 0 epdg-pmip-ipv4: 0
epdg-pmip-ipv4-ipv6: 0
epdg-gtp-ipv6: 0 epdg-gtp-ipv4: 0
epdg-gtp-ipv4-ipv6: 0
sgsn: 0 sgsn-pdp-type-ppp: 0
sgsn-pdp-type-ipv4: 0 sgsn-pdp-type-ipv6: 0
sgsn-pdp-type-ipv4-ipv6: 0 type not determined: 0
sgsn-subs-type-gn: 0 sgsn-subs-type-s4: 0
sgsn-pdp-type-gn: 0 sgsn-pdp-type-s4: 0
asngw-simple-ipv4: 0 asngw-simple-ipv6: 0
asngw-mobile-ip: 0
asngw-non-anchor: 0 asngw-auth-only: 0
phsgw-simple-ipv4: 0 phsgw-simple-ipv6: 0
phsgw-mobile-ip: 0
phsgw-non-anchor: 0
cdma 1x rtt sessions: 0 cdma evdo sessions: 0
cdma evdo rev-a sessions: 0 cdma 1x rtt active: 0
cdma evdo active: 0 cdma evdo rev-a active: 0
asnpc-idle-mode: 0 phspc-sleep-mode: 0
hnbgw: 0 hnbgw-iu: 0
bng-simple-ipv4: 0pcc: 0
in bytes dropped: 14698706480 out bytes dropped: 70438975999
in packet dropped: 98732774 out packet dropped: 64916512
in packet dropped zero mbr: 0 out packet dropped zero mbr: 0
in bytes dropped ovrchrgPtn: 0 out bytes dropped ovrchrgPtn: 0
in packet dropped ovrchrgPtn: 0 out packet dropped ovrchrgPtn: 0
System Verification
If the total number of subscribers is not the ex pected or normal value, refer to the "Session
Troubleshooting" chap
ter or spe
cific call con
trol pro
to
col chap
ters for fur
ther in
for
ma
tion.
2. Ver
ify the ses
sions in progress that are cur
rently ob
served in the sys
tem: #> show ses
sion
progress
Ex
pected Out
put:
If the values under "show session progress" are not the ex pected or nor
mal val
ues, refer to
the "Ses sion Trou
bleshoot
ing" chap
ter or spe
cific call con
trol pro
to
col chap
ters for fur
ther in
-
formation.
3. Ver
ify the dis
con
nect rea
sons that are cur
rently ob
served in the sys
tem: #> show ses
sion
discon nect-rea sons
Ex
pected Out
put:
Re
view the per cent
age col
umn of the “dis connect-rea sons”. En sure that the percent
ages
are near their nor
mal val
ues. Typ
i
cally, if an issue is oc
cur
ring within the net
work, a “dis
con‐
nect-rea son” counter may be higher or lower than nor mal.
To get a cur
rent readout of the dis
con
nect reasons, clear
ing the coun
ters can be done prior to
dis
play
ing the val
ues. Then display the val
ues over a 5 to 10 minute period to un
der
stand the
cur
rent dis
connect rea
sons in the sys
tem.
If the val
ues under "show ses sion dis con nect-rea sons" are not ex pected, refer to the
"Session Trou
bleshoot
ing" chap
ter or spe
cific call con
trol pro
to
col chap
ters for fur
ther in
for
-
ma tion.
4. Ver
ify the setup time taken by the sys
tem to at
tach a sub
scriber to the chas
sis: #> show
session se tuptime
Ex
pected Out
put:
If the val
ues under "show ses sion se tuptime" are not ex
pected, refer to the "Ses
sion Trou
-
bleshoot ing" chap
ter or spe
cific call con
trol pro
to
col chap
ters for fur
ther in
for
ma
tion.
5. Ver
ify the his
tor
i
cal ses
sion coun
ters of calls being at
tached to the sys
tem: #> show ses
‐
sion coun ters his torical all
Expected Out
put:
(A+R+D+F+H+R)
Intv Timestamp Arrived Rejected Connected Disconn Failed Handoffs Renewals CallOps
---- ------------------- -------- -------- --------- -------- ------- -------- -------- -------
With the output of “show ses sion coun ters his tor cal all”, it is im
i por
tant to take no
-
tice of any sud
den spikes in any of the columns. Such a spike will usu ally in
di
cate an issue has
taken place in the network. If there is a large in
crease or de
crease, fol
low local proce
dures to
troubleshoot the issue.
If the values under "show ses sion coun ters his tor cal all" are not ex
i pected, refer to
the "Ses sion Trou
bleshoot
ing" chap
ter or spe
cific call con
trol pro
to
col chap
ters for fur
ther in
-
formation.
6. Ver
ify that all Ses
sion Man
agers on the sys
tem are evenly dis
trib
uted: #> show task re
‐
sources fa cil ity sessmgr all
Expected Out
put:
..
Total 348 8750.83% 244.7G 44081 3.0m
All Ses
sion Man agers on the system should show a value in the “used” col umn that is fairly
equal. If the system shows a Ses sion Manager that has a value that is much higher or lower
than other Ses sion Man agers on the sys
tem, this should be inves
ti
gated im me di
ately. Try to
deter
mine if the af
fected Ses
sion Managers are all on the same card, or if it is af
fect
ing mul
ti
ple
cards.
If the ses
sions are not evenly dis
trib
uted, refer to the "Ses
sion Man
agers Im
bal
anced" sec
tion in
the "Plat
form Trou bleshoot
ing" chapter for fur
ther in
for
mation.
System Verification
Chassis Fabric
Overview
This sec
tion cov
ers a health check of Swtich Fab
ric on ASR 5000/ASR 5500.
Es
ti
mated Time for com
ple
tion: 15 min
utes
1. Ac
cess hid
den/test mode of the chas
sis CLI:
#> hid
den p <pass
word>
Or
Ex
pected Out
put:
Or
Ex
pected Out
put:
Ex
pected Out
put:
EGQ.PqpDiscardUnicastPacketCounter 5
Petra-B 4=2/2
EGQ.RqpDiscardPacketCounter 2
EGQ.EhpDiscardPacketCounter 2
EGQ.PqpDiscardUnicastPacketCounter 2
Petra-B 5=3/1
Petra-B 6=3/2
Petra-B 11=4/1
Petra-B 12=4/2
Petra-B 17=5/1
Petra-B 18=5/2
Petra-B 19=5/3
Petra-B 20=5/4
Petra-B 21=6/1
Petra-B 22=6/2
Petra-B 23=6/3
Petra-B 24=6/4
Petra-B 25=7/1
This com mand will show whether or not Egress Queue Dis cards are cur
rently oc
curring in the
chas
sis. In order to see if the values are in
cre
ment ing, the command must be run at least every
minute. If val
ues are increment ing, refer to the "Switch Fab ric and NPU" section in the "Plat
-
form Trou bleshooting" chapter.
Ex
pected Out
put:
En
sure the fol
low
ing val
ues are ob
served:
Ex
pected Out
put:
Ex
pected Out
put
lost_fails --------------------------------------------------------------------------+
bonus_fails -------------------------------------------------------------------+ |
length_fails ------------------------------------------------------------+ | |
data_fails --------------------------------------------------------+ | | |
last_valid_packet_number ---------------------------------+ | | | |
valid_rcvd_pkts ---------------------------------+ | | | | |
received_pkts --------------------------+ | | | | | |
sent_pkts ---------------------+ | | | | | | |
gen_pkts -------------+ | | | | | | | |
cpu/inst + | | | | | | | | |
sl | | | | | | | | | |
1 0/0 -> 26757021 26757021 26757021 26757021 26757021 -> 0 0 0 0
2 0/0 -> 26345617 26345617 26345617 26345617 26345617 -> 0 0 0 0
3 0/0 -> 26705123 26705123 26705123 26705123 26705123 -> 0 0 0 0
4 0/0 -> 26707048 26707048 26707048 26707048 26707048 -> 0 0 0 0
5 0/0 -> 26702253 26702253 26702253 26702253 26702253 -> 0 0 0 0
6 0/0 -> 26703086 26703086 26703086 26703086 26703086 -> 0 0 0 0
7 0/0 -> 26706768 26706768 26706768 26706768 26706768 -> 0 0 0 0
10 0/0 -> 26705221 26705221 26705221 26705221 26705221 -> 0 0 0 0
11 0/0 -> 26703240 26703240 26703240 26703240 26703240 -> 0 0 0 0
12 0/0 -> 26701973 26701973 26701973 26701973 26701973 -> 0 0 0 0
13 0/0 -> 26708119 26708119 26708119 26708119 26708119 -> 0 0 0 0
14 0/0 -> 5330377 5330377 5330377 5330377 5330377 -> 0 0 0 0
15 0/0 -> 26711346 26711346 26711346 26711346 26711346 -> 0 0 0 0
16 0/0 -> 26706719 26706719 26706719 26706719 26706719 -> 0 0 0 0
The right most column “lost fails” should be “0” or not in cre
menting. If the val
ues are in
-
cre
menting, please es
ca
late to Cisco to im
me
di
ately begin in
ves
ti
ga
tion.
System Verification
Overview
This sec
tion cov
ers how to col
lect Tech Sup
port in
for
ma
tion for trou
bleshoot
ing.
Es
ti
mated Time for com
ple
tion: 5-25 min
utes, de
pends on the con
fig
u
ra
tion of the chas
sis
1. This com
mand will pro
vide a snap
shot of the sys
tem:
This ver
sion of the com
mand will send the out
put to the User’s screen:
This ver
sion of the com
mand will cre
ate a com
pressed file and save it to the “/flash/” file sys
-
tem on the chassis:
Ex
pected Out
put:
Send
ing to the screen:
Send
ing the SSD to the “/flash” file sys
tem of the chas
sis:
This out
put is help
ful when es
calat
ing is
sues to Cisco, and should be taken before any main
te
-
nance occurs on the system, after the main te
nance is com pleted, and a few times dur
ing any
event that may need to be in
vesti
gated by Cisco.
Platform Troubleshooting
Platform Troubleshooting
CPU/Memory issues
Overview
This sec
tion dis
cusses dif
fer
ent types of re
sources that can be mon
i
tored on ASR 5000/ASR
5500.
Description
In the ASR 5000/ASR 5500 chas
sis, dif
fer
ent types of re
sources are mon
i
tored.
• CPU
• Memory
• Files
• Sessions
CPU/Memory
CPU and mem
ory usage are mon
i
tored by the sys
tem in mul
ti
ple ways:
• CPU/Memory per card - This is the usage of physical cpu and memory on each card.
• CPU/Memory per process - This is the usage of each process.
• CPU/Memory per system/kernel - This is similar to the first, but from the Linux kernel
perspective.
Files
This is a file de
scrip
tor opened by a process. The num
ber of file de
scrip
tors is mon
i
tored by the
sys
tem. There are also mul ti
ple ways the sys
tem moni
tors for files; process level and Linux
level.
Sessions
Ses
sion counts are ap
plic
a
ble to sess
mgrs, aaamgrs, and some demux tasks. A new ses sion
would be re
jected once the max-limit, de
fined by ei
ther the in
di
vid
ual task limit or the sys
tem
li
cense limit, is reached.
Platform Troubleshooting
Below is sam
ple out
put from an ASR 5500 chas
sis.
Below is a sam
ple out
put from ASR 5500 chas
sis.
In the out
put above CPU / Mem ory usage per process can be seen. Note that each process has
an al
lo
cated value and a cur rently used value. In
side StarOS there is a re
source management
subsys
tem (resctrl/resmgr) that assigns a set of re
source lim
its for each process in the sys
-
tem. These limits can’t be changed by the user.
If a process ex ceeds the limit on CPU, Mem ory, or Files, the sta
tus will be changed from "good"
to "warn" or "over" de pend ing on how much the process ex ceeds the limit. Please note, see ing a
task in "warn" or "over" does not nec essar
ily mean there is a prob lem. The lim its are just for
mon i
tor
ing, and the process will con tinue to work even after vi o
lating the limit. If a process
stays at or near 100% CPU usage, or the mem ory usage keeps grow ing, the task(s) should be in-
vesti
gated to see if there is an issue.
Below is a sam
ple out
put from an ASR 5500 chas
sis.
Network cpeth1 : 0.075 kpps rx, 0.093 mbps rx, 0.043 kpps tx, 0.037 mbps tx
[ High res sample : 0.081 kpps rx, 0.096 mbps rx, 0.051 kpps tx, 0.044 mbps tx ]
Network mcdma0 : 0.000 kpps rx, 0.000 mbps rx, 0.000 kpps tx, 0.000 mbps tx
Network mcdma1 : 0.000 kpps rx, 0.000 mbps rx, 0.000 kpps tx, 0.000 mbps tx
Network mcdma2 : 0.000 kpps rx, 0.000 mbps rx, 0.000 kpps tx, 0.000 mbps tx
Network mcdma3 : 0.000 kpps rx, 0.000 mbps rx, 0.000 kpps tx, 0.000 mbps tx
Network mcdmaN : 0.000 kpps rx, 0.000 mbps rx, 0.000 kpps tx, 0.000 mbps tx
Network Total : 0.166 kpps rx, 0.195 mbps rx, 0.108 kpps tx, 0.480 mbps tx
File Usage : 1056 open files, 9899609 available
Memory Usage : 3180M 3.2% used (1549M 3.2% node-0, 1630M 3.3% node-1)
Memory Details:
In the out
put, as high
lighted, is the cpu / mem
ory usage at the Linux ker
nel level. For cpu, the
fol
low
ing can be checked:
irq Interrupts
Card 5, CPU 0:
Status : Active, Kernel Running, Tasks Running
Load Average : 531.06, 529.32, 524.18 (531.06 max)
Total Memory : 98304M
Kernel Uptime : 23D 8H 23M
Last Reading:
CPU Usage All : 6.1% user, 3.8% sys, 80.6% io, 1.8% irq, 7.7% idle
Platform Troubleshooting
Core 0 : 7.0% user, 6.9% sys, 83.5% io, 2.6% irq, 0.0% idle
Core 6 : 6.4% user, 5.4% sys, 84.7% io, 3.5% irq, 0.0% idle
Core 1 : 4.1% user, 3.3% sys, 91.4% io, 1.2% irq, 0.0% idle
Core 7 : 4.2% user, 4.3% sys, 90.6% io, 0.9% irq, 0.0% idle
Core 2 : 4.9% user, 2.8% sys, 90.8% io, 1.5% irq, 0.0% idle
Core 8 : 6.0% user, 1.9% sys, 91.1% io, 1.0% irq, 0.0% idle
Core 3 : 3.3% user, 3.7% sys, 91.3% io, 1.7% irq, 0.0% idle
Core 9 : 3.0% user, 1.8% sys, 94.4% io, 0.8% irq, 0.0% idle
Core 4 : 4.2% user, 2.1% sys, 0.0% io, 1.0% irq, 92.7% idle
Core 10 : 4.7% user, 2.8% sys, 90.1% io, 2.4% irq, 0.0% idle
Core 5 : 8.8% user, 5.8% sys, 83.3% io, 2.1% irq, 0.0% idle
Core 11 : 16.6% user, 5.0% sys, 75.8% io, 2.6% irq, 0.0% idle
An im
por
tant com
mand for live trou
bleshoot
ing pro
vides graphs per val
ues re
ported in the
command:
CPU
• show profile - This is the background cpu profiler. This command allows checking
which functions consume the most CPU time. This command will require cli test-
command password.
Below is a sam
ple out
put of show pro file. The fa
cil
ity, in
stance, card, or more can be spec
i
fied
with this com
mand. For depth option, 4 is the rec
om mended value.
Internal trap notification 1220 (CPUOverClear) facility cli instance 5010294 card 5 cpu 0
allocated 600 used 272
Internal trap notification 1216 (CPUWarnClear) facility cli instance 5010294 card 5 cpu 0
allocated 600 used 272
Internal trap notification 1215 (CPUWarn) facility cli instance 5010317 card 5 cpu 0 allocated
600 used 595
Internal trap notification 1216 (CPUWarnClear) facility cli instance 5010317 card 5 cpu 0
allocated 600 used 220
Memory
For high mem ory is
sues, please see the following CLI com mands. For the last 2 com mands in
the list, please cap
ture multi
ple it
er
a
tions over a con
sis
tent time in
ter
val, every 15 min
utes for 4
it
er
a
tions for example.
• show task resources - Check if any process goes warn/over state
• show task resource max - Check max usage rather than current usage
• show snmp trap history - Check if there is any MemoryWarn/Over event
• show logs - Check if there is any warning/error reported by resmgr
• show messenger proclet facility <name> instance <x> heap - Check heap usage of the
process
Platform Troubleshooting
• show messenger proclet facility <name> instance <x> system heap - Check system heap
information for containing process
For ex
am
ple, if sess
mgr 2 and mmemgr 3 are in warn or over state, the fol
low
ing would be re
-
quired:
Internal trap notification 1221 (MemoryOver) facility sessmgr instance 16 card 1 cpu 0
allocated 204800 used 220392
Internal trap notification 1222 (MemoryOverClear) facility sessmgr instance 16 card 1 cpu 0
allocated 1249280 used 219608
Internal trap notification 1217 (MemoryWarn) facility npudrv instance 401 card 5 cpu 0
allocated 112640 used 119588
Internal trap notification 1218 (MemoryWarnClear) facility cli instance 5011763 card 5 cpu 0
allocated 56320 used 46856
Message Bounces
Bounces occur when the sys tem is busy or tasks are not available (not started or recov
er
-
ing), and mes sages between sub sys
tems do not reach the in tended re cip
i
ent. These
bounces can be used to help de ter
mine where mes sages got lost (one sub
sys
tem or the other).
Mes sage code will in
di
cate whether the re
cip
i
ent was not pre
sent or whether the recip
i
ent was
busy (timeout).
Bounces typ i
cally happen due to soft ware is sues on tasks, but they are also in
di
ca
tors which
can be used to iso late a problem to a sin gle card. It is good to check if bounces frequently
occur for spe cific card-desti
nations or task in
stances which could indi
cate a prob
lem for a spe
-
cific card or in
stance.
"show mes
sen
ger bounces" re
quires CLI test-com
mands pass
word
Platform Troubleshooting
Memory leak
If a process contin
ues to al
lo
cate mem ory, but doesn't re
lease it to the task, it will con
tinue to
con sume more and more mem ory resources. This is clas
si
fied as a mem ory leak. In this sit
u
a
-
tion, the mem ory consump tion will keep in creas
ing, and sooner or later the process will out-
grow not just its allo
cated limit, but also the ac tual avail
able mem ory on the card. To trou -
bleshoot mem ory leak is
sues, fol
lowing CLI outputs are re
quired.
• show messenger proclet facility <name> instance <x> heap
• show messenger proclet facility <name> instance <x> system heap
Fragmentation
If a task does not ap
pear to be leaking mem ory, but is over its speci
fied sys
tem lim
its, as seen in
"show task re sources" out put, it could be that the task has frag mented mem ory allo
ca
tion.
Mem ory fragmenta
tion is usually a result of repet i
tive al
lo
ca
tion/free ing of small mem ory
blocks. This creates many free chunks with out any of them big enough to be re-used. To trou -
bleshoot mem ory fragmenta
tion is
sues, fol
low
ing CLI out puts are required.
• show messenger proclet facility <name> instance <x> heap
• show messenger proclet facility <name> instance <x> system heap
Console Logs
If SSD is collected, please inspect the debug con sole card <> cpu <> tail 10000 only com -
mand out put for the rel
evant card(s) and ob serve if there are any recent kernel messages. If
present, these may pro vide an indi
ca
tion of re
source issues. Please note that in older ver
sions
of StarOS, the begin
ning of the lines in the console logs are Unix time
stamps.
Fol
low
ing is an ex
am
ple of how to trans
late unix
time into UTC with date com
mand on any
Linux server:
CLI com
mands below re
quire the CLI test-com
mands pass
word:
• show task resources table - Check which task is utilizing high CPU
• Collect the output of the following commands in hidden mode: "show profile facility
<task-facility> active instance <instance> depth 1" where depth could be 1,4 and/or 8
• Collect the core dump for any of the affected tasks with high CPU using "task core
facility <task-facility> instance <instance-value>"
• Collect SSD when some process/facility is facing high CPU state.
Platform Troubleshooting
Licensing
Overview
StarOS requires a proper li
cense to be installed for op
era
tion. Li
cense keys de
fine ca
pac
ity lim
-
its (num
ber of al
lowed sub scriber ses
sions) and available fea
tures on the sys
tem. Univer
sal type
cards, such as UDPC / UMIO on ASR 5500 re quire a card li
cense as well.
Fea
tures
Enabled Features:
Feature Applicable Part Numbers
-------------------------------------------------- -----------------------------
IPv4 Routing Protocols [ none ]
Enhanced Charging Bundle 2: [ 600-00-7574 ]
+ DIAMETER Closed-Loop Charging Interface [ None ]
+ Enhanced Charging Bundle 1 [ 600-00-7526 ]
Session Recovery [ 600-00-7513 / 600-00-7546
600-00-7552 / 600-00-7554
600-00-7566 / 600-00-7594
600-00-9100 / 600-00-9101
600-00-7638 / 600-00-7640
600-00-7634 / 600-00-7595
600-00-7669 / None
600-00-7897 / None
600-20-0145 / None
None / None
None ]
Platform Troubleshooting
Ses
sions
Session Limits:
Sessions Session Type
-------- -----------------------
50000 ECS
50000 PGW
Cards
Log Collection
• show license info - To check what license is installed.
Platform Troubleshooting
Example Scenarios
Prob
lem:
If the fol
low
ing li
cense is in
stalled, then new calls still can be es
tab
lished.
Al
ways On Li
cens
ing [ 600-20-0111 ]
Sat Feb 21 23:13:32 2015 Internal trap notification 65 (LicenseExceeded) license exceeded for
Enhanced Charging Service service; current sessions 1010 exceeds license 1000
Sat Feb 21 23:13:32 2015 Internal trap notification 65 (LicenseExceeded) license exceeded for
PGW service; current sessions 1010 exceeds license 1000
If the "Al
ways on" li
cense is not in
stalled, new calls would be re
jected, and the logs and snmp
traps below will be seen:
Sat Feb 21 23:21:52 2015 Internal trap notification 65 (LicenseExceeded) license exceeded for
Platform Troubleshooting
Enhanced Charging Service service; current sessions 1010 exceeds license 1000
Sat Feb 21 23:21:52 2015 Internal trap notification 65 (LicenseExceeded) license exceeded for
PGW service; current sessions 1010 exceeds license 1000
License issue
Prob
lem De
scrip
tion:
The li
cense key needs to match a stored value in hard
ware.
If the li
censes do not match, then the fol
low
ing mes
sage will be seen with the "show li
cense
info" output:
ASR 5000
ASR 5500
Log
ging
HD RAID
Overview
ASR 5000/ASR 5500 has hard dri ves in cer
tain cards which are used to store dif
fer
ent in
for
ma-
tion. In this chap
ter, hard disk ar
chi
tecture in both plat
forms, and how to troubleshoot raid is
-
sues are cov ered.
In the ASR 5000 the RAID HD (Hard Disk) is used to store data, spe
cially CDR records.
The HD Raid is hosted on the SMC cards. Each SMC has two disks of 147 GB each.
The ASR 5000 RAID disks are called hd-lo cal1 (ac
tive SMC) and hd-re
mote-1 (standby SMC). The
disk par
ti
tion has a total size of 146 GB using RAID1 (1+1) disks.
In the ASR 5500 the disks are called hd13, hd14, hd15 etc. The total ca
pac
ity of the RAID de
-
pends on the number of FSC cards.
HDCTRL task. The hard disk controller task is called HDCTRL. It is responsible for disk
management, RAID setup, file system management and Network File System (NFS) support. It
also provides disk monitoring, file synchronization, basic free space management and simple
cleanup support.
Troubleshooting HD RAID
The fol
low
ing are gen
eral trou
bleshoot
ing com
mands that are use
ful for trou
bleshoot
ing of HD
RAID is
sues:
Platform Troubleshooting
Example Sceanrios
Prob
lem De
scrip
tion:
Total un
cor
rected er
rors in
cre
ment
ing con
tin
u
ously with the com
mand "show hd smart ...":
Logs col
lected:
Res
o
lu
tion:
If the er
rors are in
cre
ment
ing con
tin
u
ously, the hard
ware must be re
placed; con
tact Cisco for
assis
tance.
Prob
lem De
scrip
tion:
When chas
sis de
tects RAID is de
graded, it will trig
ger the fol
low
ing snmp trap:
Analy
sis:
Res
o
lu
tion:
• Confirm the card that's faulty is in standby mode. In the above example, it is card 8.
Make card 9 active and card 8 as standby using the command 'card switch from 8 to
9'. Make sure that "hd-remote1" is showing "Valid image".
• Open a second connection to the chassis and active logging for hdctrl to
monitor formatting/overwriting progress using the following commands
# log
ging fil
ter ac
tive fa
cil
ity hd
c
trl level info
# log
ging ac
tive
• Below CLI command requires CLI test-commands password configured in the chassis.
• Overwrite standby SMC’s hard drive (remote1) using:
# hd raid over
write re
mote1
Platform Troubleshooting
• If command fails then try to reset the remote1 using the command:
# hd raid re
set-phy re
mote1
# hd raid over
write re
mote1
The log
ging in the sec
ond con
nec
tion will show the progress as well.
# no log
ging ac
tive
Analy
sis:
Over
write hard drive and re-synch RAID dur
ing a main
te
nance win
dow.
Res
o
lu
tion:
• Check HD RAID
• Open second connection to the chassis and active logging for hdctrl to
monitor overwrite progress:
# log
ging fil
ter ac
tive fa
cil
ity hd
c
trl level info
# log
ging ac
tive
• Below CLI command requires CLI test-commands password configured in the chassis.
• Overwrite hard drive with the issue (e.g. hd14, hd15, hd16, hd17)
# hd raid over
write < hd14, hd15, hd16, hd17>
• Check overwrite progress with "show hd raid verbose". The logging will show overwrite
progress as well.
• Disable logging in second connection:
# no log
ging ac
tive
Fol
low
ing logs ob
served in the sys
logs for FSC cards.
Analy
sis:
In the out
put above the hard drive firmware on the disk, hd16a should be up
graded dur
ing the
mainte
nance window (MW).
Platform Troubleshooting
Res
o
lu
tion:
• Remove FSC from RAID. This is needed before a firmware upgrade. To preserve RAID
data, make sure RAID is not degraded before issuing this command:
# hd raid re
move hd16 -force
Note that "-force" is needed to disable RAID0 on the two disks for fur
ther steps, and the com-
mand should be ex ecuted even if a pre
vi
ous "hd raid re
move hd16" had been exe
cuted (without
"-force").
# debug hd
c
trl up
grade-firmware hd16a
# debug hd
c
trl up
grade-firmware hd16b
Platform Troubleshooting
# hd raid in
sert hd16
Software Crashes
Overview
As with any soft ware, there are times when sit
u
ations arise in pro
cess
ing where the underly
ing
logic fails and the soft ware task must restart it
self to get back to proper work ing order. This
sec
tion de tails how to check for such crash events in the sys tem, and the in
for
ma
tion to col
lect
so is
sues re
sult
ing in crashes can be ad
dressed by Cisco.
Troubleshooting Commands
Use the com
mands below to get more in
for
ma
tion about crashes on the sys
tem:
• show crash list
• show crash number [number].
When re
port
ing this crash to Cisco TAC sup
port, the fol
low
ing info needs to be col
lected:
Fri Dec 26 08:32:20 2014 Internal trap notification 73 (ManagerFailure) facility sessmgr
instance 188 card 7 cpu 0
Fri Dec 26 08:32:20 2014 Internal trap notification 150 (TaskFailed) facility sessmgr instance
188 on card 7 cpu 0
Fri Dec 26 08:32:23 2014 Internal trap notification 1099 (ManagerRestart) facility sessmgr
instance 139 card 4 cpu 1
Fri Dec 26 08:32:23 2014 Internal trap notification 151 (TaskRestart) facility sessmgr
instance 139 on card 4 cpu 1
Please mon
i
tor logs for the fol
low
ing mes
sage after a crash [show logs]:
The fol
low put of the show crash list com
ing is the out mand:
sim
i
lar for mini core: mc-crash-<card>-<cpu>-<unix
time>-core
The unix time is the unix-epoch in hex. This timestamp can be com pared, along with the card
num ber and time from the show-crash list
ing, to iden
tify which full core maps to which crash.
If 2 crashes happen on the same card/cpu around the same time frame, only the first crash will
gen er
ate a core. The sys
tem can
not gen
er
ate more than one core file at a time.
For example in the core file 'crash-07-00-5343ad73-core', when the core is gen
er
ated can be
iden
ti
fied by using the fol
lowing sim
ple Linux shell script.
http://
www.
epochcon
verter.
com/
also can be used to con
vert unix time to GMT.
export TZ=UTC;
date -d @`printf %d 0x5343ad73`;
Tue Apr 8 08:04:03 UTC 2014;
Platform Troubleshooting
Perl ex
am
ple:
Other be
hav
iors within "show crash list" to be aware of:
• Similar crashes will be consolidated into one record. The record does display the
number of times this crash type has occurred.
• Sometimes only a core file is generated while there is no record in show crash list. This
could be a crash of non-StarOS processes such as telnet, or stack overflow.
The fol
low
ing Linux com
mand shows the dis
tri
b
u
tion of the crashes per cards:
In this example there are 21 crashes on card 15, 19 crashes on card 6 and 11 crashes on card
13. In cases where mul ti
ple ac
tive cards are on the chas
sis, and crashes are re
lated only to spe
-
cific cards, trou bleshooting would re quire more de tailed analysis of the info re lated to
Platform Troubleshooting
Sim
i
larly, the fol
low
ing will pro
vide dis
tri
b
u
tion in
for
ma
tion of crashes per in
stance(s) of the
process:
Pro
vide mini-core and SSD (show sup
port de
tails)
Ver
ify if core is cre
ated and trun
cated (con
fig change if trun
cated)
Ex
am
ples of suc
cess
ful core-trans
fer [show logs]:
internal system critical-info syslog] Crash handler file transfer ended (type=2 size=0
child_ct=0 core_ct=0 pid=21685 status=1 elapsed=13s)
Ex
am
ples failed core-trans
fer [show logs]:
$ mv crash-07-00-5343ad73-core crash-07-00-5343ad73-core.gz
$ gunzip crash-07-00-5343ad73-core
$ echo $?
0
If gun
zip doesn't re
port any error, and out
put of the echo $? is 0, then the core trans
fer was
success
ful:
$ mv crash-16-00-51e7259a-core crash-16-00-51e7259a-core.gz
$ gunzip crash-16-00-51e7259a-core.gz
gunzip: crash-16-00-51e7259a-core.gz: unexpected end of file
$ echo $?
1
If gun
zip re
ports an error such as "un
expected end of file", and output of the echo $? is 1, then
core transfer was not success
ful and the core file is trun
cated. If "failed core trans
fer" sys
log
Platform Troubleshooting
mes
sages are ob
served, or core file is trun
cated, then please in
crease tem
po
rary crash max
i
-
mum size with con
fig
u
ra
tion com
mand:
NOTE: It is advised to keep this value de fault, and if changed, to change it only tem porar
ily if
the "failed core trans
fer" sys
log message is seen, in order to capture the re
quired core file dur-
ing the next occurrence. Once the core file is provided to Cisco Support, and it is con
firmed to
be good, please return the size value back to de fault.
Session Recovery
This topic is also han
dled in the SW ar
chi
tec
ture chap
ter where it's ap
proached from an ar
chi
-
tec
tural point of view.
If en
abled, to check the sta
tus of ses
sion re
cov
ery:
Ses
sion Re cov ery is based around three man agers within the chas sis; SessMgr, AAAMgr, and
VPN
Mgr. The Sess Mgr and AAAMgr back each other up by in stance num ber. For ex am ple, Sess
-
Mgr instance 100 is paired with AAAMgr in stance 100. If one were to crash, the other could re -
build the session info. VPNMgr comes into play with IP al lo
ca
tions and reserva tions for the ses-
sions. The VP N Mgr instance ID is equal to the context ID where the IP pools are con fig
ured for
the chassis. If there are mul ti
ple con
texts with IP pools, there could be mul ti
ple VPN Mgrs in
-
volved in a recovery event. As part of the require
ments for Ses sion Re
covery, all three man agers
must live on dif fer
ent cards in the chas sis. Sess
Mgr and AAAMgr will live on ac tive processing
cards whereas VP NMgr will live on the demux card.
Ses
sion Recov
ery requires a mini
mum of one Standby proces sor card (rec
om men dation is two
processor cards to cover double fail
ure sce
nar
ios) as well as a ded
i
cated Demux card. In 5500
the Demux can be ei ther on MIO or DPC. In 5000, Demux must be on a PSC. Given the hard -
ware require
ments, for there to be a true hardware-re lated re
source issue which would cause
problems with ses
sion recov
ery, there would need to be two proces sor card fail
ures.
Platform Troubleshooting
Sys
tem could go into a state where it's "Not ready" to per form session re
covery. Single crash
events are typ i
cally han dled by the sys tem and do not re sult in any major loss of sub scribers,
data, or billing info. In rare cases where there are cas cading (one right after another) crashes of
one of the man agers, it is possi
ble that the re
covery of sessions may not hap pen cor rectly. For
ex
am ple, 2 sess mgr crashes on the same card and same CPU at the same time. Ad di
tionally,
multi
ple fail
ures across mul ti
ple man agers could also cause is sues with Session Recov ery. In
any of these crash cases, Cisco tech ni
cal sup
port should be con tacted.
In order to ana
lyze how many ses sions are re
cov
ered, and the im
pact of the sess
mgr crash, an
-
a
lyze the fol
lowing traps from the command show snmp trap his tory ver
bose out
put:
Tue Apr 07 15:02:49 2015 Internal trap notification 183 (SessMgrRecoveryComplete) Slot Number
3 Cpu Number 0 fetched from aaa mgr 1849 prior to audit 1849 passed audit 1830 calls recovered
1830 all call lines 1830 time elapsed ms 3280.
Tue Apr 07 15:02:49 2015 Internal trap notification 183 (SessMgrRecoveryComplete) Slot Number
3 Cpu Number 0 fetched from aaa mgr 1848 prior to audit 1848 passed audit 1836 calls recovered
1837 all call lines 1837 time elapsed ms 3277.
Tue Apr 07 15:02:49 2015 Internal trap notification 183 (SessMgrRecoveryComplete) Slot Number
3 Cpu Number 0 fetched from aaa mgr 1858 prior to audit 1858 passed audit 1851 calls recovered
1851 all call lines 1851 time elapsed ms 3345.
Tue Apr 07 15:02:49 2015 Internal trap notification 183 (SessMgrRecoveryComplete) Slot Number
3 Cpu Number 0 fetched from aaa mgr 1844 prior to audit 1844 passed audit 1836 calls recovered
1837 all call lines 1837 time elapsed ms 3315.
The follow
ing simple Linux script will help with analy sis/troubleshoot ing. It will pro
vide the
dif
fer
ence between total calls and re
cov
ered calls so it is possi
ble to eval
uate the real im
pact of
the sessmgr crash.
ARP Troubleshooting
Overview
The fol
low
ing chap
ter cov
ers the basic Ad
dress Res
o
lu
tion Pro
to
col (ARP) trou
bleshoot
ing.
The fol
low
ing com
mands are help
ful in iden
ti
fy
ing ARP en
tries and clear
ing them. As every con
-
text is a vir
tual router, arp en
tries are for that par
tic
u
lar con
text:
• show ip arp
• clear ip arp
Sam
ple Out
put:
[local]SPGW-1#
[local]SPGW-1# context spgw
[spgw]SPGW-1#
[spgw]SPGW-1#
[spgw]SPGW-1# show ip arp
Flags codes:
I - Incomplete, R - Reachable, M - Permanent, S - Stale,
D - Delay, P - Probe, F - Failed
1 Logging can be enabled to identify low level information about ARP. This should only be
done upon recommendation from Cisco Support as increasing the logging too high may
risk stress on the system and impact subscribers:
• logging filter active facility ip-arp level unusual
• logging filter active facility vpn level unusual
• logging filter active facility npumgr level unusual
• logging active
Once re
quired log
ging out
put is col quired to do 'no log
lected, it is re ging ac
tive'.
Platform Troubleshooting
Overview
Major com
mands and multi
ple troubleshoot ing sce
nar
ios re
lated to cards and ports in the ASR
5000/ASR 5000 are cov
ered in this chapter.
Troubleshooting commands
For troubleshoot
ing ASR 5000 / ASR 5500 hard
ware prob
lems, fol
low
ing com
mands/logs
are used
• show system uptime - Check if the system recently reloaded
• show card table all - Shows cards operational states
• show card diag <slot> - Verify "Current Failure", Last Failure reason, environmental
variables
• show cpu table - Monitor CPU utilization, see if it exceeds 70% on any of the cards
• show cpu info - Monitor CPU usage, file usage, memory usage
• show card memory - Monitor memory errors
• show cpu errors - Monitor CPU errors
• show rct stat - Found in 'show support details' prior to 16.0. Monitor if any card
migration, switchovers and shutdowns happened recently
• show port table all - Monitor status of the interfaces
• show port transceiver <slot/port> - applicable to ASR 5500 only. Monitor RxPower /
TxPower
• show port utilization table - Monitor port utilization, observe anomalies
• show port utilization graph - Monitor port utilization, observe anomalies
• show port datalink counters <slot/port> - Monitor datalink counters, look for errors
• show port npu counters <slot/port> - Monitor for errors
• show logs - Monitor recent logs, search for anomalies
• show snmp trap history verbose - Identifies recent SNMP traps
• show alarm outstanding - Monitor if any alarms are still outstanding
Platform Troubleshooting
Port Status
When BAD, ERR, OVF, or Disc coun ters in "show port datalink coun ters" are ex
cessively in
-
creas ing in a short pe
riod of time, then the fiber-op
tic cable should be in
ves
ti
gated as a poten
-
tial layer 1 issue. For ex ample, re
plac
ing the fiber cable, cleaning connec
tors and check ing
local/re mote ports.
When trou
bleshoot
ing NPU/port-re
lated is
sues ex
e
cute the "show port npu coun
ters". If any
of the follow
ing counters is non-zero and exces
sively in
creas
ing , then there might be a con
fig
-
u
ration issue or the port is re
ceiv
ing bad frames/pack ets.
Example Scenarios
Prob
lem de
scrip
tion:
Analy
sis:
Col
lect 'show sup
port de
tails'. In the SSD out
put 'show card diag ' shows the last rea
son for fail
ure.
An
a
lyze 'show logs' of SSD for card 1 boot process
The above logs show that the card failed to boot due to a mem ory prob
lem. After try
ing to boot
the card sev
eral times, due to mul
ti
ple fail
ures, the card is shut down by the sys
tem.
Every card has console logs. When a card boots, all the kernel log in
formation will be in 'debug
con
sole card <card no> cpu <0/1> tail 4000' within the SSD. As the logs above are re lated to
card 1, 'debug con
sole card 1 cpu 0' has the fol
low
ing in
for
ma
tion con firm
ing the mem ory prob -
lem.
1430285212.787 card 1-cpu0: Ophir 82571 Ethernet controller 0x10608086 (Serdes) on 2/0/0
1430285212.787 card 1-cpu0: WARNING: Memory size 24576 MB for cpu0 not matching with value
32768 MB in IDEEPROM
Platform Troubleshooting
The num ber 1430285212.787 in the above log is date and time in epoch for
mat. Converting that
re
sults in Wed, 29 Apr 2015 05:26:52 GMT. From ver sion 17.0, epoch for
mat in the line is re
-
placed with date and time format.
Res
o
lu
tion:
Prob
lem De
scrip
tion:
The io
er
r_cnt for a hd is in
creas
ing in the logs
Analy
sis
[local]ASR5500# show hd smart hd17a Device: STEC Z16IZF2D-200UCT Version: E125 Serial
number: STM00000000 Device type: disk Transport protocol: SAS Local Time is: Fri Aug 10
16:30:18 2012 UTC Device supports SMART and is Enabled Temperature Warning Enabled SMART Health
Status: FAILURE PREDICTION THRESHOLD EXCEEDED: ascq=0xb [asc=5d, ascq=b] Current Drive
Temperature: 46 C Drive Trip Temperature: 75 C
measure of the number of failures. These can also be seen in the logs)
Res
o
lu
tion
Prob
lem De
scrip
tion
Firmware needs to be up graded. When ASR 5000 / ASR 5500 chas sis is upgraded to a newer
StarOS version, all cards in the chas
sis should upgrade to the firmware ver sion in that StarOS
ver
sion. Sometimes, a few cards may not get up graded to the correct firmware ver sion. In that
sce
nario, a manual upgrade of firmware is needed for those cards.
The out
pout below shows that firmware is out of date by using the com
mand 'show card hard
-
ware'.
Description : DPC_CRYPTO_DC
Cisco Part Number : 73-14558-01 B0
UDI Serial Number : SAD00000000
Analy
sis:
Res
o
lu
tion:
• Put the card in standby mode using the command ‘card migrate from <slot no.> to <slot
no.>
• Use the command ‘card upgrade <card no.>’ to upgrade the firmware. Card upgrade
takes few minutes and after upgrade the card will reboot.
The up
grade process can be ob
served in 'show logs'
success
2015-May-06+09:14:30.188 [csp 7715 info] [5/0/18288 <cspctrl:0> pctrl_helpers.c:14470]
[software internal system critical-info syslog] upgrade of DPC_BCF_FPGA in slot 9 started
2015-May-06+09:14:30.186 [csp 7715 info] [5/0/18288 <cspctrl:0> pctrl_helpers.c:14387]
[software internal system critical-info syslog] upgrade of DPC_CAF_FPGA in slot 9 started
2015-May-06+09:14:28.751 [csp 7043 info] [5/0/18288 <cspctrl:0> spctrl_events.c:14169]
[hardware internal system critical-info syslog] Upgrade issued for the Data Processing Card
with serial number SAD00000000 in slot 9. Will upgrade 0x4180800000000000000000. Please
wait...
2015-May-06+09:13:48.385 [csp 7314 warning] [5/0/18288 <cspctrl:0> spctrl_events.c:2545]
[software internal system critical-info syslog] The DPC-CFE programmable on the Data Processing
Card in slot 9 is an out of date version
2015-May-06+09:13:48.385 [csp 7314 warning] [5/0/18288 <cspctrl:0> spctrl_events.c:2545]
[software internal system critical-info syslog] The DPC-CFE programmable on the Data Processing
Card in slot 9 is an out of date version
2015-May-06+09:13:42.908 [csp 7009 info] [5/0/18288 <cspctrl:0> pctrl_helpers.c:3078]
[hardware internal system critical-info syslog] Migrating all tasks on the Data Processing Card
in slot 9 to the one in slot 2.
LAG Troubleshooting
Overview
This section cov
ers LAG im
ple
men
ta
tion and trou
bleshoot
ing, along with some ex
am
ple
issue sce
nar
ios.
LAG Description
Two or more phys i
cal ports can be log i
cally combined in order to in crease the band width capa-
bil
ity and to create more re silient connections. A Link Ag gregation Group (LAG) works by ex -
chang ing control packets via Link Ag gregation Control Proto
col (LACP) over con fig
ured phys i
-
cal ports with peers to reach agree ment on an ag gre ga
tion of links as de fined in IEEE 802.3ad.
The LAG sends and re ceives the con trol packets directly on phys i
cal ports. Link ag gre
gation
(also called trunking or bond ing) provides higher total band width, auto-ne goti
a
tion, and re
cov -
ery, by com bin
ing paral
lel net
work links be tween de vices as a single link.
In the stan
dard LAG con
fig
u
ra
tion, the first port con
fig
ured is se
lected as Mas
ter Port.
In ASR 5000, ports are not paired (un less SSR is used, but that’s mu tu
ally ex
clu
sive with
LAG). There is only one master port and it has no fixed backup. Be
cause there is only one mas-
ter port, the in
ter
face bind
ing will al
ways match the mas ter port as dis
played in "show port
table".
LAG Troubleshooting
The fol
lowing are use
ful com
mands for trou
bleshoot
ing and health check of the LAG on ASR
5x00 chassis.
• show port table all
+ LAG distributing
~ LAG agreed
* LAG negotiated
! Timeout
Fri Nov 23 01:48:37 2012 Internal trap notification 1204 (LAGGroupDown) card:19, port:1,
partner:(007F,B0-C6-9A-C4-79-F0,0016)
Fri Nov 23 01:48:37 2012 Internal trap notification 1205 (LAGGroupUp) card:19, port:1,
partner:(007F,B0-C6-9A-C4-5F-F0,0016)
2012-Nov-23+01:48:37.321 [lagmgr 179050 warning] [1/0/4059 <lagmgr:0> lagmgr_state.c:1268]
Platform Troubleshooting
[software internal system critical-infosyslog] LAG group 50 (global) with master port 19/1 has
changed partner from (007F,B0-C6-9A-C4-79-F0,0016) on 20/1, 26/1, 28/1 to (007F,B0-C6-9A-C4-5F-
F0,0016) on 19/1, 23/1, 27/1
Fol
lowing is an ex ample of normal oper
a
tion for a LAG with redundant config
u
ra
tion. In the
output below port 19/1 is mas ter port. Ports 19/1, 23/1, 27/1 and 29/1 are in LAG distrib
ut
ing
state (carry traf
fic). Ports 20/1, 26/1, 28/1 and 30/1 are in agreed state.
Fol
low ing is an ex
ample of a system being un able to establish LAG over port 17/1. In the out put
below Port 17/1 is mas ter port for the LAG group but it is NOT in dis trib
ut
ing state (overall that
means that LACP mes sages are tim ing out and chas sis can not es tab
lish LACP over port 17/1).
Ports 18/1 and 20/1 are in dis trib
uting state and carry traf fic. Port 19/1 is in agreed state. In
this case the issue with port 17/1 being un able to es
tablish LAG should be in ves
ti
gated. Fur ther
in
vesti
ga
tion would include check ing the LAG setup in the de vice that this port is connected to
as well as sniffer traces. The com mand "show lag mgr events" com mand is also use ful to trace
LACP events.
24/3 Mgmt RS232 Serial Console Enabled Down Down Active 25/3 L2 Link
24/4 Mgmt BITS T1/E1 Timing Disabled Down Down Active 25/4 L2 Link
25/1 Mgmt 1000 Ethernet Dual Media Enabled Down Up Standby 24/1 L2 Link
25/2 Mgmt 1000 Ethernet Dual Media Disabled Down Down Standby 24/2 L2 Link
25/3 Mgmt RS232 Serial Console Enabled Down Down Standby 24/3 L2 Link
25/4 Mgmt BITS T1/E1 Timing Disabled Down Down Standby 24/4 L2 Link
26/1 Srvc 10G Ethernet Enabled Up Up Active None LA~ 19/1
27/1 Srvc 10G Ethernet Enabled Up Up Active None LA+ 19/1
28/1 Srvc 10G Ethernet Enabled Up Up Active None LA~ 19/1
29/1 Srvc 10G Ethernet Enabled Up Up Active None LA+ 19/1
30/1 Srvc 10G Ethernet Enabled Up Up Active None LA~ 19/1
37/1 Srvc 1000 Ethernet Enabled - Up - 21/1 L2 Link
Untagged Enabled Down - Standby - -
Tagged VLAN 6 Enabled Down - Standby - -
38/1 Srvc 1000 Ethernet Enabled - Up - 22/1 L2 Link
Untagged Enabled Down - Standby - -
Tagged VLAN 7 Enabled Down - Standby - -
Below is an ex
ample of a LACP time out. In the output below, the chas sis is un
able to es
tablish
LACP via port 23/1 due to time outs. To iden tify the time
outs run the fol low
ing CLI com -
mands: 'show port utiliza
tion table', 'show port datalink coun ters'. Also check LACP sta tus in the
de
vice to which this port is connected to.
Example Scenarios
Multiple ports bounces in LAGs followed by LAGs lockup and unable to process any traffic due to
flow control disabled.
Prob
lem ob
served:
Logs col
lec
tion:
Analy
sis:
Moni
toring the 'show npu stats debug slot <>' out put showed that rx-lag-cntl-pkt-cnt counter
was con stant and not in cre
ment ing. This in
dicates that the LCs got stuck/clogged (prob a
bly
due to flow con trol not working cor rectly, this is yet to be root caused) lead
ing to the LC to PSC
NPU com muni
cation get
ting lost.
With a bro
ken LC to NPU path, LACP pack ets from the LC could not reach cor
re
spond
ing NPU
and hence do not get processed. This re
sults in LAG down/bounce.
Workaround:
Re
set
ting in
di
vid
ual LC in LAG group nor
mal
ized the LAGs for short time.
Res
o
lu
tion:
En
able the flow con
trol on next hop switch to avoid traf
fic storm and LC clog
ging.
Con
sider ad
just
ing LACP time
out val
ues on next hop switch for a faster con
ver
gence.
LACP period short <<<< To let 5k send 1 LACP PDU per sec ond, in
stead of 1 per 30 sec
onds.
flow-control bidi
rec
tional <<<<< to slow down the flow rate from Router to ASR 5x00 has sent
flow con
trol beacon to Router when ASR 5x00 is throt
tled.
Platform Troubleshooting
by default
Local
Te0/1/1/1 1s ascdA--- 0x8000,0x0005 0x0001 0x8000,40-55-39-60-51-f4
Overview
This sec
tion pro
vides basic health checks for mon
i
tor
ing Switch Fab
ric and NPU re
lated is
sues.
Switch Fabric
The switch fab ric pro
vides back
plane con
nectiv
ity among all the cards in ASR 5500 chas -
sis. Both con
trol plane and data plane con
nec
tiv
ity within the chas
sis are through the switch
fabric.
Ad
di
tion
ally, to check health of the switch fab ric on ASR 5500, please in
spect in SSD for the
command "show fab ric support de
tails" and check all sub-com
mands: "show serdes all-serdes
summary" and con firm if the Last Change is GOOD for all FAP (Fabric Ac
cess Proces
sor - can
be DPCs or MIOs):
Platform Troubleshooting
Fabric Status:
Status OK(+)------------------------+
Topology fault(T)------------------+|
|||||||||||| |||||||
Source Dev SL FL vvvvvvv vvvvvvvvvvvv vvvvvvv Rate Topology CRC Errs Remote Dev SL FL
Last Change
------- --- -- -- ------- ------------ ------- ------------ -------- -------- ------- --- -- -- ------
----------------
Also, check "show npu stats sft" com fails" count is in
mand in SSD to see if "lost_ creas
ing or
not, as that in
di
cates a loss of in
ter
nal fab
ric mon
i
tor
ing pack
ets.
lost_fails --------------------------------------------------------------------------+
bonus_fails -------------------------------------------------------------------+ |
length_fails ------------------------------------------------------------+ | |
data_fails --------------------------------------------------------+ | | |
last_valid_packet_number ---------------------------------+ | | | |
valid_rcvd_pkts ---------------------------------+ | | | | |
received_pkts --------------------------+ | | | | | |
sent_pkts ---------------------+ | | | | | | |
gen_pkts -------------+ | | | | | | | |
cpu/inst + | | | | | | | | |
sl | | | | | | | | | |
2 0/0 -> 12 12 12 12 12 -> 0 0 0 0
4 0/0 -> 17042263 17042263 17042263 17042263 17042263 -> 0 0 0 0
16 0/0 -> 89488 89488 89418 89418 89488 -> 0 0 0 70
NPU
The NPU (Net work Proces sor Unit) is re quired to forward a packet from a port to the CPU /
process in side the chas sis, and vice versa. The NPU sub system contains a flow data
base. A flow
is a set of packets with sim i
lar char
acter
is
tics, for ex
am
ple: all HTTP up
link packets from a sub
-
scriber to a spe cific server on the in ter
net.
---------npu--------
npu now 5min 15min
-------- ------ ------ ------
01/0/1 0% 0% 0%
01/1/1 0% 0% 0%
02/0/1 0% 0% 0%
02/1/1 0% 0% 0%
Platform Troubleshooting
03/0/1 0% 0% 0%
03/1/1 0% 0% 0%
04/0/1 0% 0% 0%
04/1/1 0% 0% 0%
05/0/1 0% 0% 0%
05/0/2 0% 0% 0%
05/0/3 0% 0% 0%
05/0/4 0% 0% 0%
06/0/1 0% 0% 0%
06/0/2 0% 0% 0%
06/0/3 0% 0% 0%
06/0/4 0% 0% 0%
07/0/1 0% 0% 0%
07/1/1 0% 0% 0%
08/0/1 0% 0% 0%
08/1/1 0% 0% 0%
09/0/1 0% 0% 0%
09/1/1 0% 0% 0%
10/0/1 0% 0% 0%
10/1/1 0% 0% 0%
TX : Csix Tx ( 0.00%)
M1a : Frame Parser ( 0.00%)
M2a : Flow Lkup ( 0.00%)
M3 : IP Fwd ( 0.00%)
M4a : Frame Modify ( 0.00%)
M5 : IP Fragmnt ( 0.00%)
REAS : IP Reassemb ( 0.00%)
ACL : ACL/Exceptn ( 0.00%)
If re
porting a sus
pected NPU issue to Cisco Sup lect SSD, "show sup
port, please col port de
tails
to file /flash/[name]. tar.
gz com
press".
Platform Troubleshooting
Session Imbalances
Overview
Within StarOS system there are dif fer
ent ways in which ses
sions may be come unbal
anced. As-
suming normal op
er
a
tion, calls within the sys
tem should be equally dis
trib
uted among the vari
-
ous Active Ses
sion Manager (Sess Mgr) tasks. There are is
sues that could come up within the
sys
tem where a noted im bal
ance in ses
sion dis
tri
b
u
tion may be a symp tom of a prob
lem ei
ther
within StarOS or within the net
work. Some ex amples of ses
sion im
bal
ance would be:
1 Some SessMgr instances have more or fewer calls than others.
3 Some call types are notably off from their expected session counts.
Troubleshooting
This sec
tion will cover de
bug
ging the three sce
nar
ios listed above.
1) Some Sess
Mgr in
stances have more or fewer calls than oth
ers.
Iden
tify the task in
stance(s) of SessMgr which has the un ex
pected session count. Also con firm
that the CPU and Mem ory uti
liza
tion for this task are within the de
fined limits. If the task is
over on CPU or Mem ory, please refer to the sec tion on CPU/Mem ory issues in the soft ware
trou
bleshoot ing chap
ter.
cpu facility inst used allc used alloc used allc used allc S status
----------------------- --------- ------------- --------- ------------- ------
1/0 sessmgr 5008 0.2% 50% 82.41M 230.0M 32 500 -- -- S good
1/0 sessmgr 20 12% 100% 444.3M 900.0M 43 500 4068 12000 I good
1/0 sessmgr 27 11% 100% 446.0M 900.0M 39 500 4056 12000 I good
1/0 sessmgr 59 11% 100% 445.6M 900.0M 40 500 244 12000 I good
1/0 sessmgr 72 13% 100% 447.5M 900.0M 42 500 4088 12000 I good
1/0 sessmgr 118 13% 100% 446.4M 900.0M 43 500 4071 12000 I good
Platform Troubleshooting
1/0 sessmgr 121 11% 100% 446.0M 900.0M 40 500 4076 12000 I good
1/0 sessmgr 164 12% 100% 443.8M 900.0M 42 500 4067 12000 I good
1/0 sessmgr 169 12% 100% 444.3M 900.0M 43 500 4077 12000 I good
1/0 sessmgr 191 11% 100% 447.8M 900.0M 43 500 4083 12000 I good
1/0 sessmgr 220 12% 100% 444.4M 900.0M 42 500 4070 12000 I
• Check for recent calls. Compare the SessMgr in question to ones with normal load. In
this example, SessMgr instance 59 (as seen above) is the one with low session count.
Total
Total Subscribers: 48
Run this same com mand against other SessMgr instances. This will show the num
ber of total
ses
sions that have been con
nected to this in
stance for 60 sec
onds or less.
• Look for logs that report call setup issues for particular SessMgr instances
• If nothing seems out of the ordinary with SessMgr in question .i.e. new calls are
successfully attaching, existing calls are passing traffic, and the manager itself appears
Platform Troubleshooting
in working order, then this issue could have been caused by a transient event (internal
or external), that has self-corrected. If this is the case, check historical data such as
bulkstats, SNMP traps or logs, or outstanding alarms to see if a start time for the
imbalance can be identified.
• Monitor to make sure the session counts continue to equalize (catch up) with other
working SessMgrs over time.
If re
quired to check spe
cific type of calls the fol
low
ing method can be used:
To fil
ter In-progress calls types, the fol
low
ing op
tions can be used with grep in above "show
ses
sion subsys
tem data-info verbose"
CSCF-CALL-AR
RIVED, CSCF-REG
IS
TER
ING, CSCF-REG
IS
TERED, LCP-NEG, LCP-UP, AU
THEN
TI
CAT
ING, BCMCS
SER
VICE AU
THEN
TI
CAT
ING, AU
THEN
TI
CATED, PDG AU
THO
RIZ
ING, PDG AU
THO
RIZED, IMS AU
THO
RIZ
ING, IMS
AU
THO
RIZED, MBMS UE AU
THO
RIZ
ING, MBMS BEARER AU
THO
RIZ
ING, DHCP PEND
ING, L2TP-LAC CON
NECT
ING,
MBMS BEARER CON
NECT
ING, CSCF-CALL-CON
NECT
ING, IPCP-UP, NON-AN
CHOR CON
NECTED, AUTH-ONLY CON‐
NECTED, SIM
PLE-IPv4 CON
NECTED, SIM
PLE-IPv6 CON
NECTED, SIM
PLE-IPv4+IPv6 CON
NECTED, MO
BILE-IPv4
CON
NECTED, MO
BILE-IPv6 CON
NECTED, GTP CON
NECT
ING, GTP CON
NECTED, PROXY-MO
BILE-IP CON
NECT
ING,
PROXY-MO
BILE-IP CON
NECTED, EPDG RE-AU
THO
RIZ
ING, HA-IPSEC CON
NECTED, L2TP-LAC CON
NECTED, HNBGW
CON
NECTED, PDP-TYPE-PPP CON
NECTED, IPSG CON
NECTED, BCMCS CON
NECTED, PCC CON
NECTED, MBMS UE CON‐
NECTED, MBMS BEARER CON
NECTED, PAG
ING CON
NECTED, PDN-TYPE-IPv4 CON
NECTED, PDN-TYPE-IPv6 CON‐
NECTED, PDN-TYPE-IPv4+IPv6 CON
NECTED, CSCF-CALL-CON
NECTED, MME AT
TACHED, HEN
BGW CON
NECTED
Platform Troubleshooting
show session subsystem facility sessmgr debug-info verbose | grep "Current aaa acct archived"
For more info please see the com mand "show ses sion sub sys
tem facil
ity sess mgr all debug-
info" which tracks a ton of information on a sess
mgr instance basis. Any prob lem with a sess-
mgr will most likely be found an a
lyz
ing the out
put of this command, es pe
cially if mul
ti
ple snap
-
shots are taken. Getting to the root cause though may not nec es
sar
ily be as straight
forward.
2) The dis
tri
b
u
tion of ses
sions on a par
tic
u
lar proces
sor card is dif
fer
ent than oth
ers.
In case where all Ses sion Managers on a card are off from the rest of the chas sis, it is more
likely that the behavior being seen on the Sess Mgr in
stances are sympto
matic of an other issue
that affects an en
tire card. The ini scribed in "(1) Some Sess
tial steps to go through are de Mgr in
-
stances have more or fewer calls than oth ers".
Check disconnect reasons. There are many valid rea sons why calls will dis con
nect from the
sys
tem. How ever, in case of an issue on an en
tire proces
sor card, to iden
tify a par
tic
u
lar fail
ure
rea
son this ouput will fur
ther pinpoint where to look in the sys
tem. For a system that has been
up for some time, it might be best to clear the dis
con
nect rea
sons after ini
tial check to see what
is in
cre
ment
ing.
CLI:
Check diam e
ter routing if using diameter proxy. In many con fig
u
rations utiliz
ing di
ame
ter
servers across var i
ous in
ter
faces (ie: s6b, Gx, Gy, Rf, etc.) is done with Di am
e
ter proxy. This
means that each proces sor card main tains its own TCP con nection to dif fer
ent servers in the
net
work. It is pos
si
ble there could be an issue re lat
ing back to one partic
ular card.
CLI:
Look at in
di
vid
ual end points. NOTE: The leading num ber in the "Local Hostname" row in
di
cates
which proces sor card the connec
tion is com
ing from, "0007" in the output below equates to the
proces
sor card in slot 7.
To check if ses
sions are pass
ing traf
fic:
LAPI Devices: 0
pdsn-simple-ipv4: 0 pdsn-simple-ipv6: 0
pdsn-mobile-ip: 0 ha-mobile-ipv6: 0
hsgw-ipv6: 0 hsgw-ipv4: 0
hsgw-ipv4-ipv6: 0 pgw-pmip-ipv6: 104399
pgw-pmip-ipv4: 0 pgw-pmip-ipv4-ipv6: 76648
pgw-gtp-ipv6: 253637 pgw-gtp-ipv4: 0
pgw-gtp-ipv4-ipv6: 194146 sgw-gtp-ipv6: 0
sgw-gtp-ipv4: 0 sgw-gtp-ipv4-ipv6: 0
sgw-pmip-ipv6: 0 sgw-pmip-ipv4: 0
sgw-pmip-ipv4-ipv6: 0 pgw-gtps2b-ipv4: 0
Trou bleshoot ing a sce nario where cer tain call types are failing should start with com mands
men tioned ear lier in this sec tion. In partic
ular "show session disconnect-rea sons" and "show
sub sum mary smgr-in stance <Sess Mgr instance>". Look ing for other services that are either in
a bad state or down, as with the di ame
ter peer ing commands above, would also be a good place
to start. If a time stamp can be found re lat
ing to the start of the call loss, his
tor
i
cal data from the
chassis, or in off-chas sis storage (bulkstats, sys log servers, SNMP trap servers, etc) should be
consulted to see if there were any chas sis or net work events cor re
spond ing to start of the call
degra dation. "show ses sion sub sys
tem fa cil
ity sessmgr" re
ports call type re lated infor
mation.
Logs and data on the chas sis can also be parsed for clues:
~~~ Historic Session counters. Look for big moves in any of the captures. This may help
pinpoint when an issue started.
CDMA
CDMA
CDMA Overview
Overview
This chap ter covers components of a CDMA net work, dif
fer
ent in
ter
faces in the network and
basic call flows in
volved in an UE regis
tra
tion. This chap
ter also in
cludes vari
ous ASR 5000/ASR
500 platform com mands to ver ify function
al
ity of CDMA and avail able troubleshoot
ing com
-
mands.
General Architecture
The ASR 5000/ASR 5500 pro vides wire
less car
ri
ers with a flex
i
ble so
lu
tion that func
tions as a
Packet Data Serv
ing Node (PDSN) in CDMA 2000 wire less data networks.
When support
ing Simple IP data ap
pli
ca
tions, the sys
tem is con
fig
ured to per
form the role of a
Packet Data Serv
ing Node (PDSN) within the car rier's 3G CDMA2000 data net work. The PDSN
ter
minates the mo bile sub
scriber’s Point-to-Point Protocol (PPP) session and then routes data
to and from the Packet Data Net work (PDN) on be half of the subscriber. The PDN can consist of
Wireless Ap
pli
ca
tion Proto
col (WAP) servers or it can be the Inter
net.
When sup porting Mo bile IP and/or Proxy Mo bile IP data ap pli
ca
tions, the sys
tem can be con-
fig
ured to perform the role of the PDSN/For eign Agent (FA) and/or the Home Agent (HA)
within the carrier's 3G CD MA2000 data net work. When func tioning as an HA, the sys
tem can
ei
ther be lo
cated within the car rier’s 3G net
work or in an ex ter
nal enter
prise or ISP net
work.
The PDSN/FAs func tion is to ter minate the mo bile sub scriber’s PPP ses sion, and then
route data to and from the ap propri
ate HA on behalf of the subscriber.
Interface Descriptions
This sec
tion de
scribes the pri
mary in
ter
faces used in a CD
MA2000 wire
less data net
work de
-
ploy
ment.
R-P In
ter
face
Pi In
ter
faces
The Pi inter
face pro
vides connectiv
ity be
tween the HA and its cor
re
spond
ing FA. The Pi in
ter
-
face is used to es
tab
lish a Mo
bile IP tunnel be
tween the PDSN/FA and HA.
PDN In
ter
faces
PDN in
ter
face pro
vides con
nec
tiv
ity be
tween the PDSN and/or HA to packet data net
works
such as the In
ter
net or a cor
po
rate in
tranet.
CDMA
AAA In
ter
faces
Eth
er
net inter
faces carry AAA messages to and from Ra dius account
ing and au then
ti
ca
tion
servers. User-based Radius mes
sag
ing is trans
ported using the Eth
er
net line cards.
The fol
low
ing fig
ure and the text that fol lows pro
vides a high-level view of the steps re
quired to
make a Simple IP call that is ini
ti
ated by the MN to an end host:
CDMA
2 The PCF and PDSN establish the R-P interface for the session.
4 Upon successful LCP negotiation, the MN sends a PPP Authentication Request message
to the PDSN.
5 The PDSN sends an Access Request message to the Radius AAA server.
6 The Radius AAA server successfully authenticates the subscriber and returns an Access
Accept message to the PDSN. The Accept message may contain various attributes to be
assigned to the MN.
8 The MN and the PDSN negotiate the Internet Protocol Control Protocol (IPCP) that
results in the MN receiving an IP address.
9 The PDSN forwards a Radius Accounting Start message to the AAA server, fully
establishing the session allowing the MN to send/receive data to/from the PDN.
10 The PDSN returns a Mobile IP Registration Reply to the MN, establishing the session
allowing the MN to send/receive data to/from the PDN.
11 Upon completion of the session, the MN sends an LCP Terminate Request message to
the PDSN to end the PPP session.
12 The BSC closes the radio link while the PCF closes the R-P session between it and the
PDSN. All PDSN resources used to facilitate the session are reclaimed (IP address,
memory, etc.).
13 The PDSN sends accounting stop record to the AAA server, ending the session.
CDMA
2 The PCF and PDSN establish the R-P interface for the session.
4 The PDSN and MN negotiate the Internet Protocol Control Protocol (IPCP).
CDMA
7 The PDSN/FA sends an Access Request message to the visitor AAA server.
8 The visitor AAA server proxies the request to the appropriate home AAA server.
9 The home AAA server sends an Access Accept message to the visitor AAA server.
11 Upon receipt of the response, the PDSN/FA forwards a Mobile IP Registration Request
to the appropriate HA.
12 The HA sends an Access Request message to the home AAA server to authenticate the
MN/subscriber.
13 The home AAA server returns an Access Accept message to the HA.
14 Upon receiving response from home AAA, the HA sends a reply to the PDSN/FA
establishing a forward tunnel. Note that the reply includes a Home Address (an IP
address) for the MN.
15 The PDSN/FA sends an Accounting Start message to the visitor AAA server. The visitor
AAA server proxies messages to the home AAA server as needed.
16 The PDSN returns a Mobile IP Registration Reply to the MN, establishing the session
allowing the MN to send/receive data to/from the PDN.
21 The MN and PDSN/FA negotiate the termination of LCP effectively ending the PPP
session.
24 The PDSN/FA sends an Accounting Stop message to the visitor AAA server.
25 The visitor AAA server proxies the accounting data to the home AAA server.
PDSN / FA
Overview
This sec
tion cov
ers soft
ware im
ple
men
ta
tion and basic trou
bleshoot
ing for PDSN/FA.
Troubleshooting
Trou bleshooting PDSN/FA re quires an understand
ing of the fol
lowing pro
to
cols and how they
fit into the call flow for SIP and MIP. This chapter will pro
vide trou
bleshooting as
sis
tance on the
following pro
tocols as they apply to both SIP and MIP.
• A10/A11 - link between the PDSN and Packet Control Function (PCF)
• PPP - Link between the PDSN and the subscriber
• Radius authentication and accounting - Link between the PDSN and the Radius server
• Mobile IP - Upper layer protocol residing on PPP and connecting between subscriber
and PDSN, and then further via Pi interface between the FA and HA (the MIP tunnel)
PDSN Service
The PDSN ser vice al
lows for commu ni
cat
ing be tween the PDSN and the PCFs using A11 and
PPP. The var i
ous links be
tween PCFs and the PDSN are spec i
fied using an SPI con fig line for
each link, which in cludes the IP address or range of IP ad dresses of the PCF(s) along with a
unique SPI index value and a pass word (se cret). Other config
urables include IP source vi ola
tion
thresholds, the type of ppp au
thenti
ca
tion, a pointer to the FA desti
na
tion context, and the bind
ad
dress.
fact both types could be handled by one service - The provider in this case im
ple
mented sep
a
-
rate ser
vices. Just the more sig
nif
i
cant pa
ra
meters are shown:
SPI(s):
Remote Addr: 10.204.3.160/27 Description:
Hash Algorithm: MD5 SPI Num: 257
Replay Protection: Timestamp Timestamp Tolerance: 6 (secs)
...
pdsn-service PDSNsvc1
spi remote-address 10.0.0.0/8 spi-number 256 secret testtesttesttest timestamp-tolerance 6
show rp sta
tis
tics [peer-ad
dress <peer ad
dress>] [pcf-sum
mary]
To de
ter
mine the num ber of and suc
cess rate of A11 mes ing, run "show rp sta
sag tis
tics [peer-
ad
dress]". The in
for
mation that can be an
a
lyzed is ex
ten
sive, and here are a few snippets that
may be the most inter
est
ing:
Session Releases:
De-registered: 625228420 Lifetime Expiry: 480874
PPP Layer Command: 3902253655 PCF-Monitor Fail: 0
GRE Key Mismatch: 2760239 Purged via Audit: 0
Other Reasons: 15249053
A10 Stats:
...
Registration Request/Reply:
Total RRQ/Renew/Dereg RX: 1865609874 Total Accept: 1771652785
Total Denied: 74691983 Total Discard: 19265106
Init RRQ RX: 334769449 Init RRQ Accept: 1605714446
Init RRQ Denied: 72939567 Init RRQ Discard: 10136926
Renew RRQ RX: 25059017 Renew RRQ Accept: 2591155576
Renew Actv Start Accept: 1620013447 Renew Actv Stop Accept: 3172517390
Renew RRQ Denied: 1553147 Renew RRQ Discard: 1495762
Dereg RRQ RX: 1505783217 Dereg RRQ Accept: 3360918218
Dereg Active Stop Accept: 2431998799
Dereg RRQ Denied: 199269 Dereg RRQ Discard: 7634227
Reply Send Error: 0 Airlink Seq Num Invalid: 3432185
Intra PDSN Active Handoff Intra PDSN Dormant Handoff
RRQ Accepted: 99469754 RRQ Accepted: 1093610388
Inter PDSN Handoff
Total RRQ Accepted: 2976123951
Init RRQ Accepted: 2940945806 Renew RRQ Accepted: 35178145
Rev-A RRQ RX: 2907024794 Rev-A RRQ Accept: 2902311549
Rev-A RRQ Denied: 1077569 Rev-A RRQ Discard: 3635676
A number of issues can be narrowed down to a specific PCF peer, in which case the [peer-
address <peer address>] qualifier may turn out to be extremely valuable.
show rp [sum
mary | peer-ad
dress <peer>]
A11-re
lated counters/sta
tis
tics sim
i
lar to "show rp stat" dis
cussed above are available with
"show rp sum mary" or on a peer ad
dress basis, though this com
mand is not often used for trou
-
bleshooting.
RP Summary:
430167 RP Sessions In Progress
Registration Request/Reply:
Renew RRQ Accepted: 2578267 Discarded: 0
Intra PDSN Active H/O RRQ Accept: 16682 Intra PDSN Dormant H/O RRQ Accept: 1353494
Inter PDSN Handoff RRQ Accepted: 321301
Reply Send Error: 0
Registration Update/Ack:
Initial Update Transmitted: 1370184 Update Retransmitted: 18122
Denied: 0 Not Acknowledged: 21748
Reg Ack Received: 1366559 Reg Ack Discarded: 5
Update Send Error: 0
Session Update/Ack:
Initial Update Transmitted: 389872 Update Retransmitted: 25
CDMA
Data:
GRE Packets Received: 2339908401 GRE Bytes Received: 2318464329
GRE Packets Sent: 3561128876 GRE Bytes Sent: 2379694759
GRE Packets Sent in SDB Form: 0 GRE Bytes Sent in SDB Form: 0
GRE Packets Received with Segmentation Indication: 0
GRE Packets Sent with Segmentation Indication: 0
Total Successful Reassembly: 0
Total packets processed without proper reassembly: 0
show ses
sion coun
ters pcf-sum
mary [call-types]
PCF IP Addr Sessions Active Dormant SIP MIP PMIP L2TP-LAC BCMCS MISC
----------- -------- ------ ------- --- --- ---- -------- ----- ----
10.204.221.65 577 45 532 0 577 0 0 0 0
10.204.221.66 522 41 481 0 522 0 0 0 0
CDMA
show sub
scriber pcf <pcf ad
dress>
| (z) - ggsn-pdp-type-ipv4v6
| (R) - sgw-gtp-ipv4 (O) - sgw-gtp-ipv6 (Q) - sgw-gtp-ipv4-ipv6
| (W) - pgw-gtp-ipv4 (Y) - pgw-gtp-ipv6 (Z) - pgw-gtp-ipv4-ipv6
| (@) - saegw-gtp-ipv4 (#) - saegw-gtp-ipv6 ($) - saegw-gtp-ipv4-ipv6
| (&) - cgw-gtp-ipv4 (^) - cgw-gtp-ipv6 (*) - cgw-gtp-ipv4-ipv6
| (p) - sgsn-pdp-type-ppp (s) - sgsn (4) - sgsn-pdp-type-ip
| (6) - sgsn-pdp-type-ipv6 (2) - sgsn-pdp-type-ipv4-ipv6
| (L) - pdif-simple-ip (K) - pdif-mobile-ip (o) - femto-ip
| (F) - standalone-fa (J) - asngw-non-anchor
| (e) - ggsn-mbms-ue (i) - asnpc (U) - pdg-ipsec-ipv4
| (E) - ha-mobile-ipv6 (T) - pdg-ssl (v) - pdg-ipsec-ipv6
| (f) - hnbgw-hnb (g) - hnbgw-iu (x) - s1-mme
| (a) - phsgw-simple-ip (b) - phsgw-mobile-ip (y) - asngw-auth-only
| (j) - phsgw-non-anchor (c) - phspc (k) - PCC
| (X) - HSGW (n) - ePDG (t) - henbgw-ue
| (m) - henbgw-henb (q) - wsg-simple-ip (r) - samog-pmip
| (D) - bng-simple-ip (l) - pgw-pmip (u) - Unknown
| (+) - samog-eogre
|
|+----Access (X) - CDMA 1xRTT (E) - GPRS GERAN (I) - IP
|| Tech: (D) - CDMA EV-DO (U) - WCDMA UTRAN (W) - Wireless LAN
|| (A) - CDMA EV-DO REVA (G) - GPRS Other (M) - WiMax
|| (C) - CDMA Other (N) - GAN (O) - Femto IPSec
|| (P) - PDIF (S) - HSPA (L) - eHRPD
|| (T) - eUTRAN (B) - PPPoE (F) - FEMTO UTRAN
|| (H) - PHS (Q) - WSG (.) - Other/Unknown
||
||+---Call (C) - Connected (c) - Connecting
||| State: (d) - Disconnecting (u) - Unknown
||| (r) - CSCF-Registering (R) - CSCF-Registered
||| (U) - CSCF-Unregistered
|||
|||+--Access (A) - Attached (N) - Not Attached
|||| CSCF (.) - Not Applicable
|||| Status:
||||
||||+-Link (A) - Online/Active (D) - Dormant/Idle
||||| Status:
|||||
CDMA
• Because the initial call setup messages indicate the RAN technology, options for EVDO
Rev 0, Rev A, and 1X exist for distinguishing such calls
• Since some issues encountered are specific to a PCF (or range of PCFs), the monitor by
PCF IP option may prove to be very valuable
a) By MSID/IMSI
b) By Username
c) By Callid
d) By IP Address
f) Next-Call
k) Next-1xRTT Call
CDMA
m) Next-CLOSEDRP Call - refers to a Nortel proprietary PDSN service that uses L2TP
protocol
n) Next-EVDO-Rev0 Call
o) Next-EVDO-RevA Call
z) By PCF IP Address
13) Next-OpenRP Call
show rp coun
ters (cal
lid | msid | user
name) <call iden
ti
fier>
RP-related counters can be re trieved with "show rp coun ters (cal
lid | msid | user
name) <call
iden
ti
fier>" and may be useful in very specific trou
bleshoot ing sce
nar ios where counts of mes
-
sag
ing for a sub
scriber over a period of time may be use ful.
Registration Update/Ack:
Initial Update Transmitted: 2 Update Retransmitted: 10
Denied: 0 Not Acknowledged: 12
Reg Ack Received: 0 Reg Ack Discarded: 0
Update Send Error: 0
...
Session Update/Ack:
Initial Update Transmitted: 1 Update Retransmitted: 0
Denied: 0 Not Acknowledged: 0
Sess Update Ack Received: 1 Sess Update Ack Discarded: 0
Update Send Error: 0
CDMA
GRE Receive:
Total Packets Received: 3995 Protocol Type Error: 0
Total Bytes Received: 346893 GRE Key Absent: 0
GRE Checksum Error: 0
Invalid Packet Length: 0
GRE Send:
Total Packets Sent: 2439
Total Bytes Sent: 395120
Total Packets Sent in SDB:0
Total Bytes Sent in SDB: 0
IP Header compression:
Forward: ROHC not negotiated
Reverse: ROHC not negotiated
...
SPI: 257
Prev System Id: 0 Current System Id: 0
Prev Network Id: 0 Current Network Id: 0
Prev Packet Zone Id: 0 Current Packet Zone Id: 0
BSID: 007700050003 GRE Segmentation : Disabled
Registration Request/Reply:
Renew RRQ Accepted: 719 Discarded: 0
Intra PDSN Active H/O RRQ Accept: 0 Intra PDSN Dormant H/O RRQ Accept: 2
Inter PDSN Handoff RRQ Accepted: 1
Reply Send Error: 0
Registration Update/Ack:
Initial Update Transmitted: 2 Update Retransmitted: 10
Denied: 0 Not Acknowledged: 12
Reg Ack Received: 0 Reg Ack Discarded: 0
Update Send Error: 0
Session Update/Ack:
Initial Update Transmitted: 1 Update Retransmitted: 0
Denied: 0 Not Acknowledged: 0
Sess Update Ack Received: 1 Sess Update Ack Discarded: 0
Update Send Error: 0
CDMA
GRE Receive:
Total Packets Received: 3998 Protocol Type Error: 0
Total Bytes Received: 347135 GRE Key Absent: 0
GRE Checksum Error: 0
Invalid Packet Length: 0
GRE Send:
Total Packets Sent: 2440
Total Bytes Sent: 395271
...
PPP Settings
Note: Unlike the way the PDSN ser
vice con
trols the behav
ior for A11, PPP be
hav
ior is controlled
by set
tings in the con
text where the PDSN service is lo
cated, and there is no specific "ser
vice"
CDMA
ppp max-retransmissions 5
ppp dormant send-lcp-terminate
ppp negotiate default-value-options
ppp lcp-start-delay 50
sub
scriber pro
file
• The AAA group that itself describes connectivity to the Radius server(s)
• Primary and secondary DNS servers to be assigned to the subscribers
• Idle and absolute timeouts
• Destination context name
CDMA
The sub scriber profile that will be used is based on the do main of the user name of the sub -
scriber which is presented ei ther thru a PPP CHAP response (SIP) or a MIP Reg is
tra
tion Request
(MIP). For this to hap pen, do main statements are config
ured to pair each pos sible domain with
a named sub scriber pro file. Note that up to the point in time that the user name has not yet
been received in the call flow (MIP or SIP), the default sub
scriber profile with its settings will be
used.
Radius Settings
Ra
dius pro tocol is used to au thenti
cate sub
scribers during the call flow, and to re port data usage
dur
ing a sub scriber's ses sion, or on discon
nect. Settings are con fig
ured in a aaa group sec tion
and include timers, re tries, and the Radius server IPs them selves, along with server pri or
i
ties (or
round robin). Sim i
lar to the above with PDSN SPIs, the en crypted pass words can be viewed with
the showse crets op tion. Please see the Ra dius Chap ter for de
tails on trou bleshoot ing Ra dius,
which will apply to MIP, SIP, or any other pro to
col. The biggest area where is sues are seen is with
reacha
bil
ity to the Ra dius servers, which is reported by "show ra dius coun ters all".
FA Service
The FA service al
lows for passing through com mu nica
tions from the PDSN, to the Pi inter
face
connect
ing the FA and HA ser vices (MIP tunnel). It is NOT used with SIP but with MIP only. The
ser
vice can be lo
cated in the same con text as PDSN, but it is often located in a sep
a
rate (egress
- to
wards the net work/inter
net) context. As with PDSN and PCFs, there are SPI set tings that
CDMA
se
cure the FA-HA con
nec
tion. Other con
fig
urables in
clude the MIP life
time, Reg
is
tra
tion mes
-
sage time
out (be
tween FA and HA) and the bind ad
dress.
The FA func
tion
al
ity is im
ple
mented in famgr fa
cil
ity.
show ser
vice all
show fa-ser
vice all
HA Failover: Disabled
Retrans Timeout: 2 (secs) Retries
Proba
bly the most im portant com mand for the FA service, this tracks all MIP proto
col activ
ity
be
tween the FA and HA ser vices. The peer-ad dress op
tion is fantas
tic for trou
bleshooting is-
sues with par tic
u
lar HA peers, while the fa-ser vice op
tion will narrow down to a par tic
ular FA
ser
vice if there are multi
ple FA services con
fig
ured.
Denied by FA:
Unspecified error: 0 Reg Timeout: 11650
Admin Prohibited: 1051975 No Resources: 3255
MN Auth Failure: 110665414 HA Auth Failure: 4
Lifetime too long: 0 Poorly formed Request: 58490
Poorly formed Reply: 0 MN Too Distant: 0
Invalid COA: 0 Missing NAI: 13
Missing Home Agent: 6438 Missing Home Addr: 0
Unknown Challenge: 3559526 Missing Challenge: 4
Stale Challenge: 0
Encap Unavailable: 0 Rev Tunnel Unavailable: 0
Rev Tunnel Mandatory: 19961 HA Network Unreachable: 0
Delivery Style Unavailable: 0 HA Host Unreachable: 0
HA Port Unreachable: 117 HA Unreachable: 29
Unknown CVSE Rcvd: 0 MIP Key Request: 997189
AAA Authenticator: 260330 Public Key Invalid: 0
Discarded by FA:
Invalid Extn: 0 Invalid UDP Checksum: 1
CDMA
Denied by HA:
FA Auth Failure: 4 Poorly formed Request: 0
Mismatched ID: 121798 Simul Bindings Exceeded:0
Unknown HA: 139821 Rev Tunnel Unavailable: 0
MN Auth Failure: 25538792 No Resources: 64114
Admin Prohibited: 16440820 Rev Tunnel Mandatory: 0
Encap Unavailable: 0 Unspecified Reason: 2459357
Unknown CVSE Rcvd: 0
Registration Revocation:
Sent: 171819969 Retries Sent: 24865
Ack Rcvd: 171073569 Not Acknowledged: 6815
Rcvd: 296327182 Ack Sent: 296276746
Failover:
Number of Failovers (to Alternate HA): 0
Number of Registration Successful after Failover: 0
Number of Registration Failures after Failover: 0
HA Monitoring:
Total Inactivity Timeouts: 0
Total Monitor RRQ Sent: 0
Initial: 0 Retransmit: 0
Monitor RRP Received: 0
Invalid Packets:
Discarded: 23012
Lists all HA peers and the total num ber of sessions ever con
nected and the cur
rent number of
ses
sions con nected. Must spec ify an FA ser vice as part of the command - Can not run this
generi
cally for all FA ser
vices (if multi
ple exist).
General troubleshooting
show ses
sion dis
con
nect-rea
sons
show sub
scriber [sum
mary] [HA <HA ad
dress>]
Use this command to quickly see all the sub scribers by call type. Peer HA ad
dress op
tion al
lows
trou
bleshooting for a spe cific HA ad dress which can be very valu able. Re
move the sum mary
key
word to list all the sub scribers, and add the full keyword to get "show sub full". Shown
below are snippets of call types for CDMA.
HA
Overview
The Home Agent (HA) is the Mo bile IP ses
sion anchor for CDMA 3G calls. The HA pro vides the
lo
ca
tion in the provider net work where the ses sion at
taches. This one at
tach point is what al
-
lows the mo bil
ity of ses
sions to traverse multi
ple PCFs, PDSNs, or FAs. The HA also al lows
roam
ing be
tween an
other provider's FAs to its HA.
A typ i
cal connection on an HA is fairly straight for
ward. The Mo bile IP Session Request will
come in to the HA from an FA. The HA will au then
ti
cate the ses
sions, usually via Ra
dius, and if
suc
cess ful, send a Session-Ac
cept back to the FA. Once the ses sion is at
tached, other ser vices
may come into play, ACS for ex ample, but these are handled in other parts of this doc ument.
Simi
larly, for is
sues with ICSR, please see the ded
i
cated ICSR section.
The HA func
tion
al
ity is im
ple
mented in hamgr fa
cil
ity.
Troubleshooting of HA
Fol
low
ing are use
ful com
mands in trou
bleshoot
ing is
sues on HA side.
General
# show ha-service all - Check ser
vice status, it should be "Started"
# show lns-service all - Check ser
vice status, it should be "Started"
# show subscribers sum mary ha-service <NAME> - Check ac tive HA ses
sions and any dropped
coun
ters
# show sub scribers sum mary lns-ser vice <NAME> - Check ac tive LNS sessions and any
dropped counters
Pi interface
# show mipha - Dis
plays the call in
for
ma
tion for all mo
bile IP HA calls and sta
tis
tics
# show ra
dius counters all
# show ra
dius counters sum mary
# show ra
dius accounting servers
# show ra
dius authenti
cation servers
# show ra
dius client status
Gx and Gy in
ter
faces. Please refer to Di
am
e
ter chap
ter for de
tails on trou
bleshoot
ing:
# show di
am
e
ter peers full - Check Di
am
e
ter peers con
fig and sta
tus
IP Pool
# show ip pool - Dis
plays ip ad
dress usage (Con
text Spe
cific).
NTP
# show ntp sta
tus - Ver
ify NTP sta
tus
CDMA
# show ac
tive-charg
ing edr-udr-file sta
tis
tics / show cdr sta
tis
tics
Trou
bleshoot
ing a sub
scriber con
nect
ing to HA:
• monitor subscriber - Collect and analyze a subscriber HA call with monitor subscriber
from the beginning of a call till an issue with verbosity 3, option 19.
• show subscribers full username - Provides information on a subscriber session
information.
• show active-charging sessions full username - Provides information on a subscriber
ACS session information.
Example Scenarios
Prob
lem De
scrip
tion:
Analy
sis:
If redirect does not seem to be work ing, firstly DNS addresses from roam ing part
ners should be
con firmed. Note that if a roam ing partner is also using Cisco PDSN/FA, then the DNS ad dresses
can be found in the de fault subscriber tem plate (“dns (pri
mary | sec
ondary) <address>”) of the
source con text, and NOT in any named sub scriber template, because the username (NAI) is
CDMA
If pack
ets are only coming from the FA with out re
sponse, then confirm that the source ad
dress
is what is ex
pected, and if not, then a trace from the FA side is needed to de
termine why.
# show dns-proxy in tercept-list <list name> rule<rule> sta tics For EACH RULE, this adds
tis
counters for the num
ber of re
quests sent to the DNS redi
rect server(s), the num
ber of re
sponses re
-
ceived from the server(s), and num ber of time
outs.
Note when a redi rect is applied, by default, the source ad dress of the DNS packet is also
changed, from the sub scriber’s address to the address spec
i
fied in the con
fig
u
ra
tion in the re
-
spec
tive des
ti
na
tion context, with command:
ip dns-proxy source-ad
dress <in
ter
face source ad
dress for redi
rected DNS pack
ets>
Res
o
lu
tion:
Mul
ti
ple sce
nar
ios are dis
cussed above. The so
lu
tion re
lies on a par
tic
u
lar sce
nario.
Analy
sis:
The fol
low
ing traces were col
lected on the PDNS/FA and HA side.
In trace below the PDSN re ceived a call of a subscriber from the group and sent Reg is
tra
tion
Request to the HA. The HA re ceived the Reg is
tra
tion Re
quest, assigned sta
tic IP address, and
sent Reg is
tra
tion Reply back to the PDSN. How ever, PDSN did not re ceive the reply, and the call
fails to SIP. Mon i
tor subscriber traces showed that pack ets from HA did not reach out PDSN. A
ping test from des ti
na
tion context of PDSN to the HA was fail ing.
PDSN Trace
HA trace
Res
o
lu
tion:
UMTS Overview
Overview
UMTS ar
chi
tec
ture and major call flows are cov
ered in this sec
tion.
GSM is a dig i
tal cellu
lar technology that is used world wide and is one of the world’s lead ing
stan
dards in dig i
tal wireless com mu ni
ca
tions. GPRS is a 2.5G mo bile com muni
ca
tions technol
-
ogy that enables mo bile wireless ser
vice providers to offer their mo bile subscribers packet data
ser
vices over GSM net works. Com mon ap pli
ca
tions of GPRS include the fol lowing: In
ter
net ac
-
cess, in
tranet/cor porate ac
cess, instant mes saging, and mul
ti
media mes sag
ing.
GPRS is stan
dard
ized by the Third Gen
er
a
tion Part
ner
ship Pro
gram (3GPP).
• GGSN
• SGSN
UMTS
Gateway GPRS sup port node (GGSN) is the mo bility anchor point within the Mo bile Packet
Core network, a gateway that provides mobile cell phone users ac cess to a packet data network
(PDN)/ In ter
net or spec
i
fied pri
vate IP net
work (In tranet) or corporate net
works. It also main-
tains the neces
sary rout
ing in
for
mation needed to route the sub scriber traf
fic up
link/down link
to
wards the Packet Data Net work, and the SGSN re spectively.
SGSN – Serv ing GPRS Support Node, SGSN per forms the fol
lowing func
tions: Mobil
ity Manage
-
ment, Sub scriber Data Man agement, Ses sion Man
agement, Pay load handling, Charging, Se
cu
-
rity (Au
thenti
ca
tion,Ci
pher
ing,In
tegrity) and con
nec
tions to a radio net
work.
Image - Ar
chi
tec
ture Overview
This section illustrates some of the GPRS mobility management (GMM) and session
management (SM) procedures that SGSN implements as part of the call handling process. All
SGSN call flows are compliant with those defined by 3GPP TS 23.060.
The fol
low
ing out
lines the setup pro
ce
dure for a UE that is mak
ing an ini
tial at
tach.
UMTS
1) The MS/UE sends an Attach Request message to the SGSN. Included in the message is
information, such as:
2) Au
then
ti
ca
tion is manda
tory if no MM con
text ex
ists for the MS/UE:
• The SGSN gets a random value (RAND) from the HLR to use as a challenge to the
MS/UE.
• The SGSN sends a Authentication Request message to the UE containing the random
RAND.
• The MS/UE contains a SIM that contains a secret key (Ki) shared between it and the
HLR called an Individual Subscriber Key.
• The UE uses an algorithm to process the RAND and Ki to get the session key (Kc) and
the signed response (SRES).
• The MS/UE sends a Authentication Response to the SGSN containing the SRES.
• The SGSN sends an Update Location message, to the HLR, containing the SGSN
number, SGSN address, and IMSI.
• The HLR sends an Insert Subscriber Data message to the “new” SGSN. It contains
subscriber information such as IMSI and GPRS subscription data.
• The “New” SGSN validates the MS/UE in new routing area:
If in
valid: The SGSN re
jects the At
tach Re
quest with the ap
pro
pri
ate cause code.
• The HLR sends a Update Location Ack to the SGSN after it successfully clears the old
MM context and creates new one
The fol
lowing fig
ure pro
vides a high-level view of the PDP Con text Ac
ti
va
tion proce
dure per
-
formed by the SGSN to es tab
lish PDP contexts for the MS with a BSS-Gb in ter
face con
nec
tion
or a UE with a UTRAN-Iu in ter
face con
nection.
1) The MS/UE sends a PDP Activation Request message to the SGSN containing an Access Point
Name (APN).
2) The SGSN sends a DNS query to resolve the APN pro vided by the MS/UE to a GGSN ad
dress.
The DNS server pro
vides a re
sponse contain
ing the IP address of a GGSN.
3) The SGSN sends a Create PDP Con text Request message to the GGSN con
tain
ing the in
for
-
mation needed to au
then
ti
cate the subscriber and es
tab
lish a PDP con
text.
4) If re
quired, the GGSN per
forms au
then
ti
ca
tion of the sub
scriber.
A GTP-U tun
nel is now es
tab
lished and the MS/UE can send and re
ceive data.
SGSN Overview
The ASR 5000 pro vides a highly flexi
ble and effi
cient Serv
ing GPRS Sup port Node (SGSN) ser -
vice to the wire
less car
ri
ers. Functioning as an SGSN, the sys tem read
ily handles wireless data
ser
vices within 2.5G Gen eral Packet Radio Ser vice (GPRS) and 3G Universal Mobile Telecom mu-
ni
ca
tions Sys
tem (UMTS) data net works.
• Communicate with home location registers (HLR) via a Gr interface and mobile visitor
location registers (VLRs) via a Gs interface to register a subscriber’s user equipment
(UE), or to authenticate, retrieve or update subscriber profile information.
• Support Gd interface to provide short message service (SMS) and other textbased
network services for attached subscribers.
• Activate and manage IPv4, IPv6 type packet data protocol (PDP) contexts for a
subscriber session.
• Setup and manage the data plane between the RAN and the GGSN providing high-speed
data transfer.
• Provide mobility management, location management, and session management for the
duration of a call to ensure smooth handover.
• Provide various types of charging data records (CDRs) to attached accounting/billing
storage mechanisms such as our SMC-based hard drive or a GTPP Storage Server (GSS)
or a charging gateway function (CGF).
• Provide CALEA support for lawful intercepts.
UMTS
The IM
SIMg task Started by the Ses
sion Con
troller, per
forms the fol
low
ing func
tions:
• Selects SessMgr, when not done by LinkMgr or SGTPCMgr, for calls sessions based on
IMSI/P-TMSI.
• Load-balances across SessMgrs to select one for assigning subscriber sessions to.
• Maintains records for all subscribers on the system.
• Maintains mapping between the IMSI/P-TMSI and SessMgrs.
Impor
tant: For ASR 5x00s with session recov
ery en
abled, this Demux man ager is usually es
tab
-
lished on one of the CPUs on the first ac
tive Demux PSC. The IM SIMgr will not start on a PSC in
which Sess M
grs are al
ready started.
SGT
PCMgr Task will be spawned as soon the SGTP Ser
vice is cre
ated.
UMTS
Cre
ated by the Ses
sion Controller for each VPN con text in which an SGSN ser
vice is con
fig
-
ured, the SGTPC Manager task performs the fol
low
ing functions:
• Terminates Gn/Gp and GTP-U interfaces from peer GGSNs and SGSNs for SGSN
Services.
• Terminates GTP-U interfaces from RNCs for IuPS Services.
• Controls standard ports for GTP-C and GTP-U.
• Processes and distributes GTP-traffic received from peers on these ports.
• Performs all node level procedures associated with Gn/Gp interface.
Cre
ated by the Session Controller when the first SS7RD (rout
ing do
main) is ac
ti
vated, the
LinkMgr per
forms the fol
low
ing functions:
GGSN Overview
Gate
way GPRS Sup
port Node (GGSN) is a gate
way from a cel
lu
lar net
work to an IP net
work that
al
lows mo
bile user equip
ment (UE) to ac
cess the pub
lic data net
work (PDN) or spec
i
fied pri
vate
IP net
works.
There is a dedi
cated link between the UE and the GGSN. This link is created through the Packet
Data Protocol (PDP) Con text ac
ti
va
tion process. PDP Con text is used to al
lo
cate the PDP ad
-
dress. There are three dif fer
ent types of PDP Contexts: IPv4, IPv6 and Point-to-Point Pro
to
col
(PPP).
• Allow mobile network operator to control authentication and authorization for each
subscriber, and providing accounting management for the user sessions (online and
offline)
• Performing shallow and deep packet inspection for user traffic and perform action base
on traffic type (charging categories, firewalling, filtering, traffic shaping, blacklisting,
etc.)
• PDNs are associated with Access Point Names (APNs) configured on the system. Each
APN consists of a set of parameters that dictate how subscriber authentication and IP
address assignment is to be handled for that APN.
• Maintains a list of the GTPU-services available within the context and performs load-
balancing (of only Error-Ind) for them.
• Supports GTPU Echo handling.
• Provides Path Failure detection on no response for GTPU echo.
• Receives Error-Ind and demuxes it to a particular Session Manager.
UMTS
Troubleshooting SGSN
Overview
This sec
tion fo
cuses on CLI com
mands in ASR 5000/ASR 5500 used to trou
bleshoot dif
fer
ent
in
ter
faces on SGSN along with mul
ti
ple ex
am
ple issue sce
nar
ios.
Troubleshooting Commands
SGSN func
tion
al
ity in ASR 5000/ASR 5500 is con ured per the GPRS-ser
fig vice en
tity for 2G-
SGSN and by SGSN-Ser vice in case of 3G-SGSN.
CLI:
Troubleshooting Gb Interface
CLI:
• show gprsns status nsvc-status-all nse all
• show gprsns status nsvc-status-consolidated nse <>
• show network-service-entity ip-config
• show gmm-sm statistics verbose
• show sgtpc statistics verbose
• show map statistics
• show bssgp statistics verbose
• show llc statistics verbose
• show gprsns statistics sns-msg-stats
• show session disconnect-reasons verbose
The fol
low
ing CLIs pro
vide in
for
ma
tion on the sta
tus of Gb links be
tween SGSN and BSCs.
• show gprsns status nsvc-status-all nse all. Check the status (should be "ENABLED")
GbManager : 1
-------------------------------------------------------------------------------------------------------------------
| | |--------------------------------------------------------------------------------------------------
| 10 | 0 | ---------------------------------------------------------------------------| ENABLED |
|-----------------------------------------------------------------------------------------------------------------|
| 10 | 1 | ---------------------------------------------------------------------------| ENABLED |
|-----------------------------------------------------------------------------------------------------------------|
| 10 | 2 | ---------------------------------------------------------------------------| ENABLED |
|-----------------------------------------------------------------------------------------------------------------|
| 10 | 3 | ---------------------------------------------------------------------------| ENABLED |
|-----------------------------------------------------------------------------------------------------------------|
| 11 | 0 | ---------------------------------------------------------------------------| ENABLED |
|-----------------------------------------------------------------------------------------------------------------|
| 11 | 1 | ---------------------------------------------------------------------------| ENABLED |
|-----------------------------------------------------------------------------------------------------------------|
------------------------------------------------------------------------
| BVCI | MCC | MNC | LAC | RAC | CI | STATUS | BSIZE | LRATE |
------------------------------------------------------------------------
| 131| XXX| YY| XXXXX| 101| 11271| UNBLKD | 65535| 4000|
| 132| XXX| YY| XXXXX| 101| 11272| UNBLKD | 65535| 4000|
| 133| XXX| YY| XXXXX| 101| 11273| UNBLKD | 65535| 4000|
| 141| XXX| YY| XXXXX| 100| 38721| UNBLKD | 65535| 800|
| 142| XXX| YY| XXXXX| 100| 38722| UNBLKD | 65535| 800|
UMTS
• Following CLIs provide more information on the SGSN statistics. Watch for significant
sections of the command output below:
Attach Reject:
Total-Attach-Reject: 0
3G-Network Failure: 0 2G-Network Failure:
GPRS-Attach Network Failure Cause:
Comb-Attach Reject Causes:
Attach Failure:
Total-Attach-Failure: 0
3G-Attach-Failure: 0 2G-Attach-Failure: 0
Routing Area Update Reject:
Total-RAU-Reject: 0
Routing Area Update Failure:
Total-RAU-Failure: 0
Service Reject:
Total-Serv-Rej: 0
Gmm Status Message:
Total-Gmm-Status-Sent: 0
Inter-System Attach Statistics:
3G-Attach-Rej: 0 2G-Attach-Rej: 0
3G-Comb-Attach-Rej: 0 2G-Comb-Attach-Rej: 0
Authentication And Ciphering Reject:
Total-Auth-Cipher-Rej: 0
Authentication And Ciphering Failure:
Total-Auth-Cipher-Failure: 0
Ranap Procedures:
Security Mode Reject: 0
Relocation Failure: 0 Relocation Prep Failure: 52626
(look out for Relocation Failure Causes :)
UMTS
Troubleshooting Gr Interface
MAP is the appli
ca
tion-layer proto
col used to ac
cess the Home Lo
cation Reg is
ter (HLR), Visi
tor
Lo
ca
tion Regis
ter (VLR), Mobile Switch
ing Center (MSC), Equip
ment Identity Regis
ter (EIR), Au-
then
ti
ca
tion Cen ter (AUC), Short Mes sage Service Cen
ter (SMSC) and Serv ing GPRS Sup port
Node (SGSN).
Gr func
tion
al
ity in ASR 5000/ASR 5500 is con ured in the MAP-ser
fig vice en
tity.
UMTS
The gener
ated dis
play in
cludes MAP ser vice fea
tures, MAP op
er
a
tional con
fig
u
ra
tion, and some
re
lated HLR and EIR con fig
u
ra
tion in
for
mation
• Check the SCTP and M3UA status for Gr interface, SCTP path status should be in
Active state
[local]asr5000# show ss7-routing-domain 4 sctp asp all status peer-server all peer-server-
process all
ss7 routing domain : 4
*Peer Server Id : 1 Peer Server Process Id: 1
-------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------
ss7-routing-domain : 4
-----------------------------------------------------------------------------------------------
----------------
| Priority |
--------------------------------------------------------------------------------------------------
-------------
| 0 |
| 0 |
--------------------------------------------------------------------------------------------------
-------------
• Check the SS7 Signalling Connection Control Part (SCCP) network configuration and
subsystem status should be "Subsystem user in service" state.
UMTS
Sccp Network 4 :
------------------------------------------------------------------------------------------
------
| dpc | status | ssn | subsystem
status |
------------------------------------------------------------------------------------------
------
| 0.4.2 ( 34) | Signalling Point Accessible | 6 | Subsystem user in
service |
service |
------------------------------------------------------------------------------------------
------
IuPS func
tion
al
ity in ASR 5000/ASR 5500 is re
al
ized in the IuPS-ser
vice en
tity.
• show subscribers data-rate sgsn-only rnc id <> mcc <> mnc <>
• show gmm-sm statistics iups-service iups_H rnc mcc <> mnc <> rnc-id <> verbose
• show iups-service name iups_H rnc id <>
UMTS
• Check for the SCTP and M3UA status for Iu-PS interface the SCTP path status should
be in Active state.
UMTS
[local]asr5000# show ss7-routing-domain 3 sctp asp all status peer-server all peer-server-
process all
ss7 routing domain : 3
*Peer Server Id : 1 Peer Server Process Id: 1
--------------------------------------------------------------------------------------
-----
| PS ID (State) | PSP Instance | Routing Context | Registration Mode |
State |
UMTS
--------------------------------------------------------------------------------------
-----
ACTIVE |
ACTIVE |
ACTIVE |
--------------------------------------------------------------------------------------
-----
• Check for the destination-point-code's routing table, the route status should be in
Available state
ss7-routing-domain : 3
------------------------------------------------------------------------------------------
---------------------
| Destination-Point-Code | Peer-Server-Id | LinkSet-Id |
Status | Priority |
------------------------------------------------------------------------------------------
---------------------
| 0.3.2 ( 26) | 1 | - |
Available | 0 |
| 0.3.3 ( 27) | 2 | - |
Available | 0 |
| 0.3.4 ( 28) | 3 | - |
Available | 0 |
------------------------------------------------------------------------------------------
---------------------
• Check the SS7 Signalling Connection Control Part (SCCP) network configuration and
subsystem status should be "Subsystem user in service" state.
Sccp Network 3 :
------------------------------------------------------------------------------------------
------
| dpc | status | ssn | subsystem
status |
------------------------------------------------------------------------------------------
------
| 0.3.2 ( 26) | Signalling Point Accessible | 142 | Subsystem user in
service |
service |
service |
------------------------------------------------------------------------------------------
------
Example Scenarios
Prob
lem De
scrip
tion:
SGSN is re
ject
ing the Ac
ti
vate PDP re
quest with the cause code # 26: in
suf
fi
cient re
sources
Ex
pected Be
hav
iour:
Data col
lec
tion:
Analy
sis:
2. RNC re
sponds with RAB-As
sign
ment Re
sponse with cause: re
lease-due-to-utran-gen
er
ated-
reason.
......
Cause
| ..0. .... | Ext bit : 0
| ...0 00.. | Choice index : 0
Radio Network
| .... ..00 | | 1110 .... | release-due-to-utran-generated-reason (15) (0x0f)
Res
o
lu
tion:
SGSN re
ject
ing the AC
TI
VA
TION with SM Cause : (33) Re
quested ser
vice op
tion not sub
scribed
Ex
pected Be
hav
ior:
Data col
lec
tion:
Analy
sis:
Ver
ify the link be
tween SGSN and GGSN (Gn-inter
face)
Ver
ify no packet drops in SGSN.
Check the net work la
tency using ping test.
Ver
ify no port flaps or port drops in SGSN.
Collect "moni
tor subscriber imsi" trace or ex
ter
nal packet cap
ture (be
tween SGSN and GGSN )
for the subscriber. In
spect this data to see if the SGSN is send ing ACTVATE PDP reject (SM
Cause : (33) Re
quested ser vice op
tion not subscribed).
UMTS
Image - Re
quested Ser
vice Op
tion Not Sub
scribed
Trace:
Workaround :
If UE is re
quest
ing the in
cor
rect APN, the APN alias
ing fea
ture will remap the ses
sion to a de
-
fault APN which will cir
cumvent this issue.
Res
o
lu
tion:
Cor
rect the APN name in the UE or the Op er
ator can rec
tify the sub
scriber con
fig
u
ra
tion using
Over-the-Air pro
vi
sion
ing sys
tem to cor
rect the home sub scriber APN in the UE.
SGSN re
ject
ing the AC
TI
VA
TION with SM Cause ( SM Cause : (27) Miss
ing or un
known APN)
Ex
pected Be
hav
ior:
Data col
lec
tion:
Analy
sis:
Ver
ify the DNS con
fig
u
ra
tion.
Ver
ify the SGSN Gn link is work
ing.
Col
lect the "moni
tor sub
scriber" trace or ex
ter
nal packet cap
ture. From the trace see that
SGSN is re
ceiv
ing GTP_Unknown_APN from GGSN in Cre ate-PDP-Con text re
sponse..
UMTS
Image - Miss
ing or Un
known APN
Statistics:
Total-Actv-Request: 1
Total-Actv-Reject: 1
3G-Actv-Reject: 1 2G-Actv-Reject: 0
Trace:
Res
o
lu
tion:
1.
The most com mon rea
son for these fail
ures is wrong DNS en
tries for APN to GGSN IP ad
-
dress map
ping .
SGSN Rejecting Attach with GMM (MS not derived from network (#9))
Prob
lem Ob
served:
SGSN re
ject
ing the PTMSI AT
TACH with cause code (MS not de
rived from net
work (#9))
Ex
pected be
hav
ior:
Data col
lec
tion:
Analy
sis:
Res
o
lu
tion:
3244 <<<
Denied TX: 338536 Denied RX: 1818
Accepted TX: 1005322 Accepted RX: 1426
Initial Ident-Rsp TX: 1005322 Initial Ident-Rsp RX: 1426
Analy
sis found that the DNS re
sponse for the RAI is com ing in with the wrong SGSN IP ad
dress.
Be
cause of this the SGSN is not re
ceiv
ing the iden
ti
fi
ca
tion response.
Correct
ing the DNS en
tries re
sulted in SGSN receiv
ing the proper iden
ti
fi
ca
tion re
sponse for
for
eign PTMSI and At
tach were getting suc
cess
ful
SGSN re
ject
ing the At
tach with Au
th
Fail
ure in GMM Cause : Auth_
Ci
pher
ing_Re
ject.
Ex
pected be
hav
ior:
Data col
lec
tion:
Analy
sis:
In col
lected "mon i
tor subscriber imsi" trace for the sub
scriber it is clear that SGSN is send
ing
At
tach reject(Au
thenti
ca
tion_ci
pher
ing_re ject).
Statistics:
Trace:
Res
o
lu
tion:
SGSN is be
hav
ing as ex
pected since HLR keys and USIM in MS is not match
ing,
HLR/AUC needs to up date the key DB for the user. Issue must be HLR or UE may be at fault for
not send
ing proper SRES to SGSN.
SGSN rejecting the ATTACH with GMM Cause : (8) GPRS services not
allowed)
Prob
lem De
scrip
tion:
Ex
pected be
hav
ior:
Data col
lec
tion:
• show gmm-
sm statistics verbose
• show snmp traps history verbose
• Monitor subscriber imsi [value]
Trace:
Analy
sis:
SGSN rejecting the ATTACH with GMM cause : Network Failure (17)
Prob
lem Ob
served:
SGSN re
ject
ing the AT
TACH with GMM cause code : Net
work Fail
ure (17)
UMTS
Ex
pected be
hav
ior:
Data col
lec
tion:
• show gmm-
sm statistics verbose
• show map statistics
• show snmp traps history verbose
• monitor subscriber imsi [value]
• show port datalink counters
Analy
sis:
Ver
ify the link be
tween SGSN and STP/HLR is work
ing fine.
Ver
ify there are no packet drops to
wards STP/HLR (ping test).
Ver
ify no port flaps or port drops in SGSN.
Collect "moni
tor subscriber imsi" trace or exter
nal packet cap
ture (be
tween SGSN and
HLR/STP) for the sub scriber. An
a
lyze the data to find if the SGSN is sending At
tach re
-
ject(Net
work Fail
ure).
Image - Net
work Fail
ure
Trace
>>>>>>> INBOUND
===>GPRS Mobility/Session Management Message (33 Bytes)
Protocol Discriminator : GMM message
0000 .... : Skip Indicator : (0)
Message Type: 0x1 (1)
Message : Attach Request
MS Network Capability
<<<<OUTBOUND From sessmgr:1 n_ccpu_sm_log.c:1399 (Callid 00004e31) 15:12:30:308
Eventid:87114(0)
===> GSM Mobile Application (MAP)
Component : Invoke(1)
Component Length : Indefinite length format (0x80)
MAP Send Authentication Info Request
IMSI
Value : 231456789012345
Number Of Requested Vectors
Value : 1
Requesting Node Type
Value : sgsn
<<<<OUTBOUND From sessmgr:1 n_ccpu_sm_log.c:1174 (Callid 00004e31) 15:12:45:303
===>GPRS Mobility/Session Management Message
Protocol Discriminator : GMM message
0000 .... : Skip Indicator : (0)
.... 1000 : Protocol Discriminator : (8)
Message Type: 0x4 (4)
Message : Attach Reject
UMTS
Res
o
lu
tion:
SGSN is be
having as expected. HLR didn’t send any re sponse to the SGSN's SAI Re
quest
hence SGSN is re
ject
ing the call with Net
work fail
ure #17.
Routing Area Update Reject with cause code MS not derived from network
(#9)
Prob
lem Ob
served:
SGSN re
ject
ing the rout
ing area up
date with cause code MS not de
rived from net
work (#9))
Ex
pected be
hav
ior:
Data col
lec
tion:
Analy
sis:
In "mon
i
tor sub
scriber IMSI" trace the SGSN is send
ing rout
ing area re
ject (MS not de
rived
from net
work).
As a re
sult SGSN is send
ing MS ID that is not dervied from the Net work to MS after retry timer
ex
piry "gtpc max-retrans
mis
sions". In this case gtpc re
trans
mis
sion-time
out is 5sec.
Image - call
flow for MS not de
rived from net
work
RAC:F8
ROUTING AREA IDENTITY (RAI) ENDS.
PTMSI: 0xC0F85029
PTMSI Signature: 0x03BB61
MS Validated: 0
Tunnel ID Control I: 0x00000001
GSN Address I: 0xD1E21F64 (172.18.31.100)
===>GPRS Mobility/Session Management Message (4 Bytes)
Protocol Discriminator : GMM message
0000 .... : Skip Indicator : (0)
.... 1000 : Protocol Discriminator : (8)
Message Type: 0xb (11)
Message : Routing Area Update Reject
GMM Cause : (9) MS identity cannot be derived by the network
Force To Standby : (0) Force to standby not indicated
Spare half octet: (0)
3G-MsId not derived by Nw: 973586 2G-MsId not derived by Nw: 83884662
3G-Implicitly Detached: 0 2G-Implicitly Detached: 607896
3G-PLMN not allowed: 0 2G-PLMN not allowed: 0
3G-Location Area not allowed: 0 2G-Location Area not allowed: 0
Trou
bleshoot
ing Steps:
1 Check the Configuration under Call-Control profile. SGSN selects the call control
profile from SGSN Global config on the basis of RAI in RAU.
sgsn-global
imsi-range mcc XXX mnc YYY plmnid XXXYYY operator-policy home_oper1
operator-policy name home_oper1
UMTS
Res
o
lu
tion:
1 If the DNS responds with incorrect IP address, correct the IP address in the DNS.
Routing area update reject with cause code Network Failure #17
Prob
lem Ob
served:
SGSN re
ject
ing the Rout
ing area up
date with cause code Net
work Fail
ure #17
Ex
pected be
hav
ior:
Data col
lec
tion:
Analy
sis:
2 SGSN identifies the IMSI from the GTP SGSN Context response.
3 SGSN checks the IMSI range in the sgsn-global config to apply the appropriate Call-
control profile.
4 If the IMSI range is missing in the sgsn-global config then SGSN sends the RAU reject to
UE with cause code Network failure #17“
Image:
MAP_P
M
M_RE
SULT_UN
KNOWN_ER
ROR NET
WORK FAIL
URE (17)
MAP_P
M
M_RE
SULT_
TIME
OUT NET
WORK FAIL
URE (17)
MAP_P
M
M_RE
SULT_
DA
TA_MISS
ING_FROM_HLR NET
WORK FAIL
URE (17)
MAP_P
M
M_RE
SULT_UN
KNOWN_ER
ROR NET
WORK FAIL
URE (17)
MAP_P
M
M_RE
SULT_
DA
TA_MISS
ING_FROM_HLR NET
WORK FAIL
URE (17)
Trou
ble shoot
ing Steps:
1 Check the IMSI range configuration under sgsn-global.
sgsn-global
imsi-range mcc XXX mnc YYY plmnid XXXYYY operator-policy home_oper1
Res
o
lu
tion:
1 If the IMSI range is missing, then add the IMSI range under the SGSN-Global config.
Troubleshooting GGSN
Overview
This sec
tion fo
cuses on CLI com
mands in ASR 5000/ASR 5500 used for trou
bleshoot
ing dif
fer
-
ent in
ter
faces on GGSN and UMTS along with multi
ple issue sce
nar
ios.
Troubleshooting Commands
• show gtpc statistics verbose
De
scrip
tion:
Red - If coun
ters marked with red start to in
crease and there is an issue on the sys
tem
then please contact Cisco TAC.
Blue - If coun
ters marked with blue start to in
crease, there is an issue in commu
ni
ca
tion
be
tween the Client (GGSN) and AAA servers (Ra dius server, On line charg
ing
server/OCS, PCRF)
Or
ange - If coun
ters marked with Orange start to in
crease, there is an issue re
gard
ing IP
ad
dress as
sign
ment from the IP pool that is con
fig
ured on GGSN.
De
scrip
tion:
De
scrip
tion:
De
scrip
tion:
Example Scenarios
Prob
lem De
scrip
tion:
Mo
bile op
er
a
tor re
ports that sub
scriber can
not es
tab
lish PDP ses
sion.
UMTS
Data col
lec
tion:
#show ses
sion dis
con
nect-rea
sons
# mon
i
tor sub
scriber
Image - call
flow for user au
then
ti
ca
tion failed
Analy
sis:
1 Run mul
ti
ple it
er
a mand show gtpc sta
tions of the com tis
tics ver
bose. Com
pare counter
val
ues to see which ones in
creased in sec
tion "Cre
ate PDP Con
text De
nied". In this
case, the counter "User Auth Failed" in
creased. The counter "Au
then
ti
ca
tion Failed" also
in
creased in the out
put of "Cre
ate PDP De
nied - Auth Fail
ure Rea
sons:"
Authentication Failed: 1
AAA Auth Req Failed: 0
APN selection-mode mismatch: 0
Non-existent virtual APN: 0
Reject Foreign Subscriber: 0
IMS Authorization Fail: 0
Total Disconnects: 1
3 Collect "monitor subscriber next-call" data, until a session experiencing the disconnect
is caught. If information like MSISDN, IMSI or APN are available, the "mon sub" can be
narrowed by using them as a filter. In this scenario, the Radius server rejected this
subscriber.
----------------------------------------------------------------------
(Switching Trace) - New Incoming Call:
----------------------------------------------------------------------
MSID/IMSI : 450061234554321 Callid : 00004e21
UMTS
Code: 1 (Access-Request)
Id: 1
Length: 323
Authenticator: D5 F2 57 46 BA D6 73 2D 7C 07 25 20 E6 4C 35 10
Calling-Station-Id = 64211234567
User-Name = testuser
NAS-IP-Address = 192.168.1.1
Service-Type = Framed
Framed-Protocol = GPRS_PDP_Context
NAS-Port-Type = Wireless_Other
SN1-GTP-Version = GTP_VERSION_1
3GPP-IMSI = 450061234554321
3GPP-NSAPI = 5
3GPP-Selection-Mode = 0
3GPP-Charging-Id = 1
UMTS
3GPP-Negotiated-QoS-Profile = 99-22720D739640488607FFFF
3GPP-Chrg-Char = 0800
Called-Station-ID = ecs-apn
3GPP-SGSN-Address = 192.168.2.138
3GPP-SGSN-Mcc-Mnc = 123456
3GPP-GGSN-Address = 192.168.2.1
3GPP-GGSN-Mcc-Mnc = 123456
3GPP-Negotiated-DSCP = 0A
3GPP-User-Location-Info = 02 21 63 54 FF FE FF FF
3GPP-PDP-Type = ipv4
SN1-Service-Type = GGSN
NAS-Port = 4097
User-Password = 93 D8 5D 76 D3 24 EE E4 96 6C 61 44 B8 FD 29 C1
3GPP-CG-Address = 192.168.7.128
Tuesday May 05 2015
INBOUND>>>>> 14:00:57:526 Eventid:23900(6)
RADIUS AUTHENTICATION Rx PDU, from 192.168.1.128:1812 to 192.168.1.1:47218 (32) PDU-
dict=starent-vsa1
Code: 3 (Access-Reject)
Id: 1
Length: 32
Authenticator: 1F 3C 06 A0 C2 C7 B6 D2 47 11 AE D2 1F 79 F7 CC
Service-Type = Framed
Framed-Protocol = PPP
Tuesday May 05 2015
<<<<OUTBOUND 14:00:57:529 Eventid:47001(3)
GTPC Tx PDU, from 192.168.2.1:2123 to 192.168.2.138:2123 (14)
TEID: 0x00000400, Message type: GTP_CREATE_PDP_CONTEXT_RES_MSG (0x11)
Sequence Number:: 0x7FFF (32767)
INFORMATION ELEMENTS FOLLOW:
Res
o
lu
tion :
Mobile op
er
a
tor re
ports that sub
scribers for a spe
cific APN can not es
tab
lish PDP con
text con
-
nec
tions.
Data Col
lec
tion:
Trou
bleshoot
ing steps:
Ver
ify the con
fig
u
ra
tion for the APN.
Ver
ify all the nec
ces
sary in
ter
faces as
so
ci
ated for the APN are work
ing.
Ver
ify if any in
ter
faces or ports are down using SNMP traps, or "show alarm out
stand
ing" out
-
puts.
Ver
ify any packet drops ob
served in the net
work or in the GGSN using NPU com
mands.
Refer HW trou bleshoot
ing sec
tion
Ver
ify la
tency in the net
work be
tween GGSN and other con
nected net
work el
e
ments.
Analysis:
1 GGSN is receving the CPC request from the SGSN and GGSN is sending CPC reponse
with GTP_Authentication_failed. The end of the monitor subscriber trace shows the
call is failed due to " Disconnect Reason: ims-authorization-failed".
----------------------------------------------------------------------
Incoming Call:
----------------------------------------------------------------------
UMTS
2 From the packet trace / monitor subscriber it was found that the Gx CCR-I was not
leaving the GGSN. However, the Diameter interface status and peer
Diameter status looked good .
3 When Gx IMS auth service was removed, the calls started working.
4 While verifying the configuration for the APN it was found that "IP access group
<SERVICE>" was wrong, hence it was not redirecting the Gx CCR trigger towards PCRF.
This caused the call to fail internally.
Res
o
lu
tion:
Logs Col
lec
tion:
• Collect syslog
Trou
bleshoot
ing steps:
Ver
ify the con
fig
u
ra
tion for the APN.
Ver
ify all the nec
es
sary in
ter
faces as
so
ci
ated for the APN are work
ing
Ver
ify if any in
ter
faces or ports are down using SNMP traps, or "show alarm out
stand
ing" out
-
put.
Ver
ify any packet drops ob
served in the net
work or in the GGSN using NPU com
mands.
Refer HW trou bleshoot
ing sec
tion
Ver
ify la
tency in the net
work be
tween GGSN and other con
nected net
work el
e
ments.
UMTS
Image - Dy
nam
ic_PDP_AD
DR_Oc
cu
pied
Analysis:
1 The GGSN is receiving the CPC request from the SGSN, and GGSN is sending the CPC
response with GTP_ALL_DYNAMIC_PDP_ADDR_OCCUPIED. The end of the "monitor
subscriber" output shows the call has failed due to "Disconnect Reason: Gtp-all-
dynamic-pdp-addr-occupied".
----------------------------------------------------------------------
Incoming Call:
----------------------------------------------------------------------
MSID/IMSI : 123456000000002 Callid : 00006385
IMEI : n/a MSISDN : 9876543210
Username : 9876543210 SessionType : ggsn-pdp-type-ipv4
Status : Active Service Name: ggsn
Src Context : spgw
----------------------------------------------------------------------
path-failure 54 21.42857
Gtp-all-dynamic-pdp-addr-occupied 188 74.60317
ims-authorization-failed 2 0.79365
Gtp-context-replacement 6 2.38095
Res
o
lu
tion:
In
creased the size of the IP Pool in the IP POOL group. This re
solved the issue
timer 120
Sub
scriber can
not es
tab
lish PDP ses
sion.
Logs Col
lec
tion:
Col
lect mul
ti
ple SSD
Col
lect ex
ter
nal PCAP
Col
lect sam
ple "mon
i
tor sub
scriber" at ver
bosity 2 show
ing the error be
hav
ior
Col
lect sys
log
Trou
bleshoot
ing steps:
Ver
ify the con
fig
u
ra
tion for the APN.
Ver
ify all the nec
es
sary in
ter
faces as
so
ci
ated for the APN are work
ing.
Ver
ify if any in
ter
faces or ports are down using SNMP traps or show out
stand
ing alarm out
puts.
Ver
ify any packet drops observed in the net mands. [Refer
work or in the GGSN using NPU com
HW trou bleshoot
ing sec
tion.]
Ver
ify la
tency in the net
work be
tween GGSN and Other con
nected net
work el
e
ments.
Sta
tist
sics:
Analy
sis:
1 The GGSN is receiving the CPC request from the SGSN and GGSN is not sending CPC to
SGSN. Within the GGSN system it was found that an internal task, sessmgr, was not
started, hence calls were not handled in GGSN.
UMTS
Res
o
lu
tion:
Please use the command with caution! Many TEST commands are
processor-intensive and can cause serious system problems if used too
frequently. Prior to doing "task kill" it is always recommended to perform
"task core” on same instance in order to be able to complete root cause
analysis.
LTE
LTE
LTE
Overview
This chap
ter cov
ers LTE ar
chi
tec
ture, call flows and trou
bleshoot
ing com
mands.
LTE/SAE architecture
The below di a
gram dis
plays the LTE/SAE ar chi
tec
ture, with all the in
volved el
e
ments and in-
ter
faces. It also cov
ers the in
ter
faces to
wards UMTS (2g/3g) net works. This di
a
gram is ref
er
-
enced later in this chapter.
Image: UE at
tach and ac
ti
va
tion
LTE
1 The UE initiates the Attach procedure by the transmission of an Attach Request (IMSI or
old GUTI, last visited TAI (if available), UE Network Capability, PDN Address Allocation,
Protocol Configuration Options, Attach Type) message together with an indication of
the Selected Network to the EnodeB. IMSI is included if the UE does not have a valid
GUTI available. If the UE has a valid GUTI, it is included.
2 The EnodeB derives the MME from the GUTI and from the indicated Selected Network.
If that MME is not associated with the EnodeB, the EnodeB selects an MME using an
“MME selection function”. The EnodeB forwards the Attach Request message to the
new MME contained in a S1-MME control message (Initial UE message) together with
the Selected Network and an indication of the E-UTRAN Area identity, a globally unique
E-UTRAN ID of the cell from where it received the message to the new MME.
3 If the UE is unknown in the MME, the MME sends an Identity Request to the UE to
request the IMSI.
6 The MME sends an Update Location Request (MME Identity, IMSI, ME Identity) to the
HSS.
7 The HSS acknowledges the Update Location message by sending an Update Location
Ack to the MME. This message also contains the Insert Subscriber Data (IMSI,
Subscription Data) Request. The Subscription Data contains the list of all APNs that the
UE is permitted to access, an indication about which of those APNs is the Default APN,
and the 'EPS subscribed QoS profile' for each permitted APN. If the Update Location is
rejected by the HSS, the MME rejects the Attach Request from the UE with an
appropriate cause.
8 The MME selects an S-GW using “Serving GW selection function” and allocates an EPS
Bearer Identity for the Default Bearer associated with the UE. If the PDN subscription
context contains no P-GW address the MME selects a P-GW as described in clause
“PDN GW selection function”. Then it sends a Create Default Bearer Request (IMSI,
MME Context ID, APN, RAT type, Default Bearer QoS, PDN Address Allocation, AMBR,
LTE
9 The S-GW creates a new entry in its EPS Bearer table and sends a Create Default Bearer
Request (IMSI, APN, S-GW Address for the user plane, S-GW TEID of the user plane, S-
GW TEID of the control plane, RAT type, Default Bearer QoS, PDN Address Allocation,
AMBR, EPS Bearer Identity, Protocol Configuration Options, ME Identity, User Location
Information) message to the P-GW.
10 If dynamic PCC is deployed, the P-GW interacts with the PCRF to get the default PCC
rules for the UE. The IMSI, UE IP address, User Location Information, RAT type, AMBR
are provided to the PCRF by the P-GW if received by the previous message.
11 The P-GW returns a Create Default Bearer Response (P-GW Address for the user plane,
P-GW TEID of the user plane, PGW TEID of the control plane, PDN Address
Information, EPS Bearer Identity, Protocol Configuration Options) message to the S-
GW. PDN Address Information is included if the P-GW allocated a PDN address Based
on PDN Address Allocation received in the Create Default Bearer Request. PDN Address
Information contains an IPv4 address for IPv4 and/or an IPv6 prefix and an Interface
Identifier for IPv6. The P-GW takes into account the UE IP version capability indicated
in the PDN Address Allocation and the policies of operator when the P-GW allocates the
PDN Address Information. Whether the IP address is negotiated by the UE after
completion of the Attach procedure, this is indicated in the Create Default Bearer
Response.
12 The Downlink (DL) Data can start flowing towards S-GW. The S-GW buffers the data.
13 The S-GW returns a Create Default Bearer Response (PDN Address Information, S-GW
address for User Plane, S-GW TEID for User Plane, S-GW Context ID, EPS Bearer
Identity, Protocol Configuration Options) message to the new MME. PDN Address
Information is included if it was provided by the P-GW.
14 The new MME sends an Attach Accept (APN, GUTI, PDN Address Information, TAI List,
EPS Bearer Identity, Session Management Configuration IE, Protocol Configuration
Options) message to the EnodeB.
15 The EnodeB sends Radio Bearer Establishment Request including the EPS Radio Bearer
Identity to the UE. The Attach Accept message is also sent along to the UE.
16 The UE sends the Radio Bearer Establishment Response to the EnodeB. In this message,
the Attach Complete message (EPS Bearer Identity) is included.
17 The EnodeB forwards the Attach Complete (EPS Bearer Identity) message to the MME.
LTE
18 The Attach is complete and UE sends data over the default bearer. At this time the UE
can send uplink packets towards the EnodeB which are then tunnelled to the S-GW and
P-GW.
19 The MME sends an Update Bearer Request (EnodeB address, EnodeB TEID) message to
the S-GW.
20 The S-GW acknowledges by sending Update Bearer Response (EPS Bearer Identity)
message to the MME.
22 After the MME receives Update Bearer Response (EPS Bearer Identity) message, if an
EPS bearer is established and the subscription data indicates that the user is allowed to
perform handover to non-3GPP accesses, and if the MME selected a P-GW that is
different from the P-GW address which was indicated by the HSS in the PDN
subscription context, the MME sends an Update Location Request including the APN
and P-GW address to the HSS for mobility with non-3GPP accesses
23 The HSS stores the APN and P-GW address pair and sends an Update Location
Response to the MME
The fol
low
ing fig
ure and the text that follows de
scribe the mes sage flow for a user-ini ti
ated
sub
scriber de-regis
tra
tion pro
ce
dure. For more infor
mation, please refer to the of
fi
cial Cisco
ASR 5000 MME/SGW/PGW Ad min
is
tra
tion Guides.
LTE
Image: UE de
tach
1 The UE sends NAS message Detach Request (GUTI, Switch Off) to the MME. Switch Off
indicates whether detach is due to a switch off situation or not.
2 The active EPS Bearers in the S-GW regarding this particular UE are deactivated by the
MME sending a Delete Bearer Request (TEID) message to the S-GW.
3 The S-GW sends a Delete Bearer Request (TEID) message to the P-GW
5 The P-GW may interact with the PCRF to indicate to the PCRF that EPS Bearer is
released if PCRF is applied in the network.
7 If Switch Off indicates that the detach is not due to a switch off situation, the MME
sends a Detach Accept message to the UE.
8 The MME releases the S1-MME signalling connection for the UE by sending an S1
Release command to the EnodeB with Cause = Detach.
MME Overview
MMEmgr: The MME Mgr tasks will be started when a MME ser vice config
u
ra
tion is de tected.
There will be multi
ple instances of this task for load sharing. All MME Mgrs will have all the Ac -
tive MME Ser vices con fig
ured and will be iden ti
cal in config
u
ration and capabil
i
ties. This task
will run the SCTP pro tocol stack. It will also handle SGSAP pro to
col to
wards MSC/VLR and
some of the S1AP func tion al
ity to
wards EnodeB.
Di
amproxy: MME has con nec
tion to the HSS which is run over di
ame
ter. Di
amproxy process is
the process re
spon
si
ble for main
tain
ing the Di
am
e
ter connec
tion to the HSS.
SGW Overview
The Serv
ing Gate
way routes and for
wards data pack
ets from the UE and acts as the mo
bil
ity
an
chor dur
ing in
ter-En
odeB han
dovers. The MME se
lects the best SGW for a par
tic
u
lar UE.
The fol
low
ing StarOS soft
ware processes are im
por
tant on SGW:
egt
pin
mgr: Han
dles EGTP con
trol plane in
bound over S11 (from MME) in
ter
face.
egt
peg
mgr: Han
dles EGTP con
trol plane out
bound over S5/S8 in
ter
face
sess
mgr: Sub
scriber spe
cific pro
cess
ing.
gt
pumgr: Han
dles GTP-U data plane traf
fic over S1U and S5/S8 in
ter
face
LTE
PGW Overview
The PGW is not only a node in the net work path of a call setup, it is the endpoint of the call and
it has the in
tel
li
gence built in to do deep packet inspec
tion (ECS ), pol icy con
trol (Gx), charg ing
control (Gy), and billing (Rf), and possi
bly QoS. In ad
dition, it requires three services for call
control (EGTP, PGW) and user plane (GTU). The one easy part is that since it is the end point of
the call, there is no egress call control to be con
cerned about, as there would be for SGW and
MME which will have both ingress and egress.
The fol
low
ing StarOS soft
ware processes are im
por
tant on PGW:
egt
pin
mgr: Han dles EGTP con trol plane in
bound over the S5/S8 in
ter
face and acts as demux to
pass call re
quests to sess
m
grs
sess
mgr: Sub
scriber spe
cific pro
cess
ing.
di
amproxy: PGW has con nections to var
i
ous servers (Gx/Gy) over the Di
ame
ter proto
col. Di
-
amproxy process is the process re
sponsi
ble for main
tain
ing the Di
am
e
ter con
nection to these
servers
aaamgr: Per
forms all AAA pro
to
col op
er
a
tions and func
tions for sub
scribers
vp
n
mgr: Per forms IP ad
dress pool and sub
scriber IP ad
dress man
age
ment, and also con
text
spe
cific op
er
a
tions
MME Com
mands
moni
tor protocol
moni
tor subscriber
log
ging fil
ter ac
tive fa
cil
ity mme-app level info
SGW Com
mands
PGW Com
mands
Troubleshooting MME
Overview
This sec
tion fo
cuses on MME trou
bleshoot
ing tech
niques along with ex
am
ples in ASR
5000/ASR 5500.
Basic Troubleshooting
NOTE: Al
ways cap
ture mul
ti
ple "show sup
port de
tail" in
stances for analy
sis as re
quired.
• show mme-service all Check the status (should be "STARTED")
• show mme-service db statistics Provides MME database statistics for all instances of
DB
, check "DB record limit reached".
his com
T mand is extremely use
ful for var
i
ous sta
tis
tics around the MME ser
vice. In
clud
ing
SCTP, S1AP, net
work pro
ce
dures, handovers, etc
LTE
This is the com mand to audit the im simgr process. It contains var i
ous sta
tis
tics about how
many new re quests were coming in for var
i
ous proce
dures, and if a sessmgr was found to han -
dle this re
quest, as well as MME Over load Protec
tion and a few ta bles that show dis tri
b
u
tion
per sessmgr.
SCTP Sta
tis
tics and S1AP Sta
tis
tics, trans
mit
ted, re
ceived, re-trans
mit
ted data per MMEmgr in
-
stance.
MMEMgr : Instance
Peerid : 16900006
Global EnodeB ID : 17000:208:01
Assoc UpTime : 00h03m33s
EnodeB Name :
EnodeB Type : Macro
MME Service Name : mme
MME Service Address(s) : 172.16.1.1
172.16.2.1
MME Service Port : 36412
EnodeB Port : 36412
EnodeB IP Address : 172.16.3.1
Crypto-map Name(s) : n/a
Paging DRX : 0(v32)
Supported TAI(s) : 600:20001
LTE
Troubleshooting S6a
This is the in
ter
face used by the MME to com muni
cate with the Home Sub scriber Server (HSS).
The HSS is re spon
si
ble for trans
fer of sub
scrip
tion and au then
ti
ca
tion data for authenti
cat
-
ing/au tho
riz
ing user ac
cess and UE con text au
thenti
ca
tion. The MME com mu ni
cates with the
HSSs on the PLMN using Di ame
ter pro
to
col.
• show hss-peer-service service all - check the status (should be "STARTED")
• show diameter peers full all - Run command in corresponding context. Check that
Diameter sessions are in "OPEN" state. Please refer to the Diameter chapter for details.
LTE
Troubleshooting SGs
The SGs in ter
face con
nects the data
bases in the VLR and the MME to sup
port cir
cuit switch
fall
back sce
narios.
• show sgs-service all Check the status (should be "STARTED")
MMEMGR : Instance 1
MME Reset : No
Service ID : 6
Peer ID : 17171000
VLR Name : AB01.MNC01.MCC001.3GPPNETWORK.ORG
SGS Service Name : sgs
VLR Offload : No
SGS Service Address : 172.17.1.25 172.20.70.26
SGS Service Port : 29118
VLR IP Address : 172.18.1.161 172.18.1.169
VLR Port : 29118
Assoc State : UP
Assoc Uptime : 0021d20h20m
Assoc State Up Count : 1
LTE
• monitor subscriber Collect and analyze a subscriber MME call with monitor subscriber
from the beginning of a call till an issue with verbosity 3, options S and Y.
• show subscribers mme-only full Provides information on a MME subscriber session .
• show subscribers full imsi Provides subscriber session information.
• monitor protocol Select option 70 and verbosity 3 to collect DNS queries during the
monitor subscriber test call. Note: Running monitor protocol may impact system
performance when many subscribers are connected.
• various debugs can be activated to find issues on MME side. Activating unusual level
debugs is not posing a risk to the stability of the node. Sometimes however its
necessary to use debug levels of logging; increasing the debug level has to be done with
caution.
log
ging fil
ter ac
tive fa
cil
ity mme-app level un usual
log
ging fil
ter ac
tive fa
cil
ity nas level un
usual
log
ging fil
ter ac
tive fa
cil
ity s1ap level un
usual
log
ging fil
ter ac
tive fa
cil
ity egtpc level un
usual
log
ging active
Explor
ing all the var
i
ous op
tions on how to set this up would be out
side the scope of this trou
-
bleshooting guide, but a few tips on how to trou
bleshoot this.
• First a manual DNS request can be issued from the MME on the correct context:
ex
am
ple: TAC-based DNS query is
sued from MME:
lbD7.tac-hb00.tac.epc.mnc001.mcc002.3gppnetwork.org
ex
am
ple: APN-based DNS query is
sued from MME:
lte.rtc.apn.epc.mnc001.mcc002.3gppnetwork.org
• The following logs can be enabled on MME to find which SGW/PGW are actually
selected by MME:
# log
ging fil
ter ac
tive fa
cil
ity mme-app level info.
# log
ging active
LTE
# no log
ging ac
tive
Example Scenarios
CSFB Failure
Prob
lem De
scrip
tion:
It was noted that CSFB doesn't work on phones con nected to 4G in
ter
mit
tently. The caller gets
a busy tone, and the called party doesn't re
ceive the call.
CSFB is trig
gered, MME com muni
cates with EnodeB and MSC to get sub
scriber on 2g/3g to re
-
ceive the phone call via tra
di
tional CS net
work.
Logs col
lec
tion:
Analy
sis:
Below is analy
sis of the mon
i
tor sub
scriber with notes and snip pet of the trace. MME is be
hav
-
ing as ex
pected. Please note that the CSFB speci
fi
ca
tion can be found in 3GPP TS 23.272.
In
bound pag
ing from SGS
Out
bound CS ser
vice no
ti
fi
ca
tion to UE
Out
bound ser
vice re
quest to SGs
In
bound CS ser
vice no
ti
fi
ca
tion from UE
UE con
text mod
i
fi
ca
tion
Con
text Re
lease
MME re
leases bear
ers to SGW
Pag
ing from 4G net
work be
cause there is down
link data
11 sec
onds later there is an at
tach re
quest from UE
Res
o
lu
tion:
No is
sues were found on MME side. The call was not es tab
lished and a new At tach was re-
ceived, pos
si
bly in
di
cat
ing that the UE was not under any cov
erage. More in
ves
ti
ga
tion needs to
be done on MSC/VLR side of net work.
S1AP Failures
Prob
lem De
scrip
tion:
Logs col
lec
tion:
Mon Dec 08 03:53:25 2014 Internal trap notification 1167 (MMES1AssocFail) MME S1 Association
failed; vpn s1-mme service mme-service EnodeB 123:45:12345
Mon Dec 08 03:53:46 2014 Internal trap notification 1168 (MMES1AssocEstab) MME S1 Association
established; vpn s1-mme service mme-service EnodeB 123:45:12345
Analy
sis:
This prob
lem re
quires:
1) Does it af
fect 1 or more EnodeB's? if so, an ex
ter
nal cap
ture might be re
quired to eval
u
ate
what hap pens on lower layer (IP/SCTP layer/S1AP layer) towards this En
odeB.
2) If it af
fects more than 1 En odeB, does it af
fect En odeB's con nected to the same set of
MMEMGR processes? (This can be checked in sys log). If this is the case, it could in
di
cate some
kind of overload con
di
tion on a spe
cific MMEMGR processs.
3) If it af
fects all MMEM GRs, does it also af
fect the EnodeB's as
so
ci
a
tion to other MME's (if
there are any)? If this is the only MME af fected, it could be some over load con
di
tion on this
specific MME trig gered by some event, or it could be a trans port issue on the S1AP in ter
-
face/net work.
LTE
In case (3), it might be worth to check if there was a specific event that could have caused a
very high load on all MMEMGR processes. Ex amples of such events in
clude:
Res
o
lu
tion:
Sev
eral dif
fer
ent causes can trig
ger such traps, so a broad in
ves
ti
ga
tion with the steps above is
re
quired.
LTE
Troubleshooting SGW
Overview
This sec
tion fo
cuses on SGW trou
bleshoot
ing tech
niques in ASR 5000/ASR 5500.
Basic Troubleshooting
The folow
ing sec
tion cov
ers the trou
bleshoot
ing com
mands for the SGW.
Subscribers Total:
Active: 253 Setup: 254
Released: 1
Inactivity Timeout: 0
PDNs Total:
Active: 253 Setup: 254
Released: 1 Rejected: 4482
LIPA: 0 Paused Charging: 0
PDNs By PDN-Type:
IPv4 PDNs:
Active: 253 Setup: 254
Released: 1 Rejected: 4482
IPv6 PDNs:
Active: 0 Setup: 0
Released: 0
Rejected: 0
IPv4v6 PDNs:
Active: 0 Setup: 0
LTE
Released: 0
Rejected: 0
Troubleshooting S11
The S11 inter
face is the in
ter
face between MME and SGW. It's only used for con trol plane traf
-
fic, and the EGTP (GTPv2) pro to
col is used to com
mu
ni
cate over this in
ter
face.
• show egtp-service name <egtp_ingress_service_name> : basic settings for the EGTP
service between MME and SGW
• show egtpc statistics path-failure-reasons : will provide counters and reasons for
failures. Related SNMP traps: EGTPCPathFail
Echo Request:
Total TX: 420 Total RX: 0
Initial TX: 420 Initial RX: 0
Retrans TX: 0
Echo Response:
Total TX: 0 Total RX: 420
Context : EPC
Interface Type : sgw-egress
Status : STARTED
Restart Counter : 9
Message Validation Mode : Standard
GTPU-Context : EPC
GTPC Retransmission Timeout : 5
GTPC Maximum Request Retransmissions : 4
GTPC IP QOS DSCP value : 10
GTPC Echo : Enabled
GTPC Echo Mode : Default
GTPC Echo Retransmission Timeout : 5
GTPC Echo Interval : 60
GTP-C Bind IPv4 Address : 10.1.9.1
GTP-C Bind IPv6 Address : Not configured
State: Started
Trou
bleshoot
ing a sub
scriber SGW call
• monitor subscriber Collect and analyze a subscriber SGW call with monitor subscriber
from the beginning of a call until an issue with verbosity 2, options S and Y.
• show subscribers sgw-only full imsi <imsi> Provides information on a MME subscriber
session information.
• show subscribers full imsi <imsi> Provides information on a subscriber session
information.
LTE
Troubleshooting PGW
Overview
This sec
tion's focus is PGW trou
bleshoot
ing tech
niques along with ex
am
ples in ASR 5000/ASR
5500.
Username: 123456789012598@cisco.com
Subscriber Type : Home
Status : Online/Active
State : Connected
Connect Time : Thu May 7 16:06:29 2015
Bearer QoS:
LTE
QCI: 8
ARP: 0x06c
PCI: 1 (Disabled)
PL : 11
PVI: 0 (Enabled)
MBR Uplink(bps): 0 MBR Downlink(bps): 0
GBR Uplink(bps): 0 GBR Downlink(bps): 0
P-CSCF address :
Addresses received from s6b:
Primary IPv6 : n/a
Secondary IPv6: n/a
Tertiary IPv6 : n/a
Downlink APN AMBR: 150000 Kbps Uplink APN AMBR: 60000 Kbps
Mediation context: None Mediation no early PDUs: Disabled
Mediation No Interims: Disabled Mediation Delay PBA: Disabled
input pkts: 0 output pkts: 0
input bytes: 0 output bytes: 0
input bytes dropped: 0 output bytes dropped: 0
input pkts dropped: 0 output pkts dropped: 0
output pkts dropped lorc: 0
input pkts dropped due to lorc : 0 output pkts dropped due to lorc : 0
input bytes dropped due to lorc : 0
in packet dropped suspended state: 0 out packet dropped suspended state: 0
in bytes dropped suspended state: 0 out bytes dropped suspended state: 0
in packet dropped overcharge protection: 0 out packet dropped overcharge protection: 0
in bytes dropped overcharge protection: 0 out bytes dropped overcharge protection: 0
in packet dropped sgw restoration state: 0 out packet dropped sgw restoration state: 0
in bytes dropped sgw restoration state: 0 out bytes dropped sgw restoration state: 0
pk rate from user(bps): 0 pk rate to user(bps): 0
ave rate from user(bps): 0 ave rate to user(bps): 0
sust rate from user(bps): 0 sust rate to user(bps): 0
pk rate from user(pps): 0 pk rate to user(pps): 0
ave rate from user(pps): 0 ave rate to user(pps): 0
sust rate from user(pps): 0 sust rate to user(pps): 0
• show dns-client cache client <DNS client name> query-type AAAA - ensure that all P-
CSCF FQDNs that have been returned from diameter authentication requests have a
corresponding AAAA address or else the call will fail
LTE
Troubleshooting S6b
This is the in
ter
face used by the PGW to com mu ni
cate with the Di
ame
ter au
thenti
ca
tion server
for purposes of user authenti
ca
tion, in
clud
ing IP pool group, session dura
tion and idle time-
outs.
• show diameter peers full all - Run command in corresponding context. Check that
Diameter sessions are in "OPEN" state. Please refer to the Diameter chapter for details.
• show diameter aaa-statistics [all | group server]
LTE
Overview
This chap
ter pro
vides an overview of basic trou
bleshoot
ing com
mands for GTP Path Fail
ure.
Basic Troubleshooting
In order to trou
bleshoot any kind of GTP path is
sues, the fol
low
ing com
mands will be use
ful.
The CLI commands below with debug-info options require CLI test-
commands password configured in the chassis.
Please use the command with caution! Many TEST commands are
processor-intensive and can cause serious system problems if used too
frequently.
Sta
tis
tics counter by in
ter
face:
• show egtpc statistics interface mme shows EGTPC statistics on S11 interface on MME
side.
• show egtpc statistics interface sgw-ingress shows EGTPC statistics on S11 interface on
SGW side.
• show egtpc statistics interface sgw-egress shows EGTPC statistics on S5/S8 interfaces
on SGW side.
• show egtpc statistics interface pgw-ingress show EGTPC statistics on S5/S8 interfaces
on PGW side.
Context : spgw
Interface Type : pgw-ingress
Status : STARTED
Restart Counter : 42
Message Validation Mode : Standard
GTPU-Context : spgw
GTPC Retransmission Timeout : 5
GTPC Maximum Request Retransmissions : 3
GTPC IP QOS DSCP value : 10
GTPC Echo : Enabled
GTPC Echo Mode : Default
GTPC Echo Retransmission Timeout : 5
GTPC Echo Interval : 60
GTP-C Bind IPv4 Address : 192.168.5.52
GTP-C Bind IPv6 Address : Not configured
GTPC path failure detection policy
Echo Timeout : Enabled
Total Peers: 4
"egtpc test echo" is used to check a spe cific peer to see if it is reach
able or not and must be run
in the con
text where the ser vice is de
fined.
• Using ping command is not a valid test, though if it is successful, one knows that there
is some level of reachability.
• The peer restart counter (Recovery) is displayed.
• Running this command will increment the Tx echo request / Rx Echo response
counters in "show egtpc statistics"
[EGTP Context]PGW> egtpc test echo gtp-version 2 src-address <src IP> peer-address <peer IP>
EGTPC test echo
---------------
Peer: <peer IP> Tx/Rx: 1/1 RTT(ms): 83 (COMPLETE) Recovery: 14
(0x0E)
Example Scenarios
Prob
lem De
scrip
tion:
Trou
bleshoot
ing steps and analy
sis:
Gen
eral trou
bleshoot
ing steps are listed below.
Below cou
ple ex
am
ples with SNMP traps.
In
ter
nal trap no
ti
fi
cation 1112 (EGTPCPath
Fail) con text Gn, ser
vice GGSN, in
ter
face type ggsn,
self ad
dress 10.10.10.
10, peer ad dress 20.20.
20.
20, peer old restart counter 191, peer new
restart counter 59, peer ses sion count 21, fail
ure reason echo-rsp-restart-counter-change
In
ter
nal trap no
ti
fi
ca
tion 1112 (EGTPCPathFail) con
text mme_ctx, ser vice mme_svc_egtp, in -
ter
face type mme, self ad dress 10.
10.
10.
10, peer address 20.
20.
20.
20, peer old restart counter
3, peer new restart counter 20, peer ses sion count 1994, failure reason restart-counter-
change
LTE
The fol
lowing are path fail ported from "show egtpc stat path-fail
ure counts that are re ure-rea
-
sons" on a PGW.
It is impor
tant to un der
stand how EGTPC/GTPC and to a lesser de gree EGTPU/GTPU (as
user-plane failure de tec
tion is not al
ways con
fig
ured) path fail
ure de
tec
tion is im
plemented to
mon i
tor the connection between EGTP nodes. A PCAP will likely be needed to con firm that the
mes sag
ing is tak
ing place as expected.
LTE
Handover Troubleshooting
Overview
This sec
tion pro
vides overview of basic trou
bleshoot
ing com
mands for Han
dover is
sues in LTE.
Basic Troubleshooting
The fol
low
ing CLIs are use
ful in iden
ti
fy
ing is
sues dur
ing han
dover sce
nar
ios:
• monitor subscriber - Set verbosity to 3 and select options Y and S
• packet capture
• monitor protocol - Set verbosity to 3 and option 70 for DNS
• show support details
• dns-client query client-name <dns name>
• show dns-client cache
• show mme-service statistics -Provides an overview of handover statistics, see below
snippet:
Handover Statistics:
Intra MME Handover:
X2-based handover:
Attempted: 4867039 Success: 4866487
Failures: 552
S1-based handover:
Attempted: 31977 Success: 7955
Failures: 24022
EUTRAN<-> EUTRAN using S10 Interface:
Outbound relocation using TAU procedure:
Attempted: 0 Success: 0
Failures: 0
Outbound relocation using S1 HO procedure:
Attempted: 0 Success: 0
Failures: 0
Inbound relocation using TAU procedure:
Attempted: 448 Success: 0
LTE
Failures: 448
Inbound relocation using S1 HO procedure:
Attempted: 0 Success: 0
Failures: 0
EUTRAN<->UTRAN(Iu mode) SRNS Relocations using Gn/Gp Interface:
Outbound relocation:
Attempted: 0 Success: 0
Failures: 0
Inbound relocation:
Attempted: 0 Success: 0
Failures: 0
EUTRAN<->GERAN(A/Gb Mode) PS Handovers using Gn/Gp Interface:
Outbound relocation:
Attempted: 0 Success: 0
Failures: 0
Inbound relocation:
Attempted: 0 Success: 0
Failures: 0
EUTRAN<->UTRAN/GERAN(Iu or A/Gb mode) Cell Reselections using Gn/Gp Interface:
Outbound relocation using RAU procedure:
Attempted: 1687207 Success: 1593639
Failures: 93568
Inbound relocation using TAU procedure:
Attempted: 1469349 Success: 1414495
Failures: 54854
EUTRAN<->UTRAN(Iu mode) Inter-RAT Handovers using S3 Interface:
Outbound relocation:
Attempted: 0 Success: 0
Failures: 0
Inbound relocation:
Attempted: 0 Success: 0
Failures: 0
EUTRAN<->GERAN(A/Gb Mode) Inter-RAT Handovers using S3 Interface:
Outbound relocation:
Attempted: 0 Success: 0
Failures: 0
Inbound relocation:
Attempted: 0 Success: 0
Failures: 0
LTE
Example Scenarios:
Prob
lem De
scrip
tion:
Logs col
lec
tion:
Analy
sis:
Ac
cording to moni
tor subscriber and exter
nal packet capture, UE is coming back to 4G ser
vice
and sending TAU re quest (com bined TA/LA up dat
ing with IMSI attach). The MME does not
have in
for
mation about the sub scriber, and it uses in
for
mation from the "old GUTI" in the TAU
re
quest to lookup the call.
Snip
pet of the PCAP file
NAS-PDU
EMM cause
mme-service MME
nri length 5 plmn-id mcc 001 mnc 001
Ac
cord
ing to 3GPP TS 23.003 V9.9.0 (2011-12) the fol
low
ing map
pings are being used.
E?UTRAN <MME Code> maps to GERAN/UTRAN <RAC> and is also copied into the 8 most sig
-
nif
i
cant bits of the NRI field within the P?TMSI;
DNS re
quests sent by the MME are the fol
low
ings.
Below are ex ples of the DNS test queries – DNS replied with "DNS record does not exist".
am
LTE
Snip
pet of test com
mands:
Res
o
lu
tion:
Overview
This chap
ters cov
ers the ses
sion trou
bleshoot
ing com
mands used in ASR 5000/ASR 5500.
Here is an ex
am
ple of the com
mand as it would be en
tered into the sys
tem using the IMSI pa
ra
-
me
ter:
notation
username - Name of specific user within current context. Must be followed by user
name
Session Troubleshooting
For mon i
tor sub
scriber, here is an ex
am
ple with some of the more com
mon op
tions (high
-
lighted in blue):
87 - PPPOE (ON )
88 - RTP(IMS) (OFF) 89 - RTCP(IMS) (OFF)
91 - NPDB(IMS) (OFF)
92 - SABP (ON )
94 - SLS (ON )
96 - SBc-AP (ON )
Sam
ple mon
i
tor sub
scriber trace:
----------------------------------------------------------------------
Incoming Call:
----------------------------------------------------------------------
MSID/IMSI : 50604000000002 Callid : 017dc661
IMEI : n/a MSISDN : 9876543210
Username : 9876543210 SessionType : ggsn-pdp-type-ipv4
Status : Active Service Name: local
Src Context : Gn
----------------------------------------------------------------------
Version number: 1
Protocol type: 1 (GTP C/U)
Extended header flag: Not present
Sequence number flag: Present
NPDU number flag: Not present
Message Type: 0x11 (GTP_CREATE_PDP_CONTEXT_RES_MSG)
Message Length: 0x0058 (88)
Tunnel ID: 0x00000400
Sequence Number: 0x7FFF (32767)
GTP HEADER ENDS.
INFORMATION ELEMENTS FOLLOW:
Cause: 0x80 (GTP_REQUEST_ACCEPTED)
Reorder Required: 0x0 (Not present)
Recovery: 0x8D (141)
Tunnel ID Data I: 0x80000006
Tunnel ID Control I: 0x80000006
Charging ID: 0x01AAAAAA
END USER ADDRESS FOLLOWS:
PDP Type Organisation: IETF
PDP Type Number: IPv4
IPv4 Address: 172.1.1.1
END USER ADDRESS ENDS.
PROTOCOL CONFIG. OPTIONS FOLLOW:
IE Length: 0x12 (18)
Configuration Protocol: (0) PPP
Extension Bit: (1)
Overview
The util
ity dis
plays in
for
mation for all ses
sions that are currently being processed. This can
cause a large amount of data to be displayed, depending on the number of pro
to
cols moni
tored,
and the num ber of ses
sions in progress. Log ging should be enabled on the termi
nal client to
record all of the in
for
mation that is gen
er
ated. A moni
tor pro
to
col on a busy in
ter
face (for ex
-
ample, S1AP) should be planned dur ing a main
tenance window.
Troubleshooting Command
Below are the in
struc
tions to start a mon
i
tor pro
to
col ses
sion:
1 Run from the Exec mode by entering the "monitor protocol" command. An output listing
all the currently available protocols, each with an assigned number, is displayed.
4 Press B to begin the protocol monitor, and the following warning will be displayed:
WARNING!!! You have selected options that can DISRUPT USER SERVICE Existing CALLS MAY BE
DROPPED and/or new CALLS MAY FAIL!!! (Under heavy call load, some debugging output may not be
displayed) Proceed? - Select (Y)es or (N)o
C - Control Events (ON ) D - Data Events (ON ) E - EventID Info (ON ) H - Disply
ethernet (ON ) I - Inbound Events (ON ) O - Outbound Event (ON ) S - Sender Info (OFF) T
- Timestamps (ON ) X - PDU Hexdump (OFF) A - PDU Hex/Ascii (OFF) +/- Verbosity Level
( 1) L - Limit Context (OFF) M - Match Newcalls (ON ) R - RADIUS Dict (no-override) G -
GTPP Dict (no-override) Y - Multi-Call Trace ((OFF)) (Q)uit, <ESC> Prev Menu,
<SPACE> Pause, <ENTER> Re-Display Options
Session Troubleshooting
8 Press the Enter key to refresh the screen and begin monitoring. To quit the protocol
monitor and return to the prompt, press q.
Session Troubleshooting
Overview
This chap
ter pro
vides basic trou
bleshoot
ing com
mands for sub
scriber re
lated is
sues in ASR
5000/ASR 5500.
Troubleshooting Commands
The fol
lowing com mands can be used to debug ses sions. Most of these com
mands sup
port
an op
tion to limit the out
put to one sub
scriber (imsi/msid/user
name/ip/...).
• To look at a specific session, use the "show subscriber imsi <imsi>. Other options can be
used such as username and ip-address:
• In order to view the sessions for a specific access type, this can be specified as in the
following examples: Use the "show subscriber pgw-only" command to view a list of
subscriber sessions on the PGW. Use the "show subscriber ggsn-only" to view a list of
subscriber sessions on the GGSN. Other access types can be specified as required.
pmip-pdn-type-ipv4 : 1677
pmip-pdn-type-ipv6 : 55391
pmip-pdn-type-ipv4-ipv6 : 57441
gtp-pdn-type-ipv4 : 38069
gtp-pdn-type-ipv6 : 618710
gtp-pdn-type-ipv4-ipv6 : 594264
ip-type-static : 34980 ip-type-local-pool : 655423
ip-type-unknown : 1042
• To show session info for a specific access type, use "show subscriber pgw-only imsi
<imsi>":
• To show detailed output of the session, use "show subscriber full imsi <imsi>.
• To show detailed output of the session for a certain access type, use "show subscriber
pgw-only full imsi <imsi>". Different access types can be specified here, but the
difference with the pgw-only keyword versus the other call types is that the output for
the "full" version of pgw-only is different than the standard "show sub full" command,
whereas for the other call types, the output is the same as the standard "show sub full".
Session Troubleshooting
Username: 123456792@web
Subscriber Type : Home
Status : Online/Active
State : Connected
Connect Time : Tue May 5 14:38:05 2015
Auto Delete : No
Bearer QoS:
QCI: 6
ARP: 0x04
PCI: 0 (Enabled)
PL : 1
PVI: 0 (Enabled)
MBR Uplink(bps): 0 MBR Downlink(bps): 0
GBR Uplink(bps): 0 GBR Downlink(bps): 0
Downlink APN AMBR: 20000 bps Uplink APN AMBR: 10000 bps
Mediation context: None Mediation no early PDUs: Disabled
Mediation No Interims: Disabled Mediation Delay PBA: Disabled
input pkts: 0 output pkts: 0
input bytes: 0 output bytes: 0
input bytes dropped: 0 output bytes dropped: 0
input pkts dropped: 0 output pkts dropped: 0
input pkts dropped due to lorc : 0 output pkts dropped due to lorc :
0
input bytes dropped due to lorc : 0
in packet dropped suspended state: 0 out packet dropped suspended state:
0
in bytes dropped suspended state: 0 out bytes dropped suspended state: 0
in packet dropped overcharge protection: 0 out packet dropped overcharge protection:
0
in bytes dropped overcharge protection: 0 out bytes dropped overcharge protection:
0
in packet dropped sgw restoration state: 0 out packet dropped sgw restoration state:
0
in bytes dropped sgw restoration state: 0 out bytes dropped sgw restoration state:
0
pk rate from user(bps): 0 pk rate to user(bps): 0
ave rate from user(bps): 0 ave rate to user(bps): 0
sust rate from user(bps): 0 sust rate to user(bps): 0
Session Troubleshooting
• To show the session data rate, use "show subscriber data-rate". This command shows
the data rate from and to the user in bytes and packets per second. As with all
"show"sub commands, further qualifiers can be used to narrow to a specific IP pool,
sessmgr instance #, etc.
Session Troubleshooting
• To show the session debug info, use "show subscriber debug-info". This command
provides additional engineering level info:
NEMO Detail:
Peer bond: NO Peer Callid: 00000000
Mode: N/A Multi-VRF Enabled: NO
NEMO Detail:
Peer bond: NO Peer Callid: 00000000
Mode: N/A Multi-VRF Enabled: NO
NEMO Detail:
Peer bond: NO Peer Callid: 00000000
Mode: N/A Multi-VRF Enabled: NO
• To show the active charging sessions on the chassis, use "show active-charging session
summary":
• To see the active charging session options for a specific session, use "show active-
charging sessions full imsi <imsi>":
NAS-ID: n/a
Access-NAS-ID(FA): n/a
3GPP2-BSID: n/a
Access-Correlation-ID(FA): n/a
3GPP2-Correlation-ID: n/a
MEID: n/a
Carrier-ID: 123456 ESN: n/a
PCO:
Value/Interface: n/a
Uplink Bytes: 0 Downlink Bytes: 0
Uplink Packets: 0 Downlink Packets: 0
Injected Uplink Packets: 0 Injected Downlink Packets: 0
Buffered Uplink Packets: 0 Buffered Downlink Packets: 0
Buffered Uplink Bytes: 0 Buffered Downlink Bytes: 0
Uplink Packets in Buffer: 0 Downlink Packets in Buffer: 0
Buff Over-limit Uplink Pkts: 0 Buff Over-limit Downlink Pkts: 0
Dropped Uplink Packets: 0 Dropped Downlink Packets: 0
Uplink Out of Order Packets: 0 Downlink Out of Order Packets: 0
Dyn FUI Redirected Flows: 0 Dyn FUI Discarded Pkts: 0
ITC Terminated Flows: 0 ITC Redirected Flows: 0
ITC Dropped Packets: 0 ITC ToS Remarked Packets: 0
ITC Dropped Upl Pkts: 0 ITC Dropped Dnl Pkts: 0
Flow action Terminated Flows: 0
PP Flow action Terminated Flows: 0
PP Dropped Packets: 0
CC Dropped Uplink Packets: 0 CC Dropped Uplink Bytes: 0
CC Dropped Downlink Packets: 0 CC Dropped Downlink Bytes: 0
NRUPC Req Made: 0 NRUPC Req Success: 0
NRUPC Req Failed: 0 NRUPC Req Time Out: 0
Bearer Bandwidth Limiting: Enabled
Uplink MBR (bps): 0 Downlink MBR (bps): 0
Uplink GBR (bps): 0 Downlink GBR (bps): 0
Uplink Burst (bytes): 0 Downlink Burst (bytes): 0
Dropped Uplink Pkts: 0 Dropped Downlink Pkts: 0
Creation Time: Tuesday May 05 18:38:05 GMT 2015
Last Pkt Time:
Duration: 00h:07m:19s
Rule Base name: lab_test
URL-Redir First-Request-Only: n/a
Session Troubleshooting
• To see the status of the sessions on the chassis, use "show session progress". This
command will show the total sessions and break out the various access types:
• To see the history of sessions for the last 48 hours, use "show session counters
historical all". This command will show sessions that arrived within a 15 minute interval,
including calls rejected and failed. This can be a good historical record showing where a
problem occurred and the related impact. The command applies to all calls on the
chassis and so if there is more than one call type, it will not distinguish between them,
and so one should use other stats that are protocol specific in those scenarios.
• To see the various disconnect reasons for the sessions, use "show session disconnect-
reasons":
• To see the setup time for the sessions, use "show session setuptime". This command
can help determine if there are delays in the setup time for the sessions, which could be
caused by problems both internal and external to the chassis:
• To see the details for the various subsystems for the sessions, use "show session
subsystem [facility <sessmgr>] [instance <instance> | all]". This command provides
engineering level statistics for various tasks including session manager (sessmgr), aaa
manager (aaamgr), etc. and is very powerful.
Session logging
Overview
An overview of differ
ent logging op
tions re
lated to all soft
ware processes and sub
scriber traces
are cov
ered in this chapter.
Logging Description
There are four types of log
ging that can be con
fig
ured and viewed on the sys
tem, and a few of
them are set from config mode.
config
logging filter runtime facility <facility name> level <level> [critical-info |no-critical-
info]
end
Re
place <fa ity name>with the rel
cil e
vant one from the avail
able list.
wimax-r6 - WiMAX R6
[local]asr5000(config)#
Logging Mon i
tor: can be en abled to record all activ
i
ties as
soci
ated with a particu
lar ses
sion in
-
cluding the same out put as cap tured with mon i
tor sub scriber, plus all of the chas sis in
ter
nal
process
ing, which can be very ex ten
sive. This option is helpful in trou
bleshoot ing in
ter
mittent
is
sues pertaining to a sub scriber and is useful when there is a need to keep trace run ning for a
longer time un at
tended. The op tions avail
able are msid / user name or ip ad dress. It will not
capture an ex ist
ing call if enabled while the call is con nected - the call must re connect to be
captured. This log ging is enabled / disabled via con fig mode, e.g.
To en
able:
config
logging monitor msid <Mobile Station Identifier number>
end
To re
move:
config
no logging monitor msid <Mobile Station Identifier number>
end
Trace log
ging: can be used to quickly iso late is
sues that may arise for a spe
cific session. Traces
can be taken for a specific call iden
ti
fi
ca
tion (cal
lid) number, IP ad
dress, mo
bile station identi
fi
-
ca
tion (MSID) num ber, or user name. Trace log ging can only be en abled successfully for cur -
rently con
nected sub scriber ses sions (which is why it has the op tion to mon i
tor by call id,
whereas log
ging mon i
tor does not).
Active log
ging: can be used for trou bleshoot ing ac
tive sessions. This can be con fig
ured on a per
adminis
tra
tive user basis (i.e. each adminis
trative user can have their own in di
vidual active log-
ging sessions). Ac
tive logs con fig
ured by an ad minis
tra
tive user in one CLI in stance can not be
viewed by an ad min is
tra
tive user in a dif
ferent CLI instance. Each ac tive log can be con fig
ured
with fil
ter and dis
play prop er
ties that are independent of those con fig
ured glob ally for the sys -
tem in the run time log. Ac tive logs are dis
played in real time, on the man age
ment user’s ter mi-
nal dis
play, as they are gen er
ated. The log ging con fig
ured is tem po
rary while the CLI ses sion is
ac
tive, and is lost if the session times out or the CLI win dow is closed. e.g. type "log ging ac ‐
tive" to start active logging and "no log ging ac tive" to stop it.
Ac
tive log
ging can be ex
e
cuted with the fol
low
ing com
mand:
Here is an ex
am
ple of the com
mand and out
put:
crit
i
cal: logs only those events in
di
cating a se
ri
ous error has oc curred and is caus ing the sys
-
tem or a system com ponent to cease function
ing. This is the high
est sever
ity level.
un
usual: logs events that are very un
usual and may need to be in
ves
ti
gated. This level also logs
events with a higher sever
ity level.
Static Routing
Overview
This sec
tion covers sta
tic rout
ing on ASR 5000/5500 as well as trou
bleshoot
ing com
mands
and an ex
ample scenario.
Configuration
ASR 5000/5500 sup ports sta
tic routes sim
i
lar to con
ven
tional routers. The sta
tic routes con
-
fig
u
ra
tion is per con
text.
ip route [net
work/net
mask] [nex
thop-ad
dress] [in
ter
face name]
• show ip route
• show ip static-route
• show ip interface
Below is a sam
ple out
put of "show ip sta
tic-route".
Case study
What hap
pens if there are 2 sta
tic route en
tries for same route?
For ex
am
ple, in the fol
low
ing sta
tic route en
tries, which route will be used?
The rout
ing table will look like this. Both routes are se
lected as the best route.
OSPF Routing
Overview
This sec
tion cov
ers OSPF rout
ing on ASR 5000 / 5500 with mul
ti
ple trou
bleshoot
ing com
-
mands and two scenar
ios.
OSPF Version 2
OSPF is a link-state rout
ing pro
to
col that employs an in
te
rior gateway proto
col (IGP) to route
IP pack
ets using the shortest path first, based solely on the desti
na
tion IP ad
dress in the IP
packet header. OSPF-routed IP pack
ets are not en
cap
su
lated in any ad
di
tional pro
to
col head
ers
as they tran
sit the net
work.
OSPF allows sets of networks to be grouped to gether. Such a grouping is called an area. The
topol
ogy of this area is hid
den from the rest of the AS, which enables a sig
nif
i
cant re
duc
tion in
rout
ing traf
fic. Also, rout
ing within the area is deter
mined only by the area‘s own topol ogy,
lend
ing the area pro
tec
tion from bad rout
ing data. An area is a gen
er
al
iza
tion of an IP sub
net
-
Routing Protocol
ted net
work. OSPF en
ables the flex
i
ble con
fig
u
ra
tion of IP sub
nets so that each route dis
trib
-
uted by OSPF has a des ti
na
tion and mask. Two dif fer
ent sub nets of the same IP net work num -
ber may have dif fer
ent masks (i.e. vari able-length sub-net ting). A packet is routed to the best
(longest or most spe cific) match. Ex ter
nally de
rived rout
ing data (for ex ample, routes learned
from an ex te
rior protocol such as BGP) can be tagged by the ad vertis
ing router, enabling the
passing of ad
di
tional information between routers on the bound ary of the AS. OSPF uses a link-
state al
go
rithm to build and cal culate the short
est path to all known des ti
na
tions.
Im
ple
ment
ing basic OSPFv2 Con
fig
u
ra
tion
OSPFv3
OSPFv3 is very sim i
lar to OSPF ver sion 2. However, OSPFv3 ex pands on OSPF ver sion 2 to pro-
vide sup
port for IPv6 rout ing prefixes and the larger size of IPv6 ad
dresses. OSPFv3 dy nami
cally
learns and advertises (redis
trib
utes) IPv6 routes within an OSPFv3 rout ing domain. Enabling
OSPFv3 on an in terface will cause a rout ing process and its asso
ci
ated con
fig
u
ra
tion to be cre-
ated.
Redistrib
ut
ing routes into OSPFv3 (op tional config
ura
tion) means any routes from an other pro
-
tocol that meet a spec i
fied cri
te
rion, such as route type, met ric, or rule within a route-map, are
redis
tributed using the OSPFv3 pro to
col to all OSPF areas. In ASR 5500, the de signer has the
flex
i
bil
ity to config
ure and invoke the func tional
ity of OSPFv2, OSPFv3, sta tic routes – for IPv4
and IPv6, de pend ing on the specific network design needs.
OSPF re lated commands can be run only from the con text where OSPF is con fig
ured. Given
below are some CLIs that can be used for trou
bleshoot
ing on an ASR 5x00 plat
form followed by
sample output from a lab setup.
• show ip ospf
• show ip ospf neighbor details
• show ip ospf database verbose
• show ip ospf routes
• ping <ospf neighbor ip> src <interface / loopback address>
For cap
tur
ing ac
tive logs
# show ip ospf
LS age: 1052
Options: 0x2 (*|-|-|-|-|-|E|-)
Flags: 0x2 : ASBR
LS Type: Router Link States (1)
Link State ID: 192.168.7.2
Advertising Router: 192.168.7.2
LS Seq Number: 8000100e
Checksum: 0xd694
Length: 36
Number of Links: 1
LS age: 1482
Options: 0x2 (*|-|-|-|-|-|E|-)
LS Type: AS External Link States (5)
Link State ID: 192.168.7.2 (External Network Number)
Advertising Router: 192.168.7.2
LS Seq Number: 80001001
Checksum: 0x1faa
Length: 36
Network Mask: /32
Metric Type: 2 (Larger than any link state path)
TOS: 0
Metric: 20
Forward Address: 0.0.0.0
External Route Tag: 0
C 192.168.10.0/24 [10]
via 192.168.10.249, gi
Example Scenarios
Prob
lem De
scrip
tion:
OSPF neigh
bor does not move to Full sta
tus, but runs into a loop Init -> ExS
tart.
Analy
sis:
DBsmL
DBsmL
DBsmL
If the neigh
bor was in "Full" sta
tus be
fore, the fol
low
ing snmp trap will be seen.
Res
o
lu
tion:
OSPF Neigh
bor
ship Down
Prob
lem De
scrip
tion:
OSPF neigh
bor
ship went down for OSPF neigh
bors and will not re
cover.
Analy
sis:
Wed Jun 04 21:57:13 2014 Internal trap notification 1001 (OSPFNeighborDown) vpn ip_pool_ctx interface
ip_pool_ctx_if3 ipaddr 172.25.68.42 ospf neighbor 172.16.255.67 transitioned from state full to down
Wed Jun 04 21:57:13 2014 Internal trap notification 1001 (OSPFNeighborDown) vpn ip_pool_ctx interface
ip_pool_ctx_if4 ipaddr 172.25.68.46 ospf neighbor 172.16.255.67 transitioned from state full to down
Neighbor ID Pri State Dead Time Address Interface RXmtL RqstL DBsmL
bonus_fails -------------------------------------------------------------------+ |
length_fails ------------------------------------------------------------+ | |
data_fails --------------------------------------------------------+ | | |
last_valid_packet_number ---------------------------------+ | | | |
valid_rcvd_pkts ---------------------------------+ | | | | |
received_pkts --------------------------+ | | | | | |
sent_pkts ---------------------+ | | | | | | |
gen_pkts -------------+ | | | | | | | |
cpu/inst + | | | | | | | | |
sl | | | | | | | | | |
1 0/0 -> 6076 6076 204 204 6022 -> 0 0 0 5818
2 0/0 -> 526736 526736 526736 526736 526736 -> 0 0 0 0
4 0/0 -> 2114170681 2114170681 2114170681 2114170681 2114170681 -> 0 0 0
0
11 0/0 -> 2114102704 2114102704 2114102704 2114102704 2114102704 -> 0 0 0
0
14 0/0 -> 537803 537803 537803 537803 537803 -> 0 0 0 0
16 0/0 -> 209685 209685 209685 209685 209685 -> 0 0 0 0
Routing Protocol
lost_fails --------------------------------------------------------------------------+
bonus_fails -------------------------------------------------------------------+ |
length_fails ------------------------------------------------------------+ | |
data_fails --------------------------------------------------------+ | | |
last_valid_packet_number ---------------------------------+ | | | |
valid_rcvd_pkts ---------------------------------+ | | | | |
received_pkts --------------------------+ | | | | | |
sent_pkts ---------------------+ | | | | | | |
gen_pkts -------------+ | | | | | | | |
cpu/inst + | | | | | | | | |
sl | | | | | | | | | |
1 0/0 -> 12950 12950 453 453 12949 -> 0 0 0 12496
2 0/0 -> 533610 533610 533610 533610 533610 -> 0 0 0 0
4 0/0 -> 2114177548 2114177548 2114177548 2114177548 2114177548 -> 0 0 0
0
11 0/0 -> 2114109578 2114109578 2114109578 2114109578 2114109578 -> 0 0 0
0
14 0/0 -> 544677 544677 544677 544677 544677 -> 0 0 0 0
16 0/0 -> 216552 216552 216552 216552 216552 -> 0 0 0 0
Res
o
lu
tion:
The Demux card was mi grated and the SFT counter kept in crement ing. This pointed to a tran
-
sient issue on the card. OSPF was reestab
lished to Full state after the card migra
tion.
Routing Protocol
BGP Routing
Overview
This sec
tion cov
ers BGP rout
ing on ASR 5000/ASR 5500 with mul
ti
ple trou
bleshoot
ing com
-
mands and two scenar
ios.
BGP Configuration
ASR 5000/ASR 5500 sup ports Border Gateway Pro
to
col (BGP) which is an inter-AS routing
pro
to
col. BGP also may be used as a moni
tor
ing mech
a
nism for In
ter-chas
sis Ses
sion Re
covery
(ICSR). BGP-4 may trig
ger an active-to-standby switchover to keep subscriber ser
vices from
being in
ter
rupted.
Image: Topol
ogy
Routing Protocol
config
context contextname
The fol
lowing commands should only be done upon rec ommenda
tion from Cisco Support as
in
creas
ing the log
ging too high may risk stress on the sys
tem and impact sub
scribers.
Example Scenarios
Prob
lem De
scrip
tion:
Analy
sis:
This is ex
pected be
hav
ior by de
sign.
After the Demux card switchover event, the BGP process is restarted on the new Demux card.
Routing Protocol
Res
o
lu
tion:
BGP is miss
ing path from MPLS VPN (vrf). ASR 5000 was see ing only one route from one CE.
The ex
pected behav
ior was two paths to the af
fected net
works:
Logs col
lec
tion:
Analy
sis:
Res
o
lu
tion:
ip vrf test2
rd 65392:700
route-tar
get ex
port 65392:700
route-tar
get im
port 65392:700
Routing Protocol
The so
lu
tion was to change the rd in one of the routers:
ip vrf test2
rd 65392:706 <<<<<< dif
fer
ent rd
route-tar
get ex
port 65392:700
route-tar
get im
port 65392:700
Chang ing the rd made the route unique and load bal
anc
ing can now be be achieved.
After this change the out
put was seen as below (two paths to the same net
work):
Local
Origin Incomplete, metric 50, localpref 100, weight 0, valid, internal, best
In this case, the symptom was seen in the ASR 5k but the ori gin of the issue was on the up-
stream routers. Be cause there were 2 PEs ad ver
tis
ing the same net
works, they are re
quired to
have dif
fer
ent rd (route-dis
tin
guisher) val
ues to make the routes unique for MP-BGP.
Interchassis Session
Recovery
Interchassis Session Recovery
Overview
This sec
tion talks about how re
dun
dancy works in the ASR 5x00 platform using ICSR, pro
vides
trou
bleshooting commands, and then goes over com
mon is
sues re
lated to ICSR.
ICSR Architecture
Inter Chassis Ses
sion Re covery (ICSR) provides a solu
tion using a pair of re
dundant nodes (i.e.
HA and PGWs) where one node is Ac tive, handling subscribers, and the other is Standby, ready
to handle those sub scribers in case the Ac tive chassis has an issue and needs to switch over.
This archi
tec
ture is ben e
fi
cial not only for outage sce nar
ios but also for up
grades where the
subscribers can main tain their connections while both nodes are up graded, one after the other,
and switch ing over in be
tween.
Al
though not re quired, both nodes are typ i
cally ge
o
graph i
cally sep
a
rated from each other. They
would be con fig
ured simi
larly with the same ser vices, IP pools, contexts, and hard
ware, and
even some of the same loop back addresses used by the ser vices run
ning on the chassis. Each
chas
sis will have its own specific phys
i
cal ad
dresses.
Be
sides the abil ity of being able to man u
ally switch ses sions over to the Standby, the Ac tive
chassis is also able to moni
tor it
self for BGP, Di
am e
ter, and Radius con nectiv
ity in var
i
ous con-
texts. If the spec i
fied proto
cols being mon i
tored lose con nectiv
ity, then an un planned
switchover will occur.
port network in the event of a switchover. To do this, BGP is used, and in par tic
u
lar, BGP han -
dles route mod i
fiers. The Ac tive chassis will al
ways be the one that has the lower-num bered
BGP route mod i
fier, while the Standby chas sis will have a route mod i
fier value (usu ally one)
greater than the ac tive. This will re
sult in more pre-pends added to BGP packet ad ver tisements
com ing from the nodes, and as a re sult, net
work traf fic will be pointed to the chas sis send ing
updates that have fewer pre-pends. BGP route ad vertisements from the Standby are highly re -
stricted and most of the loop backs are in 'down' state. This fur ther re
duces the like li
hood of in -
tro
ducing rout
ing issues into the trans port network.
Useful commands:
ICSR Troubleshooting
Overview
This chap
ter talks about known SRP is
sues and how to trou
bleshoot them.
9 Verify that both chassis are configured with the correct Peer Remote Address
10 If all of the above has been verified, and the SRP Connection remains down, please open
a case with Cisco
5 Display the configuration that SRP uses to generate the SRP Checksum by issuing the
command "show configuration srp"
• Save the displayed data to a text file.
6 Now log into the normally SRP Standby chassis and execute steps 2-5
7 Compare the checksum from both the SRP Active and SRP Standby chassis
• If the checksums are the same and the "Invalid Checksum" is still displayed in
the "show srp info" command, please open a case with Cisco for further
troubleshooting
8 Com
pare the two text files con
tain
ing the "show con
fig
u
ra
tion srp" data line- by-line to
en
sure there are no dif
fer
ences in the con
fig
u
ra
tion. Using a diff util
ity can save time
and avoid miss
ing dif
fer
ences.
If no dif
fer
ences are found in the text files, please open a case with Cisco for fur
ther
trou
bleshoot
ing
9 If differences were found in the configuration, modify the configuration so that both
chassis match.
2 Log into the chassis that was formerly the SRP Active chassis
3 Issue the "show srp info" command and verify the chassis state is "Standby" and the
"Connection State" is "Connected"
5 Review system logs and SNMP Traps that occurred during the period leading up to the
SRP switchover
• show logs
• show snmp trap history verbose
SRPStandy trap indicates when chassis went standby
Some traps to look for include BGPPeerSessionDown,
DiameterIPv6PeerDown, DiameterPeerDown, SRPAAAAuthSrvUnreachable
leading up to the switchover
6 If the SRP switchover cannot be explained with the data collected, please open a case
with Cisco for help in identifying the trigger event.
Warning:
Wait 15 minutes after any crash, task restart, card migration on the same
chassis pair.
Wait 20 minutes after any previous SRP switchover on the same chassis
pair.
Failure to wait the required time could have session and billing impact.
1 Perform the Health Check steps detailed in the ASR 5x00 Health Check chapter
2 Log into the Active and Standby chassis to verify their "Chassis State:"
• context local
• show srp info
Chassis State: Active (From)
Chassis State: Standby (To)
• If the SRP Link has been removed, these SRP commands will give invalid results.
3 Perform these SRP switchover pre-checks on both the Active and Standby chassis.
• show session recovery status verbose
Verify all are in the "Good" state
• show card table
Verify the status of all cards (Active/Standby)
• show task resources | grep -v good
Should only show the total
• show crash list
Verify there was no new crash within the last 15 minutes
5 Verify all sessmgrs are in the standby-connected state on the Standby chassis
• show srp checkpoint statistics | grep Sessmgrs
Verify the "Number of Sessmgrs:" is equal to "Sessmgrs in Standby-
Connected state:"
If these values ARE NOT the same, do not proceed with an SRP
switchover, and open a case with Cisco for further troubleshooting
6 Validate the Active chassis is ready for the SRP switchover
• srp validate-configuration
• srp validate-switchover
Wait at least 15 seconds
• show srp info | grep "Last Validate Switchover Status:"
The "Last Validate Switchover Status:" should display "Remote Chassis -
Ready for Switchover"
Please open a case with Cisco for further troubleshooting if "Remote
Chassis - Ready for Switchover" is not displayed after waiting at
least 10 minutes
7 The system is now ready for a SRP switchover
• srp initiate-switchover
• Answer "yes" when the system asks "Are you sure?"
• After a successful SRP switchover the SRP Active chassis will now become the
SRP Standby chassis
7 Perform the Health Check steps detailed in the ASR 5x00 Health Check chapter.
Radius
Radius
Overview
Ra
dius au
then ti
cation and ac
counting are typi
cally com po nents on call flows for CDMA, UMTS,
and LTE. Some times when trou bleshoot ing call fail
ures, it is not im
mediately ob
vi
ous that Ra
-
dius au
thenti
cation is the root cause. In this sec tion, vari
ous commands used to de ter
mine
where and if there are Ra dius is
sues will be in
vesti
gated.
For ex am ple, whether it be SIP (Sim ple IP) or MIP (Mo bile IP), a call starts with A11 pro to
col
mes sage ex change followed by some PPP ne goti
a
tion. For SIP, as soon as the user name is re-
ceived dur ing PPP ne goti
a
tion, Radius au thenti
ca
tion can take place, fol lowed by fur ther PPP
negoti
a
tion where com pression is negoti
ated along with the DNS and mo bile IP address being
given out. For MIP (Mo bile IP), un
like for SIP (Sim ple IP), the username is not re ceived during
PPP ne go
ti
a
tion. Rather, the DNS in for
ma tion is given out, and header com pression nego
ti
a
tion
takes place. At this point PPP is fin ished and MIP (Mo bile IP) proto
col takes over, with MIP (Mo -
bile IP) router ad ver
tise
ment and a MIP (Mo bile IP) RRQ being re ceived with the user name, at
which time Ra dius au
then ti
ca
tion takes place.
Fi
nally Ra
dius ac
count
ing takes place. Ra
dius ac
count
ing also takes place at call tear
down.
It is rec
om
mended to study some ex
am
ple traces of MIP (Mo
bile IP) and SIP (Sim
ple IP) or
what
ever call-con
trol pro
to
col is being used, to un
der
stand the full call flow and to see how au
-
Radius
then
ti
ca
tion and account
ing play a role. Grab work using mon
bing a trace from the live net i
tor
sub
scriber is an easy way to get started.
ra
dius (ac
count
ing) time
out 3
ra
dius (ac
count
ing) max-re
tries 5
This means that a re quest can be sent a total of max-re tries + 1 times until it gives up on the
par
ticu
lar Ra
dius server being tried. At this point, it will try the same sequence to the next Ra -
dius server in order. If each of the servers have been tried the max-re tries + 1 times with
out re
-
sponse, then the call will be re jected, as
suming there is no other rea son for fail
ure up to that
point.
ra
dius (ac
count
ing) max-trans
mis
sions 256
For ex
ample if this is set = 1, then even if there is a sec
ondary server, it will never be at
tempted
be
cause only one at tempt for a spe cific sub
scriber setup will ever even be attempted.
The "show ra dius coun ters all" command is very valu able in tracking success and failures on a
server basis, and it is very im portant to under
stand the mean ing of the vari
ous coun ters that
com pose this com mand, as it may not be ob vi
ous. In the following output, note "Ac cess-Re -
quest Sent" = 1, while "Ac cess-Re quest Retried" = 3. So, any given new re quest to a par tic
ular
Radius server is only counted once, and all the re tries are counted sep a
rately. In this case, that
is a total of 3 + 1 = 4 ac
cess re
quests sent.
Radius max-retries 3
Radius server 192.168.50.200 encrypted key 01abd002c82b4a2c port 1812 priority 1
Radius server 192.168.50.250 encrypted key 01abd002c82b4a2c port 1812 priority 2
Access-Accept Received: 0
Access-Reject Received: 0
Access-Reject Received with DMU Attributes: 0
Access-Request Timeouts: 1
Access-Request Current Consecutive Failures in a mgr: 1
Access-Request Response Bad Authenticator Received: 0
Access-Request Response Malformed Received: 0
Access-Request Response Malformed Attribute Received: 0
Access-Request Response Unknown Type Received: 0
Access-Request Response Dropped: 0
Access-Request Response Last Round Trip Time: 0.0 ms
Note also that timeouts are NOT counted as failures, the re
sult being that the num
ber of Ac-
cess-Accepts re
ceived and Access-Re
jects re
ceived will not add up to Ac cess-Re
quest Sent if
there are any time
outs.
When
ever run
ning Ra
dius com
mands, be sure to be in the con
text where the aaa group that is
being trou
bleshot is lo
cated.
show ra
dius coun
ters { {all | server <server IP>} [in
stance <aaamgr #>] | sum
mary}
show ra
dius {ac
count
ing | au
then
ti tion} servers de
ca tail
Radius
• in
stance # op
tion (CLI test-comm
nds pass
word re
quired) is valu
able when trou
-
bleshoot
ing a par
tic
u
lar aaamgr in
stance
• show sup
port de
tails does not con
tain “show ra
dius [[ac
count
ing] [au
then
ti
ca
-
tion]] servers de
tail”
• “show ra
dius servers de
tail” only saves the last ten Ra
dius state changes – so run the
command mul
ti
ple times to get a com
plete his
tory of trou
bleshoot
ing over an ex
tended
pe
riod.
• clear
ing the Ra
dius coun
ters is a good idea if mon
i
tor
ing the coun
ters over a time pe
-
riod.
• The sum
mary ver
sion of "show ra
dius coun
ters" groups to
gether all the coun
ters of a
spe
cific type (ac
count
ing, au
then
ti
ca
tion, probe, etc.).
• Try
ing to look for exact cor
re
la
tions of var
i
ous coun
ters, even over a mon
i
tored pe
riod,
can be tricky. Check for gen
eral pat
terns, as get
ting too spe
cific may not be fea
si
ble.
The orig i
nal approach involves keep ing track of the num ber of failures that have oc curred in a
row for a par tic
u
lar aaamgr process. An aaamgr process is re sponsible for all Ra
dius mes sage
processing and ex changes with a Ra dius server, and many aaamgr processes will exist on all ac -
tive proces sor cards (DPC or PSC) on a chas sis. Each one paired with sess mgr processes (which
are main processes re sponsi
ble for call con trol). "show task re sources fa cil
ity aaamgr all" com -
mand can be used to view all the aaamgr processes. A par tic
u
lar aaamgr process will there fore
be pro cess
ing Ra dius messages for many calls, not just a sin gle call, and this algo
rithm in volves
tracking how many times in a row a par ticular aaamgr process has failed to get a re sponse to
the same re quest it has had to re send - an "Ac cess-Request Time out" as de scribed in ear lier
section. The re spec
tive counter "Ac cess-Re quest Cur rent Con secu
tive Failures in a mgr" is in -
cremented when this oc curs, and the "show ra dius [[account ing] [au thenti
ca
tion]] servers
detail" com mand will in di
cate the time stamps of the Ra dius state change from Ac tive to Not
Respond ing (but no SNMP trap or logs will be gen er
ated). For exam ple:
Radius
Event History:
2008-Nov-28+23:18:36 Active
2008-Nov-28+23:18:57 Not Responding
2008-Nov-28+23:19:12 Active
2008-Nov-28+23:19:30 Not Responding
2008-Nov-28+23:19:36 Active
2008-Nov-28+23:20:57 Not Responding
2008-Nov-28+23:21:12 Active
2008-Nov-28+23:22:31 Not Responding
2008-Nov-28+23:22:36 Active
2008-Nov-28+23:23:30 Not Responding
Radius
ra
dius (ac
count
ing) de
tect-dead-server con
sec
u
tive-fail
ures 4
ra
dius (ac
count
ing) dead
time 10
Note: Only ONE trap will be trig gered regardless of how many aaam grs get into this state at
once, in order to avoid overload
ing the num ber of traps trig
gered. The in stance # of the log
ver
sion of the trap (as shown above) will be the man age
ment aaamgr in stance which re sides ei
-
ther on the SMC or the MIO and does not ac tu
ally process subscriber authenti
ca
tion/account-
ing traf
fic.
Event History:
2008-Nov-28+21:59:12 Down
2008-Nov-28+22:28:29 Active
2008-Nov-28+22:28:57 Not Responding
2008-Nov-28+22:32:12 Down
2008-Nov-28+23:01:57 Active
2008-Nov-28+23:02:12 Not Responding
2008-Nov-28+23:05:12 Down
2008-Nov-28+23:19:29 Active
2008-Nov-28+23:19:57 Not Responding
2008-Nov-28+23:22:12 Down
The use case for this would be a sce nario where a burst in Radius traf
fic causes the consec
utive
fail
ures to trig
ger, but marking a server down im medi
ately as a re
sult is not de
sired. Rather, the
server will only be marked down after a spe cific pe
riod of time passes where no re sponses are
re
ceived, effec
tively rep
re
senting true server un-reacha
bil
ity.
Here is an ex
am
ple au
then
ti
ca
tion re
quest/re
sponse for the above set
tings:
Service-Type = Framed
Framed-Protocol = PPP
NAS-IP-Address = 192.168.50.151
Acct-Session-Id = 00000000
NAS-Port-Type = HRPD
3GPP2-MIP-HA-Address = 255.255.255.255
3GPP2-Correlation-Id = 00000000
NAS-Port = 4294967295
Called-Station-ID = 00
Code: 2 (Access-Accept)
Id: 16
Length: 34
Authenticator: 21 99 F4 4C F8 5D F8 28 99 C6 B8 D9 F9 9F 42 70
User-Password = testpassword
The same SNMP traps are used to sig nify the un
reach
able/down and reach
able/up Ra
dius
states as with the orig
i
nal ap
proach:
Fri Jun 27 01:00:02 2008 Internal trap notification 216 (ThreshAAAAuthFail) threshold 1000
measured value 2370
Fri Jun 27 01:10:02 2008 Internal trap notification 217 (ThreshClearAAAAuthFail) threshold 500
measured value 25
Overview
This chapter dis
cusses troubleshoot
ing tech
niques for Ra
dius-re
lated is
sues to do with ac
-
count
ing and authen
ti
ca
tion.
Radius Overload
If the Ra
dius server is getting overloaded, de crease the load by de creas ing the value (de
fault
256) config
ured for “ra
dius (accounting) max-out standing”, which sets a limit on the num ber of
outstand
ing (unan swered) requests for any given aaamgr process. If the limit is reached, logs
may in cate this: “Failed to as
di sign mes sage id for ra
dius authenti
ca
tion server x.x.x.x:1812”.
There is no way to specifi
cally rate-limit au
thenti
ca
tion/ac count
ing mes sages.
Radius Archiving
A fea
ture for account ing that may be turned on is archiv ing, enabled with the com mand “radius
ac
counting archive”. This al lows for accounting pack
ets that have failed to be re sponded to, to
be re-queued and sent at a later time when the aaamgr processes have free cy cles and are not
busy processing au thenti
ca
tion requests. The idea is that time li
ness of ac
count ing packets is
not as important as au thenti
ca
tion, though this may not be true in all sit u
a
tions. Note though
there is no way to flush out ex ist
ing archived messages. The "show ra dius" counter “Ac count -
ing-Request Pend ing” will show the num ber of archived account ing mes
sages, which could in -
crease over time if the ac count ing server gets bogged down. On the other hand, the “Ac cess-
Request Pend ing” counter for Ac cess-Re
quests will decrement when it has given up try ing to
authen
ti
cate for a given call, which will al
ways be a limited period of time (vs. ac
counting which
could theo
reti
cally be tried for
ever).
ra
dius test
This com mand (as shown below) sends a basic au thenti
ca
tion re
quest or accounting start and
stop requests and waits for a re
sponse. For authen
ti
ca
tion, use any user name and pass word, in
which case a reject re
sponse will be re
ceived, con
firming that Radius is work
ing as de
signed, or
use a known work ing user
name/pass word, in which case an ac cept response should be re -
ceived.
Here is an ex
am
ple out
put from mon
i
tor pro
to
col and run
ning the au
then
ti
ca
tion ver
sion of the
com
mand on lab chas sis:
[source]ASR5000# radius test authentication server 192.168.50.200 port 1812 test test
Code: 2 (Access-Accept)
Id: 5
Length: 34
Authenticator: D7 94 1F 18 CA FE B4 27 17 75 5C 99 9F A8 61 78
User-Password = testpassword
Fail
ure sce
nario ex
am
ple:
As another ex
am
ple, here is a fail
ure due to the wrong se
cret con
fig
ured on the Ra
dius server
(or ASR 5000/ASR 5500):
[source]ASR5000# radius test authentication server 192.168.50.200 port 1812 test test
Response Failure: Bad Authenticator from RADIUS sever, probably mismatched key
Round-trip time for response was 6.3 ms
Radius
Here is an ex
am
ple out
put from run
ning the ac
count
ing ver
sion of the com
mand. A pass
word is
not needed.
Code: 4 (Accounting-Request)
Id: 9
Length: 62
Authenticator: 29 DB F1 0B EC CE 68 DB C7 4D 60 E4 7F A2 D0 3A
User-Name = test
NAS-IP-Address = 192.168.50.151
Acct-Status-Type = Stop
Acct-Session-Id = 00000000
NAS-Identifier = source
Acct-Session-Time = 0
Code: 5 (Accounting-Response)
Id: 9
Length: 20
Authenticator: D8 3D EF 67 EA 75 E0 31 A5 31 7F E8 7E 69 73 DC
In all of the above exam ples, the Radius test mes sage was sent using the man agement aaamgr
lo
cated on the SMC/MIO card. There is a more spe cific ver
sion of the command that can be
run in tech sup port mode that al lows spec i
fy
ing a spe
cific aaamgr in
stance that is sus
pected of
having an issue. For exam ple, in the fol
lowing example no response was re
ceived for in
stance
92 but was re ceived for in
stance 93:
[source]ASR5000# radius test instance 92 authentication server 192.168.1.1 port 1645 test test
[source]CSE2# radius test instance 93 authentication server 192.168.1.1 port 1645 test test
Authentication Successful
User level: unknown(2)
User Privs: CLI
User-Password = testpassword
Accounting Successful
Code: 4 (Accounting-Request)
Id: 6
Length: 74
Authenticator: 90 87 2F 57 14 5E 08 29 74 6A 16 0C 82 C3 CF 01
User-Name = test
NAS-IP-Address = 192.168.50.151
Acct-Status-Type = Start
Acct-Session-Id = 00000000
NAS-Identifier = source
Service-Type = Authenticate_Only
Event-Timestamp = 1235852318
NAS-Port = 4573839
Code: 5 (Accounting-Response)
Id: 6
Length: 20
Authenticator: BC 75 DF 03 E1 BB CE 10 7B 70 85 2B 17 32 B0 8B
Code: 4 (Accounting-Request)
Id: 7
Length: 80
Authenticator: 76 D8 55 DC F4 16 25 D6 6C B1 5A F0 50 60 DF E1
User-Name = test
NAS-IP-Address = 192.168.50.151
Radius
Acct-Status-Type = Stop
Acct-Session-Id = 00000000
NAS-Identifier = source
Service-Type = Authenticate_Only
Event-Timestamp = 1235852318
Acct-Session-Time = 0
NAS-Port = 4573839
Code: 5 (Accounting-Response)
Id: 7
Length: 20
Authenticator: 47 F2 35 A5 86 07 12 EB 48 BC 49 C4 39 79 25 43
Prob
lem De
scrip
tion:
AAAAccSvrUn
reach
able for a sin
gle aaamgr
Log Col
lec
tion:
- various fields will show much higher values for affected aaamgrs (see examples below)
• [source] radius test instance X accounting all test
- any number of servers could fail depending on the root cause
• [source] show radius info group all instance X
- provides the UDP port number (and ip address which one would already know) used
by the aaamgr
• show npu flow ssd slot X verbose
- check to see that all aaamgrs have the proper # of flows assigned for each card
• show npu flow record min-flowid <aaamgr min flow id> max-flowid <aaamgr max flow
id> slot X verbose
- ensure the details of suspected aaamgr flows are correct for each card
• packet capture between NAS IP address / aaamgr instance udp port # AND aaa
accounting server with port #1646 (standard) taken at whatever capture points in the
network are necessary to pin down where packets may be dropping. This point is
critical!
Analy
sis:
The fol
low
ing mes
sages are seen in the logs:
Mon Sep 08 20:41:07 2014 Internal trap notification 52 (CLISessStart) user vendgrp privilege
level Operator ttyname /dev/pts/1
Mon Sep 08 20:41:35 2014 Internal trap notification 53 (CLISessEnd) user vendgrp privilege
level Operator ttyname /dev/pts/1
Mon Sep 08 20:55:49 2014 Internal trap notification 52 (CLISessStart) user vendgrp privilege
level Operator ttyname /dev/pts/1
Mon Sep 08 20:56:17 2014 Internal trap notification 53 (CLISessEnd) user vendgrp privilege
level Operator ttyname /dev/pts/1
Mon Sep 08 21:05:11 2014 Internal trap notification 1221 (MemoryOver) facility npumgr instance
7 card 7 cpu 1 allocated 284967 used 379040
Mon Sep 08 21:05:21 2014 Internal trap notification 1222 (MemoryOverClear) facility npumgr
instance 7 card 7 cpu 1 allocated 284967 used 263428
Mon Sep 08 21:10:43 2014 Internal trap notification 52 (CLISessStart) user vendgrp privilege
level Operator ttyname /dev/pts/1
Mon Sep 08 21:11:11 2014 Internal trap notification 53 (CLISessEnd) user vendgrp privilege
level Operator ttyname /dev/pts/1
address 192.168.86.10
Mon Sep 08 21:21:55 2014 Internal trap notification 43 (AAAAccSvrReachable) server 1 ip
address 192.168.86.10
REF
ER
ENCE SEC
TION
Mon Sep 08 21:05:11 2014 Internal trap notification 1221 (MemoryOver) facility npumgr instance
7 card 7 cpu 1 allocated 284967 used 379040
Mon Sep 08 21:05:21 2014 Internal trap notification 1222 (MemoryOverClear) facility npumgr
instance 7 card 7 cpu 1 allocated 284967 used 263428
Mon Sep 08 21:21:53 2014 Internal trap notification 42 (AAAAccSvrUnreachable) server 1 ip
address 192.168.86.10
Mon Sep 08 21:21:55 2014 Internal trap notification 43 (AAAAccSvrReachable) server 1 ip
address 192.168.86.10
Wed Sep 10 08:36:54 2014 Internal trap notification 42 (AAAAccSvrUnreachable) server 1 ip
address 192.168.86.10
Wed Sep 10 08:37:00 2014 Internal trap notification 43 (AAAAccSvrReachable) server 1 ip
address 192.168.86.10
AAAMgr: Instance 36
39947440 Total aaa requests 17985 Current aaa requests
24614090 Total aaa auth requests 0 Current aaa auth requests
0 Total aaa auth probes 0 Current aaa auth probes
0 Total aaa aggregation requests
0 Current aaa aggregation requests
0 Total aaa auth keepalive 0 Current aaa auth keepalive
15171628 Total aaa acct requests 17985 Current aaa acct requests
0 Total aaa acct keepalive 0 Current aaa acct keepalive
20689536 Total aaa auth success 1322489 Total aaa auth failure
Radius
86719 Total aaa auth purged 1016 Total aaa auth cancelled
0 Total auth keepalive success 0 Total auth keepalive failure
0 Total auth keepalive purged
0 Total aaa aggregation success requests
0 Total aaa aggregation failure requests
0 Total aaa aggregation purged requests
15237 Total aaa auth DMU challenged
17985/70600 aaa request (used/max)
14 Total diameter auth responses dropped
6960270 Total Diameter auth requests 0 Current Diameter auth requests
23995 Total Diameter auth requests retried
52 Total Diameter auth requests dropped
9306676 Total radius auth requests 0 Current radius auth requests
0 Total radius auth requests retried
988 Total radius auth responses dropped
13 Total local auth requests 0 Current local auth requests
8500275 Total pseudo auth requests 0 Current pseudo auth requests
8578 Total null-username auth requests (rejected)
0 Total aggregation responses dropped
15073834 Total aaa acct completed 79763 Total aaa acct purged <<< If issue started
15171628 Total radius acct requests 17985 Current radius acct requests
46 Total radius acct cancelled
AAAMgr: Instance 37
39571859 Total aaa requests 0 Current aaa requests
24368622 Total aaa auth requests 0 Current aaa auth requests
0 Total aaa auth probes 0 Current aaa auth probes
0 Total aaa aggregation requests
0 Current aaa aggregation requests
Radius
Ra
dius ac
count
ing tests fail for this aaamgr for just 192.
168.
86.
10
The NPU flows are clean for aaamgr 36 (flow shown only for one of the slots, it is the same for
all the slots as ex
pected). Note for "show ra dius info", only the aaa servers for the de fault group
are shown with this ver sion of the com mand (see more de tailed ver
sion below), while in this
case the issue is with ONE of the two servers in group aaa-3g. com. A sin
gle NPU flow han dles
all of the connections to all servers from all aaa groups using the same NAS IP ad dress in a par-
tic
ular con
text for a particu
lar aaamgr, and since the issue seems to only af fect a specific aaa
server, and just for ac count ing, then this would un likely be an NPU flow issue any way since
other aaa servers ,in cluding au thenti
ca
tion servers, use the same flow and are not hav ing any
is
sues.
radius deadtime 5
radius accounting deadtime 3
radius detect-dead-server consecutive-failures 4 response-timeout 10
radius accounting detect-dead-server consecutive-failures 4 response-timeout 10
radius max-outstanding 3
radius accounting max-outstanding 3
radius max-retries 0
radius accounting max-retries 3
Radius
radius max-transmissions 1
radius accounting max-transmissions 3
radius accounting timeout 10
radius attribute nas-ip-address address x.x.x.x
radius attribute nas-identifier xxxxxxx#10; no radius authenticate null-username
no radius accounting archive
radius dictionary custom9
radius accounting rp trigger-policy custom
radius server 192.168.86.13 encrypted key
+A0xd66i5ta8rye21r7jg4p5goox3b7ezj2rxyi4z0dinlhx0s5r3s port 1645 priority 1
radius server 192.168.86.13 encrypted key
+A3670dxvh3axm51mi1qi7358jze0n2bc8t7vpu2g0053fci51ay55 port 1645 priority 2
radius accounting server 192.168.86.13 encrypted key
+A2ga9jz7lnkocq0ess8qw10boi50mv8smg45uzdr34zcucy2 port 1646 priority 1
radius accounting server 192.168.87.13 encrypted key
+A1uk43oafg180h2z7v0s79zvdjq0sb3zxbl3hhj81p1fgiw53939n port 1646 priority 2
#exit
radius deadtime 5
radius accounting deadtime 3
radius detect-dead-server consecutive-failures 4 response-timeout 10
radius accounting detect-dead-server consecutive-failures 4 response-timeout 10
radius max-outstanding 3
radius accounting max-outstanding 3
radius max-retries 0
radius accounting max-retries 3
radius max-transmissions 1
radius accounting max-transmissions 3
radius accounting timeout 10
radius attribute nas-identifier xxxxx
no radius authenticate null-username
radius dictionary custom9
radius accounting rp trigger-policy custom
radius server 192.168.86.10 encrypted key
+A3l9q6fr6vuxp71bb7r6q502xlo0a0xnk7feg6a63j0pkgtthurub port 1645 priority 1
radius server 192.168.86.10 encrypted key
+A0l5vd78oxmn542dw0w8uzdhdm72169tkh6ingxp0uprt1sm04sao port 1645 priority 2
Radius
Authentication servers:
---------------------------------------------
Primary authentication server address 192.168.86.10, port 1645
state Not Responding
priority 1
requests outstanding 0
max requests outstanding 3
consecutive failures 0
Secondary authentication server address 192.168.87.10, port 1645
state Active
priority 2
requests outstanding 0
max requests outstanding 3
consecutive failures 0
Radius
Accounting servers:
---------------------------------------------
Primary accounting server address 192.168.86.10, port 1646
state Active
priority 1
requests outstanding 0
max requests outstanding 3
consecutive failures 0
Secondary accounting server address 192.168.87.10, port 1646
state Active
priority 2
requests outstanding 0
max requests outstanding 3
consecutive failures 0
This ex
am
ple out
put was taken from a dif
fer
ent ticket where the issue was on aaamgr 114:
show ra
dius info ra
dius group all in
stance X
Context source:
---------------------------------------------
Accounting servers:
---------------------------------------------
Primary accounting server address 192.168.86.14, port 1646
state Active
priority 1
requests outstanding 0
max requests outstanding 3
consecutive failures 0
Secondary accounting server address 192.168.77.14, port 1646
state Active
priority 2
requests outstanding 0
max requests outstanding 3
consecutive failures 0
consecutive failures 0
Secondary authentication server address 192.168.77.10, port 1645
state Active
priority 2
requests outstanding 0
max requests outstanding 3
consecutive failures 0
Accounting servers:
---------------------------------------------
Primary accounting server address 192.168.86.10, port 1646
state Down
priority 1
requests outstanding 3
max requests outstanding 3
consecutive failures 7
dead time expires in 146 seconds
Secondary accounting server address 192.168.77.10, port 1646
state Active
priority 2
requests outstanding 0
max requests outstanding 3
consecutive failures 0
Authentication servers:
---------------------------------------------
Primary authentication server address 192.168.86.13, port 1645
state Active
priority 1
requests outstanding 0
max requests outstanding 3
consecutive failures 0
Secondary authentication server address 192.168.77.13, port 1645
state Not Responding
priority 2
requests outstanding 0
max requests outstanding 3
consecutive failures 0
Accounting servers:
---------------------------------------------
Primary accounting server address 192.168.86.13, port 1646
state Active
priority 1
requests outstanding 0
max requests outstanding 3
consecutive failures 0
Secondary accounting server address 192.168.77.13, port 1646
state Active
priority 2
requests outstanding 0
max requests outstanding 3
consecutive failures 0
context source
aaa group default
radius attribute nas-ip-address address 10.210.21.234
Radius
Context source:
---------------------------------------------
AAAMGR instance 114: cb-list-en: 1 AAA Group: aaatest.com
Accounting servers:
---------------------------------------------
Primary accounting server address 192.168.86.10, port 1646
state Down
priority 1
requests outstanding 3
max requests outstanding 3
consecutive failures 4
In a cou
ple of hours, only 48 re
sponses have been re
ceived, yet over 100,000 time
outs have oc
-
curred (and that doesn’t in
clude re-trans
mits).
source]5000> show radius counters server 192.168.86.10 instance 114 | grep -E "Accounting-
Timeouts"
[source]5000> show radius counters server 192.168.86.10 instance 114 | grep Accounting
[source]5000> show radius counters server 192.168.86.10 instance 114 | grep Accounting
Wednesday October 22 20:38:57 UTC 2014
Per-Context RADIUS Accounting Counters
Accounting Response
Server-specific Accounting Counters
Accounting server address 192.168.86.10, port 1646:
Accounting-Response Received: 14299891 <== NO successes in 12 minutes
Res
o
lu
tion:
Packet captures were taken be tween a Load Bal ancer and fire wall that were located between
the ASR 5000 and the Ra dius server, and these show that the fire wall was drop ping packets.
This turned out to be a bug in the firewall and so the prob lem would just start oc cur
ing on the
PDSN with out any ap
parent trig
ger (as men tioned ear
lier). Restarting the fail
ing aaamgr task or
doing a PSC migra
tion for the PSC hous ing the aaamgr would not re solve the issue, be
cause the
source UDP port used for Ra dius requests for that partic
u lar aaamgr would not change in the
process of doing so.
Diabase
Overview
This sec
tion cov
ers basic Di
ameter pro
to
col, Di
a
base rout
ing, and gen
eral trou
bleshoot
ing ap
-
proaches for mul
ti
ple sce
nar
ios.
The Di
ameter base pro
to
col was designed to provide an Au
then
ti
ca
tion, Au
tho riza
tion and Ac
-
count
ing (AAA) framework for ap
pli
cations such as net
work ac
cess or IP mobil
ity.
Di
am
e
ter Base pro
to
col han
dles the Di
am
e
ter ses
sion es
tab
lish
ment and con
trol.
A Di
ame
ter session be
tween client and server is estab
lished over TCP (or SCTP for se
cure in
-
ter
change if re
quired) and by ex
changing ca
pa
bil
i
ties and host in
for
ma
tion.
Ca
pa
bil
i
ties ex
change in
for
ma
tion is used be
tween the two end
points to allow or deny ses
sion
connections and to verify both systems use the same appli
ca
tion on top of the Di
am
e
ter proto
-
col ses
sion. Com mon Di am e
ter base messages are ex
changed be tween peers (only peer re
lated,
not subscriber re
lated):
Sam
ple Di
a
base call
flow
Diameter
Mes
sages de
scrip
tion:
Diabase Commands
• show diameter statistics
• show diameter peers full all
• show diameter message-queue counters outbound
• show diameter message-queue counters inbound
• show diameter route status
• show diameter route status full
• show diameter route table wide
• show session disconnect-reasons
Diameter
Example Scenarios
Prob
lem De
scrip
tion:
Cus
tomer no
ticed Di
am
e
ter
Peer
Down in the snmp trap logs.
Logs collection:
1. Col
lect trace from the wire (PCAP, Wire
shark) on the par
tic
u
lar Di
am
e
ter in
ter
face.
Analy
sis:
1 Col
lect "mon
i
tor pro
to
col" / PCAP trace to see com
plete com
mu
ni
ca
tion. In this ex
am
-
ple one CER can be seen where GGSN in
forms OCS what stan
dards it sup
ports. In CEA
it is no
ticed that the OCS (server) in this sce
nario doesn't allow the GGSN (client) to es
-
tab
lish a Di
am
e
ter con
nec
tion.
Hop2Hop-ID: 0x0000020c
End2End-ID: 0x8e4b7ae0
AVP Information:
[M] Origin-Host: 0002-sessmgr.LAB-XT2-2
[M] Origin-Realm: lab.realm
[M] Host-IP-Address: IPv4 192.168.1.2
[M] Vendor-Id: 8164
Product-Name: SSI
[M] Origin-State-Id: 1430845668
[M] Supported-Vendor-Id: 10415
[M] Auth-Application-Id: 4
[M] Inband-Security-Id: NO_INBAND_SECURITY (0)
Firmware-Revision: 52519
Connection statistics:
Connection attempts: 124
Connection failures: 124
Connection reads: 0
Connection starts: 124
Connection disconnects: 0
Connection closes: 124
Connection DHOST requests: 124
Connection DHOST removes: 124
Connection Timeouts: 0
Tc Expire Connection Attempts: 124
Diameter
Res
o
lu
tion :
Mo
bile op
era
tor no
ticed a prob
lem in commu
ni
ca
tion be
tween GGSN and OCS: It seems that
mandatory 3GPP AVPs are missing in CCR mes
sages.
Logs col
lec
tion:
1 Col
lect trace from the wire (PCAP, Wire
shark trace) on Gy in
ter
face.
Analy
sis:
1 It can be seen that com
mu
ni
ca
tion was es
tab
lished (State: OPEN [TCP]) be
tween Client
(GGSN) and server (OCS) but Sup
ported Ven
dor IDs are NONE. This means cur
rently
the OCS server is only using RFC 4006-sup
ported AVPs in this com
mu
ni
ca
tion.
-------------------------------------------------------------------------------
Context: Gi Endpoint: DCCA-OCS
-------------------------------------------------------------------------------
Peer Hostname: minid-ocs
Local Hostname: 0001-sessmgr.LAB-XT2-2
Diameter
2 If com
mu
ni
ca
tion is reset be
tween GGSN (Client) and OCS (Server), the CER/CEA ex
-
change can be seen in Mon
i
tor pro col. AVP Ven
to dor-Id con
tains the IANA "SMI Net
-
work Man
age
ment Pri
vate En
ter
prise Codes"value as
signed to the ven
dor of the Di
am
e
-
ter ap
pli
ca
tion. In com
bi tion with the Sup
na ported-Ven
dor-Id AVP this MAY be used in
order to know which ven
dor spe
cific at
trib
utes may be sent to the peer.
AVP Sup
ported-Ven
dor-Id: Is used in the CER and CEA mes
sages in order to in
form the
peer that the sender sup
ports (a sub
set of) the ven
dor-spe
cific AVPs de
fined by the
ven
dor iden
ti
fied in this AVP.
Res
o
lu
tion :
[Ingress con
text] show di
am
e
ter peers full [end
point <end
point>] | grep –E "Peer Host
-
name|State|CPU"
De
ter
mine which Di
am
e
ter in
ter
faces are af
fected -- One, Some, or All? S6b, S6a, Gx, Gy, Rf, etc.
Impor
tantly, IF for every Di
am
eter in
ter
face there is a di
amproxy instance that has one peer up,
then there should be no sub scriber im pact. Con
versely if there is a Di
ame
ter in
ter
face with a
di
amproxy in stance that has all its peers down, then there will be subscriber impact for sess
m
-
grs that use that diamproxy in
stance.
Diabase Timers
Fol
low
ing is ad
di
tional clar
i
fi
ca
tion for timers that can help with trou
bleshoot
ing Di
a
base com
-
muni
ca
tion is
sues.
use-proxy
origin host test.cisco.in address 10.259.65.49
watchdog-timeout 30
device-watchdog-request max-retries 1
vsa-support negotiated-vendor-ids
max-outstanding 256
response-timeout 60
connection timeout 30
cea-timeout 30
load-balancing-algorithm highest-weight
destination-host-avp session-binding
dpa-timeout 30
no reconnect-timeout
connection retry-timeout 30
route-failure threshold 16
route-failure deadtime 60
route-failure recovery-threshold percent 90
dynamic-route expiry-timeout 86400
dscp be
peer peer1.cisco.in realm cisco.in address x.x.x.x port 3868 send-dpr-before-disconnect
disconnect-cause 1
peer peer2.cisco.in realm cisco.in address y.y.y.y port 3868 send-dpr-before-disconnect
disconnect-cause 1
default dynamic-peer-failure-retry-count
no associate sctp-parameters-template
Con
nec
tion time
out (TCP)
-------------------------------------------------------------------------------
Context: aaa Endpoint: dia-cisco
-------------------------------------------------------------------------------
Peer Hostname: peer1.cisco.in
Diameter
-------------------------------------------------------------------------------
Con
nec
tion retry-time
out (TCP)
Example:
If connec
tion timer is set up for 5 sec
onds, and retry timer for 60 seconds, then client will send
first TCP-SYN and wait for a server re sponse for 5 seconds. If the server doesn’t re
spond, peer
state will be “IDLE”, but another request will be sent 60 sec
onds after first re
quest (not after 5
sec).
The ef
fect of this can be seen in the SNMP trap his
tory, often where a con
nec
tion goes down
and comes up 10 sec onds later.
Note that if a connec tion goes down in OPEN state and is not a re sult of admin dis con nect
(CLI-based) or not a result of remote DPR, then the ASR 5000/ASR 5500 will try to es tablish the
connection imme
diately to pre vent unwanted ses
sion loss due to con nection fail
ure. So, one
will likely find many ex amples in the network of con nections that quickly re-es tab
lish after
going down. If this retry fails, then the sub
se
quent retry will fol
low the retry-timer logic:
Ca
pa
bil
ties Ex
change Time
out
cea-timeout 5
Re
sponse-time
out (Di
a
base)
This is a ses
sion timer at the Di
a
base level. The con
fig
ured value is max
i
mum time the client
waits for ANY (CCA) answer from the PEER.
The client puts all messages in a queue that are sent to wards the Peer but the Peer didn’t re-
spond / answer. A dif
fer
ent (sec
ondary) server will be tried per the con
fig
u
ra
tion if no re
sponse
from the pri
mary.
Diameter
• If Client doesn’t receive CCA from peer, but DWA arrives in inside response-timeout
time interval, in this case response-timeout is refreshed/reset.
• If Client doesn’t receive DWA from Peer, but CCA arrives inside response-timeout time
interval, in this case response-timeout is refreshed/restarted.
• If Client doesn’t receive CCA or DWA inside response-timeout time interval, Peers is
marked as unreachable.
Watch
dog-time
out (Tw) (Di
a
base)
This is a ses
sion timer at the Di
a
base level. The config
ured in
ter
val for the pe
ri
odic send ing of
Device-Watch dog-Re quest from the Client towards the Peer to check Peer avail abil
ity. Every
time the Client re ceives Device-Watchdog-An swer it refreshes/restarts re sponse-time out
counter. This is used mostly with ser
vices that don’t have ac
tive connections.
In order to deter
mine when a Diam e
ter connection needs to be torn down due to no re sponses
from the peer, a watch dog (keepalive) re
quest is sent. If after 30secs of no in
bound pack ets, and
another 30secs passes (total 1 minute), the Di
am e
ter con nection is torn down (re
tries = 1 means
just one try, not two).
watchdog-timeout 30
device-watchdog-request max-retries 1
Back
pres
sure
max-outstanding 5
Diameter
For a given peer, where the buffer is full, back pressure occurs where new po ten
tial mes
sages
that need to be cre ated do not get cre ated. As long as that pri mary peer is up, the sec ondary
peer will NOT be tried for the buffered mes sages, AND back pressure will occur for new po ten-
tial mes
sages. But the queue will con stantly be cleared of mes sages that do get a reply, or in the
case of timeout, get added to the sec ondary peer queue. So, while the queue is con tin
u
ally
being cleared, it has the po
ten
tial of being filled faster than it is being emp tied, hence backpres-
sure.
Diabase routing
Di
am
e
ter server con
nec
tiv
ity
Di
ameter servers are connected to the gate
way in two ways. The fol
low
ing pic
to
r
ial rep
re
sen
-
ta
tion gives the ab
stract and de
tailed view about this.
In
di
rectly con
nected Di
am
e
ter server
Static routes
S dra1@
cisco.
com dra1 To reach the Di
am
e
ter en
tity dra1, Gate
way should send the
mes
sage to dra1
S *@
cisco.
com dra1 To reach any Di
am
e
ter en
tity in realm “cisco.
com”, Gate
way
can send the mes
sage to dra1.
The second route entry is called realm based route as the host is wild carded here. This is also
called wild carded route.
Con
fig:
S DS1@ cisco.
com dra1 To reach the Di
am
e
ter en
tity DS1, Gate
way should send the
mes
sage to dra1
Dy
namic routes:
Route en
tries that are dy
nam i
cally learnt from the response messages from the peer are called
dy
namic routes. Con sider the fol
lowing Di
ame
ter server con
nec
tion.
Diameter
Here the gateway has no knowl edge about the Di ame
ter server (DS1) that is sit
ting be
hind the
DRA (dra1). Di
am
e
ter rout
ing table after load
ing the con
fig would be:
S dra1@ cisco.
com dra1 To reach the Di
am
e
ter en
tity dra1, Gate
way should send the
mes
sage to dra1
S *@ cisco.
com dra1 To reach any Di
am
e
ter en
tity in realm “star
ent
net
works.
com”,
Gate
way can send the mes
sage to dra1.
• The second realm based route entry would be selected for routing this session’s request
to DS1. Here a Path-cache entry for the second route entry would be created.
• Gateway sends the intial request to dra1.
• dra1 forwards the request to DS1. DS1 replies back to dra1.
• Now dra1 forwards the reply from DS1 to Gateway. This message will have the Origin-
host as DS1. Based on this gateway adds route for the Diameter entity DS1. This route is
called a Dynamic route. Diameter routing table after this step would be:
D DS1@
cisco.
com dra1 To reach the Di
am
e
ter en
tity DS1, Gate
way should send the
mes
sage to dra1
P *@
cisco.
com dra1 To reach any Di
am
e
ter en
tity in realm “cisco.
com”, Gate
way
can send the mes
sage to dra1.
S dra1@
cisco.
com dra1 To reach the Di
am
e
ter en
tity dra1, Gate
way should send the
mes
sage to dra1
S *@
cisco.
com dra1 To reach any Di
am
e
ter en
tity in realm “cisco.
com”, Gate
way
can send the mes
sage to dra1.
Al
ter
nate op
tion to dy
namic routes is avail
able via the fol
low
ing CLI under Di
am
e
ter end
point.
Diameter
Route-en
try host <host
name> realm <realm name> peer <peer name> weight <w1>
Di
am
e
ter rout
ing table after load
ing the con
fig would be:
S DS1@ cisco.
com dra1 To reach the Di
am
e
ter en
tity DS1, Gate
way should send the
mes
sage to dra1
S dra1@ cisco.
com dra1 To reach the Di
am
e
ter en
tity dra1, Gate
way should send the
mes
sage to dra1
S *@ cisco.
com dra1 To reach any Di
am
e
ter en
tity in realm “star
ent
net
works.
com”,
Gate
way can send the mes
sage to dra1.
• The first route entry would be selected for routing this session’s request to DS1.
• Gateway sends the initial request to dra1.
• dra1 forwards the request to DS1. DS1 replies back to dra1.
• Now dra1 forwards the reply from DS1 to Gateway. This message will have the Origin-
host as DS1. Diameter routing table after this step would be
P DS1@ cisco.
com dra1 To reach the Di
am
e
ter en
tity DS1, Gate
way should send the
mes
sage to dra1
S DS1@ cisco.
com dra1 To reach the Di
am
e
ter en
tity DS1, Gate
way should send the
mes
sage to dra1
S dra1@ cisco.
com dra1 To reach the Di
am
e
ter en
tity dra1, Gate
way should send the
mes
sage to dra1
S *@ cisco.
com dra1 To reach any Di
am
e
ter en
tity in realm “cisco.
com”, Gate
way
can send the mes
sage to dra1.
Diameter
Data collection
There may be circum stances where packet cap tures need to be done be tween the ASR
5000/ASR 5500 origin host address and the Diame
ter peers to confirm where delays or packet
drops are oc
cur
ring. This infor
mation, along with logs from the Di ame
ter server and ASR
5000/ASR 5500 coun ters, could help paint the pic
ture of what is going on.
Log
ging
There may be cir cum stances also where logging for Di
am
e
ter fa
cil
i
ties need to be turned up
be
yond the de fault error level. This should only be done upon rec om men da
tion from Cisco
Support as in
creasing the log
ging too high may risk stress on the sys tem and im pact sub
-
scribers. Run
time logging is done in config mode, while ac
tive CLI ses sion log
ging is done in
exec mode:
logging fil
ter <run time | ac
tive> fa
cil
ity <dia
base | di
ame
ter | di
ameter-acct | di
am
eter-auth | di
-
amproxy | di am eter-dns | credit-con trol | dhost | ims-au
tho rizatn | rf-di
am
eter> level <warn ing
| unusual | info | trace | debug>
For Run
time log
ging, when run
ning long enough for the pe tion: show logs
riod in ques
Mon
i
tor Sub
scriber
"Mon i
tor Sub
scriber" will, by de fault, capture all Di
am e
ter pack
ets for a session. It can all be
turned off com pletely with option 36 - "EC Di am e
ter". To turn off any of the in
di
vidual Di
ame
-
ter in
ter
faces, choose op tion 75 to display a sub-menu from which the var i
ous in
ter
faces can be
turned on and off:
1 - Di
a
base (OFF)
2 - DI
AMETER Gy (OFF)
3 - DI
AMETER Gx/Ty/Gxx (OFF)
4 - DI
AMETER Gq/Rx/Tx (OFF)
5 - DI
AMETER Cx (OFF)
6 - DIAM
ETER Sh (OFF)
7 - DI
AME
TER Rf (OFF)
8 - DI
AMETER EAP/STa/S6a/S6d/S6b/S13/SWm (OFF)
9 - DIAM
ETER HDD (OFF)
Diameter
Turn
ing up the ver
bosity can be help
ful in cer
tain sit
u
a
tions, but Ver
bosity 2 should suf
fice
most of the time.
Mon
i
tor Pro
to
col
"Mon i
tor Pro
to
col" using op
tion 36 will cap ture all Di
ame
ter pack ets, or use option 75 and the
submenu that follows to choose the spe cific Di
am e
ter in
ter
face being trou bleshot. Nonethe -
less, this must be used with care in a live net work as it will eas
ily miss pack ets under load and
cause undo strain.
Log
ging Mon
i
tor
Bulk
stats
The System schema con tains data for many areas besides Di
am e
ter. It is bro
ken down into sep -
a
rate schemas in the con fig
ura
tion and Di
am
e
ter-related vari
ables can be found in more than
one schema. Search for “diam” in the Sys tem Schema Stats chap ter of the doc u
men ta
tion for
vari
ables re
lated to Di
ameter, and then search for those variables in the con fig to see which
schema they are listed under.
Packet Cap
tures
Packet captures may be needed for trou bleshooting the TCP/IP trans port packets to setup and
tear
down Di am e
ter connections as this is not avail
able in any mon i
tor
ing or CLI com mands.
Some coun ters in “show di ame
ter stat proxy [end point <end point>] debug-info” will dis
play
valu
able in
for
ma tion, but they still don’t in
di
cate what is on the wire or what may be caus ing
the be
havior observed. Packet cap tures can also be helpful for an
a
lyz
ing when un expected, un-
known, or cor rupted AVPs are re ceived.
pack
ets cannot be cap
tured, certain con
trol pack
ets can, such as Di
am
e
ter. This ca
pa
bil
ity
should only be used when di
rectly re
quested by Cisco Sup
port.
When an a
lyz
ing PCAPs that con tain pack ets from mul ti
ple Diame
ter con
nections, it is possi
ble
to fil
ter on just one con nec
tion where all the pack ets will use the same TCP/IP layer vir tual
stream index. This is a func tion used by Wire shark to rep re
sent a unique TCP/IP con nec tion
between a pair of IP ad dresses and TCP/IP port num bers. It is dis
played in square brack ets im -
me di
ately below the TCP source and des ti
na
tion port num bers in the packet de tails window.
Right-click a packet that is part of the stream to be an a
lyzed and choose the popup menu op -
tion “Follow TCP Stream”. The ap propri
ate fil
ter for that stream will be dis played in the fil ter
window and ap plied to the trace. (Fil
ters can be man ually cre
ated based on IP ad dresses and
port num bers to accomplish the same goal, but this is faster).
Be
cause Di
am
e
ter pro
to
col is TCP/IP based, also check for TCP-re
lated er
rors.
Di
amproxy fea
ture
Di
amproxy processes han dle a given PSC/DPC's sess
mgrs Di
am
e
ter-based re
quests, and their
in
stance IDs as well as mem ory and cpu usage can be viewed with "show task re
sources". Di
-
amproxy tasks reside on each active PSC/DPC.
• diamproxy “proxies” all sessmgr Diameter requests – far fewer connections (and port
usage) than if direction connections from all sessmgrs
• sessmgr uses diamproxy on a different card where aaamgrs make Diameter
authentication-related requests
• sessmgr uses diamproxy on same card to make requests for Gx and Gy
For ex
am
ple:
sess
mgr (PSC 3)
S6b & Rf on di
amproxy-8 (PSC 7)
Gx & Gy on diamproxy-12 (PSC 3)
Diameter
Ses
sion Dis
con
nect Rea
sons
Di
am
e
ter ses
sion IDs
The Diam e
ter session ID which is a part of all Di ame
ter pro
tocol messages contains long se-
quences of dig its that en
capsu
late the time the sub scriber connected, the sess
mgr in stance ID,
etc. This can be valu able when try ing to iden tify which subscriber a packet is as
so
ciated with,
as well as trou
bleshoot ing other diffi
cult is
sues. Here is an ex
am ple of Di
am
e
ter session id bro
-
ken down:
Ses
sion-Id:0004-di
amproxy.
test.
com;570906393;11823987;52377bbc-7303
• The first field: 0004-diamproxy.test.com is the name of the Diameter endpoint and local
Diameter proxy task being used to communicate with that endpoint.
• The second field: 570906393 is a decimal notation of the hexadecimal callid. Convert
570906393 to hex and get 0x22075719
• The third field: 11823987 is an internal session number
• The fourth field: 52377bbc is the Epoch time that the session was created.
• The fifth field: 7303 is the sessmgr instance, in the first two (or 3) digits are the sessmgr
instance id. The 73 has to be converted from hex, which in this case means the sessmgr
instance is 115 decimal. The last two digits (03) in this case can be ignored for the
purposes of this material.
Diameter
Policy Control - Gx
Overview
This sec
tion cov
ers Diam
e
ter Pol
icy Con trol over Gx in ter
face and basic trou
bleshooting com-
mands with mul ti
ple ex
am
ples. Pol
icy Con trol is the process whereby the PCRF in di
cates to the
PCEF/GGSN/PGW how to con trol the IP-CAN (con nec
tiv
ity ac
cess net
work) bearer/session.
Classifications
Pol
icy-Con
trol can be clas
si
fied and de
fined as below:
• Bearer Binding : Association between a service data flow and the IP-CAN bearer
transporting that service data flow.
• Gating Control : Blocking or allowing of packets, belonging to a service data flow, to
pass through to the desired endpoint.
• Event Reporting : Notification of, and reaction to, application events to trigger new
behavior in the user plane as well as the reporting of events related to the resources in
the GW (PCEF),
• QOS Control : The authorization and enforcement of the maximum QoS that is
authorized for a service data flow or an IP CAN bearer or APN, IP-CAN bearer
establishment for IP-CANs that support network initiated procedures for IP CAN bearer
establishment.
For ses
sion es
tab
lish
ment and pol
icy con
trol over Gx, the fol
low
ing mes
sages are used:
Ini
ti
ated from PCEF to PCRF (PULL):
Result code: defining if PCRF accepts session or terminates it. Valid codes and
expected causes are reused from 3GPP 29.212, RFC 4006 and RFC 3588.
QoS change: and THP/ARP setting change for session
Event trigger provisioning : setting monitoring points triggers for which PCRF
wants to be notified from PCEF
PCC rule and/or group of PPC rules install/remove commands:
activating/deactivating pre-defined dynamic rules on PCEF, with option to set
predefined future time of automatic activation and deactivation.
Volume monitoring and reporting: Setting up thresholds for monitoring keys,
which enables PCEF to count and report counted volume per Service Data Flow (or
set of SDFs) or for entire session.
• CCR-U : PCEF sends update for session, based on one of the triggers that PCRF set
during session initialization. Update message is sent together with trigger that caused
this update message, and with attributes that are related to the change, for example
new RAT TYPE, new QOS, Usage-Report due to Volume monitoring threshold exceeded,
access network gateway (e.g. SGW/SGSN) change or Revalidation timer.
• CCA-U : PCRF answer to CCR-U request. In this message PCRF has full control to
deny/modify policies on PCEF for subscriber session, by setting up result-code and
provisioning any of the rules and applicable modifications with same procedure as in
INIT message (Changing QoS, Setting new event-triggers, installing new rules, setting
up or discarding volume monitoring).
• CCR-T : PCEF sends TERMINATE request when subscriber session is closed, due to any
reason (UE PDP disconnection, or administrative disconnect by other entity). In
termination request message, PCEF reports termination-reason, and also any
outstanding information that was valid at session termination stage (Like reporting
used volume or active volume-monitoring threshold).
• CCA-T : PCRF answer to CCR-T request which just acknowledges that the request has
been processed. Failure to send a CCA-T can cause the PCEF to retransmit the CCR-T
to a secondary PCRF.
Ini
ti
ated from PCRF to PCEF (PUSH):
• RAR : PCRF can initiate change by itself using Re-Authorization-Request. Unlike Gy and
Diameter Credit Control, this message does not trigger a CCR-U, but instead this is
used when PCRF needs to modify session policy control during active session. With
RAR, PCRF can push new policy attributes like an INIT response and UPDATE response
messages, and also PCRF can request explicit volume usage-report.
Diameter
• RAA : PCEF answer to RAR. Used to respond if PCEF accepted changes in RAR, and to
provide current volume usage-report if PCRF requested it.
Ses
sion Es
tab
lish
ment & Dis
con
nec
tion call-flow
Gx session failure-handling
Fail
ure Han
dling Pro
ce
dure for Gx can be in
voked under the fol
low
ing sce
nar
ios:
1 DI
A
BASE ERROR:
This happens in the path of the request, mainly due to socket-based errors like link
failure which happens when the server goes down.
3 RESPONSE TIMEOUT:
Default value "response-timeout 60" inside diameter endpoint configuration
The fol
low
ing table ex
plains the StarOS fail
ure-han
dling logic for Gx:
Event CCR Failure Han- Failure Han- Secondary server configured Secondary server not confi- Secondary Server Not Respon-
Message dling Setting dling Config and active gured or its down ding
Type
Session Initial Continue failure-han- Retry happens to the secondary 1. No further messaging with Same behavior as in the case of
Establish- dling cc- server continues with session PCRF and no diameter session is server down but this is detect-
ment request-type establishment. established. ed after the application timeout
initial-re- 2. The imsa session is deleted value of 10 seconds or the
quest but the call is not brought down. endpoint level response timeout
<failure_ value whichever happens first
Re- handling_ac- Retry happens to the secondary The call is brought down by Same behavior as in the case of
try-and-Ter- tion> server and continues with ses- initiating a delete PDP with server down but this is detect-
minate sion establishment. teardown indicator which brings ed after the application timeout
down all the sessions. value of 10 seconds or the
endpoint level response timeout
value whichever happens first
Terminate The call is brought down by NA NA
initiating a delete PDP with
teardown indicator which brings
down all the sessions.
Session Update Continue failure-han- Retry happens to the secondary 1. No further messaging with Same behavior as in the case of
Modifica- dling cc- server and continues with ses- PCRF and no diameter session is server down but this is detect-
tion request-type sion modification. established. ed after the application timeout
update-re- 2. The imsa session is not de- value of 10 seconds or the
quest leted and the call is not brought endpoint level response timeout
<failure_ down. value whichever happens first
Re- handling_ac- Retry happens to the secondary The call is brought down by Same behavior as in the case of
try-and-Ter- tion> server and continues with ses- initiating a delete PDP with server down but this is detect-
minate sion modification. teardown indicator which brings ed after the application timeout
down all the sessions. value of 10 seconds or the
endpoint level response timeout
value whichever happens first
Terminate The call is brought down by NA NA
initiating a delete PDP with
teardown indicator which brings
down all the sessions.
Event CCR Failure Han- Failure Han- Secondary server configured Secondary server not confi- Secondary Server Not Respon-
Message dling Setting dling Config and active gured or its down ding
Type
Session Terminate Continue failure-han- Retry happens to the secondary 1. No further messaging with Same behavior as in the case of
Termina- dling cc- server continues with session PCRF and no diameter session is server down but this is detect-
tion request-type termination. established. ed after the application timeout
termi- 2. The imsa session is deleted value of 10 seconds or the
nate-request but the call is not brought down. endpoint level response timeout
<failure_ value whichever happens first
Re- handling_ac- Retry happens to the secondary The call is brought down by Same behavior as in the case of
try-and-Ter- tion> server continues with session initiating a delete PDP with server down but this is detect-
minate termination. teardown indicator which brings ed after the application timeout
down all the sessions. value of 10 seconds or the
endpoint level response timeout
value whichever happens first
Terminate The call is brought down by NA NA
initiating a delete PDP with
teardown indicator which brings
down all the sessions.
Note: The default failure handling config is «failure-handling cc-request-type any-request retry-and-terminate» .
Note: The above failure-handling config is applicable for all result code from 3xxx-4xxx. All Permanent result-codes (5xxx) are treated as actual failures
from the PCRF and the session will be torn down
Diameter
Example Scenarios
Prob
lem Ob
served:
Mobile op
era
tor noticed that after ac
ti
vat
ing a new use case on the PCRF that some sub
scribers
can't es
tab
lish a data call.
Logs col
lec
tion:
Analy
sis:
1 In "mon
i
tor sub
scriber" the PCRF wants to in
stall a new rule
base. After re
ceiv
ing this
CCA-I, the GGSN dis
con
nects the ses
sion with cause "No re
sources".
----------------------------------------------------------------------
Incoming Call:
----------------------------------------------------------------------
MSID/IMSI : 450061234554321 Callid : 00004e22
IMEI : n/a MSISDN : 64211234567
Username : 64211234567 SessionType : ggsn-pdp-type-ipv4
Status : Active Service Name: GGSN-SVC
Src Context : Gn
----------------------------------------------------------------------
[M] Subscription-Id:
[M] Subscription-Id-Type: END_USER_E164 (0)
[M] Subscription-Id-Data: 64211234567
[M] Subscription-Id:
[M] Subscription-Id-Type: END_USER_IMSI (1)
[M] Subscription-Id-Data: 450061234554321
[V] [M] Supported-Features:
[M] Vendor-Id: 10415
[V] Feature-List-ID: 1
[V] Feature-List: 1
[V] [M] Bearer-Identifier: 0x00000004
[V] [M] Bearer-Operation: ESTABLISHMENT (1)
[M] Framed-IP-Address: IPv4 172.16.0.8
[V] [M] IP-CAN-Type: 3GPP-GPRS (0)
[V] [M] QoS-Information:
[V] [M] QoS-Class-Identifier: QCI_8 (8)
[V] [M] Max-Requested-Bandwidth-UL: 64000
[V] [M] Max-Requested-Bandwidth-DL: 128000
[V] [M] Bearer-Identifier: 0x00000004
[V] [M] Allocation-Retention-Priority:
[V] [M] Priority-Level: 1
[V] [M] Pre-emption-Capability: PRE-EMPTION_CAPABILITY_DISABLED (1)
[V] [M] Pre-emption-Vulnerability: PRE-EMPTION_VULNERABILITY_ENABLED (0)
[V] [M] QoS-Upgrade: QoS_UPGRADE_SUPPORTED (1)
[V] [M] 3GPP-SGSN-MCC-MNC: 123456
[V] [M] 3GPP-SGSN-Address: IPv4 192.168.2.138
[V] [M] RAI: 123456FFFEFF
[V] [M] 3GPP-User-Location-Info: Location Type: RAI (2) Location Info (RAI): 123-456-
0xFFFE-0xFFFF
[M] Called-Station-Id: ecs-apn
[V] [M] Bearer-Usage: GENERAL (0)
[V] [M] Online: DISABLE_ONLINE (0)
[V] [M] Offline: DISABLE_OFFLINE (0)
[V] [M] Access-Network-Charging-Address: IPv4 192.168.2.1
[V] [M] Access-Network-Charging-Identifier-Gx:
[V] [M] Access-Network-Charging-Identifier-Value: 0x00000002
[M] Subscription-Id:
[M] Subscription-Id-Type: END_USER_IMSI (1)
[M] Subscription-Id-Data: 450061234554321
[M] Framed-IP-Address: IPv4 172.16.0.8
[M] Termination-Cause: DIAMETER_ADMINISTRATIVE (4)
[M] Called-Station-Id: ecs-apn
[V] [M] Access-Network-Charging-Address: IPv4 192.168.2.1
After re
pro
duc sage can be seen: "Gx No re
tion of the issue, this log mes sources: No such
group:, RULE
BASE_1".
3 The rule
base seems to be prop
erly con
fig
ured:
4 PCRF in
di
cates the rule
base to use via the Charg
ing-Rule-Base-Name AVP (CRBN).
How
ever, the 3GPP model of a rule
base is just a set of charg
ing rules whereas an ECS
rule
base also de
fines a range of other pa
ra
me
ters such as analy
sers, P2P in
spec
tion,
some charg
ing pa
ra
me
ters and var
i
ous other as
pects of a ses
sion. As a re
sult, StarOS
of
fers two ways to in
ter
pret the CRBN AVP re
ceived from PCRF con
trolled via CLI:
Res
o
lu
tion :
Con
fig
ure pol
icy-con
trol so Rule
base can be changed by PCRF:
Rules re
ceived from PCRF are not in
stalled in GW
Logs col
lec
tion:
Analy
sis:
In a mon i
tor subscriber trace or packet capture CCR-I is being sent from GGSN to PCRF, PCRF
is re
ply
ing back with pre-de fined Charg ing ru
lename in CCR-A. GGSN is sending CCR-U with
"PCC-Rule-Sta tus: IN
ACTIVE (1)". PCRF is sending CCR-U success, GGSN is re
moving the ses
-
sion from Ac tive Charging Ser
vice.
Trace:
----------------------------------------------------------------------
Incoming Call:
----------------------------------------------------------------------
MSID/IMSI : 123456000000002 Callid : 00004e8b
IMEI : n/a MSISDN : 9876543210
Username : 9876543210 SessionType : ggsn-pdp-type-ipv4
Status : Active Service Name: ggsn
Src Context : spgw
----------------------------------------------------------------------
Version number: 1
Protocol type: 1 (GTP C/U)
Extended header flag: Not present
Sequence number flag: Present
NPDU number flag: Not present
Message Type: 0x10 (GTP_CREATE_PDP_CONTEXT_REQ_MSG)
Message Length: 0x0095 (149)
Tunnel ID: 0x00000000
Sequence Number: 0x7FFF (32767)
GTP HEADER ENDS.
INFORMATION ELEMENTS FOLLOW:
IMSI: 123456000000002
Recovery: 0xFA (250)
Selection Mode: 0x1 (MS provided APN, subscription not verified (Sent by MS))
Tunnel ID Data I: 0x00000400
Tunnel ID Control I: 0x00000400
NSAPI: 0x05 (5)
End2End-ID: 0x006999e0
AVP Information:
[M] Session-Id: pcrf;20107;27393;554a6bb5-102
[M] Auth-Application-Id: 16777238
[M] Origin-Host: pcrf
[M] Origin-Realm: cisco.com
[M] Result-Code: DIAMETER_SUCCESS (2001)
[M] CC-Request-Type: UPDATE_REQUEST (2)
[M] CC-Request-Number: 1
[M] Route-Record: box
Res
o
lu
tion :
Online Charging - Gy
Overview
This sec
tion cov
ers Diame
ter Online Charg
ing over Gy in
ter
face and basic trou
bleshoot
ing
commands with multi
ple ex
amples.
The Gy in
terface is the online charg
ing inter
face between the Gateway and the Online Charg
ing
Sys
tem (OCS). The Di ame
ter pro
tocol is used over this inter
face to provide on
line charg
ing
func
tion
al
ity. This is also known as the Di ameter Credit Con
trol Ap
pli
ca
tion (DCCA).
Gen
eral Call Flow 1
Failure Handling
The ses
sion failover set
ting de
ter
mines whether the ses sion needs to be tried and bound to an-
other OCS in case of the pri mary OCS fail ure. The setting can be done in the credit-con trol
con
fig
u
ra
tion under active-charging ser
vice config
u
ra
tion as below:
Al
ter
na
tively, the OCS can send the set ting in the Session-Failover AVP in the CCA-I. What ever is
sent from the OCS takes prece dence over what is con fig
ured in the ASR 5000/ASR 5500. If Ses -
sion-Failover is not en
abled through in terac
tion with OCS, even if there is sec ondary OCS config
-
ured and fail
ure-handling con
fig
ured, it will not take ef
fect and the ses
sion will be ter
mi
nated.
The fail
ure-han
dling set
ting de
ter
mines what needs to be done for the session in case of fail
ure
of the OCS. Fail
ure-han dling set
tings can come from local con fig
u
ra
tion, or from the OCS
through the Credit-Con trol-Fail
ure-Handling AVP. The AVP can have any of the 3 values listed
below.
• CONTINUE - Retry to the secondary OCS and in case that too is not reach
able or not re-
sponding, then Con
tinue the on
go
ing ses
sion, with
out any fur
ther OCS in
ter
ac
tion. [Ses
sion
goes Offline]
• TER
MI
NATE - Ter
mi
nate the ses
sion with
out retry
ing the sec
ondary.
The fail
ure-handling set
ting is more com pre
hensive in ASR 5000/ASR 5500, where there are
dif
fer
ent op
tions to set dif
fer
ent fail
ure-handling be
havior for dif
fer
ent mes
sages (Ini
tial/Up
-
date/Ter mi
nate).
Diameter
Fail
ure Han
dling Call Flow
1 If the config is CONTINUE & RETRY-AND-TERMINATE. In the Case of OCS1 down,
PGW/GGSN retries the secondary server.
Sce
nario 1
2 If the config is CONTINUE. In the case of both OCS down. PGW/GGSN continues the
session offline.
3 If the con
fig is RETRY-AND-TER
MI
NATE. In the Case of both OCS down. PGW/GGSN
ter
mi
nate the ses
sion after retry
ing.
Diameter
Sce
nario 3
Cap
ture muli
ti
ple it
er
a
tions so the delta be
tween them can be seen.
The fol
low
ing are de
scrip
tions for val
ues of "Credit-Con
trol:" in the out
put:
• Offline Charging - Sessions that were initially online but later moved into Offline
charging due to failure handling or some Offline indication from OCS server.
• On - Session is online enabled and charging and currently not waiting for CCA-I, CCA-U
or CCA-T.
• Off - The session is non-online enabled.
CC Session Stats:
Total Current Sessions: 0
Total ECS Adds: 2334 Total CC Starts: 11
Total Session Updates: 0 Total Terminated: 2334
Session Switchovers: 0
CC Message Stats:
Total Messages Received: 24 Total Messages Sent: 24
Total CC Requests: 24 Total CC Answers: 24
CCR-Initial: 11 CCA-Initial: 11
CCA-Initial Accept: 11 CCA-Initial Reject: 0
CCA-Initial Timeouts: 0
CCR-Update: 2 CCA-Update: 2
CCA-Update Timeouts: 0
CCR-Final: 11 CCA-Final: 11
CCA-Final Timeouts: 0
CCR-Event: 0 CCA-Event: 0
CCA-Event Timeouts: 0 CCR-Event Retry: 0
ASR: 0 ASA: 0
RAR: 0 RAA: 0
CCA Dropped: 0
Backpressure Stats:
CCR-I Messages : 0 CCR-U Messages : 0
CCR-T Messages : 0 CCR-E Messages : 0
Assumed-Positive Sessions:
Current: 0 Cumulative: 0
Unrouted: 0
Directed: 0
Buffer Errors: 0
Peer Never Up Errors: 0
Window Errors: 0
Unsupported Application Errors: 0
Read statistics:
Read Bytes: 0
Read Messages Total: 0
Requests Read: 0
Requests Timed Out: 0
Answers Read: 108
Answers Timed Out: 0
Read Application Messages: 108
Unexpected Answers Read: 0
Command/Flag Errors: 0
Diameter Protocol Errors: 0
Errors: 0
Length Padding Errors: 0
H2H Errors: 0
Length Too Long: 0
Command Unknown: 0
Length Sanity Errors: 0
Length-v-SCTP EOR Errors: 0
SCTP Missing EOR Errors: 0
Route statistics:
Adds: 0
Expires: 0
Hits: 0
Misses: 0
Indirects: 0
Installs: 0
Latency statistics:
Last Round Trip Time (ms): 50
Average Round Trip Time (ms): 53
Sta
tus of the Di
am
e
ter end
point
Peers Summary:
Peers in OPEN state: 1
Peers in CLOSED state: 0
Peers in intermediate state: 0
Total peers matching specified criteria: 1
Cur
rent route en
tries
Example Scenarios
Prob
lem De
scrip
tion:
Logs Col
lec
tion:
• Collect syslog
• Collect bulkstats/KPI for Diameter
• Monitor subscriber imsi <imsi> (verbosity 3)
• Monitor protocol (verbosity 3) -- Consult Cisco support before running this command
in a production network.
• Collect external packet capture
• Show active-charging credit-control statistics group <credit-control group name>
• Show active-charging sessions full <imsi>
• Show session disconnect-reason
• Show support detail
Analy
sis:
• Based on the KPIs and show command output, it was noticed that OCS session failover
occurred, and the diameter protocol Error (3xxx) started increasing.
show active-charging credit-control statistics group “PGW-GY”
• Both OCS were resonding with error 3004 (Diameter too Busy )
• Logs show both OCSs send “DIAMETER_TOO_BUSY” (3004) error to PGW.
Diameter
• Because both OCSs sent 3004 result code, the Failure Handling (FH) condition
is triggered, resulting in an increase in offline sessions.
• After expiry of FH timer (a timer which controls when to disconnect sessions which are
offline due to FH), the PGW terminated the subscriber session which subsequently tried
to reattach. This, in turn, led to an increase in the “Admin-disconnect” reason and “FH
Termination” event.
• The large number of subscriber reattaches had a degrading effect which resulted in the
problems.
Res
o
lu
tion:
1 The problem was resolved once the OCS team rolled back changes.
2 Define the DCCA high and low thresholds for the errors.
Logs col
lec
tion:
Analy
sis:
Res
o
lu
tion:
In
crease the quota size to avoid fre
quent trans
ac
tion, or con
fig
ure the sys
tem to buffer
packets while quota re
quests to OCS server are pending.
Diameter
Authentication - S6a
Overview
This sec
tion cov
ers Di
ame
ter Au
then
ti
ca
tion over S6a in
ter
face and basic trou
bleshoot
ing com
-
mands with an example.
The fol
low
ing mes
sages are ini
ti
ated by MME to HSS.
• Authentication-Information-Request(AIR)
• Update-Location-Request(ULR)
• Notify-Request(NR)
• Purge-Ue-Request(PUR)
Au
then
ti
ca
tion-In
for
ma
tion Re
quest/Re
sponse
The Authenti
ca
tion Infor
mation Re
trieval Pro
ce
dure shall be used by the MME to re
quest Au
-
then
ti
ca
tion In
formation from the HSS.
AVP struc
ture used by MME to ask for EPS vec
tors
Next, it in
puts the val
ues of {SQN, SN ID, CK, IK} in the key de
riva
tion func
tion to de
rive
KASME.
Up
date-Lo
ca
tion Re
quest/An
swer
Call flow
swer mes sage, so that the MME can create an EPS ses
sion and a de
fault EPS bearer for the sub
-
scriber. The subscrip
tion in
for
ma cluded in the Up
tion in date Loca
tion Answer mes sage is as
fol
lows:
No
tify Re
quest/Re
sponse
Call flow
Purge-UE Re
quest/Re
sponse
The MME shall make use of this procedure to set the "UE Purged in the MME" flag in the HSS
when the subscription profile is deleted from the MME database due to MME interaction or
after long UE inactivity.
When MME sends PUR to HSS, HSS will mark that sub scriber as Purged from MME. In re -
sponse, HSS shall send an in
di
ca
tion to freeze the M-TMSI (Tem po
rary Iden
tity) to MME for
some time so that same tempo
rary iden
tity can be used when sub
scriber be
came ac tive.
CLIs:
• show diameter peers full all
Diameter
Log
ging:
• Check for the Diameter Peer Status and the state should be Open.
Session Stats:
Total Current Sessions: 0
Sessions Failovers: 0 Total Starts: 0
Total Session Updates: 0 Total Terminated: 0
Message Stats:
UL Request: 0 UL Answer: 0
ULR Retries: 0 ULA Timeouts: 0
ULA Dropped: 0
PU Request: 0 PU Answer: 0
PUR Retries: 0 PUA Timeouts: 0
PUA Dropped: 0
AI Request: 0 AI Answer: 0
AIR Retries: 0 AIA Timeouts: 0
AIA Dropped: 0
CL Request: 0 CL Answer: 0
Diameter
Example Scenarios:
Prob
lem De
scrip
tion:
Ex
peri
encing au
then
ti
ca
tion is
sues with roaming sub
scribers in the net
work. S6a messages are
not going out of the box for some op er
a
tors and the fol
low
ing er
rors are ob
served with log
ging
moni
tor for the sub
scriber:
Diameter
Re
quired in
for
ma
tion for trou
bleshoot
ing
Analy
sis:
The cur
rent con
fig
u
ra
tion in
di
cates that sta
tic route entry is miss
ing for the di
am
e
ter end
point.
Ac
cording to the MME ad minis
tra
tion guide, (section: Config
ur
ing Dynamic Des
ti
na
tion Realm
Construc
tion for Foreign Subscribers - page 89 Re lease 16 ), con
fig
ur
ing Dy
namic Des ti
na
tion
Realm for Foreign Sub scribers re
quires a static route entry to be added in the diam e
ter end-
point con
fig
u
ration for each for
eign realm.
Res
o
lu
tion:
context signal
diameter endpoint hss
route-entry realm epc.mnc001.mcc001.3gppnetwork.org peer dsr1.dea.test.net
route-entry realm epc.mnc0o1.mcc001.3gppnetwork.org peer dsr2.dea.test.net
DNS
DNS
DNS Client
Overview
The in
fra
struc
ture DNS-client is used by the PGW / HSGW / MME / SGSN to re solve DNS
names for SGSN/MME/GGSN/PGW, P-CSCF or other servers likes ac count
ing. In this chap
ter
the commands used to debug and troubleshoot this fea
ture are cov
ered.
Troubleshooting Commands
show con
text
• This command will show the context configured on the chassis. Make note of the
context which contains the dns-client configuration. This will be the context where the
pgw-service is configured.
• This command will show the status of the vpn manager (vpnmgr) tasks, which controls
the dns-client.
Make sure the status of the task is in a good state.
• Note: The task instance maps to the context ID seen above, so in this example, vpnmgr
instance 2 is used for context PGWin, which is where the dns-client is configured. If
there is a problem on the chassis related to the dns-client, a restart of the vpnmgr task
may help clear the issue.
cpu facility inst used allc used alloc used allc used allc S status
----------------------- --------- ------------- --------- ------------- ------
5/0 vpnmgr 1 0.2% 30% 29.29M 48.10M 44 2000 -- -- - good
show ses
sion re
cov
ery sta
tus ver
bose
• This command will show where the Demux card is located. The vpnmgr task runs on the
Demux card, so make note of the location and confirm there are no problems with the
card. Below are two examples showing the Demux card on ASR 5500 MIO in slot 5 and
ASR 5000 PSC in slot 1.
• This command will show the statistics for the dns-client. In this example, a dns-client
named "PGW-DNS" and the config is located in context PGWin.
• Make note of any failures and re-run the command to see if the failures are
incrementing.
• If failures are seen, confirm which DNS query type has the problem.
• Check to see if the problem is related to the primary or secondary name server.
• Check for the type of failure:
Query Timeout
Domain Not Found
Connection Refused
Other Failures
DNS
• A PCAP trace may be required to determine where the problem is between the chassis
and DNS servers.
• DNS NAPTR queries are used to obtain long lists of Name Authority Pointer records
which are essentially FQDNs representing various nodes in the network to which an
HSGW or MME desire to connect calls to. DNS AAAA queries will then ensue to obtain
the actual IPv6 addresses of the FQDNs the system desires to connect to.
NAPTR 0 0 0
PTR 0 0 0
Total Resolver Queries: 50
Successful Queries: 50
Query Timeouts: 0
Domain Not Found: 0
Connection Refused: 0
Other Failures: 0
• This command can be used to clear the dns-client statistics. This can be helpful to get a
fresh look at the statistics during a problem. This command needs to be run from the
context where dns-client is configured.
• This command will show the dns-client cache output. This can be helpful to find
specific query names, query type, TTL and DNS answer. This command needs to be run
from the context where dns-client is configured.
DNS
• This command can be used to clear the dns-client cache. This can be helpful to perform
if problems are seen related to certain entries in the cache or if additional sites/nodes
have been added recently. Normally the TTL may be so low (i.e. under 60-120 seconds)
that the entries will be automatically updated in the cache on a regular interval. This
command needs to be run from the context where dns-client is configured.
• This command can be used to perform a dns query to a specific query name using a
specific query type. Use the "show dns-client cache client <dns-client name>"
command to find a query name. Other query-type can be entered, such as "A" and
"NAPTR". This command needs to be run from the context where dns-client is
configured.
show con
fig con
text <pg
w_
con
tex
t_
name>
• This command can be used to see the configuration related to the dns-client.
show con
fig con
text <pg
w_
con
tex
t_
name> ver
bose
• This command can be used to see any default options configured for the dns-client.
Sam
ple query for se
lect
ing SGW and PGW se
lec
tion using APN:
hb00.tac.epc.mncxxx.mccyyy.3gppnetwork.org
Query Name: tac-lb28.tac-hb00.tac.epc.mncxxx.mccyyy.3gppnetwork.org
Query Type: NAPTR TTL: 60 seconds
Answer:
Order: 1 Preference: 1
Flags: a Service: x-3gpp-sgw:x-s5-gtp
Regular Expression:
Replacement: sgw5.sgw.3gppnetwork.org
[mme]MME2#
[mme]MME2# dns-client query client-name dns1 query-type NAPTR query-name
cisco.com.apn.epc.mncxxx.mccyyy.3gppnetwork.org
Query Name: cisco.com.apn.epc.mncxxx.mccyyy.3gppnetwork.org
Query Type: NAPTR TTL: 60 seconds
Answer:
Order: 1 Preference: 1
Flags: a Service: x-3gpp-pgw:x-s5-gtp
Regular Expression:
Replacement: pgw5.pgw.3gppnetwork.org
DNS
Dis
play
ing the DNS cache out
put for ver
i
fi
ca
tion:
SGSN Queries
DNS query for RAI for find
ing SGSN se
lec
tion and GGSN se
lec
tion using APN query.
Proxy DNS
Overview
This chap
ter talks about proxy-dns fea
ture in
clud
ing con
fig
u
ra
tion and trou
bleshoot
ing com
-
mands.
Troubleshooting command
The fol
low
ing com
mands can be used to trou
bleshoot the dns-proxy in
ter
cept list:
Command Reference
show dns-proxy intercept-list <name> statistics To look at stats for a specific intercept list
show dns-proxy intercept-list <name> rule <rule> To look at stats for a specific intercept list and rule
statistics
show session subsystem data-info verbose | grep Look for high amount of drops
proxy-dns
show sub full username <username> | grep -i dns DNS related info for a specific session
proxy-dns in
ter
cept-list
• Use this command to define a name for a list of rules pertaining to the IP addresses
associated with the foreign network’s DNS. Up to 128 rules of any type can be
configured per rules list. Upon entering the command, the system switches to the HA
Proxy DNS Configuration Mode where the lists can be defined. Up to 64 separate rules
lists can be configured in a single AAA context. This command and the commands in the
HA Proxy DNS Configuration Mode provide a solution to the Mobile IP problem that
occurs when a MIP subscriber, with a legacy MN or MN that does not support IS-835D,
receives a DNS server address from a foreign network that is unreachable from the
home network.
DNS
By con
fig
ur
ing the Proxy DNS fea
ture on the Home Agent, the for eign DNS ad dress is in
ter
-
cepted and re
placed with a home DNS ad
dress while the call is being handled by the home net -
work.
DNS
Con
fig
u
ra
tion
The fol
low
ing com ates a proxy DNS rules list named list1 and places the CLI in the HA
mand cre
Proxy DNS Con fig
u
ra
tion Mode. A redi
rect and pass-thru rule has been config
ured.
configure
context <ingress_context>
proxy-dns intercept-list list1
redirect 0.0.0.0/0 primary-dns 1.1.1.1 secondary-dns 2.2.2.2
pass-thru 5.5.0.0/16
exit
end
configure
context <egress_context>
ip dns-proxy source-address 192.168.1.1
end
Active Charging Service
Active Charging Service
Overview
The Active Charg
ing Service (ACS), also called ECS (Enhanced Charg ing Ser
vice) is an in-line
ser
vice that pro
vides flex
i
ble, dif
fer
en
ti
ated, and de
tailed billing to sub
scribers with Layer 3
through Layer 7 packet in
spec tion.
This chap
ter will cover the basic prin ci
ples of ACS: the archi
tec
ture, the main build
ing blocks,
ad
vanced functional
i
ties, and a sec
tion with trou
bleshooting scenar
ios.
Architecture
Con
tent Ser
vice Steer
ing: Redi
rects in
com
ing traf
fic to the ECS sub
sys
tem.
Pro
tocol An
a
lyzer: Per
forms inspec
tion of in
coming pack
ets. The Proto
col Ana
lyzer is the soft
-
ware stack respon
si
ble for an
a
lyz
ing the in
di
vid
ual pro
to
col fields and states during packet in-
spec
tion.
The Pro
to
col An
a
lyzer per
forms two types of packet in
spec
tion:
• Shallow Packet Inspection: Inspection of the layer 3 (IP header) and layer 4 (for
example, UDP or TCP header) information.
• Deep Packet Inspection: Inspection of layer 7 and 7+ information.
Image - Pro
to
col An
a
lyzer Soft
ware Stack
Rule Definitions (ruledefs): Specifies the packets to inspect or the charging actions to apply to
packets based on content.
Active Charging Service
Routing Ruledefs: Routing ruledefs are used to route pack ets to con
tent an
a
lyz
ers. Routing
ruledefs de
ter
mine which con tent ana
lyzer to route the packet to when the pro to
col fields
and/or proto
col-states in the ruledef expres
sion are true.
Ruledef priori
ties con
trol the flow of the pack ets through the an a
lyz
ers and con trol the order in
which the charg ing actions are ap plied. The ruledef with the low est pri
ority num ber in
vokes
first. For routing ruledefs, it is important that lower level analyzers (such as the TCP an a
lyzer)
be in voked prior to the re lated ana
lyzers in the next level (such as HTTP an a
lyzer) as the next
level of ana
lyzers may re quire access to resources or infor
ma tion from the lower level. Pri ori
-
ties are also im por
tant for charg ing ruledefs, as the ac
tion defined in the first matched charg ing
rule apply to the packet and ECS sub sys
tem disregards the rest of the charg ing rules.
1 The packet is redirected to ECS based on the ACLs in the subscriber’s APN and packets
enter ECS through the Protocol Analyzer Stack.
4 When the protocol fields and states found during the shallow inspection match those
defined in a routing ruledef, the packet is routed to the appropriate layer 7or 7+
analyzer for deep-packet inspection.
5 After the packet has been inspected and analyzed by the Protocol Analyzer Stack:
Active Charging Service
• The packet resumes normal flow and through the rest of the ECS subsystem.
• The output of that analysis flows into the Charging Engine, where an action can
be applied. Applied actions include redirection, charge value, and billing record
emission.
1 In the Classification Engine, the output from the deep-packet inspection is compared to
the charging ruledefs. The priority configured in each charging ruledef specifies the
order in which the ruledefs are compared against the packet inspection output (Lower
number = Higher priority).
2 When a field or state from the output of the deep-packet inspection matches a field or
state defined in a charging ruledef, the ruledef action is applied to the packet. Actions
can include redirection, charge value, or billing record emission. It is also possible that a
match does not occur and no action will be applied to the packet at all.
Active Charging Service
ACS commands
Checking for proper func
tion
al
ity of ACS is typi
cally found in two sepa
rate sec
tions of the CLI,
within the ses
sion sub
sys
tem and witin a sep a
rate active-charging sub
sys
tem.
• show subscriber full [user, imsi, msid, callid] [identifier]
Descrip
tion: This com mand will show key in for
mation about which el
e
ments in the sys
-
tem con fig this ses
sion is bound to. These include the ACL, the rulebase, the FW/NAT
pol
icy (defined below), the NAT realm, pool, and IP ad dress, the NAT port chunks al
lo
-
cated to the session, and the amount of traf
fic sent to the ACL.
Overview
Ruledef is a method ol
ogy to in
spect pack
ets or apply charg
ing ac
tions in ASR 5000/ASR 5500.
This chapter cov
ers opti
miz
ing tech
niques of ruledef.
• In a rulebase, keep all HTTP-based rules at higher priority than non-HTTP based rules
(like RTSP/RTP etc) as HTTP traffic is significant and should get evaluated quickly.
• Keep optimized rules at a higher priority than non-optimized rules so that they get
evaluated earlier.
• Remove / Re-evaluate rules which show no hits: even though rule is not hit, it still
consumes CPU for evaluation.
• Find out the frequently hit rules. Keep these rules at higher priority than others.
Ways to op
ti
mize ruledef con
fig
u
ra
tion
• Every ECS config line counts: keep ECS config minimal, remove un-used rules, routes,
analyzers.
• Use debug CLI command for acsmgr to check which ruledefs are optimized and need
optimization.
• Check debug profiler to see if "acsmgr_rule_compare_string" is using cpu cycles. If
yes, then the rulebase needs to be optimized. For more info on this command please
refer to the SW troubleshooting section.
Rule Lines
==============================================================================================
==================
Ruledef Name 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2
2 2 3 3 3 Status
1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7
8 9 0 1 2
==============================================================================================
==================
dns-pkts O O - - - - - - - - - - - - - - - - - - - - - - - - -
- - - - - O
ip-pkts U - - - - - - - - - - - - - - - - - - - - - - - - - -
- - - - - U
www-url-google O - - - - - - - - - - - - - - - - - - - - - - - - - -
- - - - - O
Active Charging Service
This com
mand will help to iden
tify if the rule
base is the cause of high CPU uti
liza
tion:
FW / NAT
Overview
This chapter provides an overview of the Net work Ad dress Trans
la
tion (NAT) in-line ser
vice
which is part of the Ac
tive Charg
ing Ser
vice (ACS) fea
ture-set.
The fol
low
ing top
ics are cov
ered in this chap
ter:
• NAT Overview
• NAT Mappings: 1:1 and Many-to-One
• NAT Application Level Gateway
• NAT Call Flow
• Gathering NAT Statistics
NAT Basics
NAT maps non-routable RFC 6568 de fined pri
vate IP ad
dresses to routable pub lic IP ad
dresses.
These IPs are part of IP-pool con
figu
ra
tions which reside in the con fig
u
ra
tion of the ASR
5000/ASR 5500. There will be one or more pools in the private range(s) and sep a
rate pool(s) in
the pub
lic ad
dress space.
In typi
cal de
ploy
ments, NAT is enabled to provide for the con ser
va
tion of public IP ad
dresses
re
quired to com mu
ni
cate with ex
ter
nal net works. It also provides some se cu
rity as the IP ad
-
dress scheme for the in ter
nal net
work is masked from ex ter
nal hosts. Each out go
ing and in
-
com ing packet goes through the transla
tion process.
• GGSN
• HA
• PDSN
• P-GW
Active Charging Service
Note:
• NAT works only on flows originating internally. Bi-directional NAT is not supported.
• NAT is supported only for TCP, UDP, and ICMP flows. For other flows NAT is bypassed.
• To get NATed, the private IP addresses assigned to subscribers must be from the
following ranges:
Class A 10.0.0.0 – 10.255.255.255
Class B 172.16.0.0 – 172.31.255.255
Class C 192.168.0.0 – 192.168.255.255
100.64.0.0/10 as per RFC 6598.
NAT is con
fig
ured within ACS. For an overview of ACS, please see the ACS Overview sec
tion of
this doc
u
ment. For NAT-specific con
fig, please see below.
To con
fig
ure NAT, the first thing needed is a pub
lic ip-pool and a pri
vate ip-pool:
access-ruledef Private_Pool1
ip src-address = 10.64.0.0/16
#exit
fw-and-nat policy NAT_Policy_01
access-rule priority 10 access-ruledef Private_Pool1 permit nat-realm NAT_01
#exit
rulebase RB1
~~~
fw-and-nat default-policy NAT_Policy_01
#exit
More in
for
ma
tion can be found in the NAT Ad
min
is
tra
tion Guide.
NAT Mappings
NAT works by inspect
ing both in
com
ing and outgo
ing IP data
grams and, as needed, mod i
fy
ing
the source IP ad
dress and port num
ber in the IP header to re
flect the con
fig
ured NAT ad
dress
mapping.
NAT sup
ports the fol
low
ing map
pings:
Troubleshooting
The fol
low
ing section lists use
ful NAT re
lated CLI com
mands that can be used to ver
ify or trou
-
bleshoot NAT related issues.
ip address: 10.64.1.100
(Public_01)
[1344 - 1375]
0012345678912345@nai.epc.mnc345.mcc012.3gppnetwork.org
---
Table 1. Gath
er
ing NAT Sta
tis
tics
Statistics of all NAT IP pools in a NAT IP pool group show active-charging nat statistics nat-realm
<pool_group_name>
Summary statistics of all NAT IP pools in a NAT IP show active-charging nat statistics nat-realm
pool group <pool_group_name> summary
Statistics for a specific ACS/Session Manager show active-charging nat statistics instance
instance instance_number
Information on NAT bind records generated for port show active-charging rulebase statistics name
chunk allocation and release. <rulebase_name>
Information for subscriber flows with NAT disabled. show active-charging flows nat not-required
Information for subscriber flows with NAT enabled. show active-charging flows nat required
Information for subscriber flows with NAT enabled, show active-charging flows nat required nat-ip
and using specific NAT IP address. <nat_ip_address>
Information for subscriber flows with NAT enabled, show active-charging flows nat required nat-ip
and using specific NAT IP address and NAT port <nat_ip_address> nat-port <nat_port>
number.
SIP ALG Advanced session statistics. show active-charging analyzer statistics name sip
Information for all the active flow-mappings based show active-charging flow-mappings all
on the applied filters.
Information for the number of NATed and Bypass show active-charging subsystem all
NATed packets.
Information for all current subscribers who have show subscribers full all
either active or dormant sessions. Check IP address
associated with subscriber.
Information for subscribers with NAT processing not show subscribers nat not-required
required.
Information for subscribers with NAT processing show subscribers nat required nat-ip
enabled and using the specified NAT IP address. <nat_ip_address>
Information for subscribers to find out how long (in show subscribers nat required usage-time [ < | > |
seconds) the subscriber has been using NAT-IP. greater-than | less-than ] value
Call drop reason due to invalid NAT configuration. show session disconnect-reasons
Overview
X-Header In ser
tion and En cryp tion fea
tures, col
lec
tively known as Header En rich
ment, en-
able ap
pending of headers to HTTP/WSP GET and POST re quest pack
ets for use by end ap
pli
-
ca
tions, such as mobile ad
vertisement in
ser
tion (MSISDN, IMSI, IP address, user-customiz
able,
and so on). This is a li
censed fea
ture "Header En
rich
ment [AS
R5K-00-CS01H
DRE]" and is sup
-
ported only on the GGSN, IPSG and P-GW.
Troubleshooting
To check if http-x-header-an a
lyzer rule is prop
erly hit, and cor
rect val
ues for an
a
lyzed traf
fic
are shown, the fol
low
ing CLIs can be used:
Sam
ple con
fig
u
ra
tion for x-header in
ser
tion by ECS
xheader-format header_format_tac
insert X-header-field-name variable bearer radius-calling-station-id
Active Charging Service
ruledef new_ruledef
http url contains www.test.com
wsp url contains wap.test.com
charging-action CA_xheader
xheader-insert xheader-format header_format_tac
This com mands al lows the user to see that XHeader in
for
ma
tion has been added to en
sure the
ruledef is get
ting hit prop
erly:
First-request Redirected: 0
XHeader Information:
XHeader Bytes Injected: 385
XHeader Pkts Injected: 11
XHeader Bytes Removed: 0
XHeader Pkts Removed: 0
IP Frags consumed by XHeader: 0
This com
mand is used to show the over
all pack
ets based on the IP pro
to
col:
This com
mand is used to show the over
all pack
ets based on the TCP pro
to
col:
This com
mand is used to show the over
all pack
ets based on the HTTP pro
to
col:
Troubleshooting ACS
Overview
This sec
tion pro
vides some ex
am
ples of trou
bleshoot
ing is
sues aris
ing within ACS.
Example Scenarios
Prob
lem de
scrip
tion
Customer re
ports an issue where http traf fic for cer
tain sub
scribers is not de
tected. The tar
get
area to check in this case is the ruledef / rulebase config
u
ra
tion.
Analy
sis
1 In the mon
i
tor sub
scriber out
put below, the traf
fic being re
ceived is on tcp port 80
which is HTTP.
----------------------------------------------------------------------
Incoming Call:
----------------------------------------------------------------------
Active Charging Service
2 The con
fig shows the rout
ing ruledef for http, but no charg
ing ruledef for clas
si
fy
ing
http traf
fic:
config
active-charging service srv1
ruledef http-ports
tcp either-port = 80
rule-application routing
#exit
ruledef ip-pkts
ip any-match = TRUE
#exit
ruledef tcp-pkts
tcp any-match = TRUE
#exit
ruledef udp-pkts
udp any-match = TRUE
#exit
rulebase base1
default retransmissions-counted
billing-records egcdr
action priority 980 ruledef tcp-pkts charging-action standard
action priority 990 ruledef ip-pkts charging-action standard
route priority 80 ruledef http-ports analyzer http
flow control-handshaking charge-to-application all-packets
egcdr threshold interval 60
egcdr threshold volume total 100000
no transport-layer-checksum verify-during-packet-inspection
#exit
3 In the ac
tive charg
ing ses
sion out
put, the traf
fic is clas
si
fied as TCP in
stead of HTTP.
Res
o
lu
tion:
config
active-charging service srv1
!
ruledef http
tcp either-port = 80
#exit
!
rulebase base1
# show active-charg
ing ses
sions full imsi <>
# moni
tor sub
uscriber imsi <> with option 34, 19, A, S & ver
bosity 3.
Analy
sis
1 In the mon
i
tor sub
scriber out
put below, the traf
fic being re
ceived is tcp port 80 which
is HTTP and that sub
scriber is going to google.
com.
----------------------------------------------------------------------
Incoming Call:
----------------------------------------------------------------------
MSID/IMSI : 50604000000002 Callid : 017dc662
IMEI : n/a MSISDN : 9876543210
Username : 9876543210 SessionType : ggsn-pdp-type-ipv4
Status : Active Service Name: local
Src Context : Gn
----------------------------------------------------------------------
:
Wednesday May 06 2015
INBOUND>>>>> From sessmgr:6 sessmgr_ipv4.c:16113 (Callid 017dc662) 16:27:01:842
Eventid:77001(9)
CSS Uplink Input PDU from ACS- slot:3 cpu:34 inst:8738
172.1.1.1.1184 > 3.1.1.1.80: P [tcp sum ok] 1:338(337) ack 1 win 64240 (DF) (ttl 128, id 1235,
len 377)
0x0000 4500 0179 04d3 4000 8006 43a8 ac01 0101 E..y..@...C.....
0x0010 0301 0101 04a0 0050 fad2 d78c 266a 061d .......P....&j..
0x0020 5018 faf0 ccc0 0000 4745 5420 2f6d 2f69 P.......GET./m/i
0x0030 6d61 6765 732f 6c6f 676f 5f73 6d61 6c6c mages/logo_small
0x0040 2e67 6966 2048 5454 502f 312e 310d 0a41 .gif.HTTP/1.1..A
0x0050 6363 6570 743a 202a 2f2a 0d0a 5265 6665 ccept:.*/*..Refe
0x0060 7265 723a 2068 7474 703a 2f2f 6e65 7773 rer:.http://news
0x0070 2e67 6f6f 676c 652e 636f 6d2f 6d0d 0a41 .google.com/m..A
0x0080 6363 6570 742d 4c61 6e67 7561 6765 3a20 ccept-Language:.
2 The con
fig shows that the ruledef being matched for this traf
fic to Google is HTTP.
config
active-charging service srv1
ruledef http
tcp either-port = 80
#exit
ruledef http-ports
tcp either-port = 80
Active Charging Service
rule-application routing
#exit
ruledef ip-pkts
ip any-match = TRUE
#exit
ruledef tcp-pkts
tcp any-match = TRUE
#exit
ruledef udp-pkts
udp any-match = TRUE
#exit
ruledef www-url-google
rulebase base1
default retransmissions-counted
billing-records egcdr
action priority 50 ruledef http charging-action standard
action priority 90 ruledef www-url-google charging-action test
action priority 100 ruledef tcp-pkts charging-action standard
action priority 200 ruledef ip-pkts charging-action standard
route priority 80 ruledef http-ports analyzer http
flow control-handshaking charge-to-application all-packets
egcdr threshold interval 60
egcdr threshold volume total 100000
no transport-layer-checksum verify-during-packet-inspection
#exit
3 In the ac
tive charg
ing ses
sion out
put, the traf
fic is clas
si
fied as HTTP, but not match
ing
to the "Google" ruledef.
SessMgr Instance: 6
Client-IP: 172.1.1.1
NAS-IP: 0.0.0.0
Access-NAS-IP(FA):
NAS-PORT: 0 NSAPI: 5
Acct-Session-ID: 0A01010A01AAAAAB
:
Duration: 00h:00m:34s
Active Charging Service name: srv1
Rule Base name: base1
:
Charging Updates: n/a
Res
o
lu
tion:
rulebase base1
default retransmissions-counted
billing-records egcdr
action priority 90 ruledef www-url-google charging-action test
action priority 95 ruledef http charging-action standard
action priority 100 ruledef tcp-pkts charging-action standard
action priority 200 ruledef ip-pkts charging-action standard
route priority 80 ruledef http-ports analyzer http
flow control-handshaking charge-to-application all-packets
egcdr threshold interval 60
egcdr threshold volume total 100000
no transport-layer-checksum verify-during-packet-inspection
#exit
GTPP (CDR's)
GTPP (CDR's)
GTPP/CDR Overview
Overview
CDRs are Call Detail Records which are used for of fline/post paid billing purposes. They are
sent from the charg
ing data func
tion (CDF) to the charg ing gate
way func tion (CGF) over the Ga
in
ter
face.
CDR #1
------
recordType SGSNPDPRECORD
servedIMSI xxxxxxxxxx
servedIMEI xxxxxxxxxx
ggsnAddressUsed 1.1.1.1
chargingID 932035394
accessPointNameNI internet
accessPointNameOI mnc001.mcc002.gprs
pdpType IPV4
servedPDPAddress 10.0.0.1
listOfTrafficVolumes
dataVolumeGPRSUplink 0
dataVolumeGPRSDownlink 0
changeCondition QoS Change
GTPP (CDR's)
changeTime 140715090831
dataVolumeGPRSUplink 0
dataVolumeGPRSDownlink 0
changeCondition Record Closure
changeTime 140715090901
recordOpeningTime 140715090831
duration 30
causeForRecClosing Intra SGSN Inter System Change
sgsnAddress 2.2.2.2
msNetworkCapability e5 e0
rNCUnsentDownlinkVolume 2761
routingAreaCode 11
locationAreaCode 1f1f
cellIdentifier 3001
systmeType IuUTRAN
recordSequenceNumber 226
nodeID 1000GGSN01
Management Extensions follow
Ticket Layout Version 14
subCharacteristics 8
localSequenceNumber 26333225
apnSelectionMode MS or Network Provided and Subscription Verified
servedMSISDN xxxxxxxxxxxx
chargingCharacteristics Normal
ing/billing func
tions of each prod
uct. Some prod
ucts can use other billing mech
a
nisms, or they
can com
bine sev
eral mech
a
nism.
SGSN SCDR
PDSN Radius/Mediation
PMIP-PGW Rf
HA Radius/Mediation
Image: Soft
ware Ar
chi
tec
ture
GTPP (CDR's)
CDR Generation/transmission:
The sess mgr gen erates the CDRs and sends it to aaamgr. The aaamgr sends the CDRs to
aaaproxy and fi nally aaaproxy sends the CDRs to the CGF or HDD, based on the con fig
u
ra
tion.
Once the re quest is sent from aaamgr to aaaproxy, the re quest will be stored in a buffer which
will be cleared once the aaaproxy sends ac knowledgement men tioning that CDRs are received
success
fully. For recov
ery pur
poses, the CDRs are kept at sess mgr/aaamgr and in aaaproxy.
The CDRs will be cleared only after getting a suc
cess
ful re
sponse from server/hard disk.
Role of aaamgr:
• One holds the GTPP request outstanding with AAAproxy (AAAmgr has not yet received
response from AAAproxy).
• The other buffer is to form the next GTPP request (based of max-cdr or max-pdu or
wait-time configuration, AAAmgr will pack as many CDRs as possible in one GTPP
request).
If the two buffers are full, AAAmgr will archive the re
quest. The archive buffer is com
mon for
Radius and GTPP.
Role of AAAProxy
Con
fig
u
ra
tion Guide
lines:
Nor
mally config
u
ra
tion is out
side the scope of this guide, but there are some specifics to GTPP
con
fig
u
ra
tion for which it would be good to sum ma rize here:
If it is not con
fig
ured, the con
text where ggsn-ser
vice is con
fig
ured, will be the ac
count
-
ing con
text.
Sam
ple con
fig:
apn cisco.com
gtpp group group1 accounting-context Ga
• SGSN Accounting
How ac
count ing mode is se lected:
Under call-control-pro
file con count
fig, 'ac ing mode gtpp' CLI will de
cide the ac
count
-
ing mode. De
fault value is 'account ing mode none'.
GTPP (CDR's)
Sam
ple con
fig:
call-control-profile umts
accounting context ctx gtpp group sgsn
• SGW Accounting
Sam
ple con
fig:
call-control-profile lte
accounting context ctx gtpp group sgw
GTPP (CDR's)
GTPP troubleshooting
Overview
This sec
tion talks about CLI com mands and log outputs that can be use for trou
bleshoot
ing
GTPP. Later in this chap
ter, trou
bleshoot
ing GTPP archiv
ing is also cov
ered.
Log Collections
Show Commands
• show gtpp counters all
Outstanding MCDRs: 0
Possibly Duplicate Outstanding MCDRs: 0
Archived MCDRs: 0
MCDRs buffered with AAAPROXY: 0
MCDRs buffered with AAAMGR: 0
Outstanding SCDRs: 0
Possibly Duplicate Outstanding SCDRs: 0
Archived SCDRs: 0
SCDRs buffered with AAAPROXY: 0
SCDRs buffered with AAAMGR: 0
This com
mand suma
rizes the var
i
ous CGF servers in a spe
cific con
text.
This com
mand shows sta
tis
tics about GTPP for a spe
cific gtpp group. In
ter
est
ing lines
are marked in bold.
0: 0 41..60: 0
1: 214 61..80: 0
Dis
plays sta
tis
tics and coun age-server. This is the hard disk if hard
ters for the local stor
disk support has been en abled with the gtpp stor age-server mode com mand in the
GTPP Group Con fig
u
ra
tion Mode
Dis
plays the sta ing Data Record (CDR) stored on hard disk while stream
tus of Charg ing
mode is en
abled.
AAAMgr: Instance 1
0 Total aaa acct archived 0 Current aaa acct archived
AAAMgr: Instance 2
0 Total aaa acct archived 0 Current aaa acct archived
AAAMgr: Instance 3
0 Total aaa acct archived 0 Current aaa acct archived
AAAMgr: Instance 4
0 Total aaa acct archived 0 Current aaa acct archived
AAAMgr: Instance 5
0 Total aaa acct archived 0 Current aaa acct archived
...
• dir /hd-raid
• Logging filter. This command is like a debug which can increase the granularity (by
tuning the 'level' option). In general, using level unusual is not harmful to the system.
Anything with deeper granularity has to be used with more caution.
log
ging fil
ter ac
tive fa
cil
ity gtpp level un
usual
log
ging fil
ter ac
tive fa
cil
ity aaaproxy level un
usual
log
ging active
• External PCAP may be required if GTPP is streaming to a remote CGF. This could be
required to identify performance related issues (slow CGF server, network issues), or
transactional issues (errors/intermittent issues).
• If external capture is not possible, the 'monitor protocol' option 27 can be used, but this
is typically not recommended as it it is a protocol which generates a high load.
GTPP Archiving
ASR 5000/ASR 5500 may archive CDRs for many rea sons (unable to transmit files due to IP
connec
tiv
ity is
sues, remote server is un able to receive CDRs, vari
ous miscon
fig
ura
tions, etc.).
An aaaproxy restart re solves the issue in many cases even if it is a CGF issue. For ex am ple if
a CGF is un able to ac cept a par ticu
lar type of mes sage (e.g. cancel re
quest) then after
the aaaproxy restarts, the mes sage is no longer being sent. Since restart of aaaproxy ad -
dresses the issue, it gives a false pos
i
tive as ASR 5000/ASR 5500 being the cause. Using an ex -
ter
nal PCAP to cap ture traf
fic would help iden tify the cause, which in this case would be the
CGF.
Example Scenarios
Prob
lem De
scrip
tion:
• Too many CDRs archived in ASR 5000 / ASR 5500.
Analy
sis:
• The "show gtpp counters" shows the type and counters for CDRs. The counters
show archived CDRs too.
GTPP (CDR's)
In the ex
ample below, num ber of archived GCDRs is 144015. Mul ti
ple out
puts of the “show
gtpp counters” will show if the number of archived CDRs is in
creas
ing.
The out
put below shows an on
go
ing SCDRs archiv
ing while GCDRs archive is sta
ble.
• Checking syslogs for 'gtpp 52056' warning can be used to identify the context and GTPP
group where archiving of CDRs is happening.
Below out
put shows that archiv
ing is re
ported for con
text GTPP and gtpp group de
fault.
[software internal security system critical-info syslog] [gtpp-group default] GTPP request with
Ver
i
fi
ca
tion Steps:
• Check the RTD (round trip delay) whether Charging gateway is acknowledging the
CDRs. The "show gtpp statistics verbose" shows the RTD for CGF.
• Check the transport network to determine if it has capacity to handle the traffic by the
gateway. Delay or packet drop in the network will cause CDRs to be archived in the
gateway. If the packets are dropped (resulting in re-transmission of packets from ASR
5000/ASR 5500, which slows down the CDR transmission rate), this will result in
GTPP (CDR's)
archived CDRs. This can be fixed by increasing the Transport link capacity or adding
QoS in the network.
Additional Information:
--------------------------------------------------------------------------------------
Record Type | Apn Name | Accounting Context | Group Name | Timestamp
--------------------------------------------------------------------------------------
EGCDR |wifitest |ggsn |default |Tuesday August 26
10:18:21
EGCDR |wifitest |ggsn |default |Tuesday August 26
10:23:21
EGCDR |wifitest |ggsn |default |Tuesday August 26
10:28:21
EGCDR |wifitest |ggsn |default |Tuesday August 26
10:33:22
Bulkstat and KPI
Bulkstat and KPI
Bulkstats
Overview
Bulk
stat sta
tis
tics are counters that are pe
ri
odi
cally gener
ated when they are prop erly config
-
ured. The bulkstats are typ
i
cally trans
ferred to an ex ter
nal col
lec
tion server where these coun -
ters may be fur ther processed (via a pre-de fined math e
mati
cal for
mula) or sim
ply dis
played.
The bulk
stats can also be lo
cally stored in the chas
sis as de
sired for later pro
cess
ing.
Configuration
The bulk sta
tis
tics proclet runs on the SMC card for ASR 5000 and in MIO/UMIO card for ASR
5500. At the start of a polling in
ter
val, the pro
clet will send queries to re
trieve all of the data.
Typ
i
cally, there are two steps needed for bulkstat con
fig
u
ra
tion: the config
ura
tion of the com
-
muni
ca
tion with the col
lec
tion server, and the con
fig
u
ra
tion of the bulk sta
tis
tic schemas.
The config
u
ra
tion of the com munica
tion with the collection server in
volves the de
f
i
n
i
tion of
the time in
ter
val, IP ad
dress of the server, files and other options.
After con
fig
ur
ing sup
port for bulk sta
tis
tics on the sys
tem, ver
ify the set
tings prior to sav
ing
them.
Fol
low the in
struc
tions in this sec
tion to ver
ify bulk sta
tis
tic set
tings. This com
mand needs to
be ex
e
cuted at the root prompt of the Exec mode.
The fol
low
ing is an ex
am
ple com
mand out
put:
Bulkstat and KPI
View
ing Col
lected Bulk Sta
tis
tics Data
The sys
tem provides a mech a
nism for viewing data that has been col
lected, but which has not
been trans
ferred. This data is re
ferred to as “Records await
ing trans
mission”.
View pend
ing bulk sta
tis
tics data per schema by en
ter
ing the fol
low
ing CLI:
File 1
Remote File Format: %date%%time%
File Header: "Format 4.5.3.0"
File Footer: ""
Bulkstats Receivers:
Primary: 192.168.13.196 using FTP with username root
File Statistics:
Records awaiting transmission: 1800
Bytes awaiting transmission: 163687
Total records collected: 1800
Bulkstat and KPI
Bulk Sta
tis
tics Event Log Mes
sages
The fol
lowing table dis
plays the most com
mon event in
for
ma
tion. The range of the start events
is from 31000 to 31999.
Table 1. Log
ging Events Per
tain
ing to Bulk Sta
tis
tics
Troubleshooting Bulkstats
Overview
This sec
tion cov ers how to col
lect Bulk Sta
tis
tics and pro
vides ex
am
ples of trou
bleshoot
ing
Bulk
stats is
sues.
These com
mands are is
sued from the Exec mode.
To man
u
ally ini
ti
ate the gath
er
ing of bulk sta
tis
tics out
side of the con
fig
ured sam
pling in
ter
val,
enter the fol
low
ing com
mand:
To man u
ally ini
ti
ate the trans
fer
ring of bulk sta
tis
tics prior to reach
ing the max
i
mum con
fig
-
ured stor
age limit, enter the fol
lowing command:
Clear
ing Bulk Sta
tis
tics Coun
ters and In
for
ma
tion
This in
cludes any "com
pleted" files that have not been suc
cess
fully trans
ferred.
In
cor
rect sta
tis
tics shown when a sess
mgr crashes
In
cor
rect sta
tis
tics are gen
er
ated/dis
played.
Ini
tially the Ses
sion Re
cov
ery mecha
nism was not de
signed to re
cover the sta
tis
tics in
formation
in case of a sess mgr crash, but in newer soft
ware re
leases some of the ser vice statis
tics are
been re covered.
Example Scenarios
Prob
lem De
scrip
tion :
If bulk
stat files are not being trans ferred from the ASR 5000/ASR 5500 to the col
lec
tion server
Follow these steps to iden tify the issue:
# show bulkstats
En
able the fol
low
ing logs:
# log
ging fil
ter ac
tive fa
cil
ity stats level debug
# log
ging fil
ter ac
tive fa
cil
ity sys
tem level debug
# log
ging ac
tive
Note: The above commands in production are likely to cause high CPU and
memory usage. It's recommended to use these commands during a
maintenance window.
And ver
ify if there are er
rors; dis
able the logs with:
One pos
si
ble error is de
scribed below:
Trob
leshoot
ing steps:
• Perform a PCAP capture externally to the ASR 5000/ASR 5500 and verify the reason
why the FTP connection is terminated or not established.
• Verify that the copy command works for any file (For local FTP server). This verifies
that the FTP process has no issue if it passes.
Bulkstat and KPI
• If test 1 passed, check the copy command to transfer a bulkstat file (For local FTP
server). This should verify if the bulkstat process is working if it passes.
• If WEM server is available in the network, verify that the copy command works for any
file to WEM Server. This verifies that the FTP is working fine for WEM.
• Check the disk space on FTP servers.
• If above test passed, try to transfer the bulkstat file manually, but this time using the
bulkstat gather and force commands.
• If above test passed, check if the automatic transfer works for bulkstats or not.
If still un
able to iden
tify the issue, open a Ser
vice Re
quest with Cisco and pro
vide the log
-
gings re quested in step 2, the PCAP file and two SSDs (one be fore the PCAP and one after
the PCAP).
Bulk
stat CLI shows the fol
low
ing fail
ure.
Also bulk
stat process dis
appears from name service (Nor
mally there should be bulk
stat process
in the out
put) and no
tice there was no out
put in the above command.
[local]asr5500#
Bulkstat and KPI
Since bulk
stat is miss
ing from name space, "show pro file" can
not be taken for bulk
stat process.
While tak
ing a core of bulkstat, see below stacktrace.
So
lu
tion:
How
ever, if the net
work or server side issue still ex
ists, then the issue would hap
pen again.
Bulkstat and KPI
KPIs Overview
Key Perfor
mance In di
cators (KPIs) pro
vide a gauge of the state of the ASR 5000/ASR 5500 and
re
lated ser
vices. KPIs can be defined with a single counter (for ex
ample: the num ber of at
tached
subscribers) or de
fined with a math e
mati
cal for
mula that takes into con sid
er
a
tion several fac
-
tors that are re
lated to the ASR 5000/ASR 5500 be hav
ior di
rectly or some external fac
tosr that
are not under the con
trol of the ASR 5000/ASR 5500 (ex: at
tach suc
cess-ra
tio KPI).
KPIs for the ASR 5000/ASR 5500 are cal lated based on bulk
cu stats counter vari
ables from
the ASR 5000/ASR 5500 plat
form.
Image: Method
ol
ogy
Troubleshooting KPIs
Bulkstat and KPI
Troubleshooting KPIs
There are a large num ber of KPIs and they vary from op er
a
tor to op
er
a
tor; it is there
fore not
pos
si
ble to cover all pos
si
ble KPI sce
nar
ios in this book.
To trou
bleshoot a KPI that has de
graded, fol
low this gen
eral method
ol
ogy:
1 Once the degraded KPI has beein identified:
• Verify the formula currently in use
• Collect an SSD and store it.
2 Generate a report (with MUR or other collection server) for the previous 60 days with
all the counters that are used in the formula.
• Verify which counter(s) is mis-behaving.
4 Verify if the behavior of the logs can be linked to an error (check logs, snmp traps,
crashes) seen in the SSDs.
5 If the counters are not seen in the SSD, capture outputs of "show" commands that are
related to the issue in question.
8 If the cause of the degraded KPI is still not clear, open a Service Request with all the
data collected:
• SSDs
• Logs
• Report with KPI counters
• External PCAP
• Monitor subscriber and/or monitor protocol
• A visual representation (print screen) of the KPI in question.