You are on page 1of 37

Optimizing Network Operations Through

Automated Troubleshooting

Sponsored by

Nov 21, 2019


Today’s Speakers

James Crawshaw Benedict Enweani


Senior Analyst Director of Applications and Analytics
Agenda

• Introduction
• Resolving wireline outages
• Contextualizing faults and planned changes
• The importance of telemetry and topology
• In-field results
• Future vision
• Audience Q&A

3
Introduction

• New technologies and deployment paradigms make


rapid network and service performance
troubleshooting a priority.
• Service may be impacted but monitoring system
shows everything green.
• Operators learn of problems from their customers first
before monitoring system is aware.
• Service impairments lead to customer churn and
revenue loss.
• Operators need a unified, real-time view of inventory
and topology to aid RCA and accelerate MTTR.

4
Sudden distributed degradations can be hard to
pinpoint and solve
Widespread degradations put operations under considerable pressure

?
Manual analysis Many Systems No automatic Requires SME Limited knowledge
correlation
Diagnosing outages Overwhelmed Expert Cross linking of
takes weeks and by excess of Knowledge heterogeneous
incurs high information needed networks
operational costs

A new generation of automated assurance based on


network topology offers a solution

© 2019 EXFO Inc. 20190064 5


EXFO’s Automated Common Cause
Analysis Product was Inspired by a
Telenor-EXFO Devops program
“The goal is to link network performance with network
topology to increase the efficiency of Telenor’s SOC”

“EXFO understands our ambition of empowering


societies in the digital era by delivering exceptional
connectivity and quality of experience to our
customers. Throughout our long-standing co-
development engagement, EXFO has proven to be a
trusted advisor aligned with our beliefs and strategic
partnership goals.”
Georg Svendsen
- Chief Technology Officer at Telenor Denmark
Automated
Common Cause Analysis

ACCA

© 2019 EXFO Inc. All rights reserved. 7


Operational efficiency — impaired
Mounting technical debt Increasing complexity Drowning in data Fighting fires

Only 15% of CSPs have a fully-integrated view


of network and service inventory.
ACCA uses Semantic Graph Technology to
link Topology and Telemetry
Model
Inventory &
topology
OSS
Graph database
Orchestration

Root/common
cause analysis
BSS Real-time
topology data
Service impact
analysis
Semi-structured
Real-time
data Unified view of networks
events
Modeler and services Change impact
analysis
Performance
Unstructured monitoring
Data
Fill in the
E2E path
gaps
computation


Semantic
inference engine

© 2017 EXFO Inc. All rights reserved. 9


Many CSPS can only see the symptoms of degradations

!
!

!
!

© 2018 EXFO Inc. All rights reserved. 10


! Outages - diagnosed by manual investigation
Etc.

Very hard to know there are


! correlations across service silos

!
Etc.

© 2018 EXFO Inc. All rights reserved. 11


The invisible key that seamlessly links
Dynamic topology all your network and customer data

Weather
event 60%-90%
Customer Services Contract increase in provisioning
success with automated
change planning and
cross-domain path
Fiber
cut
computation

Faster problem diagnosis with


automated assurance
Days/ hours → minutes
!
Emergency Trouble Planned
upgrade tickets work

© 2019 EXFO Inc. All rights reserved. 12


Accuracy of the topology helps
E.g., at TM, a mobile inventory application shows
Ontology’s achievement of well over 90% accuracy
Topology links faults, TT’s with services

© 2019 EXFO Inc. All rights reserved. 14


Topology links planned changes with services

© 2019 EXFO Inc. All rights reserved. 15


Symptom set
A dependency topology gives insight
!

Potential
! cause set

© 2018 EXFO Inc. All rights reserved. 16


Symptom set
Telemetry + Topology gives the Result
!
KPIs

KPIs
KPIs
!

KPIs
Potential
! cause set

© 2018 EXFO Inc. All rights reserved. 17


With ACCA, SOCs can
radically improve
operational automation:

Deploy automated, Existing systems


collect RT
EXFO
RT- analysis
end-to-end troubleshooting telemetry triggers CCA

Identify the causes of


outages without human
intervention API enriches the
FM/PM system and EXFO Ontology
case is logged for + FM/PM/config
DEVOPS teams data create a ranked
Cut troubleshooting from historical cause
improvement set
days to minutes through (Optional) EXFO
automation validates root cause
with active testing or
device interrogation

© 2017 EXFO Inc. All rights reserved. 18


Simultaneous anomaly on multiple PE interfaces
A service interruption was seen on multiple, geographically diverse sites affecting
hundreds of enterprise users
Traffic down on
multiple
interfaces

© 2019 EXFO Inc. 19


ACCA automatically created a case

© 2019 EXFO Inc. 20190064 20


ACCA identified the fibers causing the outage

© 2019 EXFO Inc. 20190064 21


Map visualization also gave SOC insight
(simulated)

© 2019 EXFO Inc. 20190064 22


ACCA tracked the growing case to resolution

© 2019 EXFO Inc. 20190064 23


ACCA shows the resulting reroute & redesign

© 2019 EXFO Inc. All rights reserved. 24


ACCA tracked the growing case to resolution

© 2019 EXFO Inc. 20190064 25


Map visualization also gave SOC insight
(simulated)

© 2019 EXFO Inc. 20190064 26


ACCA identified the fibers causing the outage

© 2019 EXFO Inc. 20190064 27


ACCA automatically created a case

© 2019 EXFO Inc. 20190064 28


ACCA identified the fibers causing the outage

© 2019 EXFO Inc. 20190064 29


Map visualization also gave SOC insight
(simulated)

© 2019 EXFO Inc. 20190064 30


ACCA tracked the growing case to resolution

© 2019 EXFO Inc. 20190064 31


ACCA shows the resulting reroute & redesign

© 2019 EXFO Inc. All rights reserved. 32


Typical Timeline – ACCA typically sees 2-3
outages and many more planned changes per day

ACCA is now GA and 6 symptoms 42 symptoms 12 symptoms 192 symptoms


helping to solve many
Router Faulty card Third party fiber
types of issues 24/7 damaged by on a router causing anomalies
Misconfigured
router identified
lightning strike identified identified
identified
ACCA determines the
root cause and Number of days

calculates service 1 6 9 12 21 23

Impact Analysis

Avg MTTD reduced to 5 4 symptoms 8 symptoms 480 symptoms


minutes from 3.5 hours ACCA provides ACCA provides Intermittent
oversight and oversight and router failure
surveillance surveillance detected despite
MTTR potentially outage caused outage caused no alarms being
reduced by > 50% by planned work
on a fiber
by planned work
on a fiber
raised

© 2019 EXFO Inc. 20190064 33


Future vision
Extended machine learning

More powerful ML to extend heuristic


classification

Additional verification

We are adding a verification capability using


active tests to 100% confirm ACCA results

Input additional KPIs to detect more ?


SA
failure modes OSS

Latency and service quality measured from


active tests look very promising. XL NMS
BSS
CRM
© 2019 EXFO Inc. All rights reserved. 34
AUDIENCE Q&A

James Crawshaw Q&A Benedict Enweani


Director of Applications and Analytics
Senior Analyst
Heavy Reading
Thank you!

You might also like