DSATA

Learning, Indexing, and Diagnosing Network Faults
Prepared by Rawa Abdulla Aziz Supervised by D.Aso
Complex Networks
Network as a graph Vertices represent network entities Edges represent interactions between network entities Fault cascading in communication networks Information spread (e.g., via emails) in social networks Infection propagation in protein interaction networks Key challenge is to detect and understand emerging global phenomena
Network Monitoring Data

Networks generate massive monitoring data Monitored data consists of local (in both space & time) observations on the network Monitored data is incomplete and sometimes even erroneous (e.g.,out-oforder wrt to both time and causality, etc)
Examples
Ping failure, interface down, high CPU utilization, etc. in communication networks Email threads (time stamp, tokenized subject, MIME type, etc.) between members in a organizational hierarchy
Pathological symptoms in biological networks protein interaction networks (PINs)

Key observation: monitoring data gathered from network entities are correlated through the network topology
Network Patterns
Network patterns attempt to efficiently capture spatial (topological) and temporal correlations in monitored data
Key challenges
Understand the semantics of network patterns Identify domain-specific network patterns (e.g., fault diagnosis & prediction in IT systems, information spread and access control on social networks, disease propagation in protein networks, etc) How to learn and represent network patterns? How to scalably match network patterns against an online stream of network events?
Network Patterns
Notation
Event data: <nodeId, type, timestamp, monitorId> Network Pattern: <event types, spatial pattern, temporal pattern> INTERFACE DOWN <LINK DOWN, NEIGHBOR, TIME WINDOW>
Temporal Pattern
E.g.: frequent item sets
Spatial Pattern: Composition/Closures of one or more topological relationships

Communication networks: upstream, downstream, neighbor, tunnel Social networks: manages, friends, team members Biological network: catalyst, suppressor
Fault Diagnosis and Prediction in Communication Networks

Challenges: improve scalability of fault-diagnosis Limitation of current solutions: a complexity that grows as square of the network size
Correlation rules are pair-wise: expensive to support complex fault diagnosis (e.g., predicting soft failures, router failure from VRF tunnel events, etc)
Lacks predictive capability Approach:
Fault signatures encode temporal patterns: frequent item sets,; and topological patterns upstream, downstream, neighbors, VPN tunnels, etc
Topological
Topology Topologically index streaming monitoring data to facilitate scalable single-pass event correlation Index and fault-diagnosis
Results in linear complexity increased scalability

Correlation Engine (ITNM RCA)
Fault diagnosis
Monitoring Data (Omnibus)
Pair-wise correlation rules
Fault Signatures (Network Patterns)
Step 1: Learning Network Faults

Learn fault signatures from historical network event data Fault Synopsis: Fault Type Network Pattern Fault Signature: Network Pattern <Fault Type> Fault Diagnosis: <Spatial Pattern to Localize Faulty Node, Network Topology> Faulty Node Fault Prediction: Use incrementally match able network patterns
Use index able network patterns

Topological relationships are invertible: neighbor-1 = neighbor, downstream-1 = upstream
Step 2: Online Matching

Fault localization using topological indices and hierarchical evidence aggregation
Topology indexing algorithms + space-time trade off in computing R(x) and R-1(x)
R {upstream, downstream, neighbor, tunnel, }
Scalable hierarchical evidence aggregation for efficient fault diagnosis
f1
f2
bf
f3 ...
bf bf
...
fn-1
fn
bf
...
bf
...
bf
n2
n1
Details
Preparation of training data Extract temporal patterns Extract topological patterns Fault Signatures OFFLINE LEARNING
Network Topology ONLINE MATCHING Match temporal patterns Fault Signatures Evidences: <f, v, Rv> Indexed network topology Scalable Evidence Aggregation
Fault Diagnosis and Prediction
Event Stream
Network Topology
fault management
There are two primary ways to perform fault management - these are active and passive. Passive fault management is done by collecting alarms from devices (normally via SNMP) when something happens in the devices Active fault management addresses this issue by actively monitoring devices via tools such as ping to determine if the device is active and responding.
Fault management includes any tools or procedure for diagnosing testing or repairing the network when a failure occurs.
Summary
Network patterns encode spatial-temporal properties of various networks Ability to scalably mine and match network patterns is key for understanding global network phenomena Case study on fault diagnosis and prediction in communication networks Complexity of solution has to be linear in network size Topologically indexed databases was a key tool for addressing scalability Explore more complex network patterns for information, social and biological networks which exhibit stronger coupling relationships A failed router does not cause its neighboring router to fail A corrupt information node can corrupt its neighbor (e.g., summary node) A diseased enzyme can catalyze/inhibit its neighbors

DSATA

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

DSATA

Uploaded by

Copyright:

Available Formats

Learning, Indexing, and Diagnosing Network Faults

Prepared by Rawa Abdulla Aziz Supervised by D.Aso

Network Monitoring Data

Pathological symptoms in biological networks protein interaction networks (PINs)

Spatial Pattern: Composition/Closures of one or more topological relationships

Fault Diagnosis and Prediction in Communication Networks

Results in linear complexity increased scalability

Monitoring Data (Omnibus)

Pair-wise correlation rules

Fault Signatures (Network Patterns)

Step 1: Learning Network Faults

Use index able network patterns

Step 2: Online Matching

Scalable hierarchical evidence aggregation for efficient fault diagnosis

Fault Diagnosis and Prediction

You might also like