You are on page 1of 8

Alam Correlation

Correlating multiple network alarms improves


telecommunications network surveillance and fault management.
.......m..m
Gabriel Jakobson and Mark D.Weissman

odern telecommunication networks al works. The aspects of time and space correla-
may produce thousands of alarms tion of network events in t h e network t r o u -
perday. makingthe taskofreal-time bleshooting domain were discussed in [2], where
networksurvcillance and fault man- a knowledge-based approach was developed that
agement difficult. Due to the large dcscribed NEs and network events as knowledge-
volume ofalarms, network operators frequently over- base entities. The conceptual approach to alarm cor-
look or misinterpret them. To reduce the number relation was discussed in (31, A structural-phrase
of alarms displayed on operators‘ termin,‘I 1.s.current grammar-based approach to describe network
network management systems apply alarm filter- connectivity and alarm correlation conditions was
ing procedures or. in the case of bursts of alarms. introduced in [4].An alarm correlation model
send them directly to a printer or database. w a s proposed in [SI.where alarms caused by a
I n this article, we will consider a relatively new single common fault were considered. Interpreta-
process of real-time network management. alarm tion and correlation of events has been analyzed
correlation. Alarm correlation is aconceptu,‘1 I tnter- ’ i n other areas. such as electric power systems [6],
pretation of multiple alarms such that ;I new nuclear-power-plant alarm management [7], and
meaning is assigned to thesc alarms. I t is a gcner- patient-care monitoring.
ic process that underlies differcnt network man- In the network management area, several ven-
agement tasks such as context-dependent alarm dors have incorporated expert systems into theirplat-
fi 1t e ri ng, a I arm genera I iza t i on. n e t\v o r k fa u 1 t forms to support alarm correlation capabilities.
diagnosis, generation of corrective actions. proac- NMS/CoreT” from Teknekron Communications
tive maintenance, and network behavior trend anal- Systems [8] includes programs to perform alarm
ysis. filtering andcorrelation functions.The Sinergiasys-
T h e goal of this article is twofold: first. t o tem from CSELT. Italy [9]. first uses expert sys-
GABRIEL JAKOBSON b rr introduce an alarm correlation modcl and sec- tem rules t o recognize alarm correlation patterns
pnricrpul tnmrher. of rechnicirl ond, to describe the intelligent management plat- and instantiate network fault hypotheses, and
stuff (11 GTE Lahoruroi?es. form for alarm correlation tasks ( I M P A C T ) . then applies heuristic search to determine the
which implements the proposed model. Our approach best solution among the hypotheses. ALLINKTM
to alarm correlation is based on the principles of Operations Coordinator from NYNEX [ 101 uses
model-based reasoning ( M B R ) [ I ] . As in MBR. an expert system to filter network alarms.
we will define two basic components of the over- The rest of the article is organized as follows.
all alarm correlation model: the structural c o n - The following section describes the basic notions
ponent, which describes the network elements (NEs) associated with alarm correlation, and the section
NetAlertl:Mis U traricvnark and their connectivity and containment relations; after that discusses the conceptual framework of
of GTE TeleconimLinicrr- and the behavioral component, which descrihes alarm correlation. Next. we describe the struc-
tion Ser-cice.y. alarms and correlation. tural component of the alarm correlation model, and
T h e prototype of the I M P A C T system has then the behavioral component. An overview of
ALLINK’ is a trade- been developed at GTE Laboratories. It pro\ ides the IMPACT system is given, and conclusions
mark of NYNEX Corpo- an intelligent environment for developing alarm and future work are discussed.
ration. correlation applications, and for real-time alarm
monitoring. I M P A C T has been uscd at G T E
ARTIMTIIis a trademark business units to build two network alarm corre- Basic Notions of the Alarm
of Inference Corporatioil. lation applications: AMES, for a land-based tclecom- Correlation Domain
munication network: and CORAL. for a cellular
NMSlCoru7:b1is LI tratle-
mark of Teknekron Com-
munications Systems.
network.
Alarm correlation. a s a subject of research and
system development, has been discussed in scver-
I n this section, we will give a short informal
review of basic notions that we will use to explain
the alarm correlation domain and its applications.
Faults and Alarms
A fault is a disorder occurring in the hardware or
software of the managed network. Faults happen
within the managednetworkor itscomponents.while
alarms are external manifestations offaults. Alarms
defined byvendors and generated by network equip-
ment are observable by network operators. We
areconsidering only alarms mediated by alarm mes-
sages. Similar alarm messages with different time
stamps are separate alarms. Faults can be causal-
ly related, thus forming an acyclic fault propaga-
tion graph, or independent (causally unrelated). H Figure 1. Facilih dirconnect
Externalobservation of alarms may instill an impres-
sion that one alarm causes another. However. the
causality is not between alarms, but rather between
faults.
Alarm Correlation
Alarm correlation is a conceptual interpretation
of multiple alarms such that new meanings are c2
assigned to these alarms. It is a generic process
that underlies different network management tasks:
Compression: the reduction of multiple occur-
rences of an alarm into a single alarm.
Count: the substitution of a specified number
of occurrences of alarms with a new alarm.
Suppression: inhibitinga low-priority alarm in the
presence of a higher-priority alarm.
Boolean: substitution of a set of alarms satislly-
ing a Boolean pattern with a new alarm.
Generalization: reference toanalarm by itssuperclass.
Alarm correlation may be used for network
fault isolation and diagnosis, selecting corrective
actions, proactive maintenance, and trend analysis. H Figure 2. (a)Conrlrrtiori o f causally dtpetiderit alanns; (b)and (c) correlu-
To illustrate the use of alarm correlation. we tiori of cuitsally iti&ptvi&tit alarms.
will give anexample basedon actual events that hap-
pened on a private telecommunication network. until it expires o r is externally cleared. Corrcla-
Because of an administrative error at a primary tions may he subsumed by higher-level correlations.
network control center, a circuit disconnect order The alarm correlation model introduced in thisarti-
was incorrectly sent to a common carrier. hut cle distinguishes hetwcen corrclations and c o w -
soon after withdrawn. An additional error by the lation rules [ 1 I ] . A correlation is a statement about
common carrier led to the disconnect order being a e n t s happening on the network; for example. Bad-
carried out despite the cancellation.This meant that Card-Correlation states that some port contains a
alivecircuitwasdisconnected,causingacatastrophic faulty port card. A correlation rule defines thc
failure on a major DS3 link between city A and conditions under which correlations are asserted.
city B (Fig. 1). A normal facility disconnect. when Forexample, ifthcre isa redcarriergroupalarm (CGA)
performed by network operations personnel, invokes from one DCS. and a Yellow-CGAfrom another. and
automatic loopback conditions o n digital cross- these DCSs are connected. then Bad-Card-Corre-
connect systems (DCSs) at both ends of the cir- Iation will be asserted. The conditional part of
cuit. Since thisisanormal DCS behavior, the loopback the rule may contain a complex Boolean pattern rcc-
conditions a r e not reported. T h e packet and ognizing alarms. NEs. and correlations, as well a s
voice switches having logical trunks over the dis- structural. temporal and other relations.
connected circuit sent large volumes of call pro-
cessing failure messages to the primary network Fault Diagnosis
control center. The operators puzzled for an hour One of the major applications of alarm correla-
before they realized what had happened. T h e tion is network fault diagnosis. N o t all faults
task at hand was to correlate the call-processing exhibit alarms. These faults can be recognized
alarms from the switches with the absence of indirectly by correlating available alarms. Figure
alarms from the DCSs, and recognize that the 2a illustrates this, showing that correlation c 1 detects
trunk was actually disconnected. This was compli- the fault.fl. and correlation c? detects the fault
cated by the incorrect record in the database ,f2. Correlatingcl andc3into thecorrelationcOallows
showing that the circuit was live. diagnosis of the fault /U. Correlation between alarms
Subjectsforcorrelationcould be any events affcct- due to a common fault is a transitive. reflexive.
ing the network. These may be environmental- and symmetric relation (i.e.. an equivalence relation.
stat e p a r a m e t e r s, the ne two r k man age In c n t its noted in [5]). If a single alarm is a manifesta-
context, or events invoked by the user or external tion of multiple faults, this relation may not hold.
systems. Correlations are defined over a time For example. if alarm a (Fig. 7b) is caused by
interval o r window. When a situation is recog- fault fl orfaultp. but not both (anexclusive ORcon-
nized and a correlation asserted, it remains active dition). then correlations c.1 and e? arc formed

I E t E Network Ncnember 190.7 53


by disregarding the value of the “type” parame-
ter. This generalization process may utilize alarm
class/subclass hierarchies, which may b e built
along arbitrary coordinates. An example of an alarm
message class hierarchy is discussed later.
The second is interpretation of simultaneous
events or events happening within a predefined time
interval asa qualitatively new complexsituation. The
events maybe causally related or independent. Dur-
ing this interpretation process no faults are deter-
mined, but a more abstract specification of events
is constructed.

The Conceptual Framework of


Alarm Correlation

I n this section the overall conceptual framework of


our approach to alarm correlation is discussed. As
mentioned earlier, we follow the principlesof MBR,
with acommon component alarm, and consequently originally used for the modeling of intelligent sys-
the correlation relation is not transitive. If alarm tems. The conceptual framework of alarm corre-
U (Fig. 2c) is caused by both faultsfl andf2 (an lation contains t h e structural and behavioral
A N D condition), correct diagnosis remains components (Fig. 3 ) .
ambiguous. This may indicate a common primary The structural component is the description of
fault, or independent faults causingfl andf2. In order t h e managed network. It contains two major
to disambiguate these two cases, additional infor- parts, the network configuration model and the
mation is required. network-element class hierarchy. The network
configuration model describes the NEs (managed
Alarm Generalization objects) and the connectivity and containment rela-
Alarm generalization is potentially very useful for tions between them. The network-element class hier-
network management. It allows one t o deviate archy describes t h e N E types a n d t h e class/
from a microscopic perspective of network events subclass relationships between the types. Each
and view situations from a higher level. There are NE in the networkconfiguration model is an instance
two ways alarm generalization may be performed. of a terminal N E class from the network-element
The first is subsumption of lower-level alarm class- class hierarchy.
es by a higher-level class. A C G A type “ R e d ” T h e behavioral c o m p o n e n t describes t h e
(CGA-Red) may be generalized to alarm class CGA dynamics of alarm correlation. It contains three major

-. . ._. . .

SWITCH-CLASS
__I - --.
. . .. . . ..

W Figure 4. DCS class ROCKWELL-DEXCS and instance LOS-ANGELES-DEXCS.

54 IEEE Network November 1993


-__ - - ~ ~~
“ W D .

The
behavior a1
component
contains
three major
components:
the message
Figure 5. Message class CARRIER-GROUP-ALARM and a sample message class hierarchy.
class
components: the message class hierarchy, the cor-
relation class hierarchy, and correlation rules.
Message Class refers to BASIC-DEXCS-MES-
SAGE, which is the root node of the associated mes-
hierarchy,
The message class hierarchy describes the messages sage class hierarchy. T h e Connected Filter
generated by NEs. The message class hierarchy is specifies that ROCKWELL-DEXCS may only be the correla-
used to control the alarm message-parsing pro- connected to a digital crossconnect or a switch.With-
cess. This process is described in more detail in in Filter is used to specify that ROCKWELL-DEXCS tion class
[ 121. The correlation classes and correlation rules can be placed within a building o r a network
will be described later.
The NE classes,message classes,correlation class-
operations center, while Contains Filter specifies
that only physical and logical ports may be contained
hierarchy
es, and correlation rules are organized into hier- within.
archies. T h e s e hierarchies a r e related by The NE class hierarchy is an abstraction of and corre a-
“producer/consumer” dependencies. NEs are physical NEs. The terminal nodes describe partic-
“producers” of alarm messages, messages “produce”
correlations, and rules are “consumers” of all the
ular NE types produced by manufacturers. Spe-
cific digital crossconnect products, such as AT&T’s
tion rules
above. The “producer/consumer” dependencies are DACS I1 or Rockwell’s RDX-370, are terminal nodes
used by IMPACT during the application develop- of the superclass digital-cross-connect-class.The NE
ment process. These dependencies, alongwith other class hierarchy is specific to an application. It may
domain-oriented constraints, are used to support be modified by adding, deleting, or editing exist-
correctness, completeness, and consistency of the ing classes.The upper levelsof the hierarchy are gen-
knowledge base, and to guide the user through eral and are therefore reusable across applications.
the application development process. The “pro-
ducer/consumer” dependency restricts the user from Network ConfigurationModel
deleting an N E class from the knowledge base The network configuration model is constructed
while message classes still refer to it. from the instances of individual NEs. NE instances
describe the actual physical o r logical compo-
nents of the managed network. The instances are
The Structural Component specified by instantiating terminal NE classes and
Network Element Class Hierarchy connecting them according to the network config-
N E classes describe network equipment types, uration. This process may be performed by the
such as switches, digital cross-connects and multi- network operating staff using the IMPACT Network
plexers. NE classes are organized into a hierarchy Element Editor. Constraints defined in the class
using class/subclass relations. T h e root of the specification will be enforced. The user cannot make
hierarchy is a GENERIC-NE-CLASS, which con- connections that violate the physical behavior of
tains the most general information common to all the connected elements, or leave required values
NEs. The next level of the hierarchy describes the unspecified. Network Element Editor in Fig. 4
basic NE classes, such as trunk-class, transmis- describes LOS-ANGELES-DEXCS, which is an
sion-interface-class, switch-class,building-class, and instance of ROCKWELL-DEXCS. It is installed
others. Each of these classes refers to its own sub- at a Los Angeles network operations center, con-
hierarchy; for example, the trunk-class refers to nected to a DCS in Sacramento, and contains
the logical-trunk-class and physical-trunk-class, and four physical ports.
the physical-trunk-class to the super-link-class,
T1-trunk-class, and T3-trunk-class. Each subclass
inherits parameters, values, attributes, and con-
The Behavioral Component
straints from its superclasses. IMPACT permits mul- Message Class Hierarchy
tiple inheritance; that is, a class might have more All alarm messages produced by a specific NE
than one superclass. a r e organized into a message class hierarchy
Network Class Editor, in Fig. 4, describes ROCK- using the class/subclass relation. Introduction of
WELL-DEXCS, which is a subclass of the gener- message classes simplifies the decision-making pro-
ic digital cross-connect class DEXCS-CLASS. cess of network management. Let us suppose

IEEE Network November 1993 55


..... action X should be taken when one of the digital
crossconnect alarmsappears: CGA-Red, CGA-Blue,
The superclass of CARRIER-GROUP-ALARM is
DS1-MESSAGE,and it has four subclasses: DEXCS-
A correla- or CGA-Yellow. This situation could be present-
ed by the following rule:
CG-AAIS, DEXCS-CGA-BLUE, DEXCS-CGA-
RED, and DEXCS-CGA-YELLOW. A fragment
tion class IF CGA-Red
of the input alarm message text is stored in the
slot T E X T a n d matched against the Pattern
OR CGA-Yellow String. After successfully matching the pattern,
is a OR CGA-Blue the value of the first expression is assigned to the slot
THEN Action X DC, and the value of the second expression is assigned
generalized The introduction of CarrierGroupAlarm as a
to the slot FAILURE. These slots may be used by
subclasses for further pattern constraints.
superclass of CGA-Red, CGA-Yellow, and CGA-
description Blue allows us to write a simpler rule: Correlation Class Hierarchy
Acorrelation class is ageneralizeddescriptionof the
of the state IF Carrier-Group-Alarm state of the network based on interpretation of
THEN Action X network events. The conditions under which the
of the A partial message class hierarchy, which corre-
correlations are asserted are described in the cor-
relation rules. Each assertion creates an instance
sponds to the alarm messages of a DCS, is shown of a correlation class.
network in the Graph Editor Window in Fig. 5. Each message A correlation class contains components, a
class in the hierarchy contains a message-parsing message template, and parameters (slots).The com-
based on pattern and a translation schema, common to a
subset of all messages that belong to this class. A
ponents may be NEs, alarm messages, or other
correlations. Correlation components are used to
trace from the root node to some class node n in pass informationfrom a correlation rule to the assert-
interpreta- the hierarchy determines a sequence of patterns ed correlation. Parametersprovide information about
t o be recognized by the parsing algorithm to a correlation to higher-level correlations, of
tion of detect whether incoming messages belong to the which it may be a component. Correlation BAD-
message class determined by the node n. The CARD-CORRELATION, described in Fig. 6,
network translation schema in the message class deter-
mines how vendor codes for this NE can be nor-
contains two c o m p o n e n t s , a DCS, D E X C S -
CLASS, and a physical port, PHYSICAL-PORT-
malized to a common form, or made more readable CLASS. During assertion, a correlation rule
events. to the network operator. assigns values to the CLLI (a universal code,
The Message Class Editor in Fig. 5 describes which identifies the location of the equipment)
the message class CARRIER-GROUP-ALARM. and PORT-NUMBER slots. These values are

Figure 6. BAD-CARD-CORRELATION
and BAD-CARD-CORRELATION-RULE-I.

56 IEEE Network November 1993


- ~ ~ ~~
- _ _ _ _
used by the message template and asserted into
t h e D E X C S - I D a n d P O R T - N U M B E R slots.
Variable names are identified by a leading ques-
tion mark.
Correlation Rules
Correlation rules recognize events and assert or
clear correlations. Different correlation rules
may assert or clear the same type of correlation. The
conditional part of a rule is a Boolean pattern
built upon primary terms and relations. The primary
terms are messages, NEs, correlations, and tests.
The following relations are used: COUNT, CON-
TAINS, WITHIN, CONNECTED, arithmetic rela-
tions, and temporal relations. COUNTcounts similar
events and compares the count with a predefined
threshold. The counted events may be primary
alarms, correlations, or complex Boolean expres- Figure 7. IMPACT architecture.
sions. CONTAINS and WITHIN refer t o struc-
tural containment, while CONNECTED denotes
NE connectivity.
The action part of the rule contains executable
commands, such as the assertion and clearing of
correlations. A simplified version of BAD-
C A R D - C O R R E L A T I O N - R U L E 1 is given in Graphical user interface I
Fig. 6. Time is an important correlation criterion.
Correlations are determined on a fixed-length
time interval. The correlation time interval may
be absolute or relative. In the latter case, the time
interval is considered to be a dynamic window in
which alarm correlation is performed continuous-
ly. This correlation rule states: if physical ports
?near-port and ’?far-port belong to two DCSs,
respectively, ?near-DEXCS and ?far-DEXCS,
and these ports are connected by a T1 trunk, and
Yellow Carrier Group Alarm ?yellow-msgis report-
Alarm correlation engine I
ed from ?far-port, and Red Carrier Group Alarm
?red-msg is reported from ?near-port, then assert
BAD- CARD-CORRELATION. After matching W Figure 8. Application run-time environment.
the rule conditions, ?near-DEXCS and ?far-DEXCS
are bound to particular NEs. These NEs are pro- environment and application run-time environment
vided as components t o BAD-CARD-CORRE- (Fig. 7). The application development environment
LATION. supports knowledge acquisition, editing, browsing
and display tools so that the network operations staff
can create and maintain the network knowledge base
IMPACT System Description in an efficient and safe manner. The application run-
Architecture time environment provides IMPACT’S function-
There are several requirements that underlie ality to parse incoming messages, perform alarm
IMPACT design and implementation: correlation procedures, generate system actions, and
Real-time performance. provide interfaces for the network operations
Dedication to network management tasks. staff. All this functionality is supported by the
Effective representation of network and corre- network knowledge base, which contains the
lation knowledge. structural network configuration and dynamic alarm
User-oriented application development envi- correlation models.
ronment.
High-level graphic user interface (CUI) idiosyn- Application Run-Time Environment
cratic to network management. The application run-time environment monitors
In the current implementation IMPACT works the networkeventsin real time, correlatesalarms, and
together with NetAlert, a real-time network man- responds to operator commands. In addition to those
agement system from GTE Telecommunication Ser- functions, it provides information on network sta-
vices [13] that performs primary d a t a access, tus, explanations, and help. The application run-time
collection, and preprocessing functions, such as environment consists of four major modules: the
demarcation of the beginning of each message, GUI, commandimessage processor, action pro-
providing message d a t e , time, and location cessor, and alarm correlation engine (Fig. 8).
stamps, and performing basic event-filtering func- The commandimessage processor takes incom-
tions. NetAlert also supports International Orga- ing alarm messages, analyzes them, and turns
nization for Standardization (1SO)style event them into objects. I t also processes the com-
reporting and logging. mands coming from the user. The alarm correla-
IMPACT’S environment could be divided into tion engine is a rulebased system, which reasons about
two major parts: the application development the messages andgeneratescorrelations. The action

IEEE Network November lY9.i 57


-
Figure 9. Map window, message/correlation display window, and message window.

explanations by clicking the active correlation


icon on the screen. Clicking the correlation icon
opens the correlation display window, which con-
tains a hypertextual description of the correla-
I 4 Network coifiauratron tools Alarm co;relation tools Network arHohics tools
I
1 tion. The component alarm messages, NEs, and
subcorrelations are highlighted as hot areas of
t h e hypertext. A mouse gesture o n these hot
areas invokes a n editor describing t h e corre-
sponding object. The operator can select any visi-
ble object on the screen and instantly examine it.
Network The application run-time environment uses the
network knowledge base created by the applica-
tion development environment. The network knowl-
edge base contains correlation classes, correlation
rules, NE classes, NEs, and message classes. The net-
work knowledge base also stores network config-
uration models, graphical objects for network
visualization, correlation icons, and procedural scripts
to be executed by the action processor.

processor performs the functions determined by the Application Development Environment


correlation rules, such as displaying correlation mes- The application development environment provides
sages, performing diagnostic procedures, storing powerful tools for building the network knowl-
data in a database, or executing external procedures. edge base. The core of the environment consists
The command/message processor implements a of eight editors, which are grouped into three sets
novel approach to message processing based on mes- of tools: network configuration tools, alarm cor-
sage class hierarchies. The essence of this method relation tools, and network graphics tools (Fig.
is to have a universal message-parsing procedure, 10). There are several features that make these
which can be tuned t o parse messages from dif- editors specific t o the task of building the net-
ferent classes of NEs using associated message work knowledge base.
class hierarchies. First, the design principles of the editors are based
The GUI of the application run-time environment on the general alarm correlation framework dis-
provides the network operator several windows in cussed earlier. The producer/consumer relationships
which t o perform the tasks of network surveil- of t h e framework a r e enforced by t h e editors.
lance and fault management (Fig. 9). The map Second, tight integration between the editors allows
window displays the managed network, and two simultaneous editing of conceptuallyrelated knowl-
bad card correlation icons. The references to the cor- edge structures. Wherever a class or object is pre-
responding messages and correlations may be sented, either as text or iconically, a menu of common
seen in the messageicorrelation display window. The functions associated with that class or object is avail-
BAD-CARD faults happened on ports #005 and able. These menus offer choices, such as t o dis-
#007 in the Los Angeles DEXCS LSANCAASF. play that entity or access information about its
The message window displays the full text of the relationship to others in the alarm correlationframe-
CGARed alarm message selected from the mes- work. Third, the editors apply telecommunica-
sage/correlation display window. tion-domain knowledge by validating the correctness
The operator can clear correlations or ask for and completeness of entered data. If a physical port

58 IEEE Network November 1993


- ~~
-~
correlation could be defined for managing logical m....
(virtual) networks overlaid on physical networks,
or correlating network servicefaults to physical faults. The
Future enhancements of IMPACT will include on-
s e e n graphical editing of the network. This capability
will allow construction of the network configuration
proposed
using graphical objects and generation of the cor-
responding data structures. The describedevent cor- alarm
relation model was a strictly deterministic model. It
is possible to introduce event-likelihood measure-
ments and operations over the likelihood func-
correlation
tions so that multiple inexact (fuzzy) correlations
could be ordered according to a particular context. model was
Figure 11. IMPACT implementation. Acknowledgments used for
Severalpeople took part in IMPACTSdevelopment
may only be connected to a T1 trunk, then only such
trunks are offered to the user. Finally, all editors have
duringdifferent stagesoftheproject. MingTan devel-
oped the initial version of the map editor, which
three
a common look and feel, and express the idiosyn- was later enhancedby Alanlemmon. Alan also devel-
crasies of the network management domain. oped the graph editor and the cellular network alarm purposes:
Network configuration tools contain two edi- correlation application. R o b e r t Weihmayer
tors, the network class editor and the network
element editor (Fig. 4). Alarm correlation tools con-
helped us to understand the telecommunication
domain, and developed the initial network con-
intelligent
tain the correlation class editor, rule editor ( Fig. figuration knowledge base. Fred Atwater tested
6), and message class editor (Fig. 5). the system and helped to discover many system bugs. alarm
The network graphicstoolsconsist of two editors, Finally, we would like t o thank Shri Goyal for
the map editor and the graph editor. The map constant e n c o u r a g e m e n t a n d suggestions on filtering,
editor shows a graphical image of the network organization and content of the article.
correspondingto the network object representation.
The graph editor displays object/classhierarchies of References
alarm
the network knowledge base (Fig. 5). [ l l R. Davis, H. Shrobe. a n d W. Hamscher, "Diagnosis Based on
Description of Structure a n d Function," Proc. 1982 Nat'l. Conf. generaliza-
IMPACT Implementation Artificial Intelligence, Pittsburgh, Pa., pp.137142. 1982.
[21 R. Mathone, H. Van Cotthem, and L. Vamyckeghem, "DANTES:An
The IMPACTimplementation (seeFig. 11)is based
on the ART-IM expert system shell [14]. IMPACT
Expert System for Real-Time Network Troubleshooting," Proc.
10th IIcA14, Milan, Italy. pp. 527-530. Aug. 1987.
tion, and
1310.Alonietal.. 'F'erformance Analpisof a n AlarmFdteIing m rtSys-
uses the A R T I M forward-rule-chaining algo-
rithm as a natural match for the eventdriven pro-
tem," WorldCong. Exp. Sys.,vol. 4.. pp. 23462354, 1991.
[4lA &ulatcr,S.&lo, and A FmkeL ''AlmmCadath and Fault Manage
mentinCommuniccrtionNetworlcs."IBMRes.Rep. no.17967,May 1992.
fault
cessing of a l a r m correlation, a n d the R E T E [51 I. Jordaan and M. Paterok, 'Event Correlation in Heterogeneous Net-
algorithm for fast pattern matching. Objects works UsingtheOSI Management Framework,"Prcc. 3rdlnt'l. Symp.
IntesroredN~rkMgmt.,ScrnFrrmsisco,Calif..pp.683-696,1993.
diagnosis.
such as message classes, NE classes, NE instances [61M. PfauWagenbauer and W. Nejdl, "Integrating Model-Based and
and correlations are programmed as ART-IM HeuristicFeaturesinaReal-TimeExpert Systems,"E€EFapti, Intel-
schemas. A significant part of the system is pro- ligentsys. a n d TheirApplications, vol. 8. no. 4, pp. 1218, 1993.
[71 J. Rellano et al., "GENESIS: An Expert System Shell for the Devel-
grammed in C. The G U I and network graphics opment of Symptom Pattem Recognition Expert Systems," World
are developed in Tcl/Tk [15], a toolkit for build- Cong.Exp. Sys..vol. 3, pp. 1541-1549.1991.
[El W. H. Caplinger, "Objectoriented Technology in Practical Network
ing windowing applications. Management Systems," Wescon '91.
Real-time performance is central to network [91 S.B ~ g n o neti al., "An Expert System for Real-Time Fault Diagnosis
of the Italian Telecommunication Network," Proc.3rd Int'l. Symp.
surveillance and fault management. Real-time IntegratedNetwarkMgmt.. SanFmnsiscu,Calif.,pp.617-628. 1993.
networkmanagement is a "soft"real time task,where I101 T. Cikosky a n d 1. Whitehill. "Integrated Network Managment S p
normal delays of 1 to 2 s and a maximum of 10 to tems: Understanding the Basics," Telecomm., vol. 6, no. 6. 1993.
[ l l l G. Jakobson. R. Weihmayer, a n d M. Weissman. "A Domain Ori-
15 s are acceptable for most networks. The cur- ented Expart System Shell for Telecommunication Network Alarm
rent implementationofthesystemonSUNSparc10 Correlation," Proc. 2nd IEEE Network Mgmt. a n d Control Wksp..
Taqtown, N.Y.. Sept. 21-23, 1993.
workstation parses and correlates 12to 15 alarms/s. [121 G.,Jakobson a n d M. Weissman. "A New Approach to Message
Processing in Distributed TMN." Proc. 4th IFIPIEEE Int'l. Wksp. on
fist. Sys., Long Branch. N.J.. Oct. 5-6, 1993.
Conclusions and Future Work 1131 "NetAlert", RealTime Analysis System," GTE Telecommunica-
tion Services, 1993.
. 1LnquageFieferem,''InferenceG~n1991.
0 ur goalwasto create an alarm correlation model [141"'ARTlMFn-
[151 I. Ousterhout, 'Tck An Embeddable Command Language."Proc.
and corresponding software support system Winter USENIX Conf., pp. 133-146. Jan. 1990.
that allow efficient specification of alarm correla-
tion by the domain experts themselves.We stressed Biographies
the end-user orientation of IMPACT. We wanted to GABRIEL JAKOBSON[M '821 received a n M.S. in electrical engineering
lower the barrier between the network management from the Tallinn Polytechnic Institute, Estonia a n d a Ph.D. in com-
puter science from the Estonian Academy of Sciences in 1964 a n d
application development process and the end 1971, respectively. He is a principal member of technical staff a t
user of the application, the network management GTE Laboratories, Waltham. Massachusetts, where h e h a s been
personnel. IMPACT is a step towards this goal. project leader of several expert systems a n d intelligent d a t a b a s e
systems development projects. His current research interest include
The proposed alarm correlation model was intelligent network management support systems.
used for three purposes: intelligent alarm filter-
MARK D. WElSSMAN received a B.S. in chemical engineering a n d a
ing, alarmgeneralization, and fault diagnosis. There B.A. in computer science from the State University of New York a t
are other applications not discussed in this arti- Buffalo in 1983 a n d 1984, respectively. He is a senior member of
technical staff a t GTE Laboratories. Waltham. Massachusetts,
cle, such as fault prediction and preventive main- where he h a s been a major contributor to the development of sever-
tenance. Interesting new applications of alarm a l expert systems for network management applications.

IEEE Network November 1993 59

You might also like