You are on page 1of 43

Table of Contents

Fault Management Solution


Implementation Guide
10.2
Confidentiality, Copyright Notice & Disclaimer
Due to a policy of continuous product development and refinement, TEOCO Corporation or a
TEOCO affiliate company (“TEOCO”) reserves the right to alter the specifications,
representation, descriptions and all other matters outlined in this publication without prior
notice. No part of this document, taken as a whole or separately, shall be deemed to be part
of any contract for a product or commitment of any kind. Furthermore, this document is
provided “As Is” and without any warranty.
This document is the property of TEOCO, which owns the sole and full rights including
copyright. TEOCO retains the sole property rights to all information contained in this
document, and without the written consent of TEOCO given by contract or otherwise in
writing, the document must not be copied, reprinted or reproduced in any manner or form, nor
transmitted in any form or by any means: electronic, mechanical, magnetic or otherwise,
either wholly or in part.
The information herein is designated highly confidential and is subject to all restrictions in any
law regarding such matters and the relevant confidentiality and non-disclosure clauses or
agreements issued with TEOCO prior to or after the disclosure. All the information in this
document is to be safeguarded and all steps must be taken to prevent it from being disclosed
to any person or entity other than the direct entity that received it directly from TEOCO.
TEOCO and Helix are trademarks of TEOCO.
All other company, brand or product names are trademarks or service marks of their
respective holders.
This is a legal notice and may not be removed or altered in any way.
COPYRIGHT © 2019 TEOCO Corporation or a TEOCO affiliate company.
All rights reserved.

Your feedback is important to us: The TEOCO Documentation team takes many measures
in order to ensure that our work is of the highest quality.
If you found errors or feel that information is missing, please send your Documentation-
related feedback to Documentation@teoco.com
Thank you,
The TEOCO Documentation team
Introduction

Table of Contents
Introduction.......................................................................................................................... 2
What Is the Fault Management Solution? ........................................................................... 3
Fault Management Solution Architecture......................................................................... 4
Mediation .................................................................................................................................. 4
Base Configuration ................................................................................................................. 7
Understanding Fault Management Implementation ........................................................ 8
What Is a Library? ................................................................................................................... 8
Naming Libraries ..................................................................................................................... 8
Fault Library Basic Templates .............................................................................................11
Active Alarm Population .......................................................................................................11
Library Architecture Considerations .............................................................................. 12
Implementing Library Logic..................................................................................................12
Handling Complex Alarm Messages ..................................................................................14
Generic Message Handling .................................................................................................16
Is There a Need for the Validation, Splitter, or Event Distributor Component? ...........16
Designing a Light Threshold Architecture .........................................................................16
What Alarm Type Conditions are Required? ....................................................................17
Quality Considerations .........................................................................................................17
Performance Considerations ...............................................................................................18
Maintenance Considerations ...............................................................................................18
Project Implementation vs. Core Implementation ............................................................19
Using the Managing Table Method ....................................................................................19
FM Library Limitations ...................................................................................................... 21
FM Library Implementation Workflow ............................................................................. 22
Creating the Mediation Library ............................................................................................22
Base Configuration Population............................................................................................32
Configure Communication Admin Access Driver .............................................................33
Supporting Alarm Synchronization .....................................................................................34
Creating Network Commands .............................................................................................35
Unit Testing ............................................................................................................................37
QA Testing .............................................................................................................................38
Packaging and Delivery........................................................................................................38
Troubleshooting ................................................................................................................ 39
The Alarm Configuration Information Contains “UNDEFINED” or “-1” .........................39
The Alarms Show an Incorrect “Time up”..........................................................................39
Alarms do not Arrive .............................................................................................................40
Alarms Entering the Threshold Component do not Arrive ..............................................40
Alarms are not Cleared Automatically ................................................................................40
The Explanation View is not Available for GD_Internal Alarms .....................................41
No Data Appears in the New Explanation View ...............................................................41
The New Explanation Utility Shows More Than the Raw Data ......................................41
Information Cannot Be Found (in Explanation Window) .................................................41

1
Fault Management Solution Implementation Guide

Introduction
The goal of this guide is to provide information for implementing a Fault Management (FM)
solution, including both conceptual and practical guidelines. The document guides the user
through the implementation process by presenting the functionality of the FM solution,
followed by the FM solution architecture. After acquiring the relevant background information,
the user is presented with the main concepts of Fault Management implementation, the
considerations to take into account when designing the solution, and finally the
implementation workflow itself.
Note that the document references many implementation and reference guides which
describe the actual implementation steps in detail. This document can therefore be seen as a
comprehensive summary of the Fault Management solution implementation process, and
does not come to replace the current set of implementation guides. Instead, it summarizes
information and considerations relevant to Fault Management libraries without duplicating the
detailed implementation steps that appear in the individual implementation guides.

2
Introduction

What Is the Fault Management Solution?


Helix’s Fault Management (FM) solution provides users with the ability to receive, view, track,
and analyze faults from any source throughout the telecommunications network, or from
alarm-generating applications.
FM, acting as a basic layer for the network manager, receives alarms in standard format from
agents throughout the network. It also receives alarms and messages from Network Elements
in their proprietary formats, and converts them into the standard format. All alarms and
messages received are stored in a historical database.
The Fault Solution has two main functions:
 Alarm Collection—see the Alarm Collection chapter in the Fault Solution
Administration Guide
 Alarm Management—see the Alarm Management chapter in the Fault Solution
Administration Guide

3
Fault Management Solution Implementation Guide

Fault Management Solution Architecture


The Fault Management solution architecture is divided into three main components:
 Mediation
 FM—see the System Description chapter in the Fault Solution Administration Guide
 Base Configuration

Mediation
Helix’s Mediation performs three main functions:
 Communication with network for receiving data
 Sending commands to the network
 Processing and enriching the raw data
The following diagram and description illustrate the lines of communication:

4
Fault Management Solution Architecture

Communication Layer
Helix’s Mediation supports a mixture of protocols and data formats arriving from the
converged network. Data can be exported from the Mediation in any format according to the
specific need. The Mediation handles a variety of protocols supported in the market having a
dedicated connectivity method per each vendor or protocol (SNMP plug-in, Telnet plug-in,
Corba plug-in, and so on).
The following is a partial list of supported protocols:

FTP/SFTP JMS MTP

TCP/IP POP3 AFT

X.25 TFTP HTTP/S

SNMP SSH 3GPP

RS232 CORBA SOAP

Telnet Q3

TL1 FTAM

The Communication Admin (AKA Generic Driver or GD) manages the communication layer of
the Mediation platform. The Communication Admin's main task is to communicate with each
network element using its own protocol. The Communication Admin consists of two
sub-layers:
 A generic sub-layer that handles general management activities that are relevant to
all protocol types, such as “login” and “keep-alive”.
 A protocol-specific sub-layer, which implements the different protocols using thin
protocol-specific drivers (plug-ins).
The plug-ins are used for:
 Identifying hardware and software
 Accessing and connecting/disconnecting
 Saving parameters per: protocol, access, NE/subnet
 Load balancing
The functionality of the Communication Admin is as follows:
 Only plug-ins communicate directly with the NEs.
 Only the Communication Admin can communicate with the plug-ins.
 The Mediation products can communicate with the NEs only through the
Communication Admin.
 The Helix products can communicate with the NEs only through the Mediation
products.
By placing the communication in a single architecture layer, Mediation simplifies the
administration and management of communication. Commonly used plug-ins include SNMP,
TL1, and Stream (Telnet).

5
Fault Management Solution Implementation Guide

Sending Commands to the Network


Mediation can interact with multi-vendor, multi-technology networks, executing both NE native
commands and complex scripts. NCI (Network Commands Interface) module supports
network commands using a rich and interactive language. NCI can send scripts of commands
to multiple destinations and applications. Native commands and scripts of commands can be
either scheduled or manually sent to the network elements.
The NCI GUI enables authorized users to create, store and send commands and command
scripts to any network element. Users can send commands and scripts immediately or
schedule them for a future transmission.
The advanced scripting capabilities serve as a platform for other applications that
communicate and interact with the network. This can be useful for activities such as
performance measurements collection, alarms reception, automatic fault correction, and
provisioning (Activation).
For more information about NCI module, refer to the Using NCILibEditor chapter in the
TEOCO Studio User Guide.

Processing Raw Data


The FM product is based on libraries that parse and format the data, so it can be presented in
real-time to the NOC operator. FM libraries need to present logs as formatted alarms,
immediately, as they occur.
The FM library is also in charge of presenting the raw data in the Explanation utility, which
enables the end user to analyze the raw data behind the presented alarm.
TEOCO Studio provides network managers with the unique ability of a centralized control
over the entire process of data collection and processing of fault alarms.
The integrated GUI-based Software Development Kit (SDK) for library management enables
service operators to design, implement, and manage interactions in a flexible and automated
manner.
TEOCO Studio supports all known network element types and technologies, regardless of
vendor, protocol and domain. It enables service providers to add new network elements and
rules independently. It enables you to create library definitions for taking raw data,
manipulating it and forwarding it. The manipulation includes parsing, converting, and
enrichment. In addition, TEOCO Studio defines and manages thresholds for raising or
dropping and updating alarms, according to predefined rules.
For more information, refer to the TEOCO Studio User Guide.
The Engine Clips, a process which is not part of the TEOCO Studio SDK, is responsible for
saving the “if else” threshold rules in its internal memory (based on “reading” the Database
tables).

Note: The Engine Clips is a C application framework process that does not exist in
version 8.5. The "if else" threshold rules in version 8.5 are in the FaM Threshold component
memory.

6
Fault Management Solution Architecture

Once an event enters the Engine Clips, the process evaluates it, and then decides whether to
raise an alarm. The Engine clips then send packets to the FM Server.

The need for Engine Clips as part of the library was removed in version 8.0. Engine Clips
capabilities migrated to a new FaM Threshold or to the FM server. However, backward
compatibility to support Clips as part of existing libraries is still available.

Base Configuration
The Base Configuration provides effective tools for defining, updating, and showing alarming
entities such as network elements, facilities, links, services, regions, and sites.
The Base Configuration enables you to deal with dynamically changing networks in an easy
and flexible manner for assurance purposes. It provides a thorough network understanding,
allowing for a precise and transparent view of the geographical areas covered by the network,
its topology, relationships, and components.
For more information, see the Base Configuration User Guide.

7
Fault Management Solution Implementation Guide

Understanding Fault Management Implementation


To implement a Fault Management solution, implementers must define Helix libraries that
include the required definitions both for communicating with the network elements and
supplying information for the various Fault Management applications, that is, appropriate
northbound components.

What Is a Library?
A library is a set of definitions that determines how data coming from Network Elements
(NE) and Element Managers (EMs) will be interpreted and enriched before being passed on
to OSS applications.
Each library defines the definitions and implementations that characterize the data collection
and manipulation processes for a specific vendor, technology, connectivity method, NE
version, version, and data type.

Naming Libraries
When creating a new library, you must provide the library name, which is then inherited by all
the library components (such as Parser, Transform, and Threshold).
The library name should describe the content of the library. This name should not be
changed when reusing the base library.
Each library should have two names:
 Netlib file name, which is the packaging zip’s name.
 TEOCO Studio file name. This includes all the implementation files (all the library
definitions files, such as gml, trs, and RC).

The Netlib File


The Netlib file is a zip file created when you export the library files. It includes all the library
definitions files, as packed by TEOCO Studio, such as Parser, Transform, Validation,
DBLoader, and information definitions.
The Netlib file name is the displayed name of the library and therefore should be as detailed
as possible. Its main purpose is to supply all required information to the user without having to
actually open the library in TEOCO Studio.
When you export the library, the file name is automatically used as the default name for the
Netlib file.
The Netlib file includes all the library definitions files, as packed by TEOCO Studio, such as
Parser, Transform, Validation, DBLoader definitions, and Information definition.

8
Understanding Fault Management Implementation
The information that should be part of this name is:
 Vendor name
 Group/EQP name and version:
o Group in SNMP domain (such as SAA or MPLS).
o EQP name and version at non-SNMP libraries (such as MTX15, 5ESS16, M2000,
and MGW).
 Library Type ("F" for Fault or "P" for PM).
 Protocol ("S" for SNMP, "Q" for Q3, "C" for Corba, "T" for Telnet, and so on).
 Library level ("B" for Basic, "S" for Standard, "P" for Premium).
 Library version.
 Custom (optional).
 Project version (optional).
For example, Cisco_Core_SFP SNMP Fault Premium1.0, or Standard_ATM_SFP
Fault_SNMP_ Premium2.3

Notes:

 The prefix of the Netlib file name should include the name of the implementation files
(for example, Cisco_Core_SFP).
 The version number should include (if relevant) the 2 characters a and b:
o “a” represents a major change, such as new functionality, or a new
implementation method.
o “b” represents a bug fix.
 It is your responsibility to manually update the versions.

If project changes are required, the library should be saved with a new name, including the
string “custom”, and should have the project changes’ versioning. For example,
Cisco_Core_SFP SNMP Fault Premium1.0 Custom 1.1.

TEOCO Studio File Name


The TEOCO Studio files have shorter names that are composed of the basic information
required to identify the library. These names will not change during the library delivery,
upgrade, bug fixing, or change request process.
The structure of the name should be as follows:
 Vendor.
 Underscore sign "_".
 Group/EQP name and version.
 Group in SNMP domain (such as SAA or MPLS).
 EQP name and version (only if needed and not for NE upgrades) for non-SNMP (such
as MTX or MTX17).
 Underscore sign "_".
 Protocol ("S" for SNMP, "Q" for Q3, "C" for Corba, "T" for Telnet, and so on).
 Library Type—("F" for Fault, "P" for Performance, "C" for CDR, and so on).
 Library level ("B" for Basic, "S" for Standard, "P" for Premium).

9
Fault Management Solution Implementation Guide

For example:
Cisco_Core_SFP
Standard_ATM_SFP
This name can either be unlimited (for versions using the new Explanation mechanism), or
can be limited to up to 16 characters for old versions that do not have the new Explanation
solution.

Backward Compatibility
For Mediation libraries that were developed in an environment that does not include the new
Explanation mechanism, there is a limitation of up to 16 characters for the library name.

Note: The new Explanation utility is available starting with DVX2 version
DVX2_REL2.2.4.0_N2_REL3.5.2.0.

The following guidelines can be used when constructing the library name:
 Vendor—3 characters.
 Underscore sign "_"—1 character.
 Group in SNMP domain—SAA, MPLS, and so on.
 EQP name and version (if needed) for non-SNMP—4-5 characters (such as
MTX17).
 Underscore sign "_"—1 character.
 Library Type—"F" for Fault, "P" for Performance, "C" for CDR, and so on. 1
character.
 Protocol—("S" for SNMP, "Q" for Q3, "C" for Corba, "T" for Telnet, and so on). 1
character.
 Library level—("B" for Basic, "S" for Standard, "P" for Premium). 1 character.
 Library version—1 digit.
 Custom—C for custom library (only after the project had changed it). 1 character.
 Project version—the project's modification versioning. 1 digit.

10
Understanding Fault Management Implementation

Fault Library Basic Templates


The following diagram shows the Fault Library basic template for version 4.3 and up. The
various components will be discussed later in the document.

The following diagram shows the Fault Library template from 4.2 and earlier.

Helix processes alarms in real-time. As the information arrives in streams, the FM Libraries
will always use a DvxSub component subscribing to the Raw Data’s source.
FM libraries will never use File Reader components, which collect data from a predefined
directory. Therefore, in Fault Solution libraries post-scripting (external enrichment of raw data)
via Communication Admin is not possible.

Notes:

 In Fault libraries, the CharReplacer component connects between the Transform and
the Validation component.
 The Event History component should be connected.
 The Transform should be connected directly to the Threshold component.

Active Alarm Population


When creating a library, one of the first tasks is to map the raw data to the active alarm fields
that appear in the Active Alarm window. For this you need to know what library level the
customer purchased, as this determines what fields should be mapped. See the Active Alarm
Attributes appendix in the Fault Solution Administration Guide.

11
Fault Management Solution Implementation Guide

Library Architecture Considerations


The following sections discuss the various issues that should be taken into consideration
when designing a Fault Solution library.

Implementing Library Logic


When considering how to implement a solution that involves distinct message formats, there
are two major questions that you should ask:
 How different is the content of one alarm from another?
 How many alarms types are there?
The answer to these questions will help you to determine which of the following approaches
to follow:
 Use a single Parser and Transform and implement the logic in the Parser, for
example, using Switch Packs.
 Use a single Parser and Transform and implement the logic in the Transform using
Lookup tables.
 Use multiple Parsers and Transforms. This is known as Sequential Parsing.
The following table summarizes these considerations:

Few Alarm Types Many Alarm Types

Similar Alarm Switch Pack Transform Lookup Tables


Types

Very Different Parser logic, for example, Sequential Parsing


Alarm Types Switch Packs

This table takes into consideration the following:


 Performance—which methodology requires the least memory.
 Maintenance—which methodology is easiest to maintain.
 Upgrades—which methodology will be easiest to upgrade/Change Request.
 Return on Investment (ROI)—which methodology is the fastest to develop.
For more information, refer to the Parser and Transform Implementation Guides.

12
Library Architecture Considerations

Parser Logic
When you need to parse a relatively small number of alarm types, regardless of whether the
types are similar or different, we recommend performing the logic in the Parser, for example,
by creating “If then” statements using Switch Packs.
For more information, refer to the Sequential Parsing section of the Parser Implementation
Guide.

Note: FM parsers should emphasize a clear parse frame including a clear header and tail that
define the alarm message.

Lookups
If all alarms have a “simple” structure, that is, the same type of information and the same
number of arguments in each alarm, but you have many alarms, you should create one
parsing pattern for all alarms, thus creating a thin Parser. In this case, you can enrich the
alarms using information supplied by the vendor, by creating Lookup tables in the
Transformer/Parser.
For more information, refer to the Parser Implementation Guide (relevant for versions 6.1 and
up) and the Transform Implementation Guide.

Note: The reload time of FM lookups should be strongly considered: Reload lookup at times
when alarm traffic is presumed to be the lowest (for example, at night) Static lookups as the
lookup mentioned above should never be reloaded!

13
Fault Management Solution Implementation Guide

Sequential Parsing
To design a library that can process a large quantity of distinct messages you should use the
integrated approach, otherwise known as Sequential Parsing.
The integrated approach has the following characteristics:
1. The first Parser (the Framer Parser) frames all the messages and retrieves
identification information that is used by a Splitter component that routes each
message to its respective and dedicated (per concept such as technology or domain)
device chain.
2. The library is then divided into two Connect scripts: the FRAMER Connect, which
includes the file-source, Framer Parser and Splitter, and the MODULES Connect
which includes all modular device chains.

Handling Complex Alarm Messages


Occasionally, a FM library needs to support messages that do not solely raise or seize a
single alarm. For example:
 Message A raises alarm A and seizes alarm B.
 Message A seizes all alarms that came from its switch.
 Message A should not show alarm A in the Active Alarm window, only in the History
tables.
The following sections describe implementation methods for each scenario.

14
Library Architecture Considerations

Message A Raises Alarm A and Seizes Alarm B


There are two options for handling this situation:
 Create an additional Logic ID attribute and an additional Condition attribute in the
Transform’s Record Out (for example, ALR_LOGIC_ID2, ALR_CONDITION2).
The original attribute (ALR_LOGIC_ID) will receive the Logic ID of the alarm that
should be raised, and will pass this information to the Threshold, having it use an
“UP” condition in the ALR_CONDITION field.
Use the ALR_LOGIC_ID2, ALR_CONDITION2 attributes to create the Logic ID for the
seizing alarm and for setting the condition there to “DOWN”. Then, create an
additional Threshold which uses ALR_LOGIC_ID2 for its Logic ID and
ALR_CONDITION2 for its condition.
 Loop over the message with the Parser twice—the parser should hold a loop in which
we parse the message, and then go backwards and re-parse it again. Make sure that
the Parser holds a Var that counts the number of iterations.
In the Transform assign a Master loop type to the Loop. Then define that iteration
number one will be for the Up alarm and iteration number 2 for the Down alarm. Using
user function (where one of the inputs is the iteration number) constructs two different
Logic IDs and conditions.
For more information on Logic IDs and conditions, refer to the Transform and Threshold
Implementation Guides.

Message A Removes All Alarms That Came from Its Switch


The FM server has the ability to recognize “wild cards”, being the percentage sign (%).
Therefore, if message A should remove all alarms that arrive from switch A, ensure that the
library’s Logic ID is concatenated in the following manner:
<Vendor name><Technology_[Ne]_....................
And create the following Logic ID here:
<Vendor name><Technology_[Ne]_.%

Note: With items such as accurate and proper composing and planning, the logic ID structure
is needed. With (<Vendor name><Technology_[Ne]_.%), one will assume that as all network
element alarms are to be seized, thus the NE value must appear first.

15
Fault Management Solution Implementation Guide

Generic Message Handling


Generic messages are “catch all” messages that are used to capture alarms that are not
specifically handled, but nevertheless have criteria that requires them to be apparent in the
active alarm window. Generic alarms should only be handled if specified in the Functional
Specification (a requirements document written by Telecom Engineers. For details, see
Design and Specification for details). What you need to consider when handling generic
alarms is what criteria will raise the alarm (for example, priority greater than 8).

Notes:

 Generic alarms belong in many cases to a “General Alarm” class.


 Different projects treat Generic alarms in different ways. Therefore, we recommend
that General alarms get unique handling (their own Parser case, Transform Record
out and FaM Threshold).
 Generic messages will usually be manually seized alarms by nature; thus we
recommend that their (dedicated) threshold contain a time out.

Is There a Need for the Validation, Splitter, or Event Distributor


Component?
You should determine whether there are any irrelevant messages that need to be filtered out
once they exit the Transform. If there are, consider whether the filtering can be performed
from the Parser or Transform, or whether the Validation, Splitter or Event Distributor
component is essential.
In general, these modules create overhead. However, there are certain cases in which one of
them should be used:
 When the logic is too complex to place in the Transformer
 When a temporary filter is required.
 When the output requires routing.
For more information about the components, see the TEOCO Studio Components chapter in
the TEOCO Studio User Guide.

Designing a Light Threshold Architecture


An efficient library is one in which the condition logic is contained in the Transform, and not in
the Threshold. This is due to the fact that the Transform has more advanced logic processing
capabilities (for example, user functions and context usage).
The FaM Threshold component should be used for simple alarm up and alarm down decision
making, based on a single alarm field. The Threshold will determine if the alarm is a count
alarm, up-down alarm, or time-out, and when the alarm should be raised.

16
Library Architecture Considerations
The goal should be to create a single Threshold instance. In rare cases only, where special
conditioning is required, there may be a need to create multiple Threshold instances. The
special conditioning can include time-out, complex alarming (for example messages that both
raise one alarm and drop another one, or alarms that seize more than one alarm), counts,
and so on. But wherever possible, the goal should be to create one Threshold per library.
There are also fake thresholds that receive events that do not raise/clear alarms but are used
to pass other events to the engine_clips (in previous versions), such as thresholds for
synchronization, update, or acknowledge).

What Alarm Type Conditions are Required?


Developers should be familiar with the Threshold to know how best to translate the
requirements of the Functional Specification telecom document with the capabilities of the
Threshold, with the aid of the Transform.
The Threshold is capable of performing the following alarm conditioning:
 Up with auto down
 Up with manual down
 Up at specific times
 Value units
 Static count units
 Dynamic count units
 Advanced count units
For more information, refer to the Threshold Implementation Guide.
The Threshold cannot perform the following:
 Up and down based on Scorpio functions or lookups (the recommendation is to
create this logic in the Transform).
 Raise an alarm based on a specific alarm that was not received (this can be done
using the Correlator ES).
 Raise alarms on communication health
 Raise an alarm based on timeouts/disconnections (the GD/Corba_internal library with
appropriate access/driver timeout/disconnections configuration performs this activity).

Quality Considerations
It is important to remember that a NOC operator needs to read and act upon the descriptions
that appear within the alarms. Ensure that the descriptions that you create are clear,
complete, and grammatically correct.

17
Fault Management Solution Implementation Guide

Performance Considerations
FM libraries must work in real-time, and therefore it is crucial that the libraries work as
efficiently as possible. It is important to reduce the following resource-consuming activities:
 Writing to log files (bad files and log files). Even minor errors should be fixed as the
writing itself consumes system resources, which may cause alarms to be displayed in
delay.
 Too-frequent reloading of lookups. Static tables, new_config_db tables, and flat files
should be loaded into the instance’s memory only when the connect starts up.
Usually, Transform functions that involve interactions with the DB service should not be used.
Sometime there is a need to use a T-command/direct lookup function in the Transform, to
retrieve real time data (if the table is dynamic). For more T-command information, see the
Scorpio Implementation Guide. For more direct lookup information, see Defining Direct
LookUps in the TEOCO Studio User Guide.

Note: If possible, we recommend not using T-command/direct lookup functions.

FM libraries should not use enrichment based on the active alarm table (new_alarm).

Maintenance Considerations
It is critical to develop libraries that can be upgraded, reused, and debugged easily. For this
purpose it is important to do the following:
 Populate the Information editor appropriately (including all relevant attachments, such
as the Functional Specification, vendor documentation, and design review).
 Use the correct naming conventions while developing all GUI modules.
 Use comments within the code.
 Try to implement Parser/Transform components as generic as possible and avoid
using constant/hardcoded values.

18
Library Architecture Considerations

Project Implementation vs. Core Implementation


When planning the implementation, it is important to divide it into two levels:
 Library Implementation—the generic implementation relevant for the library itself,
including all generic definitions that can be suitable for all projects and will be reused
by others.

Note: If it is not possible to create generic definitions for the Transform (usually in old
Helix versions), we recommend using generic/product functions as much as possible
(and adding a comment in the function indicating the libraries in which it is used).

 Project Customization Implementation—Project specific modifications, including


the specific requirements the customer required. These requirements are usually
suitable for a specific customer and will not be used by other customers.
This division is crucial to support library reuse and upgrades.
To indicate a clear difference between the two, the alarms fields in the libraries, were divided
into 4 levels, according to their content and complexity. During implementation, the division
between the core library and the project implementation is significant in the following stages:
 Telecom designs—see Design and Specification.
 Transform enrichments—see Creating Transformation Rules.

Using the Managing Table Method


About Using a Managing Table
The Managing Table implementation is a convenient and effective method for handling FM
libraries that support more than 100 alarms that hold the same (or almost the same) raw data
format (for example SNMP).
With this method, the data is parsed in the Parser in a generic manner, and then in the
Transform, a Lookup on a file/table (static lookup that never reloads) is used for specific alarm
handling. The table/file holds one line per each alarm that has to be handled. The LookUp on
this table holds a key, which is the alarm identifier (the alarm ID, or in case of SNMP the
OID/Trap name).
The Managing Table method helps achieve the following:
 Keeping the Transform run time (*.trs) files small
 Avoiding splitting Mediation adaptors to more than a single run time connect file
 Minimizing the time for maintenance and change requests by creating small and
simple Transform GUI files (*.trl files).

19
Fault Management Solution Implementation Guide

Creating the Managing Table


The table holds columns that correspond with the Active alarm window layout (such as Logic
ID and module name). The content of the table needs to be populated by the implementation
engineer and based on the Functional Specification information.
The available content types are:
 Constant values (When EQP NAME is always “XXXX”)
 Dynamic values (when EQP Name ==<trap object #3>)
 A combination of 1 and 2(When Logic ID == “AAAA”_<Trap object #4>_<Trap object
#6>)
To find this content, assign a wrapping transform function to the Lookup on the Managing
Table. The function should receive as its inputs not only the alarm identifier but also all the
other alarm attributes that may be needed for enrichments (all trap objects). The function also
needs to recognize the scenarios and convert the table’s column information into actions (as
concatenations/lookups).

Managing Tables in SNMP Libraries


When dealing with SNMP data, in libraries that hold more than 100 traps, we recommend
using a Free loop, (a loop that must hold a switch pack). In this case, the Switch pack
includes one Nop pack for each expected object.
In this way we parse object #1 into Nop #1, object #2 into Nop #2, and so on. Therefore, the
loop must hold a counter, and iteration #1 will populate Nop #1, iteration #2 will populate
Nope, and so on.
At the transform level, after declaring the loop as Free, a wrapping function is to be used, over
the lookup, quarrying the managing table. The wrapping function discussed above will hold
inputs per(at least) all feasible N objects (#1, #2, and so on).
When a trap arrives with M objects only (M<N) N-M input ports will be populated with a
“dummy “const.
With this approach, Logic ID for trap A may appear, at the managing table as
“X”_”TRAP_A”_<Object 1>_<Object 2> and so on.

20
FM Library Limitations

FM Library Limitations
This section describes alarming issues that cannot be handled by the FM library:
 The FM library cannot play the role of a Correlation engine:
o It cannot raise an alarm based on N other different alarms.
o It cannot raise an alarm based on N different messages, arriving at N different
time frames.
o It cannot raise an alarm based on a scenario involving other networks/alarms.

Note: The Correlator ES can use direct access to the database but a lot of resources
would be required to do so.

 As the FM library needs to present real-time information in real-time, User functions


cannot communicate directly with the DB service, as this type of communication
affects performance.

Note: The Correlator ES can use direct access to the database but a lot of resources
would be required to do so.

21
Fault Management Solution Implementation Guide

FM Library Implementation Workflow


To implement a Fault Solution, perform the following procedures:
 Creating the Mediation Library
 Base Configuration Population
 Configure Communication Admin Access Driver
 Supporting Alarm Synchronization
 Creating Network Commands
 Unit Testing
 QA Testing
 Packaging and Delivery

Creating the Mediation Library


To create a Mediation library, there are three implementation options:
 Using EasySNMP for FM Wizard.
 Reusing an existing library.
 Manually creating a new library.

Using EasySNMP for FM Wizard


Almost all SNMP FM libraries should be implemented using the EasySNMP for FM Wizard,
providing a MIB file is available. Development and deployment of SNMP libraries is
impossible without a MIB file.

Note: If the library has to support a large number of traps (more than 100), a manually
Transform module can be built using a Managing Table. This should be decided per each
specific case.

The EasySNMP for FM Wizard creates both the whole library and Telecom documentation
which holds the alarm mapping.
The end user of the application is the Telecom Engineer who is responsible for designing the
library (what MIBs/traps to collect in the library, how to map them to active alarm fields, and
so on), while the implementation developer will be in charge of:
 Running simulations and debugging.
 Implementing additional changes in the library. Project additions (usually, using the
project layer) may include scenarios that the Wizard does not support, such as
implementing alarm A which that raises alarm A and seizes alarm b, and custom
library additions.
 The analysis of the components. For example, ensure that the Validation component
is essential, and if not omit it.
Before using the EasySNMP for FM wizard, you need to compile the MIB files using the MIB
Import utility in TEOCO Studio.
For more information, refer to the EasySNMP for FM User Guide.

22
FM Library Implementation Workflow

Manual Implementation
Manually implementing a FM Library involves the following steps, which are described in
detail below:
 Design and Specification
 Technical Library Design
 Defining the TEOCO Studio Modules

Design and Specification


A FM Library is created based upon a Telecom Functional Specification that distinguishes
between notifications that should raise an alarm, and notifications that should not. The
Functional Specification must populate active alarms fields per alarm, where the mandatory
fields are: a unique identifier, a priority, and a user-friendly textual attribute to each and every
alarm.
The Functional Specification may also distinguish between generic alarms and specific
alarms. See Generic Message Handling for more details.
In FM libraries the design will include the active alarms mapping. In addition, the TE is
responsible for deciding the library level and the customer specific changes within the specific
design. Therefore, the Telecom design will include both product/library and project levels. The
Product Level should be designed as described in the standard template. The library level will
include the design of the library level: basic, standard or premium, while the project layer will
include the upper library level called the custom level. The project level should be marked
differently in the design.
Refer to Active Alarm Population for more details.
Another issue that must be covered by the Telecom Functional Specification is alarm classes.
The TE should define to which alarm classes the alarms should be assigned, where the
default class is the SuperAlarmClass.
We recommend that raw data holding all needed alarms be retrieved and used for
development. The only valid format of raw data is the one recorded directly from the element.
This is because there can be a difference between raw data as it appears in the
documentation and the raw data that is actually received.
As this recording is usually done on UNIX machines the process of bringing these files to a
PC Machine is crucial: We recommend that a FTP using Binary mode be used, so the
appropriate non-printable chars are apparent.

23
Fault Management Solution Implementation Guide

Technical Library Design


Implementers should take the following issues into account when designing the library:
 Understanding the Header and Tail of the Raw Data
 Creating Synthetic Alarms
 Understanding the Standard Alarm Fields
 Understanding the Alarm Unique Identifier
 Handling Generic Alarms
 Handling Specific Alarms
 Handling Sync Alarms

Understanding the Header and Tail of the Raw Data


Identifying the header and tail is a critical step, especially for displaying data in the
Explanation utility. The raw data that is presented to the customer will not be accurate, if the
header and tail are not correctly defined. This is due to the fact that the header and tail define
the alarms’ borders, and thus enable the Explanation utility to display only the relevant raw
data.
The header and tail should be identified in the raw data using the Parser. The Parser’s debug
window enables you to view non-printable characters. Find the characters that match the
header and tail defined in the Functional Specification document and Vendor documentation.
Once you have located the header and footer, they can be used to create relevant parts of the
parsing tree.
For Corba protocols, which present the raw data in xml format, the Corba driver is required to
receive information in ASCII format.

Note: The tail of one message should never be the header of the next message, due to the
fact that if the information arrives with a delay, the first message will be delayed until the next
message’s header is received.

Creating Synthetic Alarms


Once you have located the header and footer, and you have a fixed alarm format, for
example, SNMP or TL1, you can create synthetic alarms, or modify existing/machine-created
alarms (for example, EasySNMP-created raw data), in order to test the full scope of alarms,
or to test particular scenarios for which we do not have the real raw data. You must use the
Vendor’s documentation as a reference when creating the synthetic alarms.

24
FM Library Implementation Workflow
For example, in the following recorded raw data sample, you can change the
sysConfigChangeTime parameter, to test the alarm with a different time, or change the
sysConfigChangeInfo parameter to display different text.
***SNMP-START***
IP = 172.30.91.1 IP-SNMPv1 = 172.30.91.1 Port = 162
Community = public
SysUpTime = 0,0:0:0:0 RecvTime = 12/16/2003 15:11:09:554
PDU-Type = TRAP VarNum = 2
Id = 1.3.6.1.4.1.9.5.0.9 Name = sysConfigChangeTrap
Desc =
sysConfigChangeTime(TIMETICKS)(1.3.6.1.4.1.9.5.1.1.28) =
4,5:0:36:36
sysConfigChangeInfo(OCTET_STRING)(1.3.6.1.4.1.9.5.1.1.34) =
Indicates which NVRAM block is changed by whom.
***SNMP-END***

Note: If the protocol is not TL1 or SNMP, it is still possible to create synthetic alarms but the
Telecom Engineer should play a major role in this task.

Understanding the Standard Alarm Fields


Use the FM Library Active Alarm Template document, created by the Telecom Expert to
understand how to map the alarm's raw data to the active alarm fields that are displayed in
the Active Alarm window. For more information, see Active Alarm Population.

25
Fault Management Solution Implementation Guide
Understanding the Alarm Unique Identifier
Unique identifier defined in the Functional Specification will be mapped to the Logic ID field in
the active alarm fields. This is used to differentiate between each alarm. It is also used when
deciding whether to raise, update, or seize the alarm. You must ensure the following:
 The string is unique.
 The string has no more than 120 characters (actually 119 characters, where the last
character is a constant value – an underscore). If more than 119 characters, one
must use Hash methodologies and algorithms (via Transform user functions) to
shorten the length without losing uniqueness!
 The string does not contain non-printable characters, double spaces, and so on.
TEOCO’s standard is that the unique identifier starts with the name of the vendor, followed by
the name of the technology, the alarm entities (for example, the EM, NE, port), and then a
combination of strings that will allow the alarm to be uniquely identified (for example, alarm
name). Use an underscore (_) as the delimiter when concatenating the different components
of the unique identifier.

Note: If your unique identifier requires more than 119 characters, use product-based HASH
functioning to shorten the identifier.

Handling Generic Alarms


Generic alarms must be handled in the Parser, Transform and Threshold modules. We
recommend defining a separate branch in the parsing tree followed by a separate Record Out
in the Transform. The aim is to create a single Record Class for all generic alarms, where
each attribute’s input is handled by the Context mechanism, enabling multiple inputs per
attribute. The RC Out’s Context should be different for Generic alarms and Specific alarms.
Creating a separate Record Out creates a separate Threshold instance, which can be
assigned to a separate alarm class (that is, not the SuperAlarmClass but a separate class
such as GenericClass). This will enable certain user groups to be assigned to this class only,
or to not view this class at all. In addition, this design enables the Alarm Up criteria to be
managed from the Threshold.

Note: By exposing generic alarms, customers can be made aware of alarms that they did not
initially request to see, which may in turn lead to customers requesting to add generic alarms
to the specific alarm list.

For more details about handling generic alarms, refer to the Parser, Transform, and
Threshold Implementation Guides.

26
FM Library Implementation Workflow
Handling Specific Alarms
In theory, it is possible to create a parsing tree branch for each alarm specified in the
Functional Specification. However, in practice it is important to create as few branches as
possible by combining alarms which are similar in terms of their raw data format, and then
using functions and lookups to differentiate between them. For example, you may have three
distinct alarms where the only difference between their formats is a different character that
appears in the same place in each alarm. These can be handled by the same branch of the
parsing tree.
The aim is to create a single Record Class for all specific alarms, where each attribute’s input
is handled by the Context mechanism, enabling multiple inputs per attribute. The RC Out’s
context should be different for Generic alarms and Specific alarms.

Handling Sync Alarms


When support for Sync Alarms is required, refer to Supporting Alarm Synchronization.

Defining the TEOCO Studio Modules


Create new libraries by selecting the Fault Library template. Once selected, open each
module and define the appropriate rules and definitions. The following sections describe each
module that is available in Fault libraries:
 Creating the Information File
 Creating Parsing Rules
 Creating Static Lookups on External Files and Database Tables
 Creating Transformation Rules
 Char Replacer Component
 (Optional) Creating Validation Rules
 Message History
 Event History
 Creating Threshold Rules and Alarm Mapping

Creating the Information File


Ensure that all parts of the information module are completed according to the instructions in
the TEOCO Studio Implementation Guide. In particular, refer to the instructions for MTTI
packaging within SNMP libraries.

27
Fault Management Solution Implementation Guide
Creating Parsing Rules
The Parser’s input port is connected to the DvxSubscriber, as it is the only N2 (C++
application framework) component that can receive raw data. This is the component in charge
of parsing the raw data stream. The parser parses the meaningful data and transfers it to the
Transform. In addition, this is where three types of distinctions, should be made:
 Between alarms and messages that are not alarms.
 Between specific alarms and the generic alarms.
 Between one specific alarm and another.
The Parser has a unique role in FM libraries, as it sets the infrastructure for the presentation
of an alarm’s raw data in the Explanation utility. The scope of the alarm, as presented in one
parsing iteration, is what determines the scope of the raw data an end user in the NOC will
see when clicking “Explanation”. Therefore, it is important to include all the raw data relevant
to an alarm in its parse, even if for instance the Functional Specification only requires strings
that appear from the messages’ 2nd line to be parsed (it is possible to not parse the 1st line).
Use the Parser client to create parsing rules for the Parser component. For specific details on
using the Parser client, refer to the Parser Implementation Guide and to the Parser User
Guide.

Creating Static Lookups on External Files and Database Tables


Static tables are required when more than three conversion options need to be used in the
Parser (from version 6.1) or Transform or when information is needed from Base
Configuration and it is not possible to know it in advance. For example, the Priority_Convert
table is used for converting the raw data priority into the standard Helix priority.

Notes:

 The Priority_Convert table is used by all non-EasySNMP libraries in the solution. We


recommend that when creating conversion tables, they should be used by as many
libraries as possible, rather than replicating the same table for each library.
 We recommend that in common tables there be as few views as possible and that
they serve as many libraries as possible and for each library there be a unique
lookup.

Static lookups can be performed in two ways, per the project’s requirements:
 Using static data presented in flat files, for example a .csv file. This file should be
saved in the $EXTERNAL_FILE directory. Projects using distributed mediation should
always use this method.
 Using static tables in the new_config_db database.

28
FM Library Implementation Workflow
Creating Transformation Rules
The Transform is responsible for two tasks:
 Transforming data from one data structure to one or more other data structures.
 Processing incoming data in a variety of ways (enrichment, conversion,
normalization, and so on.) using transformation rules.
The Transform’s input port is connected to the Parser. In the Transform, each alarm is treated
differently, with the goal being to assign to each a unique and user-friendly format, one that
will enable the end user to take fast and effective actions.
Use the Transform to map fields from the input Record Class to fields in the output Record
Class by a ‘link’. Additional processing logic may be applied to the data being translated by
linking an input field to a ‘function’, and linking the function to the output record-class. It is
possible to link several functions to apply several different transformations to the data arriving
from a single input field, and targeted to a single output field. Functions can perform data
manipulations, comparisons, and lookups. Lookup functions involve extracting information
about the alarm entity from the Base Configuration, based on a unique Alarm ID, and they
rely on the Base Configuration being populated according to the Functional Specification. If
the Base Configuration is not populated according to the Functional Specification, the alarm
will not contain configuration information and therefore the user will not be able to identify
where the alarm occurred!
For standard and premium libraries, the Transform’s Base Configuration lookup function for
receiving the Alarm ID is based on a concatenation of attributes that were extracted by the
Parser. The Base Configuration lookup function itself is predefined. Refer to the Transform
Implementation Guide for details.

Custom Level Implementation


There are two alternatives for differentiating between the product and project:
 Using a Project Transform
 Using a Transform after Transform method

Project Layer Transform


This option is used for changes required in a specific library, where parser information is used
for the change, or when libraries are created using wizards and the project layer is used to
solve issues that the wizard could not resolve.
Adding a project implementation layer for the Transform creates a new trl file, which is
opened with the library definitions from previous trl file. All definitions are grayed out, meaning
they are inheriting but cannot be modified.
You have the option to overwrite these definitions, without affecting the original trl. You can
add definitions using the parser (GR.In) information.
The Project Transform should be named after the library, using PL as the extension (for
example, Cisco_Core_SFP_PL).
Once the project transform is defined, perform all modifications on the project layer only and
not on the basic layer.
For more information, refer to the TEOCO Studio Implementation Guide.

29
Fault Management Solution Implementation Guide

Transform after Transform


We recommend that you use this method when generic cross-environment issues need to be
embedded throughout all the FM libraries, (for example, modify the new active alarm
population to the project’s existing one, or translate FM attributes into non-English languages
such as Spanish and do on). This option does not support Parser information (unless it is
forwarded in advance).
This option includes two transforms: the first one is the library transform which contains
generic definitions relevant for the library, while the second transform represents the project
transform and includes the project definitions. The second transform will be placed after the
library transform, enabling all project libraries to forward the data to the same project
transform.
This is a commonly used solution with EasySNMP libraries.
For more information, refer to the TEOCO Studio Implementation Guide.
Additional resources to help you implement Transform rules are:
 Transform Implementation Guide
 Transform User Guide

Char Replacer Component


If characters need to be replaced or removed, use the Char Replacer component, which by
default is placed after the Transform and before the Threshold in the Mediation chain.
If you are working on an existing library, ensure you add this component if it does not already
exist.
The following characters should be filtered, due to the N_alarm_handler’s inability to handle
them:
 Pipe—‘|’
If required, you can filter additional characters.
(Optional) Creating Validation Rules
The Validation component is an optional component that is used to verify that all the
(collected and) distributed data records are valid. Data records that are found to be invalid are
filtered to a False port.
This component may also act as a router as it has two output ports. For example, you can
connect to two different Thresholds.

Note: We do not always recommend using the Validation module to validate data. As an
alternative to the Validation component, you can place the validation logic in the Transform
module. For more information on when you should use the Validation component, refer to the
Validation Implementation Guide.

For more information on using the Validation client, refer to the Validation User Guide.

30
FM Library Implementation Workflow
Message History
Use this component only for projects that include the old Explanation utility.

Note: The new Explanation utility is available from DVX2 version


DVX2_REL2.2.4.0_N2_REL3.5.2.0.

Event History
This component should only be used if explicitly requested by the project.

Creating Threshold Rules and Alarm Mapping


The Threshold module is responsible for raising and clearing alarms to the Active Alarm
window, and populating them with data, based on the evaluation of incoming records. The
Threshold distributes the messages based on unique identifiers (the Logic ID) and criteria
(alarm conditions). Both attributes are inherited from the Transform.
The Threshold is subscribed to a specific engine_clips process and the Threshold rules are
loaded to the engine clips memory. Every event that is passed to the Threshold is evaluated
by the engine_clips according to the engine rules. The engine_clips decides if the event
should be passed forward or not.

Note: This explanation is relevant until version 8. In version 8.5 and above, the functions of
the engine clips were passed to the FaM Threshold component.

In situations where multiple classes have been defined, these classes must be defined in
advance (by the Integrator) in the TEOCO Admin. In addition, if multiple classes are required,
there will be multiple thresholds in the solution, by either splitting one threshold into two, or
duplicating a threshold’s information. Note that the new FM Admin application (10.3) enables
users to create rules that assign classes to alarms. However, each rule adds additional load
onto the FM Server, and we therefore recommend defining this logic in the Threshold.
However, for customers whose organization structure is constantly changing, they will have
the ability to do this via the FM Administration rules mechanism.
For more information on the Threshold client, refer to the Threshold User Guide.
For more information on how the Threshold works and considerations that should be taken
into account when using the Threshold, refer to the Threshold Implementation Guide.

31
Fault Management Solution Implementation Guide

Base Configuration Population


The Base Configuration (BC) holds the network configuration data needed for Fault Solutions.
Every Network Element with which Helix communicates needs to be defined in the Base
Configuration. The information held in the Base Configuration also provides the repository
with which alarms can be enriched. The Base Configuration needs to be populated prior to
defining accesses in the Communication Admin.
The Base Configuration can be populated using several different methods:
 NetImport—can be used to import CSV data files. It provides automatic reconciliation
and therefore no human intervention is required. Refer to the NetImport User Guide
and NetImport Implementation Guide for more information about using this tool.
 AutoDiscovery—uses ICMP and SNMP to discover the network configuration. The
information it discovers is written into CSV files which can be imported into the Base
Configuration. Refer to the AutoDiscovery User Guide and AutoDiscovery
Implementation Guide for more details.
 Manual Population—using the Base Configuration client for small implementations
such as POCs (Proof Of Concept). Refer to the Base Configuration User Guide for
more information.

Library Level
The population of the Base Configuration is according to the library level.
Basic libraries need at least the Network Element (NE) to be defined for the Communication
Admin (GD) to connect to the network.
For standard libraries, you need to populate two types of information:
 The geographical information about each network element or element manager.
 Enrichment to the network element level, that is, even if the Communication Admin
communicates with a single Element Manager, you still need to populate the Base
Configuration with information about all the NEs in the network, assuming that
enrichments on this level is relevant.
For Premium level libraries, Base Configuration should be populated to the lowest hierarchy
level (for example, port, interface, and sector).

Alarm ID (AID)
The Alarm ID (known as the Physical ID prior to Gold 4.3) is the index key that is used to
retrieve Base Configuration enrichments to a specific instance arrived in the raw data.
Therefore, the value of this field in the Base Configuration must be identical to the information
available in the notification. The Alarm ID defined in the Base Configuration should be
according to the guidelines specified in the Telecom Functional Spec, where the Alarm ID
should be based on the Entity level in the Base Configuration. Refer to the NetImport
Implementation Guide for more details.

Important: The definition of the Alarm ID must also be implemented in the Transform and
must comply with the Base Configuration for the alarms to present configuration information,
such as EQP name, From Site, To site, area, district, and so on. If the function used in the
Transform fails to extract information from the Base Configuration, the alarm will be displayed
without configuration information.

32
FM Library Implementation Workflow
Customized library levels enable populating the Base Configuration with additional
information about the Network Element, which can be displayed in specific fields that are
enabled for project customization (for example, Additional Info N). For example, a customer
can display an additional name for a Network Element, using the same Alarm ID key.

Notes:

 The Alarm ID on the Network Element level is called NEID.


 Some information will be populated at the NE level while other information will be
populated at the AID level.

Configure Communication Admin Access Driver


The Communication Admin is a single management layer that handles Helix’s connectivity to
all Network Elements, using various protocols.
The following list defines guidelines for using Communication Admin to define connectivity
rules for connecting to network elements:
 To display raw data via the Explanation mechanism, you need to increase the
number of days in the Log Data to RDR parameter when defining the Access.
 The Min connections parameter should be kept with the 0 value. This is due to the
fact that the connection should only be established when the library is up and
subscribing to the desired access. The exception to this rule is for Command
Accesses which are heavily used (for example, every 3 minutes or less). In this case,
it is advisable to change the Min connections to 1 or more, if sending commands in
parallel is required. The maximum connection parameter, in most scenarios, should
keep the value 1.
 In accesses which are of type “”Listening” (for example, SNMP trap listener and GD
listener) you need to ensure that every NE that sends events/traps to the Access is
assigned to the Access (in the Associated Network Element List), and NE_ID should
be defined as the IP address of the NE.
 For protocols such as Syslog, in which the messages are received as a stream of
information without a clearly defined header and tail for each message, it is possible
to use the CommandPostfix parameter within the Connection Parameters section
to define a specific string that will appear at the end of each message.
 As a general rule, each Generic Driver instance should handle up to 10 Accesses
only.
For more information, refer to the Communication Admin User Guide, Communication Admin
Administration Guide and Communication Admin Implementation Guide.

33
Fault Management Solution Implementation Guide

Supporting Alarm Synchronization


Synchronization occurs to update the Helix Active alarms screen’s status with the actual
status of an alarming instance [such as a Network Element (NE) or Element Management
System (EMS)]. For example, there are cases where disconnections between TEOCO’s Helix
system and the NE/EMS may occur, implying that Helix may not hold an accurate picture of
the alarms. During this period, the NE/EMS may continue to produce logs and notifications,
but these will not arrive at Helix, which will continue to reflect the alarms’ status prior to the
disconnection. In addition, synchronization may also occur on a periodic basis.
Once a connection is re-established between Helix and the NE/EMS, Helix must be
synchronized with the updated existing alarms within the NE/EMS.
Some NEs support a functionality that restores the ‘real alarms' status. In some instances,
Helix users can pool this information by manually sending NEs the relevant sync command
via NCI or Communication Admin. In others, the information will be pooled spontaneously
(with no active intervention).
TEOCO refers to these alarms as Sync alarms, and they are handled differently from other
alarms. Helix recognizes the arriving alarms as synchronization alarms, and updates the
existing alarms with the Sync alarms to supply the user with an updated picture of active
alarms (there is no repeated count update). The synchronization process is managed using
identification keys, based on the prefix of the Logic ID (which usually holds a constant value
with the Library name and the NE name upon which the Sync is run).
For example, vendor_eqp_[NE] (SIEMENS_RC_GSM_NORD). In this instance, the
synchronization process is managed on the NE level, for NEs connected to the library.
The first method for receiving Sync alarms is in a spontaneous manner. The second method
for receiving an updated picture of the NEs’ status is by sending commands to the NEs
according to a predefined script in NCI, a predefined rule in FaultPro (if purchased), or a
Synchronization command set in the Communication Admin.
Both methods require synchronization between what Helix has and what the NE has sent. In
addition, both require that the NE supports the synchronization mechanism. For this
synchronization to occur, Helix receives a unique raw data message holding a list of all the
alarms that are currently active from the NE’s point of view. These messages must have a
unique identifier to enable Helix to handle them differently from regular messages, and also a
unique identifier which signifies the start and then the end of the list (see the following
example).

StartSync
$Alarm 1 Abcd
$Alarm 2 Abcd
EndSync

The synchronization process can be triggered by either Helix or the EMS.

34
FM Library Implementation Workflow
The library components commonly used in the synchronization processes are:
 Parser—parses the synchronization start/end notification that came from the
EMS and passes an indication to Transform using context.
 Transform—based on the indication from Parser, creates relevant records that
are passed to Thresholds.
 Thresholds—contains relevant Threshold rule sets. According to the event
passed from Transform and the Threshold sets of rules, the Engine Clips
starts/ends the synchronization process.

Note: This explanation is relevant until version 8.0. From version 8.0, the functions of
the engine clips were passed to the FaM Threshold component.

For more information about implementing alarm synchronization, refer to the TEOCO Studio
Implementation Guide.

Creating Network Commands

Creating Commands for FaultPro


FaultPro is an optional add-on to FM that is used to execute commands and communicate
with devices. The commands can be activated via two kinds of rules (defined in FM Admin):
 Automation rules automatically send commands based on specific events and/or
value(s). For example, if the alarm has been acknowledged, a command can be sent
to the customer’s monitoring equipment informing them of the alarm’s acknowledge
status.
 Association rules provide a list of commands based on the specific alarm context.
The operator can then select the correct command.
The commands themselves are developed by the implementer in NCI, according to the
project’s requirements.
For NCI commands to be available for FaultPro users, the commands must be marked in the
NCI as being available for external applications, with the Fault Solution context.
In addition, for these rules to work, each type of rule has a dedicated connect script on the FM
Server ($FAM_SERVICES_DIR or $FAM_SERVICES_IMP_DIR) which must be running for
the rules to function. The connect scripts are as follows:
 fam_autonoc_alarmpropagation.connect
 fam_autonoc_correctiveaction.connect
 fam_autonoc_parent.connect
 fam_autonoc_semicorrectiveaction.connect

35
Fault Management Solution Implementation Guide

Creating Commands for Maintenance Integrators


NCI is also used by administrators and integrators for daily maintenance purposes to perform
external procedures on network elements and internal Helix performance checks either by
using NCI scripts or by using NCI’s scheduler to run scripts on the Helix Server.
The following list shows examples of common NCI maintenance functions:
 Ping—checks the availability of the server/NE.
 Check database—checks the availability of the database and its usage.
 Check server—checks the CPU and Memory Usage.
 Shrink database—cleans the old data in history tables and device tables.
 Clean FS—removes old and large files (core files, tar , gs, zip , or Z).
 Backup file system—tar file system of $BASE_DIR.
 Backup database—dumps the database by using an export command or dump
database command.
 Send mail—send mails on errors found in the servers.
 Get file—gets files from one directory to another.
 Check alarm arrival—checks in the New alarm table the last arrival time of alarms.

Validation
To check that the commands you created will function properly, you should run them in NCI
and ensure you receive the correct results.

36
FM Library Implementation Workflow

Unit Testing

Library Activation
Unless your development environment enables you to connect to a live/lab Network Element,
ensure that you subscribe to a Communication Admin of type FTP to test the Explanation
utility.
For more information, refer to the Conductor User Guide and the Communication Admin User
Guide.

Quality Tests
To perform Quality Tests, ideally you should have the raw data that covers all possible
scenarios. If this is not possible, you should consult with the Telecom Expert to see whether it
is possible to create synthetic data based on actual Raw Data and Vendor Documentation.
These tests should ensure the following:
 All required alarms are raised.
 All alarms that should be auto-seized are auto-seized.
 All alarms that should time-out have timed-out.
 Count alarms were raised correctly.
 Every unique event processed by the library is handled correctly (synchronization,
update, purge, acknowledgement, and so on).
 Unique alarming mechanisms are working correctly, for example, if a message should
raise one alarm and clear another, ensure that this happens. Or, for wildcard
operations, ensure that the entire batch you expect seize, is really seized.
 The alarm fields are correctly populated, and in particular you should check the
following fields:
o Logic ID—where an empty Logic ID is an implementation error.
o Description—should not be empty and should be presented correctly, for
example, correct English spelling, grammar, punctuation.
o Priority—should not be empty and should be within in the defined scope (1-9)
and according to TEOCO standard (unless specify differently).
o Configurable fields—note that if you are testing using a Base Configuration that
is not compliant with the raw data, you will receive a default value in these fields
rather than the correct Base Configuration values. Therefore, it is crucial to
populate the Base Configuration with data that complies with the raw data in
order to test these fields.
 When testing a library that includes a project layer ensure that information that the
project layer embeds into the original transform appears, for example, if the project
layer adds a prefix to the description, ensure that the prefix is visible.
To test that the Explanation is functioning properly, use a Communication Admin based on an
FTP protocol and not the Conductor simulation utility. This requires placing a raw data file in
the required directory on the server. This is because the Conductor does not create a delivery
data record which is essential for the Explanation utility.

37
Fault Management Solution Implementation Guide

Performance Tests
As opposed to PM, as the alarm information should be presented to the user in real-time, the
issue of performance is a critical one.
Note that writing to log files and BAD files has a detrimental effect on performance. Therefore,
even if the messages written to the logs are not severe, the issue that raised the log message
should be handled.
Lookup reloading should be handled during off-peak hours, and we also recommend planning
the reloads so that they are done sporadically.
In addition, Lookups and Static Tables should be loaded only once when the connect script is
raised for the first time.

QA Testing
To test the libraries, the alarms are simulated based on raw data (where available) and the
resulting data that is stored in the database is compared with the requirements in the
Functional Specification.

Packaging and Delivery


Refer to the TEOCO Studio Implementation Guide for details about packaging and delivering
libraries.

Notes:

 Information modules that include MTTI require an external script to activate the MTTI
in the target environment.
 Libraries that were developed using the EasySNMP for FM Wizard, where the target
environment does not support the Wizard, the Wizard module must be removed and
detached from the library and packaged in the metadata folder.

38
Troubleshooting

Troubleshooting
While implementing a fault solution, you may experience one of the following problems:
 The Alarm Configuration Information Contains “UNDEFINED” or “-1”
 The Alarms Show an Incorrect “Time up”
 Alarms Do Not Arrive
 Alarms Entering the Threshold Component Do Not Arrive
 Alarms are not Cleared Automatically
 The Explanation View is not Available for GD_Internal Alarms
 No Data Appears in the New Explanation View
 The New Explanation Utility Shows More than the Raw Data
 Information Cannot Be Found (in Explanation Window)

The Alarm Configuration Information Contains “UNDEFINED” or “-


1”
The alarm configuration information may contain “undefined” for strings or “-1” for integers.
These are the TEOCO standard defaults for functions that query Base Configuration.
 Ensure that the library refers to an existing Base Configuration element
 If you are using delivery data:
o If this occurs in development, ensure that the simulation was run using a file
transfer protocol.
o If this occurs in production, ensure that the Communication Admin is correctly
associated with Base Configuration.

Note: We recommend that all user functions/Lookups whose output port is connected to an
attribute in the record class out (and then mapped in the Threshold) always uses (for
debugging purposes) “UNDEFINED” as the default for strings and -1 as the default for
integers.

The Alarms Show an Incorrect “Time up”


If the alarm shows an incorrect “time up”, do the following:
 If the event time should not be used, check whether the Transform’s RC OUT holds a
DATE_N attribute which is connected to nothing, and if so delete the attribute and
make sure the Threshold’s “use event time” is unmarked.
 If the raw data time should be used, check whether the library holds a DATE_N
attribute, which holds links, and if so, ensure that the Threshold’s “use event time” is
marked.

39
Fault Management Solution Implementation Guide

Alarms do not Arrive


Use the Conductor to detect queues and ensure that information is not getting stuck in one of
the components.
Refer to the Parser, Transform and Threshold Implementation Guides’ troubleshooting
sections and to the Conductor User Guide.

Alarms Entering the Threshold Component do not Arrive


If you are working in a development environment on a machine where other
environments also exist: 

 Ensure that the Engine Clips and the FM Server in other environments are not getting
the feed.
If they are getting the feed, reassign a new port to the Engine Clips/
(config_db..PROCESSES.PORT).

To test the Engine Clips:


1. Run the relevant engine clips in debug mode (for example, engine_clips –id <id> -
med2 –p).

Notes: This action and any other debugging for an N1 (C application framework)
process can be taken after going through the following stages:

o Only if the Library debugging uses a unique engine clips (otherwise other
libraries may “suffer” from this action too.

o After blocking the process in config_db..PROCESSES.BLOCK.

o After refreshing its parent (so it does not control the blocked process) kill – 14
<Father‘s name > <UNIX instance>

2. Make sure that the clips shows no errors.


3. Make sure the Threshold is included in the clips.
4. Run a simulation, and see that it arrived in the engine clips: 0.

o With an appropriate + or –.
o With all the anticipated fields.

Alarms are not Cleared Automatically


If alarms are not cleared automatically, do the following:
 Ensure that the clearing alarm holds a “DOWN” condition in the library.
 Ensure that the clearing alarm holds the same Logic Id as the existing (up) alarm.
 Ensure that if the library supports grouping, a "DOWN" event was created for each
member.

40
Troubleshooting

The Explanation View is not Available for GD_Internal Alarms


GD_INT libraries subscribe in a way (using an internal protocol) that does not support the
Explanation. For consistency reasons, do not include the Message History and Event History
components in the library connect script.

No Data Appears in the New Explanation View


If no data appears in the new Explanation:
 Check that the Explanation service is working.
a. Check that the connect dvx2_explanation_service_for_father.connect is running.
b. Run the following:
conqt
gs es ExplanationService
You should get a successful line.
The Explanation string consists of the Access number, RDR file number, the
offset in the file (of the data), and the length of the data. Try to open the RDR file
in the Communication Admin and see if Communication Admin saved the history
data at all.
c. To see the data go to $GD_RDR_DIR/<Access number>/bin/<File name>and see
if the data exists there.
d. If it does not exist, check in the Communication Admin definitions to see
whether the access saved the RDR at all.

The New Explanation Utility Shows More than the Raw Data
If the New Explanation Utility shows more than the raw data:
 Check the parsing rules and see if they catch more than what is needed for the
message.

Information Cannot Be Found (in Explanation Window)


The Explanation feature works with "Delivery_Data" for the delivery data record class for FM
libraries. If you use a different delivery data record class, the Explanation feature is not
supported. Ensure the name of the delivery data record class is "Delivery_Data".

41

You might also like