You are on page 1of 136

#

Cloud Flexi Zone Controller, Rel. LTE17A, Operating


Documentation

Cloud Flexi Zone Controller Alarms


DN09253614
Issue 01
Approval Date 2017-10-24
The information in this document applies solely to the hardware/software product (“Product”) specified
herein, and only as specified herein.

This document is intended for use by Nokia customers (“You”) only, and it may not be used except for the purposes d
agreement between You and Nokia (“Agreement”) under which this document is distributed. No part of this document m
copied, reproduced, modified or transmitted in any form
or means without the prior written permission of Nokia. If you have not entered into an Agreement applicable
to the Product, or if that Agreement has expired or has been terminated, You may not use this document in
any manner and You are obliged to return it to Nokia and destroy or delete any copies thereof.

The document has been prepared to be used by professional and properly trained personnel, and You
assume full responsibility when using it. Nokia welcome Your comments as part of the process of continuous develo
improvement of the documentation.

This document and its contents are provided as a convenience to You. Any information or statements
concerning the suitability, capacity, fitness for purpose or performance of the Product are given solely on
an “as is” and “as available” basis in this document, and Nokia reserves the right to change any such
information and statements without notice. Nokia has made all reasonable efforts to ensure that the content
of this document is adequate and free of material errors and omissions, and Nokia will correct errors that You identify
document. But, Nokia's total liability for any errors in the document is strictly limited to the
correction of such error(s). Nokia does not warrant that the use of the software in the Product will be uninterrupted or

NO WARRANTY OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO
ANY WARRANTY OF AVAILABILITY, ACCURACY, RELIABILITY, TITLE, NON-INFRINGEMENT,
MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE, IS MADE IN RELATION TO THE
CONTENT OF THIS DOCUMENT. IN NO EVENT WILL NOKIA BE LIABLE FOR ANY DAMAGES, INCLUDING BU
LIMITED TO SPECIAL, DIRECT, INDIRECT, INCIDENTAL OR CONSEQUENTIAL OR ANY LOSSES, SUCH AS BUT NOT
TO LOSS OF PROFIT, REVENUE, BUSINESS INTERRUPTION, BUSINESS OPPORTUNITY OR DATA THAT MAY AR
THE USE OF THIS DOCUMENT OR THE INFORMATION IN IT, EVEN IN THE CASE OF ERRORS IN OR OMISSION
THIS DOCUMENT OR ITS CONTENT.

This document is Nokia proprietary and confidential information, which may not be distributed or disclosed to any third
without the prior written consent of Nokia.

Nokia is a registered trademark of Nokia Corporation. Other product names mentioned in this document
may be trademarks of their respective owners, and they are mentioned for identification purposes only.

Copyright © 2016 Nokia . All rights reserved.

Important Notice on Product Safety


This product may present safety risks due to laser, electricity, heat, and other sources of
danger.

Only trained and qualified personnel may install, operate, maintain or otherwise handle this
product and only after having carefully read the safety information applicable to this product.

The safety information is provided in the Safety Information section in the “Legal, Safety and
Environmental Information” part of this document or documentation set.
Copyright © 2016 Nokia . All rights reserved.

Important Notice on Product Safety


This product may present safety risks due to laser, electricity, heat, and other sources of
danger.

Only trained and qualified personnel may install, operate, maintain or otherwise handle this
product and only after having carefully read the safety information applicable to this product.

The safety information is provided in the Safety Information section in the “Legal, Safety and
Environmental Information” part of this document or documentation set.

Nokia is continually striving to reduce the adverse environmental effects of its


products and services. We would like to encourage you as our customers and users to join us in working
towards a cleaner, safer environment. Please recycle product packaging and follow the recommendations
for power use and proper disposal of our products and their components.

If you should have questions regarding our Environmental Policy or any of the environmental services we
offer, please contact us at Nokia for any additional information.
17A, Operating
Product”) specified

be used except for the purposes defined in the


ibuted. No part of this document may be used,

an Agreement applicable
not use this document in
ies thereof.

personnel, and You


the process of continuous development and

tion or statements
ct are given solely on
change any such
o ensure that the content
will correct errors that You identify in this
he
he Product will be uninterrupted or error-free.

UT NOT LIMITED TO
RINGEMENT,
RELATION TO THE
NY DAMAGES, INCLUDING BUT NOT
ANY LOSSES, SUCH AS BUT NOT LIMITED
RTUNITY OR DATA THAT MAY ARISE FROM
E OF ERRORS IN OR OMISSIONS FROM

distributed or disclosed to any third parties

ned in this document


ation purposes only.

sources of

e handle this
o this product.

al, Safety and


sources of

e handle this
o this product.

al, Safety and

s to join us in working
w the recommendations

ronmental services we
1. Introduction
The purpose of this document is to assist users in finding alarms, their meanings, effects, and instructions on how to avoid
them. The alarms are listed by numbers in ascending order.

2. How to read this excel report


The excel report provides a full information on alarms. It shows the full set of alarm attributes including the change
information.

2.1 Alarm List


This section shows the full alarm information including the following items:
- alarm number
- alarm name
- meaning of the alarm
- effect of the alarm
- instructions
- clearing information
- change information

2.2 Field descriptions


Field descriptions are provided in the second row of Alarm List section. Use fold (-) and unfold (+) buttons on the left panel
respectively hide and show the field descriptions.

2.3 Change information


Change information is a part of Alarm List section and is available in the following ways:

1. Definition on whether the alarm is new, removed. That definition is provided in the following columns:

- Changes between issues ... which shows the changes since the previous issue of the document.
- Changes between releases ... which shows the changes since the latest issue of the document in the previous product
release.

If the cell is empty, the alarm is not changed, nor new, nor removed.
Note that removed alarms are listed in the very bottom and they are marked with red font.

Filters are enabled for convenient browsing of the change categories.

2. Detailed change information showing the current and previous values of parameter attributes. That information is provide
in the following columns:

- <alarm attribute> in issue ... which shows the value in the previous issue of the document.
- <alarm attribute> in release ... which shows value in the latest issue of the document in the previous product release.

The columns are grouped into attribute-specific sections for structured and convenient view. Use unfold (+) and fold (-)
buttons in the top bar to browse the attribute-specific change details. Use unfold all (2) and fold all (1) buttons on the left ha
side of the top bar to respectively show and hide the change details for the whole report.

Note that for all alarms except new and removed ones, the field values for previous issue and previous release are always
provided to show a total history information. Additionally, the changed attributes of alarms are highlighted with grey color.
Highlights are enabled to specify whether there is a change between issues or releases.
d instructions on how to avoid

s including the change

ld (+) buttons on the left panel to

g columns:

ment.
ment in the previous product

tes. That information is provided

previous product release.

Use unfold (+) and fold (-)


old all (1) buttons on the left hand

d previous release are always


re highlighted with grey color.
Note:
See the How to Read This Report tab
for instructions on the usage of Alarm List

Changes between issues Changes between releases Alarm Number


01A and 01 LTE16A and LTE17A

Changed Changed 70005

Changed Changed 70006

Changed Changed 70011

Changed Changed 70012

Changed Changed 70025

Changed Changed 70030

Changed Changed 70074

Changed Changed 70115

Changed Changed 70157

Changed Changed 70158


Changed Changed 70159

Changed Changed 70160

Changed Changed 70161

Changed Changed 70164

Changed Changed 70166

Changed Changed 70168

Changed Changed 70186

Changed Changed 70188

Changed Changed 70189

Changed Changed 70194

Changed Changed 70197

Changed Changed 70242


Changed Changed 70243

Changed Changed 70244

Changed Changed 70245

Changed Changed 70246

Changed Changed 70247

Changed Changed 70249

Changed Changed 70250

Changed Changed 70251

Changed Changed 70256

Changed Changed 70265

Changed Changed 70267

Changed Changed 70268


Changed Changed 70269

Changed Changed 70271

Changed Changed 70273

Changed Changed 70280

Changed Changed 70310

Changed Changed 70330

Changed Changed 70331

Changed Changed 70332

Changed Changed 70348

Changed Changed 70350

Changed Changed 70351

Changed Changed 70352


Changed Changed 70357

Changed Changed 70358

Changed Changed 70360

Changed Changed 70362

Changed Changed 70368

Changed Changed 70377

Changed Changed 70378

Changed Changed 70379

Changed Changed 70382

Changed Changed 70439

Changed Changed 70440

New New 70446


New New 70447

New New 70448

New New 70456

New New 70457

New New 70458

New New 70459

New New 70460

New New 70461

New New 70462

New New 70463

New New 70467

New New 70469


New New 70471

New New 70472

New New 70473

New New 70474

Changed Changed 71061

New New 71300

New New 71301

New New 71302

New New 71303

New New 71304

New New 71305

New New 71306


New New 71307

New New 71308

New New 71309

New New 71310

Removed Removed 70001

Removed Removed 70003

Removed Removed 70004

Removed Removed 70007

Removed Removed 70008

Removed Removed 70009

Removed Removed 70064

Removed Removed 70175


Removed Removed 70187

Removed Removed 70254

Removed Removed 70255

Removed Removed 70274

Removed Removed 70275

Removed Removed 70276

Removed Removed 70277

Removed Removed 70278

Removed Removed 70279

Removed Removed 70283

Removed Removed 70285

Removed Removed 70286


Removed Removed 70287

Removed Removed 70291

Removed Removed 70294

Removed Removed 70295

Removed Removed 70296

Removed Removed 70297

Removed Removed 70299

Removed Removed 70301

Removed Removed 70302

Removed Removed 70303

Removed Removed 70304

Removed Removed 70305


Removed Removed 70307

Removed Removed 70313

Removed Removed 70322

Removed Removed 70324

Removed Removed 70325

Removed Removed 70326

Removed Removed 70327

Removed Removed 70328

Removed Removed 70329

Removed Removed 70347

Removed Removed 70359

Removed Removed 70361


Removed Removed 70363

Removed Removed 70364

Removed Removed 70365

Removed Removed 70366

Removed Removed 70367

Removed Removed 70370

Removed Removed 70371

Removed Removed 70372

Removed Removed 70373

Removed Removed 70383

Removed Removed 70385

Removed Removed 70386


Removed Removed 70387

Removed Removed 70388

Removed Removed 70389

Removed Removed 70390

Removed Removed 70391

Removed Removed 70394

Removed Removed 70395

Removed Removed 70396

Removed Removed 70397

Removed Removed 70398

Removed Removed 70399

Removed Removed 70400


Removed Removed 70401

Removed Removed 70402

Removed Removed 70403

Removed Removed 70404

Removed Removed 70405

Removed Removed 70406

Removed Removed 70407

Removed Removed 70408

Removed Removed 70409

Removed Removed 70410

Removed Removed 70411

Removed Removed 70412


Removed Removed 70413

Removed Removed 70414

Removed Removed 70415

Removed Removed 70416

Removed Removed 70417

Removed Removed 70418

Removed Removed 70419

Removed Removed 70420

Removed Removed 70421

Removed Removed 70422

Removed Removed 70423

Removed Removed 70424


Removed Removed 70425

Removed Removed 70426

Removed Removed 70427

Removed Removed 70428

Removed Removed 70429

Removed Removed 70430

Removed Removed 70431

Removed Removed 70432

Removed Removed 70433

Removed Removed 70434

Removed Removed 70435

Removed Removed 70437


Removed Removed 70449

Removed Removed 70450

Removed Removed 70451

Removed Removed 70453

Removed Removed 71094

Removed Removed 71095


Alarm Name Probable Cause

INCORRECT ALARM DATA Invalid parameter

ACTIVE ALARM OVERFLOW Resource at or nearing capacity

NODE NOT RESPONDING Equipment malfunction

SERVICE LEVEL DEGRADED BELOW Equipment malfunction


THRESHOLD

POSSIBLE SECURITY THREAT IN Threshold crossed


NETWORK ELEMENT

DISK DATABASE IS GETTING FULL Resource at or Nearing Capacity

MAXIMUM THRESHOLD HAS BEEN Threshold Crossed


CROSSED

LICENSE EXPIRATION WARNING LIMIT IS Threshold crossed


REACHED

CPU USAGE OVER LIMIT Threshold crossed

FILE SYSTEM USAGE OVER LIMIT Threshold crossed


MANAGED OBJECT FAILED Software program abnormally
terminated

MEMORY USAGE OVER LIMIT Threshold crossed

OPERATING SYSTEM MONITORING System call unsuccessful


FAILURE

ETHERNET LINK FAILURE Link failure

MANAGED OBJECT LOCKED Software program abnormally


terminated

CLUSTER STARTED (RESTARTED) Software environment problem

CLUSTER OPERATION INITIATED BY Congestion


OPERATOR

MANAGED OBJECT SHUTDOWN BY Congestion


OPERATOR

MANAGED OBJECT UNLOCKED BY Congestion


OPERATOR

RECOVERY GROUP SWITCHOVER Software program abnormally


terminated

MINIMUM THRESHOLD HAS BEEN Threshold Crossed


REACHED

ALARM LOG FILE INACCESSIBLE File error


ALARM PROCESSOR CONFIGURATION Configuration or customising error
IS OUT OF ORDER

CORRUPTED ALARM DATA Corrupt data

ILLEGAL USAGE OF EXTERNAL ALARM Software Program Error


TIME

ALARM SYSTEM HEARTBEAT Timeout expired

ALARM SYSTEM HEARTBEATING Configuration or Customizing


SWITCHED OFF Error

CRITICAL CLUSTER SERVICES WITHOUT Equipment Malfunction


STANDBY

NO OPERATIONAL RECOVERY UNIT FOR Software Program Abnormally


SERVICE INSTANCE Terminated

UNRECOMMENDED CONFIGURATION Equipment Malfunction


FORCED BY OPERATOR

RESOURCE ALLOCATION OR DE- Software Program Abnormally


ALLOCATION FAILURE Terminated

RECOVERY ACTIONS BANNED FOR Software Error


MANAGED OBJECT

INVALID EXTERNAL USER Configuration or Customizing


Error

EXTERNAL LDAP FAILURE Underlying resource unavailable


INVALID ACTIVE SESSIONS Database inconsistency

APPLICATION CONFIGURATION IS OUT Configuration or Customizing


OF ORDER Error

PROCESS START-UP FAILED DUE TO AN Software program abnormally


UNAVAILABLE SERVICE terminated

UNKNOWN SPECIFIC PROBLEM Configuration or Customizing


Error

LICENSE MANAGER FAILED TO OBTAIN Configuration Or Customization


TARGET ID Error

DATABASE SYNCHRONIZATION FAILURE Communication Subsystem


Failure

MAX CONNECTIONS TO DATABASE Threshold Crossed


REACHED

UNABLE TO WRITE TO DISK Storage Capacity Problem

BIDIRECTIONAL FORWARDING Transmission Error


DETECTION SESSION DOWN

DETECTED CLUSTER INTERNAL Leak Detection


MESSAGING WITH UNKNOWN ORIGIN

LICENSE STATE OFF FOR ACTIVE Threshold crossed


FEATURE

USER-SPECIFIED CONFIGURATION Configuration or Customizing


FAILED DURING POSTCONFIG Error
CERTIFICATE CANNOT BE MADE Underlying resource unavailable

TLS CONNECTION CANNOT BE MADE BY Underlying resource unavailable


RUIM

UNSAVED CONFIGURATION IN USE Timeout expired

ALARM SYSTEM RESTORED Reinitialized

NODE LOCAL FORWARDING TABLE Out of memory


ERROR

SYSTEM CLOCK OUT-OF-SYNC WITH Clock Synchronization Problem


NTP SERVER

CONNECTIVITY LOST TO FORWARDER Loss of Frame

TEST LICENSES ENABLED Configuration Or Customization


Error

CERTIFICATE ABOUT TO EXPIRE OR Threshold crossed


EXPIRED

AUTOMATIC KEY UPDATE CANNOT SOFTWARE PROGRAM ERROR


PROCEED

AUTOMATIC KEY UPDATE COMPLETED SOFTWARE PROGRAM ERROR


SUCCESSFULLY

RUNNING WITH INSUFFICIENT MEMORY THRESHOLD CROSSED


RUNNING WITH INSUFFICIENT AMOUNT THRESHOLD CROSSED
OF CPUS

STORAGE SERVICE SUPERVISION Underlaying Resources


FAILURE Unavailable

LOCAL STORAGE AGENT SUPERVISION Underlying Resource Unavailable


FAILURE

TWO WAY ACTIVE MEASUREMENT Transmission Error


PACKET LOSS RATE EXCEED
THRESHOLD

TWO WAY ACTIVE MEASUREMENT Transmission Error


ROUND TRIP TIME EXCEED THRESHOLD

SYSTEM INSTALL DATA MISMATCH INVALID PARAMETER

DUPLICATE IPV6 ADDRESS DETECTED Configuration or Customization


Error

VNF STARTUP CONFIGURATION Corrupt data


VERIFICATION FAILED

KEY ACCESS FAILED FOR THE Timeout Expired


AUTOMATED REMOTE ACCESS
ACCOUNT

MEMORY DATABASE IS GETTING FULL Resource at or Nearing Capacity

FILE SYSTEM UNAVAILABLE Underlying Resource Unavailable

PTP MASTER UNREACHABLE Link failure


ALARM OBSERVATION FILE DATA LOSS Underlying Resources
DETECTED Unavailable

PTP MASTER OUT OF SERVICE OUT_OF_SERVICE

CERTIFICATE REVOCATION LIST Software Program Error


UPDATE FAILURE

CERTIFICATE REVOKED Software Program Error

INVALID IP CONFIGURATION Configuration or Customizing


Error

CLUSTER PEER IN STORAGE POOL NOT Underlaying Resources


AVAILABLE Unavailable

GLUSTER BRICK IN STORAGE POOL Underlaying Resources


CORRUPTED Unavailable

SPLIT-BRAIN IN STORAGE POOL Underlaying Resources


DETECTED Unavailable

FABRIC PATH FAILURE Link failure

UVM NOT AVAILABLE / UPGRADE Equipment malfunction


FILESERVER NOT RESPONDING

CLS UNREACHABLE Equipment malfunction

UNSPECIFIED NETWORK ELEMENT configuration or customizing error


TYPE
FAILED TO RESERVE CLIENT ID FROM transmission error
CLS

DYNAMIC CONFIGURATION ACTIVATION Configuration Or Customisation


FAILED Error

USER/GROUP ACCOUNT FILE SYNC Transmit Failure


ERROR

LOGGING NODE NOT AVAILABLE Underlying Resources


Unavailable

CONFIGURATION OF SNMP MEDIATOR Corrupt data


IS OUT OF ORDER

NO REPLY TO SNMP REQUEST Corrupt data

UNKNOWN SNMP TRAP Corrupt data

AUTHENTICATION FAILURE IN Protection Path Failure


ETHERNET DEVICE

SWITCH RESTARTED Equipment malfunction

SWITCH LINK DOWN Link failure

BACKUP ERROR Application Subsystem Failure

SWITCH AND SERVICE UNIT: FABRIC System resources overload


BROADCAST STORM
MANUAL NODE ISOLATION Equipment malfunction
VERIFICATION NEEDED

DRBD HARDWARE FAILURE Equipment Malfunction

DRBD SYNCHRONIZATION FAILURE System resources overload

SWITCH CONFIGURATION LOAD FAILED Underlying Resource Unavailable

SWITCH CPU TEMPERATURE High temperature


EXCEEDED

SWITCH CPU UTILIZATION EXCEEDED Threshold Crossed

SWITCH IMAGE CHECK FAILED File Error

SWITCH MEMORY UTILIZATION Out of Memory


EXCEEDED

SWITCH PORT ERROR Threshold crossed

FIELD-REPLACEABLE UNIT REPLACEABLE UNIT MISSING


UNAVAILABLE

BUS ERROR Equipment Malfunction

CPU MALFUNCTION Processor Problem


CURRENT OUT OF LIMIT Power supply failure

BOOTING FAILURE Equipment Malfunction

SYSTEM FIRMWARE ERROR Equipment Malfunction

POWER UNIT FAILURE Power Problem

PLATFORM SECURITY VIOLATION Invalid parameter

TEMPERATURE OUT OF LIMIT Temperature Unacceptable

MEMORY ERROR Equipment Malfunction

BATTERY FAILURE Battery breakdown

FAN SPEED OUT OF LIMIT Cooling Fan failure

CLUSTER MANAGEMENT NODE DISK Equipment malfunction


OUT OF SYNC

SHELF MANAGER UNAVAILABLE Equipment Malfunction

FIELD-REPLACEABLE UNIT TYPE Equipment Malfunction


MISMATCH
VOLTAGE OUT OF LIMIT Power supply failure

SIGNALING GATEWAY/SIGTRAN Software Error


CONFIGURATION ERROR

SCCP USER OUT OF SERVICE Unavailable

MESSAGE TRANSFER PART 3 POINT SS7 Protocol Failure


CODE CONGESTED

INVALID MESSAGE RECEIVED BY Invalid MSU received


MESSAGE TRANSFER PART 3

MESSAGE TRANSFER PART 3 LINK AND SS7 Protocol Failure


ROUTE UNAVAILABLE

MESSAGE TRANSFER PART 3 SS7 Protocol Failure


POINTCODE INACCESSIBLE

SWITCH CONFIGURATION OUT OF SYNC Connection establishment error

DIGITAL SIGNAL PROCESSOR FAILURE Underlying resource unavailable

DIGITAL SIGNAL PROCESSOR OUT OF Software Error


SYNCH THRESHOLD EXCEEDED

HARD DISK DRIVE FAILURE Disk problem

PACKET TIMING UNIT ETHERNET LINK Link failure


FAILURE
BLADE SELF TEST STATUS FAIL OR Equipment Malfunction
PENDING

INTERNET CONTROL MESSAGE Transmission Error


PROTOCOL MONITORING SESSION
DOWN

DRBD DEVICES FORCIBLY STARTED UP Lost redundancy

AUTORETURN FROM FAULTY DELIVERY Software Error


TO WORKING DELIVERY DONE

CLUSTERTRACEMANAGER CPU USAGE Threshold Crossed


OVER LIMIT

MEMORY ERRORS Equipment Malfunction

HARDWARE ENTITY MISSING EQUIPMENT MALFUNCTION

DISK SPACE IS GETTING FULL Threshold crossed

DISK SPACE IS ALMOST FULL Threshold crossed

LOW SYSTEM MEMORY Out of Memory

SIGNALING RESOURCE EXHAUSTING Resource at or Nearing Capacity

SIGNALING RESOURCE UTILIZATION THRESHOLD CROSSED


EXCEEDED
SIGNALING DISTURBANCES SS7 protocol failure

SCTP ASSOCIATION CONGESTION CONGESTION

REDUCED SIGNALING CONNECTION Call Establishment Error


ESTABLISHMENT SUCCESS RATE

SIGNALING CONFIGURATION MISMATCH Configuration or customization


IDENTIFIED error

THERMAL THROTTLING ACTIVATED High temperature

SCTP PATH FAILURE call establishment error

APPLICATION SERVER FAILURE call establishment error

SIGNALING SERVICE INTERNAL FAILURE Software Error

SIGNALING ACTIVATION FAILURE Configuration or Customizing


Error

SERVICE INITIALIZATION FAILURE SOFTWARE PROGRAM ERROR

SCTP ASSOCIATION FAILURE call establishment error

CUSTOMIZABLE EXTERNAL HW EXTERNAL POINT FAILURE


ALARM_1
CUSTOMIZABLE EXTERNAL HW EXTERNAL POINT FAILURE
ALARM_2

CUSTOMIZABLE EXTERNAL HW EXTERNAL POINT FAILURE


ALARM_3

CUSTOMIZABLE EXTERNAL HW EXTERNAL POINT FAILURE


ALARM_4

CUSTOMIZABLE EXTERNAL HW EXTERNAL POINT FAILURE


ALARM_5

CUSTOMIZABLE EXTERNAL HW EXTERNAL POINT FAILURE


ALARM_6

CUSTOMIZABLE EXTERNAL HW EXTERNAL POINT FAILURE


ALARM_7

CUSTOMIZABLE EXTERNAL HW EXTERNAL POINT FAILURE


ALARM_8

CUSTOMIZABLE EXTERNAL HW EXTERNAL POINT FAILURE


ALARM_9

CUSTOMIZABLE EXTERNAL HW EXTERNAL POINT FAILURE


ALARM_10

CUSTOMIZABLE EXTERNAL HW EXTERNAL POINT FAILURE


ALARM_11

CUSTOMIZABLE EXTERNAL HW EXTERNAL POINT FAILURE


ALARM_12

CUSTOMIZABLE EXTERNAL HW EXTERNAL POINT FAILURE


ALARM_13
CUSTOMIZABLE EXTERNAL HW EXTERNAL POINT FAILURE
ALARM_14

CUSTOMIZABLE EXTERNAL HW EXTERNAL POINT FAILURE


ALARM_15

CUSTOMIZABLE EXTERNAL HW EXTERNAL POINT FAILURE


ALARM_16

CUSTOMIZABLE EXTERNAL HW EXTERNAL POINT FAILURE


ALARM_17

CUSTOMIZABLE EXTERNAL HW EXTERNAL POINT FAILURE


ALARM_18

CUSTOMIZABLE EXTERNAL HW EXTERNAL POINT FAILURE


ALARM_19

CUSTOMIZABLE EXTERNAL HW EXTERNAL POINT FAILURE


ALARM_20

CUSTOMIZABLE EXTERNAL HW EXTERNAL POINT FAILURE


ALARM_21

CUSTOMIZABLE EXTERNAL HW EXTERNAL POINT FAILURE


ALARM_22

CUSTOMIZABLE EXTERNAL HW EXTERNAL POINT FAILURE


ALARM_23

CUSTOMIZABLE EXTERNAL HW EXTERNAL POINT FAILURE


ALARM_24

CUSTOMIZABLE EXTERNAL HW EXTERNAL POINT FAILURE


ALARM_25
CUSTOMIZABLE EXTERNAL HW EXTERNAL POINT FAILURE
ALARM_26

CUSTOMIZABLE EXTERNAL HW EXTERNAL POINT FAILURE


ALARM_27

CUSTOMIZABLE EXTERNAL HW EXTERNAL POINT FAILURE


ALARM_28

CUSTOMIZABLE EXTERNAL HW EXTERNAL POINT FAILURE


ALARM_29

CUSTOMIZABLE EXTERNAL HW EXTERNAL POINT FAILURE


ALARM_30

CUSTOMIZABLE EXTERNAL HW EXTERNAL POINT FAILURE


ALARM_31

CUSTOMIZABLE EXTERNAL HW EXTERNAL POINT FAILURE


ALARM_32

SIGNALING OBJECT STATUS Transmission error


FLUCTUATING

D-CHANNEL FAILURE CONNECTION ESTABLISHMENT


ERROR

HW-EVENT RATE OVER LIMIT Equipment Malfunction

ESW Upgrade Failed Software Error

AHUB CONFIGURATION OUT OF SYNC Configuration or Customizing


Error
PACKET TIMING UNIT CLOCK SYNCHRONISATION
SYNCHRONIZATION FAILURE PROBLEM

PACKET TIMING UNIT BOOT INITIATED SOFTWARE ENVIRONMENT


PROBLEM

PACKET TIMING UNIT SUPERVISION SOFTWARE PROGRAM ERROR


ALERT

PACKET TIMING UNIT SYNC SOFTWARE PROGRAM ERROR


CONFIGURATION ERROR

FIBRE CHANNEL SWITCH STATUS Equipment Malfunction


CHANGE

FIBRE CHANNEL SWITCH PORT STATUS Equipment Malfunction


CHANGE
Event Type Default Severity

Major

Major

Major

Minor

Minor

Quality of service Warning

Warning

Minor

Major

Major
Processing error Major

Quality of service Major

Major

Minor

Warning

Major

Warning

Warning

Warning

Major

Warning

Critical
Minor

Major

Major

Warning

Major

Equipment Minor

Minor

Equipment Minor

Major

Processing error Major

Processing error Warning

Processing error Major


Processing error Critical

Minor

Major

Warning

Critical

Communications Warning

Quality of service Warning

Major

Communications Minor

Critical

Warning

Minor
Processing error Minor

Processing error Minor

Warning

Minor

Processing error Major

Equipment Critical

Communications Major

Major

Minor

Processing error Minor

Processing error Warning

Quality of service Major


Quality of service Major

Processing error Major

Processing error Major

Communications Minor

Communications Minor

Processing error Major

communicationsAlarm Minor

Processing error Critical

Processing error Major

Quality of service Minor

Processing error Major

Communications Major
Processing error Major

Quality of service Major

Processing error Critical

Processing error Critical

Major

Processing error Major

Processing error Major

Processing error Major

Equipment Major

Equipment Major

Equipment Major

Processing error Critical


Communications Critical

Processing error Major

Equipment Major

Processing error Major

Processing error Minor

Processing error Minor

Processing error Minor

Equipment Warning

Equipment Minor

Equipment Major

Processing error Minor

Quality of service Warning


Equipment Critical

Equipment Major

Quality of service Major

Processing error Major

Environmental Critical

Quality of service Major

processingErrorAlarm Major

Processing error Major

Quality of service Major

Equipment Major

Equipment Critical

Equipment Critical
Equipment Critical

Equipment Critical

Equipment Critical

Equipment Critical

Processing error Major

Environmental Major

Equipment Critical

Equipment Major

Environmental Critical

Equipment Critical

Equipment Critical

Equipment Critical
Equipment Critical

Processing error Critical

Communications Major

Communications Major

Communications Major

Communications Major

Communications Major

Communications Major

Processing error Critical

Processing error Critical

Equipment Critical

Equipment Major
Equipment Minor

Communications Minor

Equipment Major

Processing error Warning

Quality of service Warning

Equipment Warning

Equipment Warning

Quality of service Warning

Quality of service Major

Processing error Major

Quality of service Warning

Quality of service Critical


Communications Warning

qualityOfServiceAlarm Warning

Communications Warning

processingErrorAlarm Minor

Environmental Warning

Communications Minor

Communications Critical

Processing error Critical

Processing error Major

Processing error Major

Communications Major

Environmental Major
Environmental Major

Environmental Major

Environmental Major

Environmental Major

Environmental Major

Environmental Major

Environmental Major

Environmental Major

Environmental Major

Environmental Major

Environmental Major

Environmental Major
Environmental Major

Environmental Major

Environmental Major

Environmental Major

Environmental Major

Environmental Major

Environmental Major

Environmental Major

Environmental Major

Environmental Major

Environmental Major

Environmental Major
Environmental Major

Environmental Major

Environmental Major

Environmental Major

Environmental Major

Environmental Major

Environmental Major

Communications Critical

Communications Major

Equipment Critical

Processing error Major

Processing error Major


Equipment Major

Processing error Minor

Equipment Major

Processing error Major

Equipment Minor

Equipment Minor
Meaning Effect

The alarm system has been requested to raise or


clear an alarm with incorrect alarm data. One or
more
arguments provided with the request might have
an invalid value or meaning: null or they might be
The number of active alarms has reached its
empty, too long,
maximum in the Network Element.
out of specified range, containing non-printable
While the number of active alarms
characters, or have an incorrect format. The
in the Network Element has reached maximum,
alarm number (Specific
further requests for raising new alarms are
A failing node
Problem) mightstays
also un-reachable
be unknown. for a
declined (that means, the alarms
configured
An incorrect amount
formatofintime. It is possible
this case means, for that the
are lost). Clearing of the alarms is possible.
node
example, is unable
that ato restart, value was entered
character
or
where is stuck. Important
a numeric valueservices/functions
was expected. A that are
special
provided
case of anbyincorrect
an active-standby
format is recovery
when the group (")
quotes
The number of active recovery units within the The alarm is informative by nature and indicates
may be taken over
surrounding by other
load sharing the groupvalue
has dropped below the that the load sharing group cannot maintain the
operational
of an info nodes.
field are Servicesfrom
missing mayanbealarm down if the
predefined threshold. This can happen acceptable service level.
standby
notification nodes areinalso
record the down.
syslog.
because of:
1) The alarm whichactions
management is requested
(number to be raised
of recovery
The
or alarm with
cleared is raised whendata
incorrect there is is
nota further
reason to Someone may be trying to intrude on a network
units or nodes hasbeen locked), 2) a series of
suspect
processed, thatbutsomeone is trying is to put
intrude on a element.
node failures orthe information
3) continuous failure as
to
network
additional element.
info in Thisalarm.
this condition emerges
restart the recovery unit(s) within the load sharing
ifIf there
the alarmare too manyissubsequent
number unknown, then failed login
the actual
group.
attempts.
fault
The disk storage area which is reserved for also
for which the alarm has been raised is the The disk database is still fully operational. If the
unknown.
disk database is filling up. When the database fills up
disk is almost full, an immediate action is completely, its services cannot be used anymore.
required from the user. The
maximum size of database (DB) is limited by the
This alarm indicates that the maximum threshold The precise effect of this alarm cannot be
available file-system space in the
value (based on the threshold rule defined for the determined since the nature of the alarm
working directory (the actual location is defined
measurement result) has been depends on the measurement(s) involved in the
by the used storage deployment, for example
detected. The severity of this alarm depends measurement result.
/mnt/db/<dbname>). There are three
upon the measurement(s) that reach(es) the
This is an informative
fillRatioAlarm alarm which
limits: warning, major,indicates that
and critical. If the license has expired and it was the only
defined threshold value.
the
These license
limitsexpiration warning limit
can be configured has been
through the license for the feature in question, the application
reached,
configuration or the licenseby
directory has expired.
the database implementing the licensed feature
The license The
application. expiration warning
disk usage limit (notdirectory
of working will stop operating.
configurable)
has of how early the impending expiry
This alarm indicates that the Network element's If a processor is constantly used at a very high
is reported
reached one is of
fixed
these to 30 days before
configured the
limits.
CPU is overloaded because certain processes utilization level, some functionality of the system
license expiration time. The severity is MAJOR if
are consuming a lot of processing might appear very slow. For
the license has expired. The severity is MINOR if
power. There is a risk that the node is unable to example, the execution of some SCLI commands
the remaining days before the
accomplish the tasks allocated to it. This takes an unusually long time to finish.
The
license available
expiresdisk space on
is greater a partition
than zero. is smaller There is a risk that some data cannot be written
depends on the processes that consume
than the minimum requirement. The partition can on the disk.
CPU time and to what extent they are blocking
be filled up by crashing programs (or by large log
other processes from getting their runtime on the
files if the rotation of logs is malfunctioning)
CPU. It also depends on whether there
resulting in large core files.
is a temporary or a permanent increase in the
CPU utilization.
The alarm is raised when a named managed The effect of the situation depends on the
object (MO) has failed. The managed object is a managed object type:
logical entity. Node: Any of the important services/functions
The following are identified as managed object that are provided with an active-standby or N+M
types: recovery group, may be taken over by
This alarm indicates that the total memory usage There is a risk that the node is unable to
Node: The computing node, its system software other operational nodes. Services may be down if
on the node (running processes and tmfs file accomplish the tasks allocated to it because
or operating system has failed, or the node has standby/spare nodes are also down.
systems) is too high. processes cannot reserve enough memory for its
been manually restarted. Recovery unit (RU): If the recovery unit belongs
usage. As a result, tasks allocated to those
Recovery unit (RU): A recovery unit contains one to an active-standby or N+M recovery group, the
processes are not executed.
The
or morefaultprocesses.
detector inAthe operating
recovery unitsystem
failure has
is The state
service of the
may named
be taken device
over by ancannot be
failed
usuallytocausedcaptureby the statisticsfailure.
a process of the usage of a discovered, which may indicate
operational standby/spare that unit.
recovery there are
given
Process: resource.
The process has crashed, terminated some fundamental
Process: The service problems withthat
or function it. the process
abnormally, or stopped responding. provides is not available. A process failure can
Recovery
The node group cannot(RG): receive A recovery
or transmit group
dataconsists
over cause a recovery action at the
of one
the network.or more recovery units. A recovery group recovery unit level, or the system may attempt to
failure alarm is raised for an restart the failed process.
active-standby configuration, when both Recovery group (RG): The service provided by
redundant components (recovery units of the the recovery group is not available. Manual
The administrative
recovery group) have state of the
failed. Thisnamed managed
is always a When an MO
correction is locked,asthe
is required, thealarm system of the
automatic
object
serious(MO) whichas
situation, can be a cluster,
it indicates a node,
a double or a
failure cluster
system clears
repair the alarms
actions raised
have by the the
not solved MO and its
recovery
(for example, unit (RU)
two nodeshas changed
have failed to at the same subordinateMOs.
problem.
LOCKED
time). as a result of a user action (graceful High availability services (HAS) will periodically
shutdown or lock operation). attempt to solve the problem with corrective
The whole cluster is starting or restarting.Starting
The named MO and its subordinateMOs have actions, such as switchovers or
or restarting of the whole cluster means
been stopped and restarts. The alarm system also clears the
(re)starting of all managed objects
will not be restarted before a corresponding obsolete alarms that may have been raised by
within the cluster. The (re)start may have been
unlock operation is performed by the user. The the faulty managed object, or by its subordinate
initiated by an operator or caused by fatal errors
This
service is an informative
provided by the alarm
MO is which
not indicates that The operations
managed have different effects:
objects.
in some critical component(s). When the cluster
an operator
available, has initiated
unless the MO aiscluster
a RU withoperation
some on Switchover - Applicable only to recovery groups
is restarted, the alarm system clears all alarms
the specifiedand
operational managed
UNLOCKED objectredundant (RG). The active RU instance of the RG is
that were raised by the cluster's managed objects
(MO).
resources. The MO can concern the whole cluster, a terminated and a standby instance on
before the restart.
node, a recovery unit (RU), or a process. The another node started or, in case of a hot active
This is an informative alarm which indicates that
platform high availability services standby RG, activated. The service provided by
the specified managed object (MO) is being shut
(HAS) is now executing the operation. The the named RU is down until the
down. The MO can be the whole
operation can be - switchover - restart. switchover is complete.
cluster, a node, or a recovery unit (RU). The
Restart - For the cluster and nodes this means a
named MO and all its unlocked sub-resources
This is an informative alarm which indicates that physical restart (reboot) of node(s). For other
are now being terminating.The MO is being
the specified managed object (MO) which can be MOs, the named MO is stopped
shut down by an operator. All services provided
the whole cluster, a node, or a and restarted. The services provided by the
by the named MO are being terminating. Once
recovery unit (RU) has been unlocked. The named MO are down during the restart.
the operation has been completed, the
named MO and its unlocked sub-resources (if
The platform high availability services (HAS)
administrative
there are any) can now be activated.Notice
have
state of initiated
the MO a switchover. This maywill
and all its sub-MOs bebe
a
that the MO (or its sub-MOs) can remain locked
recovery
changed action to locked. for a recovery unit (RU)
because of the dependency on higher-level
failure
Note that or may be a result
a shutdown of an may
request administrative
take a long
MOs. That is, the unlock operation will
operation
time if the such
maximum as a lock or shutdown of the
This alarm indicates that
not have an effect on the MO in question a minimum threshold
before The precise effect of this alarm cannot be
active
duration RUfororthe a switchover has request from specified.
value,
the based onoperation
higher-level the threshold
MOs
not
are unlocked.rulebeen
defined for the
For example, determined since the nature of the alarm
an
The explicit
shutdown recoveryrequest groupcan (RG). The new
be forced to active
measurement
a RU in a noderesult, has
will remain been
locked, depends on the measurement(s) involved in the
recovery
completion unit byis now starting
issuing a lock (a cold active
ifdetected.The
the node or the seriousness
cluster MO of command.
the In
alarm depends
is locked. measurement result.
standby
that case RG)
the or being activated
platform high (a hot services
availability
onThethe MO measurement(s)
has been set tothatthereached
unlocked the defined
state. If all
Alarm
active
(HAS) processor
standby
willvalue.
terminate cannot
RG).The open
service
the services orprovided
read the alarm
by
ungracefully. the
threshold
the higher level MOs are unlocked as well,
log
RG file.Alarm
is currently notifications
unavailable. that
Theare recorded in
normal
the
the services
alarm log provided
file cannot by the MO are activated.
operation resumes once reach
the switchover operation
the alarm system, and as a result the control of
is successful.
the alarm situation in the network element is lost.
The configuration of the alarm processor
contains an invalid attribute value or an attribute
is missing. The system ignores the
invalid value and uses a default value.
Corrupt data found in the alarm log file. The
corrupted record in the alarm log file is ignored,
which means that it is possible
that an alarm notification was lost or a more
serious system error has occurred.
This alarm is raised when an application raises or
clears an alarm that contains an internal
managed object identification (MOId),
and provides its own alarm time in external
format { TIME=E********* }. This alarm is also
This is an informative alarm which indicates that
raised if the application raised or cleared an
the alarm system is healthy and operational .
alarm which originated outside of a Network
When configured it is raised
Element, or which is issued by an external alarm
periodically to indicate to external managers that
system, but did not provide its original alarm time.
the alarm system is alive.
The alarmoriginalsystem
alarm heartbeating
is discarded. is switched off,
The alarm system raises and clears this alarm
which means that the alarm system does not
after
raise or clear its heartbeat alarms.
the expiration of each heartbeat interval. The
Alarm system heartbeating is the simplest and
raise and clear events related to this alarm are
most efficient way for the operator to monitor if
Services
visible in alarmthat are criticalItfor the operation of the
history. The situation has no immediate impact on
the alarm system itself is healthy.
system
indicates arethattypically
the alarm provided
system in is
thehealthy.
Cluster system operation. The system provides services
If the system is in a switched OFF state, the
Administrator
NOTE: There (CLA) is a delay nodes using
before the raise/clear normally as long as the remaining CLA node is
operator cannot detect if the alarm system
an active-standby
operation becomesredundancy visible. If the model.
system Theis under operational.
becomes faulty. That is why it is strongly
standby
heavy node
load, it is currently
might take not operational.
even If the remaining CLA node fails, the critical
An N+M recovery
recommended groupthe
to have doesalarm notsystem
have an
Note:
longer The actual
for operational
the namesto
operation ofbethevisible.
CLA nodes can services (cluster management functionality and
unlocked
heartbeating always switched recovery unit for the service
ON.
vary in different product applications. The term directory service) become unavailable.
instance.
Nonetheless, An N+M alarmrecovery
system group has N can be
heartbeating
"CLA node" refers to a node that
active
switched recovery
OFF if units that provide service and M
contains the centralized cluster management and
spare
an recoveryheartbeating
units. All unlocked recovery units
Thealternative
O&M services that areincritical
functionalities
exists.
the system.for theAlarm
Asystem
70247
user logsto inis The situation has no immediate impact on
that
not have been
raised or assigned ifa the value of the
DISABLED
operate
to a CLAare node provided in the management nodes
when starting system operation. The system will still provide
service instance workload
fsAlarm70247raise parameter are is active and
configured
(for
a SCLI example:
session. CLA-0In thisand CLA-1)
example, "CLA-0" and normal services as long as the currently
providing
to FALSE service.
in the The rest
alarm of the configuration
processor unlocked of
using anare
"CLA-1" active-standby
used for theredundancy CLA nodes.model. The operational management node(s) and their
recovery
the units thatDirectory.
Configuration have not been assigned a
most important service is called the "Directory". If services remain functional. An operator can
Allocation
service instance or deallocation
workloadof areresources to or from
spare recovery An operation failure has been reported for the
this alarm is raised, it means choose to set this configuration if, for example,
a computer
units. At thisnode moment, in thethe cluster
system hascannot
failed.assign defined recovery unit, while it was starting or
that an operator has locked the current standby one management node needs maintenance. If
Applications
the workloadrunning of the named in the cluster
service stopping.
FSDirectoryServer recovery unit. the FSDirectoryServer recovery unit is locked in
are
instanceoften to identified
any recovery with resources
unit, because that are
there are If the error occurred when
one management node, failure of the node that
allocated
no unlocked to the node, before
operational recoverythe application
units withoutis an application was starting, application start-up is
An operator has set the specified managed This
provides alarm theis Directory
raised when an operator
service will cause switches
the
started
service and released from
assignments. This the
can node, aborted. In case of a permanent fault, the service
object to an inert mode. The managed object the inert mode
Directory serviceontoforbeeither
downa either
set of for
nodesa few or the
after
happen, the application has the been terminated. Such - provided by the application
identifiesfor example,
a node. If theininert following
mode is set situations: cluster.
minutes (until the node has successfully
resources
Multiple nodecan,reboots
for example,have be TCP/IP
been initiated. - is down. With a transient or node-specific fault,
for the whole cluster, then this alarm is raised The
rebooted)inert mode has the following
or permanently effects
(if the node ontothe
fails
addresses
Multiple node thatfailures
are associated
have with the service and provided that the application has a standby
separately for each node. behavior
start). of the system
Directory service in nodes foroften
downtime which has the
A Networkbyelement
provided
happened. the software. (NE) has detected, based on The
resource,user account cannot may
the application be used to log into the
While the inert mode is on, high availability inert
immediate mode impact
has been on switched
the on: provided
services
theIn NetAct
addition,
-services Remote
the
One or more service User
application Information
instancescan allocate and
have problems NE.
restart successfully on another node. If the fault
-byIf the
there are noelement.
network failures, the service provided by
Management
deallocate
causing other (RUIM) and
resources LDAP (Lightweight
(for example, start Note
occurred that in case
while theofapplication
error type was terminating,
(HAS) do not attempt to recover services fromthey
failures for the recovery units where the network element is not affected.
Directory
and
have stop
been Access
assigned. Protocol) access control lists, NO_MATCHING_GROUP_ERROR
the node on which the error (listed in
failures, for example, by restarting nodes or - If failures occur, no recovery actions are
that
3rd an
party external
applications) userunitsaccount defined in thea Application additional Information field), a user is
-A All spare
network
applications, recovery
element (NE)inexperiences
or by performing
its
arecontrol
locked scripts.
when
problems occurred
Login
performed
now
of anand restarts,
external to restore
user ismay
the service deniedit to
witha known
be affected. an
NetAct
recovery LDAP
unit user database
failure happens. has no permissions still
state. able
If to log
the nodeintohas
therestarted
NE. successfully, or
with the connection
switchovers within the to the NetActmanaged
specified appropriate
For example, error
if acode and fails,
process all authorization data
it is not restarted
for
The this NE. associated with the named service
workload the application has a standby
external
objects. Lightweight
Note that theDirectory
inert mode Access
should Protocol
be used replications
by HAS. fail.
instance
(LDAP) (part of the service
server. provided by the resource, the application likely has already
only by representatives of qualified - Process failures are still propagated to the
According
recovery to the isNE securitydown. architecture, the restarted, and service is again available.
suppliers group)when analyzing currently problems in the recovery unit level, but the recovery unit level
remote user accounts are replicated locally.
The
system.NE contacts the external NetAct LDAP fault recovery does not take place.
However, the user account is not replicated if the
server
The inert to verify
mode the password
is switched onusing PAP an SCLI
by issuing In practice, this means that the propagated
external username violates any of the guidelines
(Password
command. For Authentication
example: Proxy) and to obtain process failure does not cause restarts of other
set for the external user account.
authorization
set has inert on information
managed-object of an external
/CLA-0user. recovery unit processes, and
The command above switches the inert mode on switchovers do not take place with
for the /CLA-0 node. Accordingly, the inert mode active/standby recovery groups.
can be switched off by using the - HAS logs pending recovery actions to journal
following SCLI command: (journalctl command on the active management
set has inert off managed-object /CLA-0 node) in the form "INFO Inert mode
set for <managed object name>. Recovery
Currently, there are open sessions for the This alarm can indicate that some users operate
network element (NE) that operates according to within the NE with higher permissions than
outdated authorization profiles. This situation allowed by NetAct according to a changed user
occurs when there are changes in the NetAct account authorization profile. There are four
Lightweight Directory Access Protocol (LDAP) possible reasons for this:
The configuration of an application contains an Depending on the severity of the alarm, an
affecting those external Remote User Information
invalid attribute value or an attribute is missing. application's start-up or run-time session can:
Management (RUIM) user accounts (or 1.A non-existent user is still logged into the NE
- fail (CRITICAL severity)
permissions associated with those accounts) (user account removed in NetAct).
- partially lose functionality (MAJOR severity) or
which were replicated into the NE local user
- ignore the invalid configuration and use the
This alarmThe
database. is raised when
change canabeprocess
one offails
the to start A
2.AProcess
user with (identified
no permissions by the APPID) for the might
NE is fail loggedto
default or closest acceptable value/s (MINOR
because
following:of unavailability of some service. The come
in (user upaccount
because hasof unavailability
been detached of from
any of thethe NE
severity)
process tries to access the following
accordingService to RUIM types:
Access Control Lists).
service
- The userfor account
a limitedhas
amount
beenofremoved
time andinlimited
NetAct. Node: The computing node, its system software
number of attempts. If the service still remains or
3.A operating system has failed and is not
This alarm is raised when an alarm notification is If theuser has higher permissions than defined in
unavailable,
-detected the alarm
The userforaccount is raised.
cannot be used to access available.
a specific problem (alarm number)the NetAct (permissions for the user account were
fsRaise70280insteadOf70005forUnknownSP
NE anymore. Recovery
lowered). unit (RU): A recovery unit which
that is unknown to the alarm parameter is set to true (default value) in the
contains one or more processes is not available.
system (the corresponding alarm type is not alarm processor configuration in
-defined
Permissions associateddata). with this account have Process:
4.A user has A processlower (HAS or non-HAS
permissions aware)in
in the reference Configuration Directory, then thethan defined
original
This
changed alarm in indicates
NetAct. that the license When,
has
NetAct failedfor a prolonged
and
(permissions period
is not available.
for the userof time,
account the werelicense
notification that has the unknown specific
management server failed to fetch the target ID management
Recovery
raised). group server(RG): fails to fetch the
A recovery target
group which ID of
problem (alarm number) is ignored and
of the network
Currently, thereelementare active (NE). user With this, it opened
sessions the NE, itofwill
consists one goorinto more a recovery units is not
alarm 70280 (UNKNOWN SPECIFIC PROBLEM)
can
before only the accept emergency licenses
above-mentioned changesand were not state
available.
Note where
that it only 1-3accepts licenses which are not
is raised incases
its place. indicate
If the a security risk.
target-locked licenses. bound
System toResource
any specific NEresource
(target). provided
Emergency the
detected in the NE. Within
The synchronization between active and standby those already created If the alarm is raised, the standby databaseby
fsRaise70280insteadOf70005forUnknownSP
: Any is not
user licenses,
system is for example, are
unavailable. such Ports, IP
Example:
database is failing. This can be due to a not
sessions, access control changes are in synchronization
parameter is set towith the active database. This
automatically taken into effect. Users by logged licenses
addresses, andDisk are Spacenot locked and against
so on. a target ID.
temporary overload situation caused high in might
false, then lead to theaalarm degraded system redundancy
raises alarm of the 70005
with affected user accounts still continue to or Target-locked
The service or licenses
function cannot
that the be installed
process provides
application load, a network related problem, system, which ALARM
(INCORRECT is also shown DATA). through the
operate with the old permission set. anymore
is not available. since the verification of the
due to an outage on one of the database nodes. persistent states and resource states of the
There is an unexpected high number of The
target alarm
ID presentis raised in when
them resultsthe sufficient
in failure.number of
There are three different levels of database watchdog and resource controller. As a
connections
Please note that fromonly the database applications. The free spare connections
Target-locked licenses already is not available.
installed Ifwill thealso
synchronization loss: sessions maintained in consequence of the alarm, the system tries to re-
maximum
/var/run/utmp number of connections
are monitored. is limited
Currently, those byare
a number
be automatically of spare disabled connections untilis getting to low, it
establish the database synchronization.
system
only SSH defined
sessions default
(ftp value
sessions in the database
opened with might
the target cause ID problems.
becomes available For example, again. in This
- The "warning" severity is generated when the
configuration. Applications can connectbut to theftp DB switchover
failure will situations
happen due where
to the new connections
following are
vsftpd
The alarm
replication are also lags visible
indicates very that in /var/run/utmp,
slightlythe behind
subsystem cannot
the active The subsystem
following severity raisingindicators
the alarmare noreasons:
will supported
longer be
as long asare
sessions thenot number
possible of connections
with external has
user not established
1. Target id by the
e.g. NE newly
ID) activated
has not been application,
properly
write
instance. the files to the target
A "warning" alarmdirectory.
is reached Thealready
most able
by to alarm:
this write new files to the target directory. The
reached
accounts the maximum.
according to the If the maximum
platform is
configuration). while
configured old connections
via CAM. are still being cleaned up.
probable
after a 5-second cause for
delaythis could
to give be
an early indication effect will vary from one
reached,
For other this
types alarm is raisedno
of sessions withalarmseverity
is saturation.
raised. In
2. other
A coding situations,
error. when applications establish
that
when the theresult
problem directory
appears. is approaching subsystem
"Warning": database to another.synchronization
For example, inweakens case of
"warning". new
Note: connections,
The license this might leadtarget
management to unexpected
ID of the
Other reasons, though very unlikely, could be performance
for more than management
5 seconds, (PM), the
special server
observation isis
The alarm indicateslist that a Bidirectionalis currently The
failures
NE peer
may innetwork
other element
parts of the that is under
network BFD
element.
Note:
permission
-Forwarding The session
The severity issuesof the oralarmin /var/run/utmp
storage is session
"minor" for a shortthe unable
activated tobe different
write the result from the
file HOSTNAME
to the and
Additionally, Detection
a threshold (BFD)value between
of remaining monitoring
the NEdirectory,
ID as is unreachable,
used or inother
the DOWN state.
accessible
node(s)
break. The with
failure."minor" SCLI. alarm is raised if the standby result andinthis NMS willandend up in losing
configured
number of freepeers is switched
available from its UP
connections cantobe set. management operations.
instance is only a short period of time behind the performance
"Minor": counter information
fsdbInSyncLimit (value from that is collected
DOWN
The state.
threshold defines, when this alarm should be
active instance. In this OutOfSync situation, a during
Configurationthis period. Directory) reached, standby
This alarm
raised. The isalarm
raised is due
issued to theif thefollowing
specificreasons:
One
forced of switch-over
the cluster management
to the standby functionality
causes only This
database alarmisisOutOfSync, a sign of leakage in the network.
synchronization repairIt
1. The peer
threshold network element
is crossed, might be incleared
and automatically the if
nodes
very (CMFN)
recent changeshas received
to be cluster management
lost. is raised when
mechanisms are theactivated
cluster management
DOWN
the number state. of available free connections is again
messages of unknown origin. messages are received simultaneously from
2. The than
higher two-way connectivity
the threshold between
value. the local
The check for
-system
The "critical" alarm is raised if is
thenotfailure more
"Critical":thanfsdbAsyncRepStandaloneLimit
one senders. This may be possible (valueif
the alarm and is thedone remote
for both system
the single functional.
nodes, as
continues forisanot longer alarm period of the cluster management messages, that in turn
This
3. The
well asis BFD
an
theinformative
cluster enablednodes. inwhich thetime, tonetwork
indicates
peer indicatethat from Configuration Directory)
If the license state for the feature, which is reached, standby
that the standby utilize the is multicast messages, are recovery (from
the license
element. state does not have any
for a particular featurevalidis OFF database
identified by unavailable,
the feature code database displayed in the
database content. being leaked from another cluster. While the
but the admin stateAofswitchover
the in such a scratch) is activated
"Identif appl. addl. info" field,
situation would make the standby (if working) network leakage is a serious problem, the cluster
feature
The BFD is limits
still ON. the maximum number of alarms is OFF, the application implementing the licensed
clean up any leftovers from the earlier local should
In casetolerateof the this. However,
"minor" alarm, a the short break, the
raised from a single node to 50. If more than 50 feature will stop operating.
This
standby is ancopy, informative
and start alarm
with that
an could database.
empty be Based
restart onathe
of
re-synchronization CMFN reason of for
might which
thefail thecan
during
standby alarm
thebe is
leakage,
done on
BFD sessions fail in one
generated for one of the following reasons: node simultaneously, raised, you may notice
BFD can only raise up to scripts 50 alarms.
and therefore
the available data, and the following
administrative taskscan
alarm effects:
should be
disappear
1) The
Note thatpostconfiguration
a minor alarm might just provided
be, theby initial 1) Inconsistencies
avoided.
very quickly. In thisincase Configuration
only the PostgresDirectory or in
either
indication the user of a or the subsystem
bigger problem in hasnetwork
the failed. the
Server state is of the Recovery
restarted withoutGroups a copy due of theto DBthe from
Raise
2) SCLI and cancel
scripts alarm
could notprinciples
be executed in such to the
element
situation (NE)
are: which can lead to raisingdue the failure in
the active side. one of the .sh or the
possible
severity unavailability
of the alarm to of"critical".
the SCLI daemon. .scli script.
1.
3) If a new alarm
fsconfigure needs
-- save to be raised,
command failedand during the 2) If the SCLI daemon is not running
count In the "critical" case, it requires some,longer
the .scli time.
postconfiguration operation. This commandBFD
of alarm in this node is less than 50, is scripts
The standby provided by the user/subsystem
is restarted from scratch with willthe not
raises
executed theto alarm.
save the configuration be executed.
2.changes
If a new alarm needs to be raised, and the initial copy of As theaDB result,
fromthe the active side.
provided by the user/subsystem. configuration changes expected as part of
count of alarm in this node is 50, BFD does not the .scli scripts will not take place.
raise the alarm but saves this alarm information 3) The added configuration changes will persist
into BFD buffer sorted by the timestamp. only untill the next reboot if "fsconfigure --save"
3. If BFD cancels one alarm so that the total command has failed.
count of alarms per node drops under 50, it In all cases, any configuration changes made
This alarm indicates that the subsystem could not The affected subsystem might be running without
create certificates in the /etc/certs/<CertMan certificates (plain-text) or may shutdown.
domain>/ directory.
Note: If the subsystem used certificates for
SSL/TLS connections, it could be either disabled,
This alarm indicates that the Remote User This alarm indicates that the RUIM TLS
or functioning in a plain-text mode.
Information Management (RUIM) Transport Layer connection was unsuccessful resulting in
Security (TLS) connection to the external NetAct replication and authentication failure for RUIM
LDAP server has failed. users.
This alarm indicates that configuration under This is an informative alarm. It can be safely
This alarm is raised due to the following reasons: Note: The user is still able to login by changing
cluster Configuration Directory Server (LDAP) ignored if the configuration changes do not need
1. The certificate is missing or rejected by the RUIM configuration to use "PLAIN_TEXT" mode.
has been modified but the modifications to be saved.
external NetAct LDAP server.
were not saved. In case of a cluster reboot, all
2. The external NetAct LDAP server is not
the configuration changes since the configuration
This alarm indicates
supporting TLS connections, that the operator has
or the connections Alarm system will clear all active alarms from
was last saved will be lost.
performed
to it cannotabe restore of the alarm
established for anysystem to a
other reason backup version.
The alarm is raised after a predefined period of
previous
(see alsobackupthe alarm version.
70268).
time has passed since the first configuration
change. Note that any kind of change
Note:
This Whenwith an TLS connection isindicates attempted but
in thealarm
configuration warning
data severity
- even if changed that Exceeding the number of supported routes may
no
the connection
number of is made
routes because
added in theof a connection
local not cause any immediate effects, but it can lead
immediately back to the original value - is
problem
forwarding which is not specific to TLS (for the
considered table as a changeof a node from hastheexceeded
viewpoint to degradation of the performance and stability of
example,
maximum wrong
number IPof address),
routes that botharethe alarms in
supported the system.
of this alarm.
70268
the node. and 70358 are raised.
The local clock is not accurately synchronized When this alarm is raised, the system time is not
At some point, route addition may fail, triggering
with the Network Time Protocol (NTP) time in sync with the NTP server and is noticeable by
If more routes are added to this forwarding table, a more severe alarm which leads to an unknown
sources. This alarm is raised because the time the operator. The system cannot be used for
the alarm is raised again with major severity, state of the node, and inconsistent behavior of
difference between local and NTP sources is distributed applications that require correctly
indicating that one or more routes could not be the system.
longer than one second. synchronized network time (such as IPSEC-VPN
The
added ntpd application
in the table of running
the node. onThis
the system
alarm isis not Lost connectivity might lead to the system clock
establishment). It makes system logs unreliable.
able
raised tobecause
sync timethe with the forwarder/peer
capacity of the forwarding ntpd being out of sync and system local time could be
Incorrect timestamps are done by the system
because
table hasof beenany exceeded.
one of the following reason: incorrect.
leading to security vulnerabilities, wrong auditing,
1. Lost connection with the forwarder.
accounting, authentication and incorrect
2. The ntpd application is rejecting the forwarder
This is an informative alarm which indicates that If the test-license
functioning of the state
network is set
fileto 'enabled'
system in the
(NFS). An
response because the forwarder is considered
the network element (NE) is set into a state configuration
inaccurate system directory,
clockthenalsoNOKIA
affects internal test
the operator
not suitable for synchronization at that moment.
where it only accepts test licenses licenses can be installed
billing system.
used inside NOKIA laboratories, but not into the NE. Commercial licenses cannot be
Such forwarders would be declared as
commercial licenses. installed anymore. The license-management
This alarm indicates that an X.509 digital
unreachable. If a CA certificate (trust anchor or intermediate
server will be automatically restarted when
certificate configured in the network element is CA certificate) expires, client services cannot
the test-license state is changed using the
about to expire or has expired. establish SSL/TLS connections
unsupported vendor mode SCLI command "set
The certificate can either be a certification with external server applications since they
license test-license state <state>"
authority (CA) certificate or an end entity (EE) cannot authenticate received server certificate. If
This alarm indicates that the automatic key The EEitcertificate(s)
so that of the applicable
takes the changed state into use. NE service
certificate. The "DTE" an EE certificate or intermediate
update request (KUR) operation has failed. The domain(s) were not updated (renewed)
(Days To Expire) value present in the "Appl. addl. CA certificate used by server services expires,
automatic KUR operation is performed by automatically. The corresponding NE services
info" of the alarm indicates the number of days peer client applications might not establish
certificate-management framework only if the might not be able to establish SSL/TLS
remaining before the SSL/TLS connections with these server
network element (NE) configuration supports it. communication after the expiry of their
This is an expires.
certificate informative The alarm
severitywhich of indicates
the alarmthat The EE certificate(s)
applications of the element
in the network respective NE service
(NE). If the
This operation is intended to obtain the updated certificates, if they are not updated. The alarm
the automatic
increases as the keynumber
update of request (KUR)
days remaining for domains
domain ofhave been updated
the certificate is not(renewed).
"default", itThe means
(renewed) X.509 digitally signed end entity (EE) "70382 - CERTIFICATE ABOUT TO EXPIRE OR
operation
the certificatehas to beenexpirecompleted
decreases. successfully. The updated
the expiryEE of certificates
the certificateare taken into use, that is
certificate from the Certificate Management EXPIRED" for EE certificate(s) will still remain
automatic
The severity KUR of theoperation is performed
alarm depends on the by days
the the corresponding
will impact only theNE services
specific startowning
service to use the
this
Protocol (CMP) server, install it into active in the NE for the corresponding service
certificate-management
remaining to expire. framework, only if the updated
domain. certificates
For example, forifSSL/TLS
domain communication.
is "ruim", then
The alarm indicates
corresponding that machine/VM/node
NE service domain and saveisthe The node which raised the alarm might not reach
domain(s).
network
Days elementto(NE)
remaining expireconfiguration
in the range supports
of 8 - 30 it. the expiry will impact only RUIM
running
current withconfiguration
NE insufficient amount of memory
as the start-up full capacity.
As
dayspart: of this operation, the updated (renewed)
MINOR The current
service. NE
If the configuration
domain has beenissaved as
of the certificate
compared
configuration to required
(snapshot). amount defined by If the domain of the certificate is "default", then
X.509
Days digitally signed
remaining to expireend entity (EE) certificates the start-up configuration (snapshot)
impactthereby,
application running on it. Ifinthistheisrange
physicalof 4 - 7 "default", then the expiry could
the EE certificate expiration could impact one or
one or
have
days been
: MAJOR obtained from the Certificate also
moresaving
services theusing
unsaved (unintended)
certificate from
machine
In case ofthen generic the amount
system of physical
related RAMthe
failures, more applications/services using certificate(s)
Management
Days remaining Protocol
to expire (CMP)lessisserver
than 4 successfully.
days : configuration
"default" domain. changes
These (ifare
any). services which do
installed
"Managed is object"
insufficient. of this If alarm
this virtual
contains machine from "default" domain. These are services which
Subsequently,
CRITICAL those have been successfully not have own domain, but depend on "default"
then it was created with insufficient
"fsFragmentId=CertMan", virtual RAM
and the "Application depend on "default" domain for certificate(s).
installed into the corresponding NE service As part of
domain forthe automatic KUR operation, the root
certificate(s).
attached
AdditionaltoInformation
it. fields" contains the reason
domain(s). certification
Note that none authority
of the (CA)
clientcertificate
services will or the
be trust
for failure of automatic KUR operation along with Note that none of the services will be impacted
anchor
impacted is never
due toretrieved
certificateforexpiry
an NE service
if the NE wide
the list of NE service domains for which it failed. due to certificate expiration if the NE wide
The alarm also indicates that the current NE domain.
certificateHowever,
validationintermediate
policy is CA certificates
In case of domain specific related failures, the (global) certificate validation policy is set to
configuration has been saved as the start-up might
set to be retrieved,
ignore but their
expiration. Verify installation
the current willpolicy
be
"Managed object" of this alarm contains ignore expiration. Verify the current policy by
configuration (snapshot) successfully. ignored if theythe
by executing areSCLI
identical
command to the"show
already
"fsFragmentId=CertManEECert, executing the SCLI command "show security cert
installed one(s). The updated certificate(s) will
security ...".
fsCertManDomain=<domain>", and the <domain> validation-policy".
The "Application Additional Information fields" of obey the NE global validation policy.
"Application Additional Information fields"
The alarm indicates that a machine/VM/node is The node which raised the alarm might not reach
running with insufficient amount of CPUs full capacity.
required by the application running on it. If this is
a physical machine then the amount of physical
CPUs/cores installed is insufficient. If this is a
The supervision in the operating system has Storage related service or process in fault status
virtual machine then it was created with an
detected failure situation of the supervised might cause abnormal behavior of storage
insufficient amount of virtual CPUs attached to it.
storage service object. resource usage. This local storage agent can
help to trace such issue quickly.
If an alarm is raised, only when the status
The supervision in a storage manager located in The supervised local storage agent object might
recovered, the alarm will be cancelled.
the management node has detected a failure of not work properly and the storage service objects
the supervised local storage agent object. The supervised by local storage agent might not be
alarm raising/clearing from different management supervised properly.
nodes to the local storage agent are
This alarm indicates that the Packet Loss Rate The network connection quality that is monitored
independent.
(PLR) value exceeds the threshold defined for a by a TWAMP session is lower than expected. A
Two Way Active Measurement Protocol higher packet loss rate can be expected.
(TWAMP) Sender session.
This alarm indicates that the average Round Trip The network connection quality that is monitored
This alarm is raised due to the following reasons:
Time (RTT) has exceeded threshold defined for a by a TWAMP session is lower than expected.
1. The peer network element might be in the
certain Two Way Active Measurement Protocol Higher average round trip time can be expected.
DOWN state.
(TWAMP) Sender session.
2. The network connection between the local
This
system alarmandisthe raised
remote when system a system has observed
is unstable. Some interfaces are missing or they are
Parameter packet-frequency, which is used for
an3. issue
The in TWAMPyaml filesession during is system
not enabledinstallation.in the unnecessary and the network topology is not
creating TWAMP Sender session, is the
The
peeralarm
network is raised
element. when a mismatch is detected deployed as the user expected.
determinant factor for calculating a maximum
between interface names defined in ip-info and
RTT (max-RTT) value:
interface-info
This alarm sections for a specific node.
This alarm is
max-RTT
raised
indicates per
thatTWAMP
= 1000/packet-frequency. an IPv6 Sender addresssession. is not The IPv6 address over this interface is not
The actual PLR value
unique within a network interface, meaning is calculated by the end thatof unique within the network and this IPv6 address
per-measurement
therevalue
are multiple period, usually 15
with same on
minutes,
RTT has thedevices following defined
constraints: can not work normally over this interface.
every
IPv6 TWAMP
address Sender.
in this If the actual PLR value is
1. A message withnetwork.
the actual RTT value
higher than
Duplicate Address the PLR threshold,
Detection (DAD,an alarm
presented is in
exceeding
Startup snapshot max-RTT of VNF is treated
configurationas lost one.
data in Configuration Management validate VNF
raised.
IETF RFC 2461) is a mechanism used by a node
2. It is meaningless
persistent storage has been corrupted. to set an RTT threshold configuration data files at startup phase to ensure
to determine
larger than max-RTT. if an address it wishes to use is
The alarm is cancelled automatically in two that underlying system has not corrupted
already in use by another node.
situations: configuration files. If corruption has been
This alarm is raised due whentosystem the followinghas detectedreason:
1. alarm
The TWAMP Sender session is deleted or detected, the VNF startup will be halted and this
The
duplication
The network isforraised
an IPv6when
connection addressthe access
between at one key
the for the
network
local When the access key for the automated remote
stopped. alarm is raised.
automated
interface
system and basedremote on access
the remote Neighbor system account could not be
Advertisement
is unstable. access account is not installed correctly into the
2. Theinactual
installed
Messages. the
Once PLR
system
address value is lower than
commissioning
duplication or PLR
from when system, or is invalid, the account cannot be used
threshold
the access
network in two
key
is detected, consecutive
is notfor valid,
the system measurement
that is, authentication
sends a for connecting to the guest. That means that
This alarm is raised an individual TWAMP
periods.
for the remote access account fails.
Neighbor
Sender
The memory Solicitation
session. Actual
reserved Message
foraverage
the database RTTThe
periodically
value reason
(which is is entities running in the host environment and
The memory database is still fully operational. If
why
toward thethe
calculated keyby was
network
the not
end installed
link of area
per to can be: the
detect
measurement needing SSH connection to the guest by using
only for DB memory
Modification of the PLR usethreshold
that has configuration
is effectiveis no to the database fills up completely, its services
address duplication the automated remote account will not be able to
period,
limit
from the
the
usually
total
next data
actual instate.
1 minute, memory)
PLR
on Onceevery
calculationis duplication
TWAMP
gettingpoint.full. If the cannot be used anymore.
a) The detected
longer network connection
over to the hostis higher have access to the guest. An example of such an
Sender.
memory Ifisactual
almost full,one
average period
RTT
an immediate (usually
value action 10is
environment
seconds),
than RTT then keythe
threshold repository
alarm
and is
lower was
cancelled.
thannotmax-RTT,
working
The usethe
of entity is the Nokia Upgrade Tool used for
required from the user. There are three
The
during file
Neighbor
alarm is system
the of themessages
commissioning
Solicitation
raised. volume time.has for become
Duplicate Data cannot
upgrading thebeguest
written and read on the volume
software.
fillRatioAlarm limits that can be configured via the
unavailable,
b) The needed
Address Detection andaccess
the is system
key was
specified couldinnot not
IETF recover
defined
RFC anymore.
configuration directory by the database
from
2462.
The the
correctly
alarm fault
intois automatically.
the host
cancelled environment. The
automatically system in twofsck tool The impact on the system depends on the
application: minor, major, and critical. The
execution
c) The hosthas
situations: failedproviding
service and manual the actions
access are key was impacted volume. Severe disturbances to system
corresponding severity alarm is raised when the
required
not
Alarm
1.responding
Thewill inTWAMP
be order to tothe
cancelled bring
Sender guest the volume
requests.
automatically
session is back
in the
deleted to use.
or operation can be expected.
The
memory PrecisionusageTime of the Protocol
database (PTP) hasisreached
a protocol one When this alarm is reported the PTP service
Some
following
stopped. possiblesituations: reasons that might lead to this
used
of these to synchronize
configured limits. clocks throughout a continuously tries to establish communication
situation:
The 2. reason
1. The duplicated
actualwhy RTT theItkeyaddress
value isdefined
not
is is valid
deleted
lower can be:
from the
computer network. was inthan
IEEE the RTT
1588- with master and it will not affect other services if
1. Host
local volume
interface.
threshold is unavailable.
2002 (PTPinVersion five consecutive
1)/IEEE 1588-2008 measurement (PTP the master recovers in certain time (about 1h). If
2.
a) Host
The
2.
periods. diskhas
key
Duplication is removed.
been
is not modified
detected infrom
the hostnetwork
Version 2) standard, officially entitled "IEEE master is not recovered automatically, the
3. Guest volume
environment
anymore. but is unavailable
not in the guest. due to wrong use
Standard for a Precision Clock Synchronization application functions relying on precise time may
of
b) some
The key commands by the user, in such as dd at
Modification
Protocol forhas beenthreshold
to RTT
Networked modified
Measurement will the
take guest
effect
and be impacted as system internal clock is not in
command.
environment
the nextSystems". but not in
actual RTT calculation point. the host.
Control synchronization with external reference clock.
c)
ThisThe needed
alarm is used access key was
to indicate thatnotthedefined
correctly into the host
communication is lost between PTP slave and environment.
PTP master.

This alarm is raised due to the following reasons:


1. The PTP master device is in the DOWN
state.
This alarm indicates that a file containing alarm The active alarm status of the system is
observation information is missing some amount potentially not up-to-date. Some alarms might be
of entries that should be present there. active unnecessarily and some alarm
observations might not have reached alarm
system properly and thus not activated.
The Precision Time Protocol (PTP) is a protocol When this alarm is reported the PTP service
used to synchronize clocks throughout a continuously tries to detect the master clock
computer network. It was defined in IEEE 1588- status. The PTP service does not affect other
2002 (PTP Version 1)/IEEE 1588-2008 (PTP services if the master clock recovers in the
Version 2) standard, officially entitled "IEEE specified period of time (about one hour).
This alarm indicates that the Certificate Effect of the alarm varies based on the existence
Standard for a Precision Clock Synchronization If the master clock is not recovered automatically,
Revocation List (CRL) update has failed for one of a previously fetched CRL.
Protocol for Networked Measurement and the application functions relying on the precise
of the Certificate Authorities (CA). The CRL is a
Control Systems". time might be impacted as the system internal
time-stamped list identifying revoked certificates a) Previously fetched CRL exists in the file
This alarm is used to indicate that the sync clock is not in synchronization with the external
that are signed by a CA or a CRL issuer and system
This alarmsource
reference indicates that an of
validation X.509PTP digitalmaster. If a CA certificate
reference clock. is revoked, client services
made freely available in a public repository. The
certificate configured in the network element is cannot establish SSL/TLS connections with
CA maintains and publishes the CRL periodically, Services using the secure communication
revoked.
This alarm The certificate
is raised when can PTP eithermaster be an clock external server applications since they cannot
which contains the serial numbers of certificates protocols (TLS, IPSEC and so on) and doing
Intermediate
status is detected certification
DISABLED authority or NOK. (CA) certificate authenticate the received server certificate. If an
that have been issued by the CA and have been certificate revocation verification check will
or
The anPTPendmasterentity (EE) clocka certificate.
status shall be determined EE certificate or an intermediate CA certificate
The
revokedsystem along contains
with the daemon
date andprocess time thethat is The daemon
continue to use process ignores
the existing CRLthe which
illegal might
data and not
based on following criteria: used by server services is revoked, peer client
responsible
revocation went for creating
into effect. the Internet Protocol (IP) attempts have the to continue
latest the configuration
information about the revoked using other
A
1) certificate
A reply forisarevoked
Unicast if, for example,
request for Announce, it is applications might not establish SSL/TLS
related
The CRL configuration
is downloaded - e.g. by the Network element valid data. The
certificates, thiseffect
means, and services might either
discovered
Sync, Delay_resp that its messages
certificate authority was received (CA)with had a connections with these server applications in the
interfaces
(NE) from the andCA's addressesdistribution - based point. onThe the data
CRL severity oftothe
continue encountered
communicate witherror
an condition
insecure peer,
improperly
granted issued rate, a certificate,
equal toor if arequested
private-key network element (NE).
stored
distribution inmessage
thepoint system can lightweight
be either
the
directory
manually access depends
which mighton thehave actual
got unsuccessful
revoked recently configuration
in this
One
is
one. or more
thought to havepeersbeen in thecompromised.
storage pool are Storage
If the domainsystem performance
of the certificateisisdegraded. not "default",
protocol
configured (LDAP)or directories.
extracted from installed certificate action.
update Most likely these kinds of
cycle.
inaccessible
Certificates
2) There toleast
might thealso rest. be revoked message for failure of The
and theservices whichisuse
certificate revoked the storage
then it system
impactsare only
Note thatistypically
extensions.
at
The download
one
all or Announce
most forof a these
CRL file is errors
Services wouldcan be experienced
continue to break whenthe configuring
connection, if
the identified
received per 3entity
expected to adhere Announce to policy messages. affected.
the specific service owning this domain. For
configuration
triggered based data onare created during
configured periodicity, the system the external connectivity (that
theis, external
requirements,
3) The Clock Class such attribute
as publication announced of false by the If allcertificate
storage
example,
is present
nodes
if domain become
is
in existing
"ruim",unavailable,
and then thethe
available
commissioning.
planned next_update The daemon of the CRL reads file theby the interfaces
CRL, as theand addresses
certificate once ) ofrevoked
the network is not
documents,
PTP master mis-representation
is lower than the Clock of software
Class services
certificate which use theit storage
is revoked, will impact system
only fail.
the RUIM
A local
LDAP
issuer, disc volume
directories
manually and on
triggered a storage
creates through thenode a is
configuration Storage
removedpool
element. fromperformance
Especiallythe CRL in thesefile,isunless
degraded.
situations the
the certificate
behavior,
attribute
corrupted. of or theviolation
PTP of
slave any and other policy
accepted specified
as per service.
Redundancy If the domain
of the of
storage the certificate is
typically
management duringinterfacethe startorupeven phase during of the IR,system,
KUR or network element
expiry happens. would lackpool is lost. site
the required
by the CAclock
accepted operator or itsOnly customer. classes 6, 7, "default", then the certificate is revoked. It could
e.g.
explicitly systemclasses.
after during reboot. of clock
installation the certificate. all local
connectivity
If the downloaddiscand volumes
would
fails for notof abestorage
next_update able tovolume
fully theare
then
The
13, most
14, 52, common
58, 135, reason
140, for revocation
145, 187, 193, is that
248 are impact
corrupted, one or
the more
volume services
becomes using certificate
inaccessible
However,
The CRL is the configuration
verified may also be
for authentication and stored provide operator the
tries intended
to manually services. download, as even and
the user is no longer in sole possession of the
accepted. from
data is thelost.
"default" domain.
deliberately
in
The the file
redundant changed
system. in a live network
volumes of a storage pool volume This element as during periodic
fault update failure the CRL file is of
private key. These
This areaffects
affects the the integrity and
all services
services which which do usenot
availability
thehave
storage own
part
The
are of
CRL
out maintenance
file
ofPTP
sync. is later operations,
used by NE for
specific services valid.
files. During split-brain, the affected files are for
OK: The master clock status is OK, when all domain, but depend on "default" domain
example,
or the Certificateduring an upgrade of Framework
Management the system. In this pool. inaccessible.
Note:
the Generally,
criteria frommay 1)trust
tobe 3)anchor
areresult certificate (Root
fulfilled. certificates. Some data written before the
case
(CertMan)the alarm to verify the
the revocation ofofaaSW bug
certificate The CertMan Framework continues to use
CA) is not revoked,
DISABLED: The PTP because
master itclock
is self-signed
status is and detection of the split-brain is unavailable.
since
issuedfor bythethealarm respective to be activated,
CA and to take available
The split-brain CRL in can thebefile system, to validate
a consequence of a the
therefore
Link downthere
DISABLED, forwhen is no
fabric any trusted
slave of the mechanism
criteria
network 1), OR by which
2)
the LDAP's
necessary failsafe
actions. (LDAP
The main intentinterface(s)
Validator) the CRL for A
ofwhich fabric certificates,
domain slave interface If the
storage node separation. 'CLUSTER PEER IN
is used for is a part of
certificate
to
are
the verify
nodes a CRL.
not fulfilled,that theIf the
AND trustisanchor
criterion
failure 3)
reported. is certificate
fulfilled. node is
verifies
file is to that
validateonly valid
the network
certificate againstEach
configurations revocation are communication
the existing POOL
STORAGE CRL, with the restFramework
CertMan
UNAVAILABLE'
of the nodes
alarm
of the
continues
is
compromised,
NOK:
may The
have PTP
one it has
master to
(non-resilient be
clock manually
status
configuration) removed
is NOK, when
or two same cluster that and are configured as fabric
written
during to the
the secureLDAP would have
communication to(IPSec
be and to report an error the revocation alarm for
from
the the trusted
criterion
(resilient 3) isstore
configuration) and
not fulfilled. entire
fabric slave has to be A raised
PKIinterfaces. when this occurs.
bypassed
TLS) authentication / malfunctioning. phase. The daemon creating members.
already When certificates.
installed connectivityIfamong
This affects all services which use the affected a fabric is
certificate slave
not
re-deployed.
fabric path failureraises will bethis reported simultaneously interfaces onexisting
separate nodes isnolost, thenorno data
the
Duringconfiguration
in-service upgrade, alarm
an Upgrade if it reads a part of the CRL then errors
The
by allmaster
nodes, would
thatupdate be
are declared
fabric members as out in ofVirtual
service The in-service upgrade procedure cannot
files.
semantically
Note:
machine If the (UVM) CRLillegalruns IPas configuration
fails,
a services
fileserver. ortheUVM
If the
same traffic
alarms
continue
exchange
are reported.
if the
is possible over those paths. If a
alarm is raised.
when
cluster,
data its
from clock
and theirstatus
the LDAPit will is
communicationdetected
directories and DISABLED
paths cannotare or resilient fabric network (2x slave interfaces per
CertMan
does not framework
respond, continue
is considered tounavailable
use the
NOK.
broken. node) is configured, then failure
create
existing
and such
the alarm a configuration.
CRL (ifisexists) raised to Examples
validate
since of this
the certificates b) The CRL file does not exist
the upgrade in of
theboth
file fabric
system
kind
for of data
revocation wouldduring be a
securenon-existing
communication. interface. paths results to an unusable fabric network for
procedure
Whenalarm the PTP cannot continue.
service starts up in theLicense slave
This indicates that Centralized During
the the
affected
Services time
using node. period
the secure thatcommunication
CLS is unreachable
clock, the
Server (CLS)PTPisslave unreachablerequestsdue Announce,
to network Sync, no requests such as capacity or on off
protocols (TLS, IPSEC and so on) and doing
Delay_Resp
problems or CLS messages problem from all configured PTP reservations can be served
certificate revocation check during peer
masters via periodical negotiation request in the
authentication will not be able to do so. If
Unicast mode. After announce message received
This services arerelated
using requests
stricter policy for served
secure from
from alarm
the PTP indicates
master,that thelocal
PTP CLS slaveclient could
determines No license can be
not communication it will not establish that policy
the determine
PTP master network
clock status element based typeon the above local CLS client
without the CRL file being present. The CertMan
criteria. If the
framework is not able to validate installed
PTP master clock status is DISABLED or NOK,
certificates for revocation, due to unavailability of
this alarm is raised.
CRL files.
Cancellation of the alarm is triggered
automatically in the below situation:
1. The PTP master clock status is OK.
This alarm indicates that the local CLS client No Feature reservation actions can be performed
failed to reserve the Client ID from the towards CLS.
Centralized License Server (CLS).

The dynamic configuration activation has failed Failure of the dynamic configuration activation for
for at least one target node. at least one target node is the result of incorrect
setting of the relevant parameters in the affected
nodes.
This alarm indicates that initial coping of the The fault indicates that OS user/group account
Probable causes:
user/group account files (passwd, group, files (passwd, group, shadow, gshadow) cannot
1. A node joined the cluster but its node-type is
shadow, gshadow) from /etc directory to the be synchronized to all nodes and cannot be
unknown, and the dynamic configuration
shared storage (/mnt/mstate/_global/etc) has stored to the shared storage. The fault by itself is
activation for this node has failed.
failed, or that the synchronization of user/group not critical; it means that other nodes have not
If the logging node does not respond, it is The
2. A central logging
node joined thenode which
cluster, the receives
dynamic all logs
account files (passwd, group, shadow, gshadow) been updated with the last OS user
considered unavailable and the alarm is raised from all nodesactivation
configuration is not available.
started and, immediately
between the shared storage administration activity. Particularly password
since the logging procedure cannot continue. The
after,services
the node which use the logging
has disjoined. node the
As a result are
(/mnt/mstate/_global/etc) and /etc failed, on the change of OS user has not been synchronized
affected.
configuration is not fully activated.
specific node reported. with other nodes.
Probable cause: Underlying Resources
The configuration of the simple network The invalid part of the configuration is ignored.
Unavailable
management protocol (SNMP) mediator contains This causes partial loss of functionality. The
Event type: Processing error
values that are unacceptable. SNMP traps may be lost.
Default severity: Major

SNMP Mediator has sent an SNMP request to an SNMP Mediator is not able to handle the trap
SNMP agent but it has not received a response. correctly, because it is not able to query or
Example: modify the variables in the SNMP agent.
1. A filter condition has been added for the
authentication failure (.1.3.6.1.6.3.1.1.5.5) trap.
SNMP Mediator has received an SNMP trap that 1) Unknown traps may contain information that
Thus, the following entry can be viewed by using
it is not aware of. The trap is unknown to the could be useful.
the SCLI command:
SNMP Mediator, if - 2) Unnecessary traps waste network capacity.
> show config fsClusterId=ClusterRoot
1) the IP address of the SNMP agent that sends
fsFragmentId=SNMP fssnmpMediatorName=1
the trap is missing from the SNMP Mediator's
The alarm is triggered by the
fssnmpAttributeType=V2traps The alarm may indicate that an SNMP tool which
configuration, or
authenticationFailure SNMP trap. The SNMP
fssnmpV2TrapId=.1.3.6.1.6.3.1.1.5.5 is used for sending SNMP requests to the SNMP
2) the OID (object identifier) of the trap is
agents running in the Ethernet switches (switch
fssnmpAttributeType=V2traps, agent, is configured incorrectly. The alarm may
unknown to the SNMP Mediator.
blades or modules) generate this trap, if they
fssnmpMediatorName=1, also indicate malicious activity, which means that
receive SNMP requests that are not properly
fsFragmentId=SNMP, an unauthorized user is trying to obtain
The alarm is triggered by a coldStart Simple The alarm indicates that the switch has restarted
authenticated.
fsClusterId=ClusterRootSNMPv1 and SNMPv2c use information by sending SNMP requests.
Network Management Protocol (SNMP) trap sent and is now reinitializing itself. This is not
community
The filter names to
condition provide by
is defined security,
the attribute
by an Ethernet switch. The coldStart trap necessarily an error condition but might be
authentication,
fssnmpFilterCondition. and access control. Community
fssnmpFilterCondition
signifies that the sending protocol entity is caused by maintenance operations such as
names
may qualify
have, for the following
example, thethe criterion:
value
reinitializing itself and that management powering up a new switch or intentional restarting
A
1. linkDown simple network
Each community
(.1.3.6.1.2.1.1.1.0=*Linux*). namemanagement
has
See an RFC protocol
associated
2254 for Once a port (or link) is in down state, it cannot
agent's configuration or the protocol entity of a switch after a software or configuration
(SNMP)
access
more trap triggers
mode
information (either
about thisthealarm.
read-only filterorItsyntax.
is an indication
read-write). transport any traffic. This is not necessarily an
implementation may have been altered. The update. In these cases, this alarm can be safely
that
2. SNMP an
Each Ethernet
communityswitch port the
name changes from up
has an associated error condition, but this can follow from a
warmStartMediator receives
trap signifies that the sending protocol ignored.
state
IPv4 to down state. trap that does not contain
subnet/mask.
authenticationFailure maintenance operation such as replacing a cable
entity is reinitializing itself and that neither the
3.
the Each of
value community
variable name may beentities
.1.3.6.1.2.1.1.1.0. enabled or between two switches, or closing a switch port
The backup
management ofagent
any ofconfiguration
the following has
nor the protocol As an effect
However, of this alarm during
a spontaneous restart backup, a
should be
disabled.
3. SNMP Mediator queries the value via a management interface. A link may also go
failed because of a fatal
entity implementation error:
is altered. backup-iso
considered is asnot
an created,
indicationorofisarendered
serious problem
The
of SNMP agent rejects
.1.3.6.1.2.1.1.1.0 from and the SNMPdiscards any SNMP
agent, but to down state if, for example, the host computer
- Delivery unusable.
even though the system with its redundant
request
does not that does
receive anot contain
response. a community name or switch at the other end of the link is shut down,
- Configuration snapshot Ethernet infrastructure will tolerate it.
that matches one of the configured community restarted, or removed.
- State Volume
A broadcast
names, storm
or which control
does not condition
contain anhas started
IPv4 source The switch will drop broadcast frames for a fixed
- Database
within
address thethat
lastis250 milliseconds.
allowed by the IPv4 address and period
However,of 250 milliseconds.state change may
a spontaneous
- Plugin
IPv4 mask of the corresponding community. indicate a serious failure, although the system
- File System
More specifically, the following authentication will typically tolerate these failures to some extent
failures are detected for SNMP requests: because of redundant networking infrastructure.
1. Each getRequest, getNextRequest, or
getBulkRequest that does not include one of the Note that a linkDown SNMP trap is paired with a
enabled, read-only or read-write community linkUp SNMP trap that will trigger a cancelling of
names. the alarm. For example, when replacing a
2. Each setRequest that does not include one computer node, one would first see the raising of
of the enabled, read-write community names. this alarm when the replaced unit is shut down
3. Each SNMP message that is sent from an and the automatic cancelling of this alarm when
invalid IPv4 subnet. the new blade is taken into use.
The platform high availability services (HAS) The possible active/standby RGs, which have an
subsystem cannot reset a faulty active recovery unit (RU) instance running on the
node using HWM functionality with Intelligent failed node, cannot recover from the situation by
Platform Management Interface applying a switchover to another node. The
(IPMI). services provided by these RGs are currently
A physical disk partition of a Distributed The service that the application provides is not
The operational state of the node is not known asdown.
Replicated Block Device (DRBD) is broken or is impacted if the other node and the DRBD device
the node still holds and updates
reporting errors. are still functioning. In this case, the application is
the shared resources.
DRBD is used to replicate data of an application however no longer redundant, and recovery from
partition between two nodes. The nodes form an possible forthcoming failures may take longer or
A
This secondary
is a severe Distributed
platformReplicated Block Device
If the node running the primary DRBD (and the
error that may have the
active/standby redundancy pair where a standby may not be possible at all.
(DRBD)
followingdoes not synchronise or synchronises
results: application) is functioning, then there is no
node can take over in case the active node or The service provided by the application is down if
very
In ATCA slowly with the primary DRBD device.
Hardware: immediate impact to the service that the
application fails. also the other node or partition is not functioning.
DRBD
- a double is used to replicate
hardware fault data of an application
application provides. The identified DRBD
The identified DRBD partition or logical volume is
partition
-New an IPMI between two nodes.
configuration error is The nodes form an
partition or logical volume will, however, not be
configuration
currently unavailable on orSwitch not successfully
functioning Failure in applying new configuration on Switch.
poorly.
active/standby
-applied
a network redundancy
partitioning pair where a standby
problem currently available as a backup resource. Any
when initiated by SCLI Switch may not function properly or Switch may
node can take over of
-ormanual in acase the active node or
failure in the node that currently runs the
DHCP power-off
lease time expiry. complete chassis
be running with old configuration.
application fails. When the two disk images are
application causes a long or permanent service
not identical
In BCN (for example, following a node
Hardware: interruption. The service is down if the node with
The internal
reboot)
- LMP they
not temperature
are synchronised
responding of theby CPU
There has passed
copyingis a severe
the primary theDRBDtemperature-related
is not functioning. problem in
the programmed
changed data from threshold. the referred component, and the unit may behave
the primary DRBD to the
secondary
As the HASDRBD. unexpectedly.
This alarm is just for information if the severity is
is unable to determine the state of
Currently
the node, synchronisation
the active or standby to the identified
"Minor". However, actions are to be done if the
secondary
recovery DRBD partition
The central processing unitdelaying
groups (RG) are or(CPU)
logical volume is
severity
Unit mayisbecome
the raised tounstable.
utilization has "Major", as this indicates that
not proceeding
switchover until or
the proceeds
node
passed the programmed threshold. is extremely
operational slowly.
the situation has lasted longer than an hour.
again or the
The alarm is node
raisedisduemanually
to below Notice also that a "Major" alarm situation is
set reasons:
to isolation
state.
1. High amount of traffic expected if some maintenance operation (for
2. Possible looping in switch in multi-chassis example, disk replacement procedure) requires a
The image loaded via TFTP (Trivial File Transfer The switchorisaaccessible
complete only with default
large disk re-synchronization.
environment because of wrong cabling/wrong
Protocol) has not passed the CRC (cyclic configurations. The application-related
Depending on the disk size, a full disk re-
configuration.
redundancy check) check and has been configurations
synchronizationwill nottake
can be applied.
several hours to
discarded. complete.
The system memory utilization has passed the System is running out of memory which may
The alarm is raised beacuse of the following
programmed threshold. cause the system to behave erratically.
reasons:
The alarm is raised due to below reasons:
1. High amount of traffic
1. The loaded binary image is corrupted and can't
2. Possible looping in switch in multi-chassis
One of the physical ports in the switch has a
be used. It is an expected behavior of the application to
environment because of wrong cabling/wrong
problem, which may
2. The corruption may severely affect system
have happened during the raise this alarm when
configuration
performance.
transfer or the original image on the server was a switch blade or server is either plugged,
already corrupted. unplugged, restarted or invalid switch login.
The switch port error alarm is raised for the
The reported field replaceable unit (FRU) is not Refer to the scenarios under Meaning.
following reasons on a (physical)
present in the system or is in inactive state.
port of the switch:
Typical FRUs include cards, power supply units,
and chassis components.
portErrorsExceeded: the level of errors on the
A Bus
port has error has taken
passed place. The possible error
the programmed Unit may exhibit erratic behaviour or decreased
The alarm may be raised in one of the following
types are: Compared (as a percentage) to the
threshold. functionality.
scenarios:
total amount of packets over
Front
a periodPanel NMI (non-maskable interrupt
of time.
1. FRU missing from its expected position.
request) / Diagnostic Interrupt
A
2. problem in CPU (central processing unit)
FRU is deactivated. Unit may be out of service.
Bus Timeout
portsBroadcastExceeded: the level of broadcast-
functionality
3. Shelf Manager has been detected.
switchover Possible
or restart. error
I/O
limit(input/output)
has passed channel
the check NMI
programmed
types are: restart.
4. Cluster
Software
threshold.NMI
5. Dynamic configuration scenarios like adding or
PCI PERR (peripheral component interconnect
IERR (internal
replacing error)
the blades.
parity error)
portsCRCErrExceeded: the level of CRC (cyclic
Thermal
6. Embedded Trip software upgrade or downgrade.
PCI SERR (peripheral
redundancy check) component interconnect
errors
FRB1/BIST
7. Software (fault
upgrade resilient boot/built-in self-test)
or downgrade.
system
has error)the programmed threshold.
passed
failure
8. Management node restart or switchover.
EISA
Compared(Enhanced
(as Industry Standard
a percentage) to the Architecture
FRB2/Hang
9. FRU is in M7 in POST (power-on
state(Communication self-test) failure
Lost).
bus)
total Fail SafeofTimeout
amount packets over a period of time.
(believed to be due or related to a processor
Bus Correctable Error
failure)
Note: Alarm is raised with minor severity when
Bus Uncorrectable Error
portsRuntsExceeded: the level of runts (=broken
FRB3/Processor
the FRU is deactivated Startup/Initialization failure
and in other scenarios the
Fatal
(too NMI packets) has
short)
(CPU didn't
alarm start)
is raised with major severity.
passed the programmed
Configuration Error threshold. Compared
(as a percentage) to the
SM BIOS (system management basic total
The unit's current has exceeded the programmed The field-replaceable unit (FRU) may be
threshold. damaged or may stop working. The risk of the
FRU damage or misbehavior can be judged
The supported threshold levels are: based on the severity of the alarm. Severity level
- lnr (Lower Non Recoverable) warning signifies that the FRU is still working
This alarm indicates a booting failure. The Unit boot-up may be failing.
- lc (Lower Critical) within its specifications despite crossing the
possible causes or error types for the failure are:
- lnc (Lower Non Critical) detected threshold limit. The higher severity
0. No bootable media.
- unc (Upper Non Critical) signifies that the operation is out of
1. N/A - This specific sensor offset value is not
- uc (Upper Critical) specifications.
applicable for BS2AM-A unit.
A
- unr system(Upper firmware error (POST - power-on self- Unit boot-up may be failing or the unit may be
Non Recoverable)
2. Boot/configuration server not found.
test error) has been detected. The possible failing in some other respect.
3. Invalid boot sector.
causes
Based on are:
the threshold level crossed, the alarm
4. Timeout waiting for user action.
will be raised with a different severity as mapped
5. Primary bank boot failed.
1.
below: No system memory is physically installed in
The
6. power unitbank
Secondary readingboothas failed.exceeded the The system is running on backup power supply
the system.
programmed
7. Network boot/configuration threshold. failed. unit, which is highly risky. In case both the
2.
lnr,No unr: usable system
The alarm willmemory,
be raised because all
with severity
8. Boot retry limit exceeded. primary and secondary power supplies go down,
installed
"Critical". memory has experienced an
The supported threshold levels are: the system will come to a stop.
unrecoverable
lc, failure.
- lnruc: The alarm
(Lower will be raised with severity
Non Recoverable)
One
3.
"Major". of the following
Unrecoverable is taking place:(Advanced
hard-disk/ATAPI Platform security may have been compromised.
- lc (Lower Critical)
Technology
lnc, unc: The Attachment
alarm will bePacket
raised Interface)/IDE
with severity
- lnc (Lower Non Critical)
a.
(Integrated
"Warning". Pre-boot Drive Password Violation
Electronics) - userfailure
device password
- unc (Upper Non Critical)
b.
4. Pre-boot Unrecoverable Password Violation attempt
system-board failure - setup
- uc (Upper Critical)
password
5.
Note: Unrecoverable
For the below hard-disk
mentioned controller
FRUs failure
and their
The
- unrunit's (Upper temperature
Non has exceeded
Recoverable) the The chassis which has the faulty cooling field-
c.
6.
respective Pre-boot
Removable Password
IPMC boot Violation
media
firmware not - network
found
version onwards, bootthis replaceable unit (FRU) has the
programmed threshold.
password
7.
alarm Firmware applicable
(BIOS (basic input/output system))
Basedisonnot the thresholdand levelwill not be raised:
crossed, the alarm risk of getting over heated. The risk of the FRU
d.
ROM Out-of-band
(read-only Access
memory) Password
corruption Violation
detected
Thebe
will supported
raised with threshold
a different levels are: as mapped over heating or FRU malfunction can be judged
severity
e.
8. Other
CPU pre-boot
(central Passwordunit) Violation
AMPP2-A
- lnr
below: (Lower Nonprocessing
2.0.0 Recoverable) voltage by the severity of the alarm. Severity level
This
mismatch
AHUB4-A alarm(processors
reports an error
that
03.00.001-009-001-003 in the the
share system
same The FRU may have erratic behavior or decreased
- lc (Lower Critical) warning signifies that the FRU is still working
memory,
supply
ACPI5-A discovered
have different
2.1.31 through a sensor
voltage requirements) present in functionality.
- lncunr:
lnr, (Lower The Non
alarm Critical)
will be raised with severity within its specifications despite crossing the
the
9. system.
CPU speed The possiblefailure
matching causes are:
-"Critical". unc (Upper Non Critical) detected threshold limit. The higher severity
10.
- ucuc: System
(Upper Firmware Hang
Critical) signifies that the operation is out of
lc, The alarm will
a. Correctable ECC (error-correcting code) or be raised with severity
The
-"Major". unrbattery
(Upperreading has exceeded the
Non Recoverable) Unit may boot up with wrong configuration or
specifications.
other correctable memory error;
programmed
lnc, unc: The threshold.
alarm will be raised with severity date and time.
b. Uncorrectable ECC or other uncorrectable
Based
"Warning". on the threshold level crossed, the alarm
memory error;
The
will be supported
raised with threshold
a different levels are: as mapped
severity
c. Parity error;
-below:
The lnr (Lower
possible Non
causes Recoverable)
are:
The
d. Memory fan speed scrub has exceeded
failed (stuck the bit);programmed Fan speed out of limit may indicate a mechanical
- lc (Lower Critical)
threshold.
e. Memory device disabled; and or an electrical problem with the fan, which can
-lnr, lncunr:
(Lower The Nonalarm Critical)
will be raised with severity
f. power Correctableoff/power ECCdown or other correctable memory affect the cooling performance. The risk
-"Critical".
unc
power (Upper
cycleNon Critical)
The
error supported
logging threshold
limit reached. levels are: associated can be judged based on the severity
-lc,uc soft (Upper
uc: The
power Critical)
alarm
control will be raised
failure (the with severity
unit did not
- lnr (Lower Non Recoverable) of the alarm.
This unr alarm
-"Major".
respond (Upper indicates
a Non
to Critical)
request that
Recoverable)oneon)
to turn of the management This alarm is raised during the boot process, if
- lc (Lower
nodes
lnc, has detected thatbe the contentswith of the disk the system notices that the disks of management
lncunc:
-- power (Lower The
unit alarm
failure
Non will
detected
Critical) raised
(other) severity
are
Based outon
"Warning". of sync with the other
the threshold node of the
level crossed, the same
alarm nodes are not identical. The process
- unc (Upper Non Critical)
type.
will be raised with a different severity as mapped automatically puts the booting management node
- uc (Upper Critical)
below:
Inlet: This typically means thetheambient into inert mode and powers it off. The user has to
This
- unr alarm (Upper isNon
triggered when
Recoverable) system notices System functionality and performance may
temperature is out of the limit. take steps manually to get the disks in sync
that the shelf manager is unavailable, shelf degrade, or the system may not work at all.
lnr, unr: The alarm will be raised with severity again.
manager
Based onmay be missing,
the threshold level or is not running
crossed, in a
the alarm
"Critical".
Outlet:
healthy This means
state. withIt is alsothe chassis
possible cooling
that shelf field-
will be raised a different severity as mapped
lc,
manager uc: Theisalarm
replaceable unit was will be
experiencing notraised
able towith
a connectioncoolseverity
the chassis.
problem.
below:
The inserted FRU
"Major". reason can be(field-replaceable
caused by brokenunit) fans,does
or System functionality and performance may be
not
lnc,
some match
unc: withalarm
The the expected
field-replaceable will be
units unit based
raised
(FRUs) with on the
severity
consuming degraded, or the system may not be working at
If
lnr, theunr:system is not will
The alarm ablebetoraisedcontact the shelf
target
"Warning".
more hardware
power than configuration.
expected. In with
caseseverity
of Lynx all.
manager
"Critical". after several retries (currently
build, the alarm is also raised if a unit is inserted
programmed
lc, uc:a The for 35
alarm willretries,
be raised 1 retry inseverity
1 second), it
in
This to slot which
alarm is expected towith
be empty.
queries
"Major". themayexistenceindicate: of the shelf manager by
-pinging
batterythe lowmain IP address, and an alarm is
lnc,
FRU unc: The
is a missing
hardwarealarm component
will be raised thatwith
canseverity
be
-raised.
battery The system then tries to switchover from
"Warning".
removed and replaced on-site. Typical field-
-the
battery failed (other reason)
main IP address to one of the secondary IP
replaceable units include cards, power supply
addresses. If this also fails, the alarm is raised
units, and chassis components.
again.

The health of backup shelf manager is also


monitored by normal internet control message
protocol (ICMP) pings. If it becomes unavailable,
The unit's voltage has exceeded the programmed The field-replaceable unit (FRU) may be
threshold value (high or low values). damaged or may stop working. The risk of the
FRU damage or misbehavior can be judged
The supported threshold levels are: based on the severity of the alarm. Severity level
- lnr (Lower Non Recoverable) warning signifies that the FRU is still working
This alarm indicates that there has been a The effect of the alarm depends on the data on
- lc (Lower Critical) within its specifications despite crossing the
configuration error in the Signaling Gateway or the Application Additional Info field.
- lnc (Lower Non Critical) detected threshold limit. The higher severity
SIGTRAN.
- unc (Upper Non Critical) signifies that the operation is out of
For example:
- uc (Upper Critical) specifications.
If the alarm event raised is
This
- unr alarm
(Upperindicates that a remote subsystem is The affected subsystem that is referred by the
Non Recoverable)
AAI_IUA_SET_TRACE_FAILED, it indicates that
out of service. subsystem number and point code in the alarm
the Trace Log level setting is failed for ISDN
Based on the threshold level crossed, the alarm can no longer receive or send messages. This
protocol.
will be raised with a different severity as mapped may lead to a call drop for that particular
below: subsystem.
This alarm indicates that either a signaling point For If theallremote point code
other alarm events,is congested, it cannot
SIGTRAN Network
(SP) has become congested or one or more of process
Managerthe traffic
(SNM) and calls Layer
/SIGTRAN may be dropped.
Managers
lnr,
the unr:
linksThe alarmthis
towards willpoint
be raised
code is with severity
congested. Eventually, linksbetoward
(SLMs) will not able tothe point code
provide can also
any services.
"Critical". go
Thedown.
alarm indicates a shutdown of the entity in
lc,
Theuc: The alarm
signaling pointwillcanbe be
raised with severity
identified by the point question followed by a restart from the High
This alarm indicates that the signaling gateway
"Major". The message will not be delivered to the correct
code mentioned in the field "Identifying Availability
(SGW)
lnc, unc: received
The alarman invalid
will be message
raised withfrom the
severity user due toServices
the error (HAS).
in the message.
application additional information".
network.
"Warning". There may also be configuration or
interoperability issues at the peer node or the
SGW node.
This alarm indicates the unavailability of a Link or Alarm AAI_MTP3_LINK_DOWN indicates that a
Route. particular link cannot handle any incoming or
outgoing traffic.
AAI_MTP3_LINK_DOWN alarm indicates that the
status of a signaling link has changed to Alarm AAI_MTP3_ROUTE_DOWN indicates that
This alarm indicates the unavailability of the self The affected signaling point code cannot handle
unavailable. The link is identified by the LinkId a Destination Point Code (DPC) cannot be
or remote signaling point code (PC) which can be the traffic which can result in call drops.
parameter given in the Identifying Application reached for data transfer.
identified in the Application additional information
Additional Information (IAAI) fields of the alarm.
field.
The severity of AAI_MTP3_LINK_DOWN is
This
Minor. alarm indicates that the switch manager is Switch configurations are out of sync.
MTP3_EVENT_PC_INACCESSIBLE: This
unable to configure the switch.
indicates that a signaling point code configured
AAI_MTP3_ROUTE_DOWN alarm indicates that
for signaling gateway at Message Transfer Part 3
all the links used for the signaling route have
(MTP3) stack has become inaccessible. This
become
A out-of-service.
digital signal It can also indicate that
happens when processing
all the links corefrom isthis found
self to
PCbeare The application image which is running in the
Signaling Gateway (SGW) has
faulty for the following possible reasons: received a core might be faulty or stuck. In practice, this
down.
Transfer Prohibited (TFP) message from a means that the core/CPU might no longer be
remote
a) The corePointhasCode (PC), which is reachable
crashed. functioning.
MTP3_EVENT_DPC_INACCESSIBLE: This
through an adjacent
b) The connection toSignaling
the core isTransfer
lost. (SS7)Point
indicates
A digitalThe that
signal the Signaling
processor System7
(DSP) core is reported This alarm is raised when the state of the active
(STP).
c) An internal severity of
digitalexample,
signal processor (DSP) non-
Destination
out-of-sync. PC
This(for
happens when RAN thePC), which
mirroring is DSP core is not replicated to the stand-by DSP.
AAI_MTP3_ROUTE_DOWN
fatal errorby occurred. is Major.
identified the point code in the
application detects that the core on the active alarm, has In practice, this means that the fail-over is denied
d) An internal
become DSP fatal
inaccessible error occurred.
unit is out-of-sync withand thethe
DSP SS7 coreDestination
located on in the case of the number of failed DSP cores
e)
PC The core did nothandle
start up thewithin the specified
the can no longer
stand-by unit. When the signaling
number of traffic.
the DSP exceeding the configured threshold value.
This
time,alarm is raised
after being when the operating system The node cannot run properly without disk
unlocked.
cores that are out-of-sync exceeds its limit, the
indicates
f) A general failure in the disk
application interface. The
programming failure access as reliable data reading and writing is
interface
configured threshold value for this alarm is
may
(API)be caused
error by a physically broken hard drive, suddenly endangered. The given node might not
is returned.
raised.
solid state disk, incorrect wiring, loose connector, be functional as the disk drive is a critical
problem with the disk controller, or device driver. resource for the node. The faulty node should be
This alarm indicates that an Ethernet link to a When the external synchronization of the BCN is
In
A someCPU
DSP cases, powerfaulty
is Unit
found or temperature locked as soon as possible.
Packet Timing (PTU) ofinthe case Box any of its
Controller lost, the network performance is potentially
fluctuations,
cores is found ortooverheating
be faulty. can cause disk
Node (BCN) has failed. The error can be caused degraded.
access failures.
by a hardware failure at either end of the
reported link, an unplugged cable on the front Below are the possible effects for different offset
The problem can be transient or permanent.
panel of the PTU (or link peer), or if some values of Ethernet link OEM sensor:
Also, the disk I/O overload situations can cause
program or a user has issued a command to shut
the operating system to indicate transient failure
down the Ethernet interface on the PTU (or link 0 - No effect
in disk access which may trigger this alarm.
peer). 1 - OAM traffic between Packet Timing Unit and
management (CLA) node is down
The different error types are described below: 2 - PTP traffic on the reported link is down
3 - SFP failover (redundancy) not possible
0 - [Fabric Link Down]: Fabric Interface Link 4 - PTP synchronization performance will be
Down (reserved for future use) impacted
1 - [Base Link Down]: OAM Base Interface Link 5 - PTP synchronization performance will be
Blade self test of the hardware unit has failed or Raised alarm depicts the incorrect functionality of
is pending. the hardware unit.

This alarm indicates that an Internet Control The peer network element that is under ICMP
Message Protocol (ICMP) monitoring session monitoring is unreachable. If the alarm is not
was switched from its UP to DOWN state. cleared automatically, it might require operator
intervention to bring up the network element. The
This alarm is raised because two-way application(s) dependent on the ICMP monitored
The alarm indicates that in order to start the Some data may have been lost from the DRBD
connectivity between the local node and remote link might be affected.
cluster, one or more replicated distributed partitions. During normal cluster start up, both
node is not functional.
replicated block device(DRBD) disk devices has cluster manager nodes come up, and the DRBD
to be forced active. The alarm is raised when a drivers can reliably determine which cluster
cluster manager node which previously had a manager node has the most recent updates.
During upgrade, activated build rebooted the The newly activated delivery is not able to boot
working peer node starts up, and the peer cluster DRBD devices are then synchronized from the
system several times. And due to the autoreturn properly. Hence, autoreturn to the old working
manager node does not come up within the most recent version ensuring that e.g. no log
feature, the system is booted back to the old delivery has been automatically done.
defined maximum wait time. records are lost.
build.
This alarm indicates
The reason thatcluster
for the peer the manager node Exceeding the threshold
In the situation where thislimit indicates
alarm thatDRBD
is raised, the
ClusterTraceManager's
failure can, for example,CPUbe a usage
disk orhas crossed
some other tracing
device(s)is causing a significant
are forced loadthe
to start using oncurrent
the
the
HWconfigured threshold
failure, an explicit limit (WARNING
operator actions, oror
a system.
data copy. This does not present a real problem
MINOR)
software for a pre-configured period of time.
bug. if the failed cluster manager node went down
This alarm reports an error in the system about the same
The kernel timecapability
has the as the remaining
to repair the
memory, discovered through a software operational cluster manager
correctable errors in ECC-enablednode. memory, and
mechanism. There are two classes of errors it will seamlessly do so.
reported: If, however, the failed peer cluster manager node
had
If theearlier
rate ofbeen runningerrors
correctable alone,exceeds
it is possible
the that
This alarm indicates that a field-replaceable unit The
disk present
updates system
have configuration
been lost. For does not parts
example,
1. Correctable error rate over limit predefined threshold, then this alarm will alert the
(FRU), or hardware module associated with the correspond
of logs maytheto
bethe deployment
missing, the configuration.
known list ofrate. It
active
2. Uncorrectable error detected user about detected abnormal error
FRU is missing, or the hardware module may
alarms lack
maya critical resource,
be different, andfunctionality,
application storageor
associated with the FRU is not responding. redundancy,
may have and may operate in degraded mode.
Class 1 is an early warning, which usually In case of lost transactions.
an uncorrectable error, the kernel
Physical
indicates disk space
a more of thecondition
serious cluster isdeveloping.
getting full.It SW delivery
cannot repairinstallation
it and suchmight
errorsfail.
will affect the
There might
is raised withbe no space
severity available
"warning", to install
"minor", or new program running on the node. As a recovery
software (SW) deliveries.
"major" depending on the number of correctable attempt, the affected node will be restarted
errors occurred during the measurement period. automatically. If the memory error was due to a
Physical disk space of the cluster is almost full. transient
SW delivery cause, this maywill
installation solve
fail. the problem.
Class
There 2will
indicates that an
be no space error already
available caused
to install new
malfunction of the affected
software (SW) deliveries. node, and is raised
with severity "major".

This alarm indicates that the total available This alarm is an indication that the memory on
system memory on the Local Management the unit is running low. If the memory runs out
Processor (LMP) node is low. completely, the unit is restarted automatically as
a recovery action.
This alarm indicates that an SCCP instance's This is a warning alarm indicating that the
utilization of one of the signaling resources is signaling resource utilization by SCCP instance is
nearing the configured maximum capacity. nearing the maximum capacity. This is to prevent
a situation where the resource utilization exceeds
The situation where the signaling resources' maximum capacity, as this will lead to failures in
This alarm indicates that the utilization of one of The alarm indicates that there is an attempt to
utilization is nearing the configured maximum the establishment of signaling connections and
the signaling resources by the SCCP instance consume additional signaling resources than
capacity may arise due to: thus dropping the KPI.
has reached the configured maximum capacity what is allowed or dimensioned according to the
a. A software fault that does not erroneously
and there has been an attempt for reserving product deployment. This will typically cause
release the resources when not needed and thus
more resources than what is allowed by the failures in the establishment of signaling
leads to a leak of the resource.
system dimensions. connections and might result in KPI drop.
b. An overload situation.
c. An insufficient dimensioning of the system
The exhaustion of the signaling resources may
either at the deployment phase or object
arise due to:
administration phase.
a. A software fault that does not erroneously
The SCCP instance may utilize the signaling
release the resources when not needed and thus
resource "CONNECTION_CONTROL_BLOCK"
leads to a leak of the resource;
wherein the signaling connection control block is
needed for successfully establishing the signaling
b. An overload situation or due to an insufficient
This alarm indicates that there are disturbances The alarm indicates failure of the signaling
in the SCCP stack. The following reasons may protocol procedures and thus leads to releasing
be the cause of the disturbances: of abnormal signaling resources, for example,
signaling connections.
1. RLC_FAILURE:
This alarm indicates that the SCTP association is This alarm indicates congestion on the SCTP
The releasing of the signaling connection failed
congested. A congestion is determined when the association. Due to this, the outgoing signaling
as no acknowledgment was received. When the
outbound messages towards SCTP failed with message is dropped, which means that the
Released (RLSD) message was sent in the
error code EAGAIN. Due to this, the outgoing signaling traffic is disturbed.
connection section and if the Release Complete
messages on the specific SCTP association are
This
(RLC)alarm was indicates
not received thatbefore there is the a drop
expiryinofthe T The alarm indicates failure of some or all
dropped until the failure is recovered. However,
success
(int) timer, rate theofconnection
signaling connection would be released. signaling connection establishment attempts in
the SCTP association will not change the
establishment. The success rate of the signaling the network and thus leading to a drop in the KPI.
connection status.
connections
2. IAR_EXPIRED: within a signaling instance is
checked
The releasingwith aofsampling the signaling interval of 30 seconds.
connection due to
The
A signaling occurs
congestion configuration when SCTP has beenbuffers are This leads to unintended activation of the
The success
Inactivity timer rate includes
expiry. When thethe overall
Connection incoming
successfully
exhausted and validated
unable and to accept also activated,
further outbound or signaling object and/or signaling traffic as per the
and
Confirmoutgoing signaling
message connections
added,
messages to(CC)the
duesignaling
to an overload
was received
service. by the
After activating
situation. configuration database. This also misleads the
(preconfigured
network node,object, Tto(iar)
a minimum
timer of 100 connections)
is started and if there
the signaling the runtime configuration in inquiry reports.
in
are the
no network.
SCCP This alarmexchanged
messages will be reported in thisifdue the
the service has been modified dynamically to
This
success alarm
connection rate is raiseduntil
drops
section whenever
below the limit the power-
defined in the Throttling reduces the performance of the target
the mis-configurations atthe theTnetwork,
(iar) expires, for then
throttling
product
the mechanism
deployment
connection is
beactivated
would(preconfigured released. ontoa 90%) node.or depending on the level applied, and hence, may
example, the remote network element. This will
Power-throttling
the limit defined is in performed
the SCCP configuration on the node when during also reduce system performance depending on
result into inconsistency between the
the temperatureofofthe
commissioning
3. INVALID_IT_MSG: either system. the processing
The operator must the target role. Leaving the condition unattended
configuration database and the signaling
resources,
consider
The releasing that or thisthe
of the air
alarm inletmight
signaling of thealso component,
connection indicatedue has for a long period of time may also affect the long
SCTP
services. Multi-homed association is created to to The alarm indicates a path failure in the working
exceeded
paging
invalid the predefined
response
Inactivity failures
(IT) messages threshold
and IMSI levels.The
detach
received. term reliability of the system.
provide path resiliency for network failures. SCTP multi-homed association. The system
Power-throttling
scenarios
Invalid which
Inactivity reduces
maymessage
(IT) nottothe affect maximum
the
could KPIs,
be power
any unlessof
Generally, it is possible create two paths to takes into use an alternative path.
level
these
the which
failures
following: can are bepartdrawn of by monitored
the the affected KPIs.node.
each SCTP association. If either of the path fails,
The
a. method and thelocal amount of throttling depend
thisIncoming
alarm is Source set. If the primary reference path number fails, then the
All
on SCTP
the type
(SLRN) in associations
the of received
the component. inInactivity
the association
In (IT)
addition,message set areis
power- There is total failure in the reachability of the
traffic is automatically transferred to a secondary
unavailable.
throttling
not is There
performed are (ano connections
fallback action between
to bring remote network element resulting in signaling
pathmatching
so that the with the locally
traffic is not affected.stored DLRN in the
the local
processing
corresponding network elementtoand
resource a safe the blockremote
reduced mode) traffic downtime.
Furthermore, if connection the primarycontrol path recovers, (CCB) thenat
network
when
the SCCP theelement.
thermal
stack. Here,
controller network process element exits refers to
the traffic is automatically transferred back to the
"Application
possibly
b. Incoming due Server".
to manual
Destination There is somethingnumber
intervention.
local wrong
This
primary alarm path indicates
from thethat therereference
secondary has path.been If thean These are critical errors which will cause the
with
(DLRN) the in data the transmission
received connections
Inactivity (IT) of the is
message
internal
secondary failure pathinfails, the signaling
then the traffic components continues which signaling components to malfunction. They
associations
In
not addition
matching to of the
with this association
power-capping
the locally set,
for SLRNand/or
the objects
node, the
may
normallyaffect viathe the provisioning
primary path ofstored
signaling
and only the in thein indicate shutdown or restart of the entity in
associations
there are three
corresponding have levels been
connection oforblocked.
power control The
throttling
block network
(CCB)
the signaling
redundant services,
connection is lost.which This may SCTP affect the at question depending upon the failure described in
element
employed:
the SCCP automatically
stack. attempts to re-establish
signaling
association services' usedfunctionality.
is configuration for M3UA These or
protocol errorsIUA are the "Meaning of the alarm" section.
The
the
c. signaling
associations.
The received Inactivity (IT) activation
message ondata the is Failure during activation will result into
mostly
protocol, generated
as specified because in the offirst
abnormal
identifying
signaling
1.
not matching stack
Throttled-Light: withfailed.When
the Thethe
data configuration
temperature
stored locally input
of
in the inconsistency between the configuration
functioning
additional of the network
information field element
("Protocol"). due to If the failure type is "CONFIG_DB_FAILURE", the
was successfully
processing
connection resource
control validated
blockhas to beatactivated
exceeded theSCCP minor on the database and the signaling stack.
environmental defects. In(CCB)
case of the
such failures, signaling services will not startup successfully.
stack.
threshold,
stack, Failure
as the
the during
percentage
connection the activationof
control throttling
block is rare applied
(CCB) anddata it
the
This signaling
alarm can services will try beto onrecover And if the alarm exists when the signaling
is either
is mostly
depends due
on
corrupted the tobe set and
invalid
component
or configurations
inconsistent. type.
only iffor
An alarm
the whichwith
The alarm indicates
automatically
association is from
active that
failure
at the oneandSCTP ofif the services
failure
layer, that in
is, The effect
services havedepends
already onstarted,
the service
thenthat
the has failed.
validation
severity as has erroneously
"warning" is succeeded, or due to
d.
the The
NE
persists
status proto_class
has
then of failed
recovery
the toinstart
associationtheisraised
received
up
done and
is
in
only this
other not state.
isInactivity
after
than available (IT) configuration validation and activation will fail.
runtime
message status
is eitherof the system which made the
for use. The
restarting
"connection_down". the faultyclass-0
"Managed components.
This isorbecause
object" class-1
fieldIn in and
some
the thepathnotalarm
cases, 1. netconsole: kernel crash logs of the affected
activation
2. Throttled-Mid:
matching of the
the configuration
class-2. When the fail. This
temperature will
of result
the
specifies
restarting
status is the service
the failed signaling
supervised name
inside that
an has
services
active failed.
is needed
association node will nottype
If the failure be available
is in the system master
into inconsistency
processing resource between
has the configuration
exceeded the major
explicitly. and local syslogs.
by
The
databaseSCTPIn
SCTP.
threshold,
4. ERR_MSG: and
some
Ifassociation
the the
the restart
SCTP otherassociation
signaling
percentage hascases,
ofterminated
services.
thetakes
throttling
system themight
applied
"LM_CONNECTION_FAILURE",
If this association was the only available the activation of
automatically the faulty signaling services. The
the alarm forconfiguration
netconsole does notstack
exist in:
connection
abnormally,
depends
The releasing
status
on that the is,as
thenot
ofcomponent
"connection_down"
by normal
signaling type. abort
An
connection alarm ordue due
to
withto association in the association set, there which
signaling on the is total
either a graceful termination or an abnormal a. single
raised the node
alarm deployments;
will fail.
shutdown
severity
protocol as
data procedure
"minor"
unitalarm of
isdue
error raisedthe SCTP. in message
this The
state. SCTP has failure in the reachability of the remote network
This alarm
termination,
terminated is raised
this
the association is(ERR)
to one
cancelled.
due of the
to the The lack
received
following
alarm of
b. for the management node in a deployment that
element resulting in signaling traffic downtime.
from
failures:theASSOCIATION
peer network element. has
If only onetype management node.
"SCTP
SACK
3. chunks or HEARTBEAT
Throttled-Deep:
FAILURE"
When the customizable ACK
temperature
is chunks
raised only from
ofalarm, Onthethefailure
other hand, is if there are remaining
This
when isthean association
operator-specified is terminated abnormally. The significance of the external event that raised
"SCCP_ACTIVE_STANDBY_SYNCUP_FAILUR
the
either
5. peer the endpoint,
ROUTE_FAILURE: processing so the
resource reason or for
the the
air inlet of available associations in the association set, the
which reports
LM_CONNECTION_FAILURE: events externalbecomes to the host system. this
E", alarm
then theis created
described in a separate operator-
When
termination
the
The
the SCTP
component
releasing is either
of has
association
the in the underlying
exceeded
signaling the critical
connection IPactive
network
due signaling traffic to theSCCP connections
network element, are not
The
A
again
or
alarm
connection
in the later peer
is
on, raised
failure
this alarm
endpoint
when
has(for the
is setexternal
occurred
example, again between ifalarm
therethe
reset). istoa
input provided
getting instructions.between the active and
synchronized
reachable via the association set, is distributed to
threshold,
a remote
assumes
SGWNetMgr the
signaling
its percentage
assigned point alarm of throttling
state (defined applied by
path failure for(Signaling a path with services
the new central
association.part) and standby
the remainingSCCPassociations
compromising the association
in the high availability
set.
depends
code
input
the thaton
polarity
stack layer the
transitioned component
setting).
management The type.
to "INACCESSIBLE"
description
component.An alarm of the with
If this SCTP
severity
status. as association
"major" is raised is used in this M3UAPrior to
forstate. of
In this case, the signal transmissionSCCP
those SCCP connections. If the capacity may
external
raising theevents alarm, is described in a separate
protocol, then the the M3UA stack level layer manager will
connection switchover
be decreased. is triggered when this alarm is active,
operator-provided
exhaust the attempts instructions.
to reconnect to the all the SCCP connections managed by the
establishment
6. UNEXPECTED_ has not succeeded due
CONN_MSG: to either,
SGWNetMgr.
inconsistencies If the
of theoperator
ASP triggers
machine a change in
state recovery group which issued the alarm will be
The
the releasingconfiguration
signaling of the signaling when connection
the alarm dueis to dropped.
between
receipt itofwill the local andConfirm
Connection remote network
(CC) elements,
message,
active,
or there is incompatible create inconsistency
M3UAmessage, between
configurations the
Connection
configuration Refused
database (CREF) and the signaling or If the failure type is
between
Release the local
Complete and
(RLC) remote message network for elements.
a
services,
In override that mode, is, the theconfiguration
machine state willofbe the added "DISTRIBUTED_STACK_SYNCUP_FAILURE",
signaling
to the connection
configuration which
database is inaftertheaestablished
successful the recovery unit which loses the TCP connection
standby
state. This association
is applicable is moved forreflected
ANSI from only.
validation but
"ASSOC_STATE_INACTIVE" it will not be to in the will not participate in the distribution and
signaling services. redundancy. This will result in decreased
This is an operator-specified customizable alarm, The significance of the external event that raised
which reports events external to the host system. this alarm is described in a separate operator-
The alarm is raised when the external alarm input provided instructions.
assumes its assigned alarm state (defined by
input polarity setting). The description of the
This is an operator-specified customizable alarm, The significance of the external event that raised
external events is described in a separate
which reports events external to the host system. this alarm is described in a separate operator-
operator-provided instructions.
The alarm is raised when the external alarm input provided instructions.
assumes its assigned alarm state (defined by
input polarity setting). The description of the
This is an operator-specified customizable alarm, The significance of the external event that raised
external events is described in a separate
which reports events external to the host system. this alarm is described in a separate operator-
operator-provided instructions.
The alarm is raised when the external alarm input provided instructions.
assumes its assigned alarm state (defined by
input polarity setting). The description of the
This is an operator-specified customizable alarm, The significance of the external event that raised
external events is described in a separate
which reports events external to the host system. this alarm is described in a separate operator-
operator-provided instructions.
The alarm is raised when the external alarm input provided instructions
assumes its assigned alarm state (defined by
input polarity setting). The description of the
This is an operator-specified customizable alarm, The significance of the external event that raised
external events is described in a separate
which reports events external to the host system. this alarm is described in a separate operator-
operator-provided instructions.
The alarm is raised when the external alarm input provided instructions.
assumes its assigned alarm state (defined by
input polarity setting). The description of the
This is an operator-specified customizable alarm, The significance of the external event that raised
external events is described in a separate
which reports events external to the host system. this alarm is described in a separate operator-
operator-provided instructions.
The alarm is raised when the external alarm input provided instructions.
assumes its assigned alarm state (defined by
input polarity setting). The description of the
This is an operator-specified customizable alarm, The significance of the external event that raised
external events is described in a separate
which reports events external to the host system. this alarm is described in a separate operator-
operator-provided instructions.
The alarm is raised when the external alarm input provided instructions.
assumes its assigned alarm state (defined by
input polarity setting). The description of the
This is an operator-specified customizable alarm, The significance of the external event that raised
external events is described in a separate
which reports events external to the host system. this alarm is described in a separate operator-
operator-provided instructions.
The alarm is raised when the external alarm input provided instructions.
assumes its assigned alarm state (defined by
input polarity setting). The description of the
This is an operator-specified customizable alarm, The significance of the external event that raised
external events is described in a separate
which reports events external to the host system. this alarm is described in a separate operator-
operator-provided instructions.
The alarm is raised when the external alarm input provided instructions.
assumes its assigned alarm state (defined by
input polarity setting). The description of the
This is an operator-specified customizable alarm, The significance of the external event that raised
external events is described in a separate
which reports events external to the host system. this alarm is described in a separate operator-
operator-provided instructions.
The alarm is raised when the external alarm input provided instructions.
assumes its assigned alarm state (defined by
input polarity setting). The description of the
This is an operator-specified customizable alarm, The significance of the external event that raised
external events is described in a separate
which reports events external to the host system. this alarm is described in a separate operator-
operator-provided instructions.
The alarm is raised when the external alarm input provided instructions.
assumes its assigned alarm state (defined by
input polarity setting). The description of the
This is an operator-specified customizable alarm, The significance of the external event that raised
external events is described in a separate
which reports events external to the host system. this alarm is described in a separate operator-
operator-provided instructions.
The alarm is raised when the external alarm input provided instructions.
assumes its assigned alarm state (defined by
input polarity setting). The description of the
external events is described in a separate
operator-provided instructions.
This is an operator-specified customizable alarm, The significance of the external event that raised
which reports events external to the host system. this alarm is described in a separate operator-
The alarm is raised when the external alarm input provided instructions.
assumes its assigned alarm state (defined by
input polarity setting). The description of the
This is an operator-specified customizable alarm, The significance of the external event that raised
external events is described in a separate
which reports events external to the host system. this alarm is described in a separate operator-
operator-provided instructions.
The alarm is raised when the external alarm input provided instructions.
assumes its assigned alarm state (defined by
input polarity setting). The description of the
This is an operator-specified customizable alarm, The significance of the external event that raised
external events is described in a separate
which reports events external to the host system. this alarm is described in a separate operator-
operator-provided instructions.
The alarm is raised when the external alarm input provided instructions.
assumes its assigned alarm state (defined by
input polarity setting). The description of the
This is an operator-specified customizable alarm, The significance of the external event that raised
external events is described in a separate
which reports events external to the host system. this alarm is described in a separate operator-
operator-provided instructions.
The alarm is raised when the external alarm input provided instructions.
assumes its assigned alarm state (defined by
input polarity setting). The description of the
This is an operator-specified customizable alarm, The significance of the external event that raised
external events is described in a separate
which reports events external to the host system. this alarm is described in a separate operator-
operator-provided instructions.
The alarm is raised when the external alarm input provided instructions.
assumes its assigned alarm state (defined by
input polarity setting). The description of the
This is an operator-specified customizable alarm, The significance of the external event that raised
external events is described in a separate
which reports events external to the host system. this alarm is described in a separate operator-
operator-provided instructions.
The alarm is raised when the external alarm input provided instructions.
assumes its assigned alarm state (defined by
input polarity setting). The description of the
This is an operator-specified customizable alarm, The significance of the external event that raised
external events is described in a separate
which reports events external to the host system. this alarm is described in a separate operator-
operator-provided instructions.
The alarm is raised when the external alarm input provided instructions.
assumes its assigned alarm state (defined by
input polarity setting). The description of the
This is an operator-specified customizable alarm, The significance of the external event that raised
external events is described in a separate
which reports events external to the host system. this alarm is described in a separate operator-
operator-provided instructions.
The alarm is raised when the external alarm input provided instructions.
assumes its assigned alarm state (defined by
input polarity setting). The description of the
This is an operator-specified customizable alarm, The significance of the external event that raised
external events is described in a separate
which reports events external to the host system. this alarm is described in a separate operator-
operator-provided instructions.
The alarm is raised when the external alarm input provided instructions.
assumes its assigned alarm state (defined by
input polarity setting). The description of the
This is an operator-specified customizable alarm, The significance of the external event that raised
external events is described in a separate
which reports events external to the host system. this alarm is described in a separate operator-
operator-provided instructions.
The alarm is raised when the external alarm input provided instructions.
assumes its assigned alarm state (defined by
input polarity setting). The description of the
This is an operator-specified customizable alarm, The significance of the external event that raised
external events is described in a separate
which reports events external to the host system. this alarm is described in a separate operator-
operator-provided instructions.
The alarm is raised when the external alarm input provided instructions.
assumes its assigned alarm state (defined by
input polarity setting). The description of the
This is an operator-specified customizable alarm, The significance of the external event that raised
external events is described in a separate
which reports events external to the host system. this alarm is described in a separate operator-
operator-provided instructions.
The alarm is raised when the external alarm input provided instructions.
assumes its assigned alarm state (defined by
input polarity setting). The description of the
external events is described in a separate
operator-provided instructions.
This is an operator-specified customizable alarm, The significance of the external event that raised
which reports events external to the host system. this alarm is described in a separate operator-
The alarm is raised when the external alarm input provided instructions.
assumes its assigned alarm state (defined by
input polarity setting). The description of the
This is an operator-specified customizable alarm, The significance of the external event that raised
external events is described in a separate
which reports events external to the host system. this alarm is described in a separate operator-
operator-provided instructions.
The alarm is raised when the external alarm input provided instructions.
assumes its assigned alarm state (defined by
input polarity setting). The description of the
This is an operator-specified customizable alarm, The significance of the external event that raised
external events is described in a separate
which reports events external to the host system. this alarm is described in a separate operator-
operator-provided instructions.
The alarm is raised when the external alarm input provided instructions.
assumes its assigned alarm state (defined by
input polarity setting). The description of the
This is an operator-specified customizable alarm, The significance of the external event that raised
external events is described in a separate
which reports events external to the host system. this alarm is described in a separate operator-
operator-provided instructions.
The alarm is raised when the external alarm input provided instructions.
assumes its assigned alarm state (defined by
input polarity setting). The description of the
This is an operator-specified customizable alarm, The significance of the external event that raised
external events is described in a separate
which reports events external to the host system. this alarm is described in a separate operator-
operator-provided instructions.
The alarm is raised when the external alarm input provided instructions.
assumes its assigned alarm state (defined by
input polarity setting). The description of the
This is an operator-specified customizable alarm, The significance of the external event that raised
external events is described in a separate
which reports events external to the host system. this alarm is described in a separate operator-
operator-provided instructions.
The alarm is raised when the external alarm input provided instructions.
assumes its assigned alarm state (defined by
input polarity setting). The description of the
This is an operator-specified customizable alarm, The significance of the external event that raised
external events is described in a separate
which reports events external to the host system. this alarm is described in a separate operator-
operator-provided instructions.
The alarm is raised when the external alarm input provided instructions.
assumes its assigned alarm state (defined by
input polarity setting). The description of the
This alarm indicates that the signaling object This indicates transmission break. Due to this,
external events is described in a separate
"SigObjectID" as specified in the "Identifying signaling traffic is disturbed and as a
operator-provided instructions.
application additional information" field has consequence there is a significant KPI drop or
changed status in a number of times based on even a possibility of network outage.
the "ObjectFluctuationThreshold" parameter, and
This alarm indicates that the D-channel has The signaling on the D-channel in question is
within the span based on the "CriticalAlarmTimer"
terminated abnormally, that is, not "disabled" broken.
parameter. The parameters
administratively. The failure may be due to a fault
"ObjectFluctuationThreshold" and
in the primary rate access terminal used by the
"CriticalAlarmTimer" are defined as part of
D-channel, or a fault in the D-channel
This alarm,
product when raised
deployment and with a "Critical"
appears in the severity, As long as the faulty hardware component has
connections, or transmission failure within the
indicates
"ApplicationthatAdditional
a flood ofInformation"
hardware events has
field of the not been isolated, the system will drop the
SGW (that is, between LAPD and IUA), or non-
been
alarm.detected, which could be caused by a faulty excess events, which may lead to inconsistent
operation of the remote end.
hardware component. The system takes hardware alarms state.
automatic
This action by limiting the ratenotifications
of hardware
This alarm
alarm suppresses other
is raised where alarm
either NE-wide eSW Either eSW installation or activation has failed
alarm
relatednotifications
to the tostatus
object a fixedchange
value, to
setavoid
in
installation, or NE-wide eSW activation has and may cause the system to malfunction, or
deployment,
flooding the and dropping
syslog with the notifications.
alarm rest.
failed. This failure is either full or partial. The may endanger the reliability and failover in the
failure can be one firmware component in a blade later phase if the issue is not solved and fixed.
or multiple firmware components in a blade, or The user is advised not to power off any blades
The alarm indicates that running configuration of Failure in applying correct configuration on the
failure would have occurred in multiple blades or nodes at this stage.
the switch is not in sync with the configuration switch may lead to improper functioning of the
with multiple firmware components. User
files present on the management node. switch.
attention is required to investigate the issue, fix
the problem, and retry the operation.
The alarm is raised only if the difference is found.

In case there is no difference and the alarm is


already active, it is cleared.
This alarm indicates, that the Packet Timing Unit This alarm reports synchronization lost in the
has lost its input reference. current system. This condition or state might
bring negative impact on the system which needs
The different error types are described as precise signal for synchronization if no other
follows: suitable reference is available. Within a short
The node is starting or restarting. The time During the start-up, the node is not able to
period of signal loss, the clock synchronization
synchronization is not available until the node provide services. Restoring the services after the
0 - [ToP Lock Fail] - PTPv2 HW Engine DPLL in system will start using local oscillator if no other
completes initialization. This alarm indicates that node is up takes additional time, depending on
Holdover suitable reference is available. After the
user can expect the service restoration after the role of the node.
1 - [JAT Lock Fail] - SyncE Jitter Attenuator synchronization loss, the quality of the clock
some time. The restart may have been initiated
This
DPLLalarm indicates that internal supervision in
in Holdover When
signal the supervision
depends on the sensor
quality is of asserted,
the local which
by an operator / OS failure / transient failure of
the Packet
2 - [ToP TimingErr]
RefInput Unit- (PTU)
PTPv2ofHW theEngine
Box indicates
oscillator.various process failures, the
the critical hardware component, a recovery
Controller
ReferenceNode Input (BCN)
Lost detected an error. The performance of Packet Timing Unit gets affected
action of critical application failure and so on.
error canRefInput
3 - [JAT be caused Err]by a various
- SyncE process
Jitter Attenuator in many ways.
running
Reference on Input
BS2AM-ALost unit such as GNSS
This different
The alarm indicates that are
error types internal supervision
described as in When the supervision sensor is asserted which
process,
4 - [GNSS HWM process, SYNC process, NTP Following are the possible effects for different
the Packet Lock
follows: Timing Fail]
Unit - GNSS
(PTU) module
of the Box in Holdover indicates various process failures, the
process,
5 - [GNSS also caused
RefInput by -low
Err] GNSS system
SMA memory
Antenna and offset values of Supervision OEM sensor:
Controller Node (BCN) detected an error. The performance of Packet Timing Unit will be
high system load.
Open-Circuit (NC) by a problem in loading
error
0 can beby
- Initiated caused
power up affected in many possible ways.
6 - [ToPand
startup SyncInput
default Err]
sync - config
PTPv2files.HW Engine 1pps 0 - If the Packet Timing Unit is operating as a
1 -
The Initiated by hard reset
Inputoperational
different
Lost status
error typesofare thedescribed
connectivity unit
below: Usually
Grand Masterthere are Clock at least
or anytwo fibremode
other
Following are the possible effects for different
channelrequiring
2 - Initiated
(Fibre Channel by warm
Switch) resethas changed. There can switch
7 - [ToD Input Err] - Time-of-Day Input Lost GNSS modules
time of day equipped
input, thein the system and the
time/phase
The
3 different error types are described as
- several
N/A offset values of Supervision OEM sensor:
be
0
8 - [GNSS reasons failure]:
process
[Clock Converter for
Lock thisFail]
alarm,
GNSS such
daemon
- 1Hz/10MHz as is not fibre channel connection
synchronization capabilityiswillstillbe
fully operational if
impacted.
follows:
4 - N/A
changes
working in the configuration
properly.
clock converter DPLL in Holdover causing the online the redundantManagement
1 - Hardware link via the redundant
functions switch
relatedistoup.
5 - OS initiated hard reset [6-7] - The synchronization process will not work
unit
1
9 - to be
[HWM
[Clock offline
process
Converterorfailure]:
a RefInput
malfunctioning
Hardware
Err] - switch. Unit However,
Clock alarm the lost
reporting are connection
not should
available from be
there-Packet
The
6 protocol
- [SYNC
N/A statusconfig
startup for some of the
loading switch
error]: ports properly
Packet Usually thereand the areIEEE1588v2/SyncE
at least two fibre channel
Status
Management
Converter andReference
Unit State Input
daemon values
is not will giveproperly.
working
Lost the clear established
Timing Unit as soon as possible to ensure fault-
side.
has changed.
Timing
7 - SystemUnitRestartThere can be several
synchronization processreasons
could not for switch modules
configuration willequipped
be invalid. and the fibre
Traffic channel
is down.
reason
2 - [SYNC for process
the alarm. tolerant operability. If the redundant connection is
this
loadalarm, as failure]:
changesSynchronization
such configuration
the startup the configuration 2
infile. - IEEE1588v2
connection is stillorfully
SyncE synchronization
operational if the
daemon
To
that makeis the
understand not port
working
theconfig
meaningproperly.
to of the current alarm, not re-established
functions are
linknot
and the only
viaavailable.
remaining link
Synchronization
7
To- [SYNC
understand the unable
default meaning process
loading
of the thePacket
error]:
current alarm redundant
goes down,
the redundant switch is up.
the devices attached to fibre channel
3 - [NTP
check
protocol,the process
error failure]:
type printed
or manually/automatically Network
out in Timing
the IAAI
isolating field. traffic is down.
the However, the redundant connection should be re-
Timing
in detail,Unit
checksynchronization
the error typeprocess printed could
out in not
the
Protocol daemon is not working properly. become
3 inaccessible.
- NTP protocol is not
port from
load
IAAI loop orsettings.
the default
field fabric. Port Status and Port established as soon as available.
possible to ensure fault
4
State values will give the clear reason for raising tolerant operability. Ifmay
- [System Memory low]: Packet Timing Unit 4 - Some processes run out of memory
the redundant connection andis
system
this alarm.memory is low, 10% or less is now do
not not work properly.
re-established and the only remaining link
Please note that N/A means that there is no
available (Total 2GB on BS2AM-A). 5 - This
goes errorthe
down, is not likelyattached
devices to be reported
to fibreinchannel
Slave
probability of sensor logging that offset value.
5 - [System Load high]: Packet Timing Unit mode, but for
become inaccessible. Grandmaster clock and Boundary
system load is high, CPU resource utilization is clock, high system loads could be observed
over 95%. depending on the amount/number of slaves that
have been configured. Additionally, the packet
rate selected will influence in the load.
Identifying Additional Information Fields Additional Information Fields

1. Erroneous data: 6. Original Application Additional Info.


Identifies the alarm data that was incorrect or that was
completely missing. Only the name of the first field
containing invalid
data is mentioned here. Possible values are:
1. The maximum number of alarms configured for this
aFamily: Alarm Family given in the data is not reasonable.
Network Element, which is an integer value.
SP: Specific Problem given in the data is not reasonable
2. Managed Object ID
or is not known by the alarm system (raising the alarm
Distinguished name of the managed object that is the
70280 instead
cause of the alarm. From the first alarm rejected due to
1. 70005
of Any further information
for unknown if available.
specific problem are switched off in
overflow.
Alarm System configuration).
3. Specific Problem (alarm number)
MOId: Managed Object Id given in the data is not
Further qualifies the Probable Cause. From the first alarm
reasonable.
rejected due to overflow.
MONEId:
1. Perceived
Recovery Network
group, Element
full name Idof
(where faulty Managed
the recovery group, 2. System
4. Severity
Object is located
/ISupervisionServer in given in the data is not reasonable.
The severity of the alarm, from the first alarm rejected due
applId: Application Id (of alarm application) given in the
to overflow. Possible values are:
data is not reasonable.
1 Indeterminate
appNEId: Network Element Id (where alarm application is
2 Critical
located in given in the data is not reasonable.
3 Major
IAAI: Identifying Application Additional Info given in the
4 Minor
data is not reasonable.
5 Warning
alarmTime: Alarm time given in the data is presented in a
5. Application ID
too-long format, or is in non-numerical format. 1. Max size: the maximum size of the database
Distinguished name of the application that is raising the
utcShift: Shift between UTC time and the local time given in kB.
alarm. From the first alarm rejected due to overflow.
in the data is not reasonable.
PS: Perceived Severity given in the data is not 2. Fill ratio: the fill ratio of the database.
reasonable.
The
AAI: attribute
Application id for which the
Additional alarm
Info givenis in
raised, is a is not
the data 1. Maximum threshold exceeded
3. Disk-space usage of working directory (for
combination
reasonable. of the omes id and attribute number; for 2. Maximum threshold no longer exceeded
example /mnt/db/<dbname> is xx%)
example,
notificationId:m2002c0001
Notification is Id given in the data is not 3. Monitoring stopped
the attribute id in which m2002 is the omes id and c0001
reasonable.
is
FC:the
Flowattribute
control number.
given in the data is not reasonable.
1. Unique ID of the license 2. Number of days for the license to expire
ET: Event type given in the data is not reasonable. along with the absolute end time of the license
EET: Extended event type given in the data is not file ("Expired", if the license has
reasonable. already expired)
OT: Object type given in the data is not reasonable.
1. CPU The
length: INFO :
combined length of the string type fields 2. The upper threshold value has been
Possible
(Managed Objectare:
values Id, Application Id, Application Additional surpassed and thus the alarm has been raised.
a) CPU Index
Info, For overall CPU load, the top 10 CPU users
The value "CPU
Identifying Index"Additional
Application displays the name
Info) givenofinanthe
individual
data are also listed. The format of this field is the
CPU
exceeds corethewhen an alarm
maximum is raised
allowed by monitoring
value. Note that inthe load
this following:
1. Mountpoint 2. Mountpoint Information
of individual CPU cores.
case PID %CPU Process Name/COMMAND (CPU
This field shows the upper threshold
For
bothExample:
Application CPU 1 Managed Object Id in the given
Id and load > <upper threshold value percentage>,
percentage crossed by mountpoint.
b) All CPUs
data are considered as invalid, as only the combined interval: <sampling interval in seconds>,
When surpassed, the alarm will be raised.
The
lengthvalue
is "All CPUs" is displayed if the alarm is raised by sampling: <number of samples to average>)
The format of the field is "Filesystem <device
monitoring
verified. the average load on all cores of the CPU. <PID1> <load1> <Process name of PID1>
name> (<type of filesystem>) mounted on
2. Original specific problem ...
<mount point> uses > <upper threshold value
: <PID10><load2> <Process name of PID10>
percentage> of the space" e.g. "Filesystem
Specific problem (the alarm number) of the invalid alarm For example:
rootfs (rootfs) mounted on / uses > 90.0% of
can also contain the original invalid value, if this was the PID %CPU MONAME/COMMAND (CPU
the space".
invalid field. load > 90%, interval: 5s, sampling: 16)
3. Original perceived severity: 1012 2.0 glusterfs
Perceived severity of the invalid alarm can also contain
the original invalid value, if this was the invalid field. 1061 1.0 glusterfs
1. Identifies the managed object type:
"Node"
"Recovery unit"
"Process"
"Recovery group".
1. Reason for alarm 2. Information on memory utilization
2. Explains the string of the fault type (if that information is
Possible values: This field shows that the upper threshold value
available), or just the string "failure". For example:
"Memory critical" has been surpassed, thus, resulting in raising
"Process has stopped responding to heartbeats."
the alarm. The top 5 memory users are also
"Node connection heartbeat failure."
listed. The alarm is cleared automatically when
1. Failed subsystem
"Recovery group failure."
memory utilization remains for a period of five
2.
3. Failed
For hotresource
active standby and cold one plus M type of
seconds below the lower threshold value. If it
Where
recovery the valuesthat
groups aresupport
as: controlled switchover, the
oscillates around the thresholds during this
CPU:Index
role of the RUof the
or aprocessor
time, it may or may not remain raised
FILESYSTEM:
process Name ofisthe mount point
Ethernetisinterface
shown. Role shown with strings "with role
depending on the sampling algorithm used.
ACTIVE" or "with role HOTSTANDBY".
The format of the field is
"PID MemUsage(kB) MOName/command
(Memory utilization >
1. Identifies the MO type: a cluster, a node, or a RU. <upper threshold value percentage>)
2. For hot active standby and cold one plus M type of <PID1> <PID1 memory usage> <Process
recovery groups that support controlled switchover, the name of PID1>
role of the RU or a <PID2> <PID2 memory usage> < Process
process is shown. Role is shown with strings "with role name of PID2>
ACTIVE" or "with role HOTSTANDBY". ...
<PID5> <PID5 memory usage> < Process
name of PID5> ".
If a process is not located in an RU, the
1. Identifies the MO type (the cluster, a node, a process, process name is displayed replacing the MO
or an RU). name of the process. For example:
2. For hot active standby and cold one plus M type of "
recovery groups that support controlled switchover, the PID MEMUSAGE(KB)
role of the RU or a MONAME/COMMAND (Memory utilization >
1. Identifies the MO type (a cluster, a node, or an RU) 82.0%)
process is shown. Role is shown with strings "with role
2. For hot active standby and cold one plus M type of 4365 521452 python
ACTIVE" or "with role HOTSTANDBY".
recovery groups that support controlled switchover, the
role of the RU or a 1012 148512 glusterfs
process is shown. Role is shown with strings "with role
1. Identifies the MO type (a cluster, a node, or a RU)
ACTIVE" or "with role HOTSTANDBY". 828 42296 glusterfs

1281 38820 logf

1. Identifies the managed object (MO) name of the new 1061 26600 glusterfs".
active RU.

The attribute ID for which the alarm is raised, is a Possible values are:
combination of the 1. Minimum threshold exceeded
omes ID and attribute number; for example, m2002c0001 2. Minimum threshold no longer exceeded
is 3. Monitoring stopped
the attribute
1. reason, possible values: 1 - file cannot be opened, 2 -
ID in which m2002 is the omes ID and c0001 is the
permanent file read error
attribute number.
2. additional information about the problem (for example,
text of the corresponding system exception)
Possible values:
1. Invalid attribute's value or empty string if the attribute or
its value is missing.
2. "LDAP server unavailable, using default configuration
parameters" if connection to LDAP has failed.
1. Invalid record (please note that the field can hold no 2. Error code, possible values:
more than ~390 symbols, so the original invalid record 1 - missing mandatory field;
can be cut). 2 - duplicated field;
3 - empty record;
4 - non-alarm data record.
Data from the original alarm: 4. Application additional info from the original
3. Field name (for missing or duplicated field).
1. MOId alarm.
2. Specific problem
3. Identifying application additional info
(The application ID is present in the MOId field of the
1. Heartbeat interval in seconds.
alarm)

1. Heartbeat interval in seconds.

1. Name of the still operational CLA node.


For example: "Operational: /CLA-0"
2. Name of the unavailable CLA node.
For example: "Unavailable: /CLA-1"
1. The number of recovery units providing service:
RUsInService=<n>
2. The number of faulty and non operational nodes:
NodesFaulty/Down=<n>. Non-operational nodes turn
faulty if the system
1. A string explaining that the situation was caused by a
does not manage to bring them up within some minutes.
lock operation.
For example string "NodesFaulty/Down=1/3" means that
three nodes are
currently non operational and one is currently declared
1. Name of the recovery group to which the recovery unit
faulty.
belongs. For example,
3. The number of failed"/Directory".
recovery units: FailedRUs=<n>
2.
4. Situation
The number when the failure
of locked RUs:happened:
LockedRUs=<n>
string "allocating" or "de-allocating"
3. Type of resource allocation:
"IP(address)", "disk(mount point)" or "ctrlscript". For
example, "IP(192.1.1.78)" or "disk(sysimg)".
4. Only present if argument 3 is "ctrlscript", and contains
the name of the control script that reported the failure. For
1. Username
example, 2. Error type
"RUControlDirectoryServer.sh"
USER_NAME_DUPLICATE_ERROR -
External username is the same as one of the
already existing NE internal usernames. Note:
1. Cause of the alarm 2. Problem type
This should not happen, if NetAct is following
This field provides brief information about the cause of the
the agreed way of naming users.
alarm. Possible values of the field are: Example:
Dedicated_IP_Not_Configured_for_PAP. Problem_type:BIND_IP_NOT_CONFIGURED
USER_NAME_RESERVED_ERROR -
PAP_unable_to_connect_to_remote_LDAP.
External username is a reserved username in
No_Value. The description of each problem type is
the NE.
described as follows:
USER_NAME_TOO_LONG_ERROR -
LDAP_DOWN - Both the primary and
External username is too long (supported
secondary NetAct LDAP servers are down,
usernames are up to 32 characters long).
unreachable, not responding within certain
time, or replying with a return code indicating
that the external LDAP server is busy.
USER_NAME_CONTAINS_INVALID_CHARS_
ERROR - External username contains invalid
INVALID_CREDENTIALS - The NE account
1. Username 2. Change type
User_removed_or_denied_access_to_NE
User_permissions_changed

1. Invalid configuration attribute 2. Configuration repository type (LDAP, FILE,


JPROP - Java property, ENVVAR -
environment variable)
3. Fault type (1 - attribute is missing, 2 -
attribute value is invalid)
Cannot use service: X
4. Wrong attribute value (N/A if the attribute is
Failed function: Y
missing) - optionally
Failure code: Z
5. Use default or closest acceptable value -
optionally
1. Unknown specific problem in the original alarm
notification.

1. Time stamp of the synchronization loss.

2. Current value of fsdbInSyncLimit (default is


60 seconds).
1. Actual number of used connections to the
3. Current value fsdbAsyncRepStandaloneLimit
DB->database
(default is 300 seconds).
2. Maximal possible number of connections to
the DB->database
Failure reason
3. Remaining number of connections to the DB-
>database

1. VrfId 6.
4. Diagnostic code in
Limit configured onConfiguration
why the session was
Directory
previously DOWN. The possible
fsdbConnectionsAlarmLimit values
(default is nfor the
= 10)
2. Source address of the session code are:
5.0Limit
- No_Diagnostic
configured in Configuration Directory
3. Destination address of the session 1 - Control_Detection_Time_Expired
fsdbConnectionsCheckFreq (default is 10
2 - Echo_Function_Failed
seconds)
4. Hop type - The possible values are: 3 - Neighbor_Signaled_Session_Down
a. M - For multihop 4 - Forwarding_Plane_Reset
b. S - For singlehop 5 - Path_Down
1. Feature code 2.6License state (possible values are "OFF"
- Concatenated_Path_Down
5. Reference ID - This is an optional parameter used in and
7 - "ON")
Administratively_Down
multihop solution. 3.8Feature Admin State (possible values are
- Reverse_Concatenated_Path_Down
"OFF" and "ON")
1. Error Code. The possible values are: 7. The time when the session failure was
SCLI_DAEMON_NOT_AVAILABLE - The SCLI daemon observed.
was not available at the time of execution of the
user/subsystem configuration script.
FSCONFIGURE_SAVE_ERROR - The fsconfigure -- save
command did not execute successfully hence the applied
configuration was not saved.
USER_CONFIG_SCRIPT_ERROR - The script mentioned
in the second field failed to execute.
2. Script name (optional)
This is an optional parameter which indicates the name
of the script that failed.
Note that, this parameter is only applicable in case of
the USER_CONFIG_SCRIPT_ERROR code.
1. Error code

Example: CERT_NO_RW_PERMISSION
Possible values for error codes:
1. Error code:
- CERT_NO_RW_PERMISSION
This error code indicates that the directory is in a read-
Error code is the error code of LDAP server or
only mode or the parent directory resides on a read-only
RUIM providing detailed
file system.
information for the problem type. Possible error
codes and their explanation
- CERT_NO_RESOURCES
is given below. For each problem type,
This error code indicates that there is no free disk space
corresponding LDAP Server or RUIM error
on the device for creating a file, or the resources available
codes are given below in order to provide more
in the system are insufficient to perform a write operation.
detailed information.
- CERT_MAX_FILE_COUNT
Example: LDAP_CONNECT_ERROR
This error code indicates that the maximum allowed
number of files are currently open in the system.
1.WARNING_FORWARDING_TABLE_LIMIT_EXCEEDE The following
Route values are
limit exceeded: Thepossible forofthe
soft limit theerror
D codes:
number of supported routes is exceeded and
- CERT_CREATION_FAILED
2.ERROR_FORWARDING_TABLE_ADDITION_FAILED
This error code indicates that a component of the path the user should proceed cautiously when
LDAP_CONNECT_ERROR
adding more routes.
(prefix specified by the path) does not name an existing
WARNING_FORWARDING_TABLE_LIMIT_EXCEEDED: LDAP_PROTOCOL_ERROR
directory, the path is an empty string, or the path 1. General Information
This message indicates that the LDAP_TIMELIMIT_EXCEEDED
argument specifies the slave sidenumber of routes added
of a pseudo-terminalRoute4 addition failed: The hard limit of the
Possible values:
in the local LDAP_UNAVAILABLE_CRITICAL_EXTENSIO
device that forwarding
is locked. table of a node has caused an number of supported routes is reached and the
A) sysPeer not chosen
excess of the maximum number of routes supported in the N
addition of IPv4 route has failed.
NTP is unable to select a server.
LDAP_CONFIDENTIALITY_REQUIRED
node.
B) The sync time difference is beyond the
LDAP_UNWILLING_TO_PERFORM
1. Reachable
Route6 peers
addition failed: The hard limit of the
allowed offset
PMG_RESULTS_NOT_GET_YET
ERROR_FORWARDING_TABLE_ADDITION_FAILED: 2. Unreachable
number peersroutes is reached and the
of supported
The time difference between NTP server
This message indicates that one or more routes could not RUIM_TIMEOUT
addition of IPv6 route has failed.
and NTP client is greater than
be added in the local forwarding table of a node, because
the allowed offset.
the capacity of
Test-license theisforwarding
state enabled. table has been exceeded. C) ntpdc polling is not successful
Monitoring of NTP time sync has failed.
D) NTP is syncing
NTP is syncing to this server.
;SN:<serial-no> ISSID:<issuer-id> C:<EE> D:<domain> DTE:<days> CET:<end-
1. SN: Serial number of the certificate time>
2. ISSID: Issuer ID of the certificate C:<CA> D:<domain> DTE:<days> CET:<end-
Example: time> CT:<cert-type> CAID:<ca-id>
"SN: A08038 ISSID: /C=IN/O=nokia/CN=FlexiRootCA" C:<EE> D:<domain> Expired
1. Reason for failure. The possible reasons for
means the certificate has serial number as A08038 and C:<CA> D:<domain> Expired CT:<cert-type>
failure are:
the issuer of the CAID:<ca-id>
certificate is "/C=IN/O=nokia/CN=FlexiRootCA". 1. C: Certificate (possible values are "EE" and
a. UNABLE_TO_FETCH_ROOTCA: The root
"CA")
certification authority (CA) certificate or trust
1.
2. Name
D: Domainof the current start-up snapshot.
anchor was missing for the corresponding NE
2.
3. Name
DTE: Daysof theTo
previous
Expire start-up snapshot.
service domain. The root CA certificate or trust
3.
4. LDAP configuration
CET: Certificate Endchanges
Time apart from KUR
anchor is mandatory for KUR operation
changes
5. CT: CAsaved. The possible
Certificate values values
Type (possible are, are
initiation.
Yes/No.
"new-with-new", "old-with-old", "old-with-new"
1. Failed node name 2. Expected minimum RAM amount
andYes - Current start-up snapshot includes
"new-with-old")
3. CMP_REQUEST_FAILED:
b. Actual RAM amount The CMP
unsaved
6. CAID: CA configurations
identifier ofbefore
the CAKUR.
certificate
parameters of "default" domain were either
No - Current start-up snapshot includes only
Examples:
missing or improperly configured. The CMP
KUR changes
1. "C:EE no other
D:default unsaved
Expired" means configurations
the EE
parameters will always be fetched from the
before autoKUR.
certificate present under the default domain
"default" domain while performing KUR
4.
hasComma
alreadyseparated
expired. list of NE service
operation for any of the NE service domains.
domains
2. "C:EE for which automatic
D:default KUR update
DTE:5 CET:2013-03-15
succeeded.
09:19:24-10:00" means the EE certificate
c. UNABLE_TO_INSTALL_EE_CANDKEY:
present under the default domain is going to
The EE candidate private key required for
Example:
expire inconfig-
5 days, and the certificate will not
performing KUR operation is not auto-
R_FPT_170.3.WR.64.r.1602220650.395744-
be usable after 09H 19M 24S of 2013-03-15.
generated successfully and is not
INITIAL
3. "C:CAconfig-
D:default Expired CT:new-with-new
configured/installed for the corresponding NE
R_FPT_170.3.WR.64.r.1602220650.395867-
CAID:common" means the CA certificate of
service domain.
INITIAL Yes swmgmt,ruim
type new-with-new present under the default
1. Failed node name 2. Expected CPU count
3. CPUs found

1. Failed service name or failed process name.

1. Failed target object service name.

1. TWAMP session ID 2. Actual PLR


The session ID of a TWAMP session, the possible The PLR value is calculated by a TWAMP
value range is from 1 to 4095. session.

3. Reference threshold PLR


1. TWAMP session ID 2. Actual RTT
Range: 0.00% to 100.00%
The session ID of a TWAMP session, possible value The RTT value calculated by the TWAMP
default: 100.00%
range is from 1 to 4095. session.

3. Reference threshold RTT


1. List of mismatching interfaces
range: 0 to 1000 ms
This field contains one or more
default: 1000 ms
<parameter>:<interface_name> pairs for which
mismatching data was identified. Items are separated with
a comma.
1. IPv6 address and interface 2. remote MAC address
Examples:
"ip-info:ext2" indicates missing interface-info counterpart
for ext2 interface.
"interface-info:ext3" indicates missing ip-info counterpart
for ext3 interface.
"ip-info:ext2,interface-info:ext3" indicates both issues
listed above were observed.

1. Reason for error, possible values are:


"SSH_key_fetch_failure": Failed to fetch the
keys from openstack

2. Phase when error occurred, possible values


1. Service name: the service name which starts
are:
up a Redis instance.
"Commissioning_phase": Failed to fetch the
2. Maximum size: the maximum database
keys in commissioning phase
memory in kB.
"Runtime_phase": Validation of keys failed
3. Memory usage threshold: the memory ratio
1. Mountpoint target of the device 2. Mount
while information
attempt to loginfrom yaml file user-data
alarm limit of the database.
For example: /mnt/brick/export section related to volume
4. Actual memory usage of database as
For example:
percentage.
storage-provider-export=/mnt/bricks/export;ext4
;/dev/vdg
1. PTP master IP address 2. Domain number
1. Alarm observation file name
2. Expected file offset
3. File end offset

1.PTP master IP address 2. Master clock status


3. Grandmaster clock class

1. Issuer of the CRL 3. Reason for CRL update failure


Example: "/DC=NSN" Possible values are:
2. Distribution point source type
Possible values are: "CERT-EXT" and "MANUAL" "CERTMAN_CRL_DECODE_FAILED": The
CRL being downloaded is corrupted and
1. Serial number of the certificate
cannot be read due to which decoding failed.
Example: "A08038"
"CERTMAN_CRL_DOWNLOAD_FAILED": The
2. Issuer ID of the certificate
fetched CRL file from PKI operator's CRL
Example: "/C=IN/O=nokia/CN=FlexiRootCA"
distribution repository failed for an
3. Name of the domain where the certificate resides
unknown/identifiable reason.
Example: "default"
"CERTMAN_CRL_DP_HOST_RESOLUTION_
FAILED": The DNS host resolution for FQDN
based CRL distribution point failed for
configured
node=NN IP type.
"CERTMAN_CRL_DP_INVALID_URI": The
network
Example:connection to PKI operator's CRL
distribution
node=SN-1 repository failed because of an
invalid URI being configured.
brick=BB node=NN
"CERTMAN_CRL_DP_IPV6_HOST_UNSUPP
ORTED": The network connection to PKI
Example:
operator's CRL distribution repository failed as
brick=log node=SN-0
the configured IPv6 is not supported.
"CERTMAN_CRL_DP_IPV6_RESOLVED_HO
List of files with data loss.
ST_UNSUPPORTED":
E.g. The network connection
to PKI operator's CRL distribution
Files which need a recovery: repository
failed as host configured for IPv6 type is
/mnt/log/SN-2/local/syslog
unsupported.
/mnt/log/SN-2/local/debug
"CERTMAN_CRL_DP_LDAP_INCONSISTENT
_CRL_CONTENT": The PKI operator's CRL
distribution repository provided multiple CRLs.
"CERTMAN_CRL_DP_LDAP_OPEN_CONNE
CTION_FAILED": The network connection to
download CRL from external LDAP server
failed.
"CERTMAN_CRL_DP_LDAP_SEARCH_FAILE
D": The request to download required CRL
failed as the configured URI is not valid.
"CERTMAN_CRL_DP_PARSE_URI_FAILURE
": The network connection to the PKI operator's
CRL distribution repository failed as parsing of
URI failed.
"CERTMAN_CRL_DP_URI_MULTIPLE_ATTR"
: The network connection to PKI operator's
CRL distribution repository failed as URI
contains multiple attributes. For example -
ldap://10.44.35.92:389/CN=Root
%20CA,DC=NSN%20Ulm?10.44.35.92:389/CN
=Root%20CA,DC=NSN%20Ulm?
certificateRevocationList;binary
"CERTMAN_CRL_DP_URI_NO_ATTR": The
network connection to PKI operator's CRL
distribution repository failed as URI does not
contain any attributes. For e.g. -
ldap://10.44.35.92:389/CN=Root%20CA
"CERTMAN_CRL_DP_URI_NO_DN": The PKI
node=< logging node name >
Example:
node=LN-0

1. Configuration entry
The name and value of the attribute that is out of order
under the fssnmpMediatorName=1, fsFragmentId=SNMP,
fsClusterId=ClusterRoot branch.
1. IP address of the SNMP agent that does not
respond.

1. IP address of the SNMP agent that had sent the trap 1. Version of the used SNMP, possible values
are:
SNMPv1
SNMPv2c
1. IP address
2. Object identifier of the received trap
The trap was generated because this IP address entity
had an incorrect community string.

1. IP address of the switch 2. Hardware platform.


Contains the string value identifying the
hardware platform, as given in the environment
variable $HW_PLATFORM in the running
cluster.
1. Identifies which port has changed the state to down in 2. IP address of the switch:
the Ethernet device. The IP address of the Ethernet switch that sent
3. Type of the original SNMP trap.
the simple network management protocol
Contains the string value "coldStart" or
(SNMP) trap.
"warmStart".
The format of the backup log file name is:
3. Type of the interface:
Depending on the switch type, this field may
NetworkElementName_<full|
not be present.
partial>_backup_YYYYMMDD_hhmm.log
1. IP address of the switch 2.
4. fabricBroadcastControlGroupConditions
Textual description of the interface:
A 32-bit counter
Depending which
on the denotes
switch the field
type, this number
mayof
broadcast storm control conditions that have
not be present.
been detected.
5. String value of the interface administrative
3. fabricBroadcastControlRxFrameDiscards
state:
A
The32-bit counter
possible which
values aredenotes the amount
"up", "down" or of
frames discarded
"unknown". due toonbroadcast
Depending the switchstorm
type, this
control.
field may not be present.

6. String value of the interface operational


state:
The possible values are "up", "down" or
"unknown". Depending on the switch type, this
field may not be present.
1. Name of the application mount point.
2. Name of the broken partition or logical
volume

1. Name of the application mount point. This


identifies the application that uses the DRBD.
2. Name of the DRBD partition or logical
volume that is not synchronising.
1. Trivial File Transfer Protocol (TFTP) failure. 4. Unable to get <filename> over TFTP.
2. Switch issue. 5. Unable to put <filename> over TFTP.
3. Falling back on default fabric configuration. 6. Failed to spawn monitoring script on the
Switch.
7. Failed to create monitoring script on the
1. The internal temperature of the unit in
Switch.
degrees Celsius.
8. Lock file not created by IMI even after
timeout.
9. Lock file not deleted by IMI even after
1. Switch IP address. 2. High limit in percent of normal CPU
timeout.
utilization.
10. Switch configuration taking more time than
3. The current level in percent of CPU
expected.
utilization.
11. Switch configuration file unavailable.
12. Switchfile
1. Image configuration
name. validation failure.
13. Switch configuration
2. IP address of the TFTP application
server. failure.

1. Switch IP address. 2. Hardware platform: Contains the string value


identifying the hardware platform as given in
the environment variable $HW_PLATFORM in
the running cluster.
1. Switch IP address. 2. Type of the original trap. Contains the string
3. Contains the string value
value "portErrorsExceeded",
"memoryOverThreshold"
"portsBroadcastExceeded",
"portsCRCErrExceeded",
4. Current threshold limit.
"portsRuntsExceeded"
1. Position of the affected unit. For example: 2. Type of the affected unit.
"portsOverSizeExceeded" or "ATCA
/chassis-1/slot-4. 3.
5. HotSwap state of the
Memory utilization affected unit. Example:
level.
portsInvalidLogin".
HS=INACTIVE.
3. High limit in percent of exceeding port error.
1. Name of the affected unit.
2. Position of the affected unit.
3. Error type.

1. Name of the affected unit.


2. Position of the affected unit.
3. CPU error type.
1. Type of the affected FRU. 4. Sensor name.
2. Position of the affected FRU. 5. Sensor reading.
3. Sensor number. 6. Threshold breached.

1. Name of the affected unit.


2. Position of the affected unit.
3. Error type.

Note: The list of possible error types are listed in the


1. Name of the affected unit.
meaning of the alarm.
2. Position of the affected unit.
3. Error type.

1. Type of the affected FRU. 4. Sensor name.


2. Position of the affected FRU. 5. Sensor reading.
3. Sensor number. 6. Threshold breached.

1. Name of the affected unit.


2. Position of the affected unit.
3. Violation type.

1. Type of the affected FRU. 4. Sensor name.


2. Position of the affected FRU. 5. Sensor reading.
3. Sensor number. 6. Threshold breached.

1. Type of the affected FRU


2. Position of the affected FRU
3. Error type

1. Name of the affected unit.


2. Position of the affected unit.
3. Error type.

1. Type of the affected FRU. 4. Sensor name.


2. Position of the affected FRU. 5. Sensor reading.
3. Sensor number. 6. Threshold breached.

1. The affected shelf manager. 2. Role of the shelf manager.

1. Type of the inserted unit.


2. Type of the target (intended) unit.
3. Position of the inserted unit, for example: chassis-0 /
physical_slot-3 / fru-3.
1. Type of the affected FRU. 4. Sensor name.
2. Position of the affected FRU. 5. Sensor reading.
3. Sensor number. 6. Threshold breached.

1. Possible failure cause:

AAI_SNM_INIT_CONFIG_NOK: This event is


raised when the IP address of the SNM is
unavailable during SNM startup. This can
AAI_SCCP_USER_OUTOFSERVICE: AAI_SCCP_USER_OUTOFSERVICE: This
happen because no IP address is assigned to
SCCP: event is raised when the remote SCCP
the SGWNetMgr (SNM) in the /etc/hosts file.
SAP=<value>,Subsystem=<value>,PointCode=<value> subsystem that is identified by the SAP,
The range of "SAP" is from 1 to 254. subsystem number and point code in the field
AAI_SNM_SERVER_IPPORT_NOK: This
The range of "Subsystem" is from 2 to 255. "Identify application additional information" can
1. AAI_MTP3_PC_CONGESTED: The
eventcongestion
is raised when level the for the point code
IP address andisport also
The range of "PointCode" is from 1 to 16777215. no longer receive or send messages.
SapId=<value>, PointCode=<value> mentioned
configured along for thewith SNM theareevent type. The
unavailable during
The range of "SapId" is from 1 to 254. range for "CongestionLevel"
SNM startup. This could happen is from 0 to 3.
if another
The range of "PointCode" is from 1 to 16777215. program is using the same IP address and port
1.
combination.
Note: For some of the application additional information 1. AAI_EMTP3_UNEXP_SLTA_RECV: This
2. AAI_MTP3_DPC_CONGESTED: AAI_MTP3_EVENT_PC_CONGESTED,Conge
fields (AAI), there event is raised when the signaling link test
SapId=<value>, PointCode=<value> stionLevel=<value>
AAI_PM_ERR_RESPONSE: : This event is event raised when
will not be a corresponding Identifying application message (SLTM) or signalingThis link test is
The range of "SapId" is from 1 to 254. the
raisedMTP3 when own the point code is (the
PMHandler congested.
entity thatan
additional information fields(IAAI) information. Only the acknowledgement (SLTA) is enabled and
The range of "PointCode" is from 1 to 16777215. reads configuration data from the Configuration
events having IAAI information are listed below: SLTA message is received before sending an
1.Possible
2.
Directory) failureacause:
sends configuration-read error to
SLTM.
AAI_MTP3_EVENT_DPC_CONGESTED,Cong
the SNM.
1. AAI_EMTP3_UNEXP_SLTA_RECV:
AAI_MTP3_LINK_DOWN:
estionLevel=<value> : ThisThis event event is raised
is raised
<Error_code> 2.
when a
the particular
MTP3 link
destination
AAI_SLM_STACKHNDLR_SERVER_IPPORT_ has gonepoint down
code andis
The value of error code "977". AAI_EMTP3_CONG_LEVEL_OUT_OF_RANG
cannot
congested.
NOK: handle
This event any is incoming
raised when or outgoing
the Stack traffic.
1.Possible
E: This event failure cause:
is raised when the received
LinkId=<value>
Handler
2. AAI_EMTP3_CONG_LEVEL_OUT_OF_RANGE: message contains an invalid value for the a
within an SLM is unable to create
The
server rangeusing of the
"LinkId" is from 1 to 720.
<Error_code> congestion level.IP address and port
AAI_MTP3_EVENT_PC_INACCESSIBLE: This
The value of error code "778". configured
event is raisedthein whenConfiguration
a signalingDirectory. point code
AAI_MTP3_ROUTE_DOWN: This event is
configured
3. for the own signaling gateway at
1. AAI_EMTP3_INVALID_HEADING_CODE:
Broken
raised when Telnet
all the
AAI_MTP3_SM_INIT_REDN_FAILED:connection
links usedon forSwitch type.
the signaling
3. AAI_EMTP3_INVALID_HEADING_CODE: Message
This eventTransferis raisedPart when 3 (MTP3)
the received stack This has
route
event have
is become
raised when out-of-service.It
the redundancy-related can also
<Error_code> become
messageinaccessible.
contains an invalid heading code.
indicate
data that Signaling
initialization has failed.Gateway (SGW) has
The value of error code "799". SapId=<value>, PointCode=<value>
received a Transfer Prohibited(TFP) message
The
4. range of "SapId" is from 1 to 254.
from a remote Point Code
AAI_M3UA_INIT_FAILED: (PC),
This which is the
4. The field
range"failureType"
of "PointCode"
AAI_EMTP3_TFC_RECIEVED_WITHOUT_CO canis fromevent
contain to isofraised
1one
reachable
when
following through
initialization an
of theadjacent
M3UA
values:This event is raised when the Signaling
has failed.
AAI_EMTP3_TFC_RECIEVED_WITHOUT_CONG_PRIO 16777215. NG_PRIORITY:
Transfer
CoreCrashed: Point (STP).
RITY: transfer controlThe messageDigitalreceived
Signal Processorfrom the
RouteId=<value>
AAI_M3UA_ADD_SGP_FAILED:
core This event is
<Error_code> peer hasnodecrashed.
AAI_MTP3_EVENT_DPC_INACCESSIBLE:
does not have the congestion
The
raised rangewhen ofthe
"RouteId"
addition is from 1Signaling
to 4096.
The value of error code "1050". This event
priority.
1. The is raised
configured whenofthe
out-of-sync
theSignaling
threshold value.
Gateway
ConnectionLost:
Destination Process
PC (for The (SGP) has RAN
connection
example, failed.
between
PC), which the
5. LMP
is
5. and
identified the byDigital
the Signal
point code Processor
in the core
alarm, is
has
2. Total number of cores in the blade.
AAI_EMTP3_TFC_NOT_SUPPORTED_IN_INTERNATIO AAI_M3UA_SET_TRACE_FAILED:
lost.
become inaccessible and the SS7 Destination
AAI_EMTP3_TFC_NOT_SUPPORTED_IN_IN This event
NAL: is
PC raised
can no
TERNATIONAL: when
longer theThis enabling
handle event theof the trace
issignaling
raised levels
traffic.
when the
<Error_code> at the M3UA
InternalDSPError:
SapId=<value>,
transfer has failed.
Some
controlledPointCode=<value>
message internal error-related
is received and
The value of error code "1051". to
The
SGW the internal
range
is configured interfaces
of "SapId" with is detected.
is from
an 1 to 254.
International
AAI_M3UA_ADD_REM_AS_FAILED:
The range of "PointCode"
standard while the peer node is from 1 to This with
is configured
6. event is raised
InternalDSPFatalError: when
a National standard. Some internal
16777215. the addition of thefatal
AAI_EMTP3_INVALID_HEAD_CODE_FOR_NM_MESG: Remote
error-related Application
to the Server
internal (AS)
interfaces has failed.
is
1. Type of the affected unit. 5. Error Description
<Error_code>
2. Error type (= asserted sensor offset). detected.
6.
The value of error code "1052". AAI_M3UA_ADD_REM_ASP_FAILED:
AAI_EMTP3_INVALID_HEAD_CODE_FOR_N This
3. Position of the affected unit. For example:
event is raised Base
when Link
the Down
addition of the
4. Sensor number. StartupFailed:
M_MESG: This event is raised when thewithin
The core did not start up
7. AAI_EMTP3_INVALID_SIO: Remote
the
received Application
specifiedMTP3 timeout
Network Server
after Process
being
Management (ASP) has
unlocked.
<Error_code> failed.
Message contains an invalid heading code.
For example: Unit={BS2AM-A} ErrorType={01} <Remote ASP Id>CPU contains some faulty
The value of error code "783".
Position=/chassis-1/AMC-2 Sensor={number=186} FaultyCores: The
The
cores. range of "Remote
7. AAI_ EMTP3_INVALID_ROUTE_ID: ASP Id" is 1 to 200. This
8. AAI_EMTP3_SLTC_MSG_FOR_REM_DPC: event is raised when there is a discrepancy in
<Error_code> AAI_M3UA_CONFIG_REM_ASP_FAILED:
GeneralAPIError:
the MTP3 routing Some tables.of the used interfaces
The value of error code "972". This
whicheventare usedis raised whenan
to initiate modification
operation to of the
the
remote
Digital application server process
Signal Processor core have failed.This
8. AAI_EMTP3_INVALID_LINKSET_ID: has failed.
event is raised when the linkset ID in the
AAI_M3UA_ADD_LOCAL_ASP_FAILED:
received message is invalid. This
event is raised when the addition of the Local
NA 1. Blade self test description
(bladeSelfTestDescn) of failed or pending tests.

2. Blade self test status (bladeSelfTestStatus)


pass (1), fail (2), pending (3).
1. VrfId.

2. Owner of the session - The possible value is: BFDMgr -


owner is BFD Manager.

3. Source address of the session.

4. Destination address of the session.

Faulty
5. Hop delivery <delivery
type - The possiblename>
value is M - MultiHop. Autoreturn delivery <delivery_name>
Bootcount limit <bootcount_limit> Autoreturn
6. Reference ID reason <autoreturn_reason>

1. Class of the memory error occurred 1. Error rate for correctable type
(correctable/uncorrectable) 2. Affected memory location for uncorrectable
2. Type and ID of the affected memory module (DIMM- and correctable type
ID/cache)
Example1:
1. Name of the affected unit 4. Sensor Type
Example1: Error rate= 20/ 24h
2. Position of the affected unit
Class= correctable Affected location= DIMM ID=2 , Channel ID= 3
3. Sensor name
Affected memory= DIMM-2
Example2:
1. Volume group name
Example2: This fieldlocation=
Affected shows the threshold
core: value477e5
5 address: which has
Class= uncorrectable been surpassed,
syndrome0: and thus the
94 syndrome1: 0 alarm has been
Affected memory= cache (L2D) raised.

The format of the field is "Volume group


1. Volume group name This field shows the threshold value which has
<volume group name> > < threshold value> %
been surpassed, and thus the alarm has been
full", e.g. "Volume group VG_CLA-0_FP5LYNX
raised.
> 70% full".
The format of the field is "Volume group
1. Name of the affected unit. 4. This field shows that:
<volume group name> > <threshold value> %
2. Position of the affected unit. a. If the total available free memory (MemFree)
full", e.g. "Volume group VG_CLA-0_FP5LYNX
3. Sensor name. is less than the lower threshold
> 85% full".
value, this will raise the alarm "MEMORY
For example: Unit={BCNMB-A} USAGE OVER LIMIT".
1. ResourceType: 3. Idle=<Value>:
Position=/chassis-2/motherboard-1 b. If the total available free memory (MemFree)
This defines the resource type which is utilized by the This defines the number of signaling
Sensor={number=75,Name=System Notify} is greater than the upper threshold
SCCP instance to the maximum capacity. The possible connections in idle state.
value, this will raise the alarm "MEMORY
value is CONNECTION_CONTROL_BLOCK.
USAGE WITHIN LIMIT".
4. Starting=<Value>:
1. ResourceType: For CONNECTION_CONTROL_BLOCK:
2. RUNameDetectingFailure: This defines the number of signaling
This defines the resource type which exceeds the
This defines the recovery unit name which identifies that connections in starting state, connections for
threshold. The possible values are: 4. Idle=<Value>:
the SCCP instance is using resources to the maximum which Connection Request (CR) message are
a. CONNECTION_CONTROL_BLOCK This defines the number of signaling
configured capacity. sent but the Connection Confirmed (CC)
b. SLRN connections in idle state.
message was not received.
For example, the IAAI will look like:
2. RUNameDetectingFailure: 5. Starting=<Value>:
5. Established=<Value>:
This defines the recovery unit name which identifies that This defines the number of signaling
ResourceType=CONNECTION_CONTROL_BLOCK,RUN This defines the number of signaling
there is a resource threshold exceeding condition. connections in starting state, connections for
ameDetectingFailure=<SCCP RU> connections in established state, connections
which Connection Request (CR) messages are
For eg. for which Connection Confirmed (CC) message
3. SignalingPointcodeName sent but the Connection Confirmed (CC)
ResourceType=CONNECTION_CONTROL_BLOCK,RUN are received in response to the Connection
This is only applicable for the resource type "SLRN". This message is not received.
ameDetectingFailure=/CLA-1/FSSCCPSGUServer-1 Request (CR) message.
defines the signaling point code name for which the SLRN
range has been exhausted. 6. Established=<Value>:
Note: The "SCCP RU" name is deployment specific. For 6. Closing=<Value>:
This defines the number of signaling
1. DisturbanceID: 8. AdditionalInfo:
This defines the disturbance type in the signaling stack as This specifies additional information which is
mentioned in the "Meaning of the alarm" section. The included to assist in debugging.
possible values are:
For INVALID_IT_MSG case, the possible
1. Protocol
a. RLC_FAILURE values are:
This defines the SCTP protocol payload identifier. The
a. LRN_MISMATCH - This specifies that the
possible value is M3UA, which indicates the functionality
b. IAR_EXPIRED incoming SLRN does not match
specified in RFC4666.
with the locally stored DLRN or incoming DLRN
1. RUDetectingFailure:The signaling connection is
c. INVALID_IT_MSG: 2.
doesConnectionAttempts:
not match with the locally
2. AssociationID
This defines
released duethe recovery
to an invalid unit name(IT)
Inactivity which identified
message. Thethat This
stored defines
SLRNthe totalcorresponding
in the number of signaling connection
This indicates the association identifier.
there
possible is areasons
drop in the overallIT
for invalid success
message ratecould
of the besignaling
due to connection
control blockattempts (CCB) at made,
the including both the
connection
SLRN not matching,establishment. DLRN not matching, inconsistent incoming
SCCP stack. connections and outgoing signaling
connection control block (CCB) data or protocol class not connections,
b. within the signaling instance.
INCONSISTENT_CCB_DATA: This specifies
1. ObjectID: 3. ConfiguredParams:
matching the connection section.
This indicates the object identifier for which the The specific situation that the connection control
This indicates the parameter and its associated block
can be identified using the "Additional Information" field 3. ConnectionSuccess:
(CCB)
configuration mismatch between the configuration values data that areis either corrupted or inconsistent.
configured/administered at the
from the Application Additional Information field section. This
c. defines the total number of signaling
PROTO_CLASS_MISMATCH - This
database and the signaling service has been identified local network element.
connections
specifies thatsuccessfully
the protocolestablished, class does not including
match
after successful activation of the signaling object. The
1.
d. Node
ERR_MSG name 4.
both
in Power
the the throttle applied:
incoming
connection and outgoing
section. Throttlesignaling
level and the
possible values are: 4. ActivatedParams:
2. Component type (AMPP2-A, ACPI5-A, ADSP2-A) percentage
connections,(0within to 100). the signaling instance.
ASSO - This indicates M3UA association object This indicates the parameter and its associated
3.
e. Position
ROUTE_FAILURE
ISASSO - This indicates ISDN association object values that are activated in the signaling object
5.
4. Name of the sensor, if the power throttling
ConnectionFailed:
which is different from the configured values.
f. UNEXPECTED_CONN_MSG action
This is due the
defines to the totalsensor's
numbertemperature.
of signaling
1. InstanceID:
2. Protocol: This defines the SCTP protocol payload
identifier. The possible values are: of the object defined connections failed or refused, including both
This indicates the instance identifier
2. RUDetectingFailure: 6.
the Temperature
incoming and ofoutgoing
the sensor (in degree
signaling
M3UA: This indicates the functionality specified in
in the "ObjectID".
This definesIdentifier
the recovery unit name which identified Celsius),
connections, if thewithin
power the throttling
signaling action is due to
instance.
RFC4666.
For Object - ISASSO, instance Identifier willthat
be a
there are signaling disturbances as mentioned in the the sensor's temperature.
IUA: This indicates
combination of the the functionality
Remote specifiedASP. in RFC3057-
1. Protocol: This
"Meaning defines theAS SCTP and protocol
Remote payload 5. ConnectionStarting:
BIS-01. of the alarm".
identifier. The possible values are: 7.
ThisIf power
definesthrottling
the totalaction number is due to thermal
of signaling
3. SAP: controller
connections process
in the exit,
starting thestate,
information
including in items
both
2. Association Identifier: This indicates the association
M3UA:
This This indicates
represents the the functionality
logical network specified
where the in
signaling 5
theand 6 will not
incoming and bethe shownoutgoing (sincesignaling
the cause for
identifier.
RFC4666.
disturbances
1. FailureType: were defines the failure type as mentioned the
Thisobserved. throttling action
connections,
4. FaultRelatedInfo: within the isThis
not the sensor
signaling
field willinstance.
carry the The
IUA: This indicates the functionality specified in RFC3057- temperature). starting state ofInstead,
the the
signaling text "fallback,
connection when
in the
3. "Meaning
Destination of the alarm"
Identifier: section. The
This indicates the possible
unreachable values effects of the error conditions for which the
BIS-01.
4. OPC: thermal
indicates controller
that the exits
connection possibly has duenot to
yetmanual
been
are:
destination IP address and thus resulting into path failure. alarm is reported.
This represents the local signaling point code name to intervention" will be displayed.
successfully established nor it hasThisfailed,
meansbut the
2. Remote
which the Application
signaling Server Identifier:
disturbances were This indicates the additional
observed. this means information
that the in
connection items 5 and 6 are
establishment
a. LM_CONNECTION_FAILURE 5. ErrorCode:
1. ObjectApplication
Remote identifier: ThisServer indicates
name. the object identifier 3. API identifier:
mutually
phase is exclusive This
still in-progress.with indicates the API which
the additional
b. CONFIG_DB_FAILURE a. SLM_SNM_TCP_SEND_FAILURE: The
which failed during activation. The possible values are: failed at
information the stack.
in item
5. LocalSSN:
c. SCCP_ACTIVE_STANDBY_SYNCUP_FAILURE alarm is raised with7.ErrorCode as
This represents the local signaling subsystem name to
d. DISTRIBUTED_STACK_SYNCUP_FAILURE 6. MinConnectionAttempts:
SLM_SNM_TCP_SEND_FAILURE when the
SAP - the
Thissignaling
indicatesdisturbances
service access point object. 4.
ThisError codes: theThis indicates the error
which
e. LM_STACK_CONNECTION_FAILURE were observed. TCP defines
connection minimum
failure happens signaling between the
ASSO - This indicates M3UA association object. code/reason that as to whybe theattempted
stack failed the the
Alarm identifying generic information - the possible values connections
f. INSUFFICIENT_RESOURCES Reason
SLM andfor failuremust
SNM. - the possible valueswithinare:
LCLAS
6. DPC: - This indicates local AS object. activation.
signaling instance to raise the alarm if there is
are:
g. STACK_MESSAGE_QUEUE_CONGESTION
RMTAS - This indicates
This represents the remote remote AS object.
signaling point code name to a reduced signaling connection establishment
1.
b. netconsole:
SNM_SLM_TCP_SEND_FAILURE: The
GTRSLT the-detecting
This indicates GT result object.
which
1.
2. netconsole:
Failure signaling
hostnamedisturbances
RU Name: were
- for example:
This observed.
CLA-0
defines the recovery success rate aswith
alarm is raised specified
ErrorCode in theas "Meaning of
GTRULE - This indicates GT Rule object. the alarm" section. configuration filewhen
unit name which identified the failure specified in the a. "The netconsole
SNM_SLM_TCP_SEND_FAILURE is notthe
1. Protocol:
SCCPTP
7. RemoteSSN: Thisindicates
- This defines the SCCP SCTP protocol
Timer profilepayload
object. 3. State Machine: This indicates the machine
Additional information field ("FailureType") section. available";
TCP connection
identifier.
CSPC Theindicates
- This
This represents possible
the remote valuesignaling
Concerned are: point code object.
subsystem name to state of the association inside the protocolthe
failure happens between
b.
SNM "The and netconsole
SLM. peer server address has not
CSSN
which - This
the indicates
signaling Concernedwere
disturbances subsystem
observed. object. specified in the first identifying additional
This field is optional and is used when the failure is one of been specified in the netconsole configuration
M3UA:
SCCOPC This indicates
- This the functionality
indicates signaling own specified
point code in object. information field ("Protocol"). The possible
the following: file";
c. SLM_SNM_DISCONNECT: The alarm is
RFC4666.
SCCDPC values are:
a. Input id - the external alarm input raising the alarm.code 2.
- This indicates
1. "SCCP_ACTIVE_STANDBY_SYNCUP_FAILURE" signaling destination point
c. Operator
"Cannot
raised with description
resolve
ErrorCode the as - description
MAC address of forthetheExt
IUA:
object. This indicates the functionality specified in RFC3057-
b. "DISTRIBUTED_STACK_SYNCUP_FAILURE" Alarm
specified input as configured
netconsole
SLM_SNM_DISCONNECT peer using server";
when thetheExtAlarm
SLM
BIS-01.
SSN - This indicates SCCP subsystem object. ASSOC_STATE_DOWN: This indicates that
SCLI
d. "The
detects parameter
netconsole
a connection "input
module description".
is not available or
MTPOPC - This indicates MTP own point code object. the association statusbreakage with
is "connection_down". the SNM
3. Partner RU name: This is to indicate the partner RU there are invalid
(Signaling Network entries
Manager). in the netconsole
2. AssociationId:
MTPDPC This indicates
- This indicates the association
MTP destination pointidentifier.
code ASSOC_STATE_INACTIVE: This indicates that
name which has communication failure to the RU name 3. Sensor identity
configuration file". - identity of the sensor raising
object. the association status is "asp_state_inactive".
specified in the Additional information field ("Failure the alarm.
d. ACTIVATOR_SNM_DISCONNECT: The
MTPLINK - This indicates signaling link object. hw-component type/chassis id/port
detecting RU Name") section. alarm is raised with ErrorCode as id/sensor
LNKSET - This indicates linkset object. name
ACTIVATOR_SNM_DISCONNECT when the
ROUTSET - This indicates route set object.
This field is optional and is used when the failure is one of Activator detects a connection breakage with
MTPTP - This indicates MTP timer profile object.
the following: a. the SNM (Signaling Network Manager).
DCH - This indicates D-Channel object.
"SCCP_ACTIVE_STANDBY_SYNCUP_FAILURE"
ISASSO - This indicates ISDN association object.
b. "DISTRIBUTED_STACK_SYNCUP_FAILURE" e. ACTIVATOR_SNM_TCP_SEND_FAILURE:
The alarm is raised with ErrorCode as
2. Instance Identifier: This indicates the instance identifier
The name of the RU is deployment dependent. For ACTIVATOR_SNM_TCP_SEND_FAILURE
of the object defined in the "Object identifier" which failed
example, the IAAI will look like: when the TCP connection failure happens
1. Input id - the external alarm input raising the alarm. 2. Operator description - description of the Ext
Alarm input as configured using the ExtAlarm
SCLI parameter "input description".

3. Sensor identity - identity of the sensor raising


1. Input id - the external alarm input raising the alarm. 2. Operator description - description of the Ext
the alarm.
Alarm input as configured using the ExtAlarm
hw-component type/chassis id/port id/sensor
SCLI parameter "input description".
name
3. Sensor identity - identity of the sensor raising
1. Input id - the external alarm input raising the alarm. 2. Operator description - description of the Ext
the alarm.
Alarm input as configured using the ExtAlarm
hw-component type/chassis id/port id/sensor
SCLI parameter "input description".
name
3. Sensor identity - identity of the sensor raising
1. Input id - the external alarm input raising the alarm. 2. Operator description - description of the Ext
the alarm.
Alarm input as configured using the ExtAlarm
hw-component type/chassis id/port id/sensor
SCLI parameter "input description".
name
3. Sensor identity - identity of the sensor raising
1. Input id - the external alarm input raising the alarm. 2. Operator description - description of the Ext
the alarm.
Alarm input as configured using the ExtAlarm
hw-component type/chassis id/port id/sensor
SCLI parameter "input description".
name
3. Sensor identity - identity of the sensor raising
1. Input id - the external alarm input raising the alarm. 2. Operator description - description of the Ext
the alarm.
Alarm input as configured using the ExtAlarm
hw-component type/chassis id/port id/sensor
SCLI parameter "input description".
name
3. Sensor identity - identity of the sensor raising
1. Input id - the external alarm input raising the alarm. 2. Operator description - description of the Ext
the alarm.
Alarm input as configured using the ExtAlarm
hw-component type/chassis id/port id/sensor
SCLI parameter "input description".
name
3. Sensor identity - identity of the sensor raising
1. Input id - the external alarm input raising the alarm. 2. Operator description - description of the Ext
the alarm.
Alarm input as configured using the ExtAlarm
hw-component type/chassis id/port id/sensor
SCLI parameter "input description".
name
3. Sensor identity - identity of the sensor raising
1. Input id - the external alarm input raising the alarm. 2. Operator description - description of the Ext
the alarm.
Alarm input as configured using the ExtAlarm
hw-component type/chassis id/port id/sensor
SCLI parameter "input description".
name
3. Sensor identity - identity of the sensor raising
1. Input id - the external alarm input raising the alarm. 2. Operator description - description of the Ext
the alarm.
Alarm input as configured using the ExtAlarm
hw-component type/chassis id/port id/sensor
SCLI parameter "input description".
name
3. Sensor identity - identity of the sensor raising
1. Input id - the external alarm input raising the alarm. 2. Operator description - description of the Ext
the alarm.
Alarm input as configured using the ExtAlarm
hw-component type/chassis id/port id/sensor
SCLI parameter "input description".
name
3. Sensor identity - identity of the sensor raising
1. Input id - the external alarm input raising the alarm. 2. Operator description - description of the Ext
the alarm.
Alarm input as configured using the ExtAlarm
hw-component type/chassis id/port id/sensor
SCLI parameter "input description".
name
3. Sensor identity - identity of the sensor raising
the alarm.
hw-component type/chassis id/port id/sensor
name
1. Input id - the external alarm input raising the alarm. 2. Operator description - description of the Ext
Alarm input as configured using the ExtAlarm
SCLI parameter "input description".

3. Sensor identity - identity of the sensor raising


1. Input id - the external alarm input raising the alarm. 2. Operator description - description of the Ext
the alarm.
Alarm input as configured using the ExtAlarm
hw-component type/chassis id/port id/sensor
SCLI parameter "input description".
name
3. Sensor identity - identity of the sensor raising
1. Input id - the external alarm input raising the alarm. 2. Operator description - description of the Ext
the alarm.
Alarm input as configured using the ExtAlarm
hw-component type/chassis id/port id/sensor
SCLI parameter "input description".
name
3. Sensor identity - identity of the sensor raising
1. Input id - the external alarm input raising the alarm. 2. Operator description - description of the Ext
the alarm.
Alarm input as configured using the ExtAlarm
hw-component type/chassis id/port id/sensor
SCLI parameter "input description".
name
3. Sensor identity - identity of the sensor raising
1. Input id - the external alarm input raising the alarm. 2. Operator description - description of the Ext
the alarm.
Alarm input as configured using the ExtAlarm
hw-component type/chassis id/port id/sensor
SCLI parameter "input description".
name
3. Sensor identity - identity of the sensor raising
1. Input id - the external alarm input raising the alarm. 2. Operator description - description of the Ext
the alarm.
Alarm input as configured using the ExtAlarm
hw-component type/chassis id/port id/sensor
SCLI parameter "input description".
name
3. Sensor identity - identity of the sensor raising
1. Input id - the external alarm input raising the alarm. 2. Operator description - description of the Ext
the alarm.
Alarm input as configured using the ExtAlarm
hw-component type/chassis id/port id/sensor
SCLI parameter "input description".
name
3. Sensor identity - identity of the sensor raising
1. Input id - the external alarm input raising the alarm. 2. Operator description - description of the Ext
the alarm.
Alarm input as configured using the ExtAlarm
hw-component type/chassis id/port id/sensor
SCLI parameter "input description".
name
3. Sensor identity - identity of the sensor raising
1. Input id - the external alarm input raising the alarm. 2. Operator description - description of the Ext
the alarm.
Alarm input as configured using the ExtAlarm
hw-component type/chassis id/port id/sensor
SCLI parameter "input description".
name
3. Sensor identity - identity of the sensor raising
1. Input id - the external alarm input raising the alarm. 2. Operator description - description of the Ext
the alarm.
Alarm input as configured using the ExtAlarm
hw-component type/chassis id/port id/sensor
SCLI parameter "input description".
name
3. Sensor identity - identity of the sensor raising
1. Input id - the external alarm input raising the alarm. 2. Operator description - description of the Ext
the alarm.
Alarm input as configured using the ExtAlarm
hw-component type/chassis id/port id/sensor
SCLI parameter "input description".
name
3. Sensor identity - identity of the sensor raising
1. Input id - the external alarm input raising the alarm. 2. Operator description - description of the Ext
the alarm.
Alarm input as configured using the ExtAlarm
hw-component type/chassis id/port id/sensor
SCLI parameter "input description".
name
3. Sensor identity - identity of the sensor raising
the alarm.
hw-component type/chassis id/port id/sensor
name
1. Input id - the external alarm input raising the alarm. 2. Operator description - description of the Ext
Alarm input as configured using the ExtAlarm
SCLI parameter "input description".

3. Sensor identity - identity of the sensor raising


1. Input id - the external alarm input raising the alarm. 2. Operator description - description of the Ext
the alarm.
Alarm input as configured using the ExtAlarm
hw-component type/chassis id/port id/sensor
SCLI parameter "input description".
name
3. Sensor identity - identity of the sensor raising
1. Input id - the external alarm input raising the alarm. 2. Operator description - description of the Ext
the alarm.
Alarm input as configured using the ExtAlarm
hw-component type/chassis id/port id/sensor
SCLI parameter "input description".
name
3. Sensor identity - identity of the sensor raising
1. Input id - the external alarm input raising the alarm. 2. Operator description - description of the Ext
the alarm.
Alarm input as configured using the ExtAlarm
hw-component type/chassis id/port id/sensor
SCLI parameter "input description".
name
3. Sensor identity - identity of the sensor raising
1. Input id - the external alarm input raising the alarm. 2. Operator description - description of the Ext
the alarm.
Alarm input as configured using the ExtAlarm
hw-component type/chassis id/port id/sensor
SCLI parameter "input description".
name
3. Sensor identity - identity of the sensor raising
1. Input id - the external alarm input raising the alarm. 2. Operator description - description of the Ext
the alarm.
Alarm input as configured using the ExtAlarm
hw-component type/chassis id/port id/sensor
SCLI parameter "input description".
name
3. Sensor identity - identity of the sensor raising
1. Input id - the external alarm input raising the alarm. 2. Operator description - description of the Ext
the alarm.
Alarm input as configured using the ExtAlarm
hw-component type/chassis id/port id/sensor
SCLI parameter "input description".
name
3. Sensor identity - identity of the sensor raising
1. SigObjectID: 2. ObjectFluctuationThreshold:
the alarm.
This indicates the signaling object identifier which This indicates the threshold value used to
hw-component type/chassis id/port id/sensor
changed status frequently. The possible values are: determine the fluctuation in the signaling object
name
MTPLINK: This indicates the signaling link's status. status. On expiry of timer "CriticalAlarmTimer",
LINX_CONN: This indicates the LINX link's status this alarm will be reported if the fluctuation
1. DCH_ID: This indicates D-channel identifier. 2. ErrorCode: This indicates the error
between MTP3 and DSP MTP2 stacks fluctuates. remains equal or above threshold value.
code/reason as to why the D-channel failed.
DPC_CONG: This indicates the DPC congestion status 3. CriticalAlarmTimer:
Sample Output: The possible values are:
fluctuates. This indicates the timer value to determine if
DCH_ID:1
the object fluctuation threshold as defined in
a. DCHANNEL_STARTUP_FAILED: The
1. Type of the affected hardware component 3. Total event rate/minute
"ObjectFluctuationThreshold" has been
admin state of the D-channel is set to "enabled"
2. Position of the affected hardware component 4. Top component
reached event rate/minute
or exceeded.
but the d-channel is unable to move into
5. Total event rate threshold
"Available" functional state.
b. DL_RELEASE_INDICATION: The D-channel
1. Reason for failure: The possible values are: functional state is moved to "Unavailable"
because DL-RELEASE indication is received
a. Installation Failure from the lower layer.
b. Activation Failure c. DL_RELEASE_CONFIRM: The D-channel
Switch configuration mismatch functional state is
FI configuration moved
out to "Unavailable"
of sync
because DL-RELEASE confirmation is received
from the lower layer.
d. DSP_DOWN,DSP_ID=<dsp_id value>: The
D-channel functional state is moved to
"Unavailable" because the serving DSP core is
unavailable.

Sample Output:
ErrorCode:DCHANNEL_STARTUP_FAILED
1. Type of the affected unit. 5. Error description
2. Error type.
3. Position of the affected unit. For example: ToP Lock Fail
4. Sensor number.
1. Type of the affected unit. 5. Error Description
For example: Unit={BS2AM-A} ErrorType={01}
2. Error type.
Position=/chassis-1/AMC-2 Sensor={number=186}
3. Position of the affected unit. For example: OS initiated hard reset
4. Sensor number.
1. Type of the affected unit. 5. Error Description
For example: Unit={BS2AM-A} ErrorType={02}
2. Error type (= asserted sensor offset).
Position=/chassis-1/AMC-2 Sensor={number=174}
3. Position of the affected unit. For example: HWM process failure
4. Sensor number.
1. Type of the affected unit. 5. Error Description
For example: Unit={BS2AM-A} ErrorType={02}
2. Error type (= asserted sensor offset).
Position=/chassis-1/AMC-2 Sensor={number=186}
3. Position of the affected unit. For example: SYNC default config loading error
4. Sensor number.
1. Fibre channel switch module address 2. Switch Status
For example: Unit={BS2AM-A} ErrorType={06}
The table lists the values of the Switch Status.
Position=/chassis-1/AMC-2 Sensor={number=186}
For detailed information please see the Switch
user guide.
Value Meaning
1. Fibre channel switch module address 1. Port Status
1 Unknown
2. Fibre channel port ID The table lists the values of the Port Status.
2 Unused
Value Meaning
3 OK
1 Unknown
4 Warning
2 Unused
5 Failed
3 OK
4 Warning
3. Switch State
5 Failure
The table lists the values of the Switch State.
6 Notparticipating
For detailed information please see the Switch
7 Initializing
user guide.
8 Bypass
Value Meaning
1 Unknown
2. Port State
2 Online
The table lists the values of the Port State.
3 Offline
Value Meaning
1 Unknown
2 Online
3 Offline
4 Bypassed
Instructions

Fill in a problem report, and then send it to your local customer support.

Check the active alarms that are overflowing in the Network Element with
an alarm management application and correct
them according to their instructions. If this is not possible or does not clear
the alarm raised, fill in a Problem Report and
send it to your local customer support.
Perform the following steps to verify the state of the node:
1. Log into the controller node.
2.Check the state of the failing node VM from the compute node. For
example, for node IB7-0:
> nova list --name IB7-0
The alarm does not require any particular corrective actions if it is
3. The output in the previous step shows that IB7-0 node is available. If
preceded by either deliberate management action(s) or node
the node is powered off, then after 30 minutes the high
failures. In the latter case separate alarms indicating the node failures
availability services (HAS) of the system attempt to restart the failed node
would have been raised.
by issuing a restart. You can issue a restart manually
If, however, the reason is that the recovery units have failed and system is
Security
if time is log data must
an issue. Use be thechecked.
followingNotablySCLI commandsinvestigate forthe login on and
powering
not able to restart the software, the problem is in
attemptsthat
off. were made just before the alarm was raised.
the applications forming the load sharing group. Basically, there could be
For example: > nova reboot <nodeid> where <nodeid> is the id of IB7-0
several reasons why an application cannot be restarted
node
and therefore it is difficult to give exact or detailed instructions on how to
4.
ToEven
avoid if this operation fails(DB),
to bring the node up, contact your local
deal with filling up database
the situation. perform database-specific actions.
customer
Contact support.
The first yourthinglocal to becustomer
checked support is the immediately, and provide them with
the information
availability statusesyou received
of the failed fromrecovery
the alarm-notification fields. This status
units. If RU's availability
information
has the value depends
failed, systemon how has the already
DB application
attempted uses tothe file-system.
restart
itTowith
get detailed information on the measurement(s) that causedthe
no success. Note that it is also possible to try andrestart this alarm,
PostgreSQL
recovery server doesby notusing
storethe any other data or command:
files apart from DB
enter the unit manually,
following command: following SCLI
data files. If the application stores some other data or files, the application
show stats t-job all
documentation
set has restart informs which files can be removed to free the disk space.
show stats t-jobmanaged-object
id <id-value> where /AS-0/TestApplServer
id-value is the threshold job id of the
If the file-system is only used for the database, the actions must still be
measurement.
Check
defined
If manual ifto arestart
new the
free licensedisk is required,
space. and install the new license if needed.
Note that this is aalso userfails, the log threshold
configured writings should
rule. be detected: a typical
Fill in afor
Typical
cause problem
example report
is when
failing restart and sendbe ait missing
applications
might to your
insertlocal
new data, butfile,
configuration never delete it.
customer
In such case,
incorrect support.
disk-space
configuration filefull alarmor
content will be raised
even lack ofatsome a certain time,
critical and
resources,
1. Tothe check
applications
e.g. amount the
will status
define of
of availablethea disk
license, execute
clean-up
physical the following
procedure
memory. SCLI command:
(for example, which
tables
Clearing:
1. Check canfrom be deleted to free up
the Identifying table space).
application In customer
additional informationdocumentation,
field if this
show
there
Do not license
should
clear details
be
the a hint
alarm. unique-id
such
The as, <unique
systemwhich ID>the alarm
database
clears instances are
automatically usedwhen
from
alarm report is for overall CPU overload, or one core
where,
which
the number <unique
application. of ID> is the eight-character
operational RUs goes above unique
the ID of the
defined license to be
threshold.
overload.
checked.
2. Execute the "top" Linux command on the node that reported the alarm.
2. To delete the
Additionally, each expired license
application mustfrom the network elementon(NE),
provide execute
This command provides a repetitive update aofdescription
the processorwhat actions
Disclaimer:
one
need oftothe be The instructions
following
done if a SCLI
certain below occurs.
commands:
alarm use either The unsupported
application SCLI to state
needs
activity in real time. It gives a list of the most CPU-intensive tasks of the
commands,
delete
what thelicense or unique-id
database commands is used from
<unique
for and theID>unsupported
what Recoveryfull bash shell.
Groups (RGs)Pleaseare used.
system.
carefully
where,
An attribute read
<unique the
entry ID> disclaimer
in is the
the that is shown
eight-character
configuration whenforeither
unique
directory ID
the entering
ofDB-database
the expired thelicense.
should
For overall CPU load, follow the steps below
unsupported
describe the SCLI vendorofmode
instructions the or the full bash
application. This shell. Do notshould
information use the be
to check the CPU usage at the process level:
commands
delete
stored license
in afrom in any
text expired
field other
in thecontext. Please check from the product
configuration
a. Check the Application additional directory.
information field which processes
documentation
The execution of orthe from your command
above local customer deletessupport
all thefor more information.
expired licenses.
use more CPU load.
1.
3. Log
Please in
To install instructions:
verify a that
newno license
other into
data the
thanNE, the execute
directoriesthe following
db_data, SCLI
db_socket,
b. Press "P" to sort the processes as per CPU utilization.
a. Log inand
command:
.wdstat to the NE.
lost+found are created at the working directory (for
For one core overload, press "1" to show the separate state of all cores.
b.
add Switch
examplelicense to file
the <file>
root account (root privilege required). Other data will
3. Collect/mnt/db/<dbname>,
the syslog. depends on deployment).
USER@NODENAME
where,
reduce <file>
the is the license
available [NE]>file
space setname
for user
the username
along
database. with itsrootabsolute
Password path.
4. If the problem persists, contact your local customer support and provide
2. Check dynamically the file systems's fullness on the node. The alarm is
the information gathered in the previous steps.
raised when the
The following upper threshold
diagnosis commandvalue mustisbe reached
invokedorby the operator, in
exceeded.
order Use the instructions below to get the threshold value for the
node:
1. Log into the cluster, and check that the named managed object has
been successfully restarted.
2. Verify also that the MO did not raise any new alarms that would explain
the failure. You can check the status of a managed
object with the following SCLI command: "show has state". An operational
Disclaimer: The instructions below use either unsupported SCLI
MO has the value ENABLED in the operational state attribute,
commands, or commands from the unsupported full bash shell. Please
and has no value in the procedural status attribute.
carefully read the disclaimer that is shown when either entering the
For example, the state of the process NodeDNS in the recovery unit
unsupported SCLI vendor mode or the full bash shell. Do not use the
FSNodeDNSServer of the node AS-5 can be seen as follows:
commands in any other context. Please check from the product
Execute
show has state managed-object /AS-5/FSNodeDNSServer/NodeDNS
documentation or from your local customer support for more information.
Systemctl
OBJECT | grep OSMON ADMINISTRATIVE OPERATIONAL USAGE
1. Login instructions:
To
ROLE verifyPROCEDURAL
that OSMON is running. DYNAMIC
a. Log into the NE.
The output, in case OSMON is indeedUNLOCKED
/AS-5/FSNodeDNSServer/NodeDNS running should beENABLED of the form:
b. Switch to the root account (root privilege required).
# systemctl
ACTIVE | grep osmon
ACTIVE - -
1. If the alarm is raised[NE]
USER@NODENAME for an external
> set Ethernet interface,
user username root Password: check that the
osmonitor.service
If the MO loaded active
is not operational, perform running thepanel osmonitor
following service
steps:
cable
... is properly connected in the front of the
If
1. the
With alarm
a node is notMO, cleared
you can automatically,
wait for a node contact your local
to restart. The customer
system will
GW
2. Checknode. the current system memory usage on the node. The alarm is
support.
raise another alarm (70011 NODE NOT RESPONDING) if the node
2. Check
raised when the the status upper of the interface
threshold valuewithisthe following
reached SCLI
or exceeded.
does not
command:show come up within
networking a given
interface time. Verify that an <node> iface situation
alarm for the
The
This threshold
is an raised value
informative for the node
alarm and does can runtime
be read
notcommand:
require
node
as below:
any actions.
has been
<interface> with the following SCLI
a. Get the node name where alarm is raised using the following
show alarm active
For example: showfilter-by
networking specific-problem
interface runtime 70011node IB-0 iface
command:
2. Check
management the journal (journalctl command on the active management
# fsclish -c "show alarm active filter-by specific-problem 70160" | grep
node) for
3. If the above error(s) steps that have
cannot occurred, by searching for the yourMO'slocal
"Managed object" | awk -F"="resolve
'{print "nodethe situation,
name =contact " $2}' | cut -d "," -f1
name
This and/or
alarm by
is an"nodelooking
informative at events that
indicating that the whole alarm
occurred before this clusterwas
customer
Sample support.
output: name alarm = CLA-0" has
raised.
been
Note:
b. Log(re)started.
Ininto
a deployment
the node: As this operation
where not allisexternal
critical, check interfaces are used, a
3. You can
carefully
number ofthe also
alarm
alarms initiate
status
equal an
toinimmediate
thenumber
the clusterrestart
after
of unused attempt
the of the links
restart.
Ethernet failed MO
# ssh root@CLA-0
using
will the
be raised. following SCLI
To avoidvalue command:
getting those deceptive alarms
c. Read the threshold from the configuration file: (since they are
"set
actuallyhas the restartresult managed-object"
of unused links and not of failing links),
#
Thiscat /etc/opt/nokia/osmon-template.conf
is an informative alarm and does not require | grep -v ^#
any | actions.
grep "MEM LIMITS"
For example:
the operator isset has restart
advised to setmanaged-object
the admin /AS-5/FSNodeDNSServer
Sample output: MEM LIMITS 75.0 82.0 state If theof all unused
output of thelinks to DOWN
above
The
usingrestart operation is mostly
the following useful after a problem has been corrected.
command is empty,SCLI then command:
the default thresholds of 80% lower
Verify the resultinterface
set networking from the<node> journal and iface by checking theadmin down
<iface-name>
threshold (clearing) and 90% upper threshold (raising) are used to decide
status of the MO using the following SCLI command:
whether to raise or clear the alarm.
"show
This is has state".
an informative alarm which requires no user actions.
d. Exit from node:
4. Alarm for a recovery group implies a multiple error situation (for
# exit
example, multiple node failures) or a persistent configuration
e. Get system memory usage on the node:
or corruption problem.
# ssh root@CLA-0 "cat /proc/meminfo" > /mnt/export/mem.txt
#
ThisMemTotal=$(cat
is an informative /mnt/export/mem.txt
alarm and does not |grep require"MemTotal"
any actions. | awk -F ' ' '{print
$2}')
# MemFree=$(cat /mnt/export/mem.txt |grep "MemFree" | awk -F ' ' '{print
$2}')
# Cached=$(cat /mnt/export/mem.txt |grep "^Cached" | awk -F ' ' '{print
Verify
$2}') that the switchover operation is successful. The alarm is
automatically
# TotalFree=$(expr cleared if the switchover
$MemFree + $Cached) is successful. However,
depending
# PercentUsage=$(expr on the type of$TotalFree the application, \* 100the time for starting (or
/ $MemTotal)
activating)
# echo "Memory a standby usage RU= can vary from a few seconds to tens of
$PercentUsage%"
minutes.
# echo The state
"Total available of the new active
System memory: RU can be checked
$TotalFree" > using the result
The system clears the alarm automatically when the measurement
structured command
/mnt/export/output.txt line interface:
goes up and is continuously held at the minimum threshold
1.
# Log into
echo "Memory the cluster.usage = $PercentUsage%" >> /mnt/export/output.txt
clearing level or above.
2.
3. Use
Collect thethe "showtop 5 has state" SCLI
memory users commandat processtolevel see (asthe listed
state in of the
the alarm)
new
active
on the RU. NE using The MO thenamefollowingof the new active RU can be found in
commands:
Disclaimer:
the
# echo " TopThe
application instructions
5 additional
memory below make
information
consuming fielduse
process of either
1. (check
For example,
Appl.unsupported
execute
addl. SCLIfor
info the
field
commands
following or
command commands
details): " >> /mnt/export/output.txt to check from the the unsupported
state of the full bash shell.
Please
# fsclishcarefully
-c "showread
/AS-10/ApplServer-0 alarm theactive
disclaimer
recovery unit:
filter-by that is shown when 70160"
specific-problem either entering
>> the
SCLI
> show unsupported vendor mode or/AS-10/ApplServer-0
has state managed-object
/mnt/export/output.txt the full bash shell.
Do
An
4. Thenot memory
use the RU
operational commands
has an
is often in any other
UNLOCKED
consumed by filescontext.
administrative
located Please
in the check
state,
tmpfs from
ENABLED the
filesystem.
product
operational documentation
state, an or
empty from local
procedural customer
status,
This information is not part of the report collected above. The instructions support
and for
"ACTIVE" morerole.
information.
The
below procedural
can be used status of INITIALIZING
to collect such information: means that the RU is still starting
1.
a. Check
up. that
If the switchover
Check the name
the memory fails of(operational
used the
by alarm
the tmpfs logstatefile
fileof that theisnew
systems defined
on thebynode the found on
parameter
active
step (2.a). RU is fsLogFileName
DISABLED),
Check the tmpfs inmount
check the the alarmsyslog
points processor configuration
for a possible
(ignore entries explanation for
in
forConfiguration
the failure
the none and and if Directory,
required, is the same
contact
/dev filesystems): your as local /var/log/master-alarms
customer . Use the
following
support. SCLI command:
# echo "tmpfs mount points: " >> /mnt/export/output.txt
showssh config
#Note that
root@CLA-0 fsClusterId=ClusterRoot
if both Recovery
"mount -tUnits tmpfs" in the
>>fsFragmentId=AlarmMgmt
active standby RG fail repeatedly,
/mnt/export/output.txt
fsFragmentId=AlarmProcessors
this
b. Checkalarmthe may be raised
space used for by both fsAlarmProcessorId=AlarmProcessor1
the tmpfs Recovery Units.
file systems:
fsAlarmProcessorConfigurationId=Default
In that case
# echo "Space theused situation has to" be
by tmpfs: >> corrected immediately.
/mnt/export/output.txt
2. If the value in Configuration Directory is different, then modify the value
Disclaimer: The instructions below make use of either unsupported SCLI
commands or commands from the unsupported full bash shell.
Please carefully read the disclaimer that is shown when either entering the
SCLI unsupported vendor mode or the full bash shell.
Do not use the commands in any other context. Please check from the
1. Fill in a problem report with the alarm data and send it to your local
product documentation or from local customer support for more
customer support.
information.
1. Find the attribute with the invalid value. The name of the attribute can
be found in the "Managed Object Id" field of the
Fill
alarm. in a problem report with the alarm data, and then send it to your local
customer
fsParameterId=<name support. of attribute>,
fsAlarmProcessorConfigurationId=Default,
fsAlarmProcessorId=AlarmProcessor1, fsFragmentId=AlarmProcessors,
fsFragmentId=AlarmMgmt,
1. If automatic alert is not supported fsClusterId=ClusterRoot
in situations where the alarm system
The
heartbeating is not functioning, the
alarm can be shown using check following
occasionally SCLI command:
show alarm active filter-by specific-problem
that the heartbeating functions properly. The time of the alarm and the 70243
2.
value Modify of the theheartbeat
attribute'sinterval value in the Configuration
(specified in the Directory using the
following SCLI command:
'Application Additional Info' field) should be used in analyzing the situation.
Disclaimer:
set config attribute The instructions below make usefsFragmentId=AlarmMgmt
fsClusterId=ClusterRoot of either unsupported SCLI
commands
fsFragmentId=AlarmProcessorsor commands from the unsupported full bash shell.
fsAlarmProcessorId=AlarmProcessor1
2. Perform such checks also when the system does not generate any
Please carefully read the disclaimer that isattribute-list
fsAlarmProcessorConfigurationId=Default shown when<name either of entering the
attribute>
alarm events for a long time.
SCLI
<correct unsupported
value> vendor mode or the full bash shell.
3. If these occasional checks reveal that the heartbeat alarm events are
Do
Then notrestart
use the thecommands
alarm processorin any with otherthe context.
following PleaseSCLIcheck command: from the
To
not find out why
continuously the other
generated CLA node heartbeat
at each is unavailable, interval, perform the following
product
set has documentation
restart managed-object or from local customer support for more
/AlarmSystemLight
steps:
restart the alarm processor (this also forces the restart of the alarm
information.
3. The default
1.
system Log in to thevalues
database) controller
using of the the alarm
node.
following processor
SCLI command: attributes used when
This
correcting alarmthe is raised
situation when are the fsHeartbeatAlarm70246Enabled
listed below: is set to
2.
setCheck has restart the state of the VM using
managed-object the command:
/AlarmSystemLight
false
Attribute and the fsAlarm70247raise Default attribute
value is set to true.
>nova list
This
When problem
70247 alarm
fsAlarmProcessorConfigurationId:can indicateis raised one dooreither
both one of the of following:
the following (but not both)
3. If the VM is stopped then start it using the command:
-Default
A failure
Switch ON situation
Alarm
fsLogFileName: that
System is, for example,
heartbeating
/var/log/master-alarms caused
. by node reboots or node
>nova start <VMid >
failures.
If In these cases,
the heartbeating
fsLightParserThreadSleepTime: is required the alarm please severity
1set the is attribute
MINOR to true using the
and
following the problemSCLI commands
fsLightAlarmHistorySize: is likely toand disappear
the alarms
25000 quickly. will be Thecleared
severity of this alarm
will be raised to MAJOR if the node(s)
automatically:
fsLightAlarmListSize: 10000 do not restart
Find out why an FSDirectoryServer recovery unit has been locked. The
within
set a few
config minutes.
system canattribute
fsLightSnapShotTimeInterval:
be restored fsClusterId=ClusterRoot
to a safer state 60 by performing fsFragmentId=AlarmMgmt
the
-fsFragmentId=AlarmProcessors
An application
fsLightSnapShotMinNumRecords: problem caused,fsAlarmProcessorId=AlarmProcessor1
for example,1000 by a program error, a
following steps:
configuration error, or data corruption. In
fsAlarmProcessorConfigurationId=Default
fsLightManualAlarmClearingEnabled: this
false case, the
attribute-list
1. Log into the system.
alarm severity is MAJOR and manual
fsHeartbeatAlarm70246Enabled
fsTimersForRNWAlarmsEnabled: true intervention
false may be needed. If the
2. Start the structured command line interface:
1.
When Login
severity the toclearing
of the network
this alarm
fsRaise70280insteadOf70005forUnknownSP:is is element
MINOR,restart
complete, as
youroot may theuser
choose
Alarmtotrue
check
to wait
Light the asituation.
Processor by using
$ fsclish
2.
few
the Check
minutes
following the to
fsHeartbeatAlarm70246Enabled:state
SCLIsee ofif all
the
command: the
alarm recovery
is units
cancelled.
true within
In node the recovery
reboot and group (the
transient
3. Check the status of the FSDirectoryServer recovery units using the
name
failure
set has of the
situations,
restart recovery
fsMultiSpBlockingRuleEnabled: the group
system
managed-object is in
will the
cancel Application
/AlarmSystemLight
false the alarm as
following SCLI command:
Additional
soon as the Information
node
-fsHeartbeatAlarm70246TimeInterval:
Switch reboots field). have If the
beenrecovery 300 groupand
completed is providing
the service service,
show hasOFF state 70247 Raising
managed-object /*/FSDirectoryServer
each
When ofthe
instance(s) itsalarm
UNLOCKED
has
fsAutoAcknowledgeWhenCleared: been
system's recovery
successfully
heartbeating units that
reassignedis have
desired and the
to
follow-alarm-definitionthe
be recovery
switched unit
OFF,
1. To ensure the proper
OBJECT functionality of theOPERATIONAL
ADMINISTRATIVE system, switch off USAGEthe inert
ACTIVE
restarted.
along with role,
fsAlarm70247raise: If the
settinghas the ENABLED
severity of this operational
alarm
fsHeartbeatAlarm70246Enabled
true is MAJOR, state, and to
perform anthe empty
following
false
mode
ROLE after the problem analysisDYNAMIC
PROCEDURAL is done.
procedural
steps:
set FALSE status. For example, the12state of recovery units of in the
You can as
fsLightEventsProcessed:
2.
/CLA-0/FSDirectoryServerswitchthe value
off thefor inertthemode
fsAlarm70247raise
UNLOCKED from all nodes ENABLED attribute
of the cluster theby
ACTIVE
/Directory
1. Log
AlarmLight into recovery
the
fsLightProcessingInterval: active
Processor groups
CLA can
as
configuration be
root checked
user.
200 in the using
Configuration the "show has" SCLI
Directory.
issuing
ACTIVE the following
- SCLI
- command:
command:
2.
DoCheck bythe
this inert system
using
fsLightProcessorSleepInterval: the syslog
following (/var/log/master-syslog)
SCLI command: for possible failure
set has
/CLA-1/FSDirectoryServer off managed-object LOCKED / 800 ENABLED IDLE field from
Extract
> show
reasons
set config the
has
and error
regex
contact
attribute
fsLightClearWarningAlarmsEnabled: type
filter from
yourru statethe
local
fsClusterId=ClusterRootapplication
managed-object
customer false additional
support information
*Directory*
if you
fsFragmentId=AlarmMgmt
Note that this should
COLDSTANDBY be done by the supplier's
NOTINITIALIZED - field engineer that is
alarm
OBJECT
need output. Below is the list of fsAlarmProcessorId=AlarmProcessor1
assistance.
fsFragmentId=AlarmProcessors
fsLightClearAlarmsOnNEReset: actions none corresponding to each error type
currently
4. If one of analyzing
the recovery the system.
units is LOCKED, unlock it by using the SCLI
number. ADMINISTRATIVE
fsAlarmProcessorConfigurationId=Default OPERATIONAL USAGE ROLE
attribute-list
fsLightExcludeRangeFromNEResetRule:
When
command the inert "set has mode is switched
unlock". off, pending
For example: none recovery fsAlarm70247raise
actions take place.
PROCEDURAL
false
fsLightExternalFlowControlValid: DYNAMIC none
For
set example,
has unlock if managed-object
an important severity process in a
/CLA-1/FSDirectoryServer
USER_NAME_DUPLICATE_ERROR
fshaRecoveryUnitName=FSDirectoryServer,fsipHostName=CLA-
Once done, restart the Alarm Light - A username
Processor byausing cannot
the be the same
objectClass:
Depending
cold
Followingactive/standby on
message the problem
recovery
is displayed: FSAlarmProcessorConfiguration
type
group(see has thefailed
Application
in node thatfollowing
Additional was in the SCLI
as one of the reserved names
0,fsFragmentId=Nodes,fsFragmentId=HA,fsClusterId=ClusterRoot
command:
objectClass: from the list: root, wheel, daemon, adm,
extensibleObject
Information
inert mode, switching
/CLA-1/FSDirectoryServer field 2), the the cause
inert of
mode
unlocked the successfully.
problem
off for thecan node be:
sync,
UNLOCKED
set has shutdown, halt,
ENABLED lp, mail, uucp,
ACTIVE operator, ACTIVE games, -nobody, gopher, -
causes
5. Quit arestart
the switchover
fsclish managed-object
fsAlarmCompareAAIforWildcardIAAI: sessionof theusing /AlarmSystemLight
recoverythe group. false
following command:
nfs, nfsnobody, named, ntp,
fshaRecoveryUnitName=FSDirectoryServer,fsipHostName=CLA-
4. If Application- Additional ldap, mysql,
Information field postgres,
contains "LDAP serverrpm,
apache, sshd,
LDAP_DOWN
quit Network configuration problems.
dbus, vcsa, nscd.
1,fsFragmentId=Nodes,fsFragmentId=HA,fsClusterId=ClusterRoot
unavailable,
Check thatusing defaultand
the primary configuration
secondaryparameters" NetAct LDAP then server addresses
UNLOCKED
please
definedcontact in the network ENABLED
your local customer
element IDLE
(NE) support. COLDSTANDBY
internal LDAP server are reachable.
USER_NAME_RESERVED_ERROR
NOTINITIALIZED - - A username cannot start with one
of
In the prefixes reserved for network elements: "_nok", "_nsn".
INVALID_CREDENTIALS - NE accountthe
case above, the recovery unit of CLA-1
credentials node is actingtoas
to connect thea cold
standby backup, and the recovery
NetAct LDAP server are invalid (wrong account name, wrong password unit on CLA-0 is
USER_NAME_TOO_LONG_ERROR
running the service normally. Note that- A theusername
grep command cannotinexceed more
the example
and so on).
than
is Check
used 32 to characters.
filter out information regarding individual
that the NE account stored in the NE internal LDAP server to
processes
connect to in theeach NetAct recoveryLDAPunit. server exists in NetAct, has not expired, has
USER_NAME_CONTAINS_INVALID_CHARS_ERROR
Since this is a situation that - User name must
the correct password and somay on. be caused by various faults, contact your
start
local withcustomer a letter, a digit,
support toan underscore
analyze or a full stop. The last character
the root-cause.
must be a letter, a digit, an underscore, a hyphen, a dollar sign or a full
BAD_DATA - NetAct LDAP server is overloaded or shut down.
Disclaimer: The instructions below use either unsupported SCLI
commands, or commands from the unsupported full bash shell. Please,
read carefully the disclaimer that is shown when either entering the
unsupported SCLI vendor mode or the full bash shell. Do not use the
commands in any other context. For more information, see the product
1. Observe the application in question by checking the "Application ID"
documentation or your local technical support.
field of the alarm.
2. Lock the application in case of a CRITICAL severity alarm. To prevent
All currently active SSH sessions opened with the indicated username
its infinite restart, use the following SCLI command:
(see the 1st additional information field of the alarm) must be closed and
> set has lock managed-object <mo-name>
This
reopened alarmifimplies needed. anAftererrorreopening
situation which a session, prevents the process
the correct from
permissions
3. Observe the invalid or missing attribute by checking the "Identifying
starting.
are taken into use if the account is still in use for the NE.
Additional Info" field of the alarm.
Please contact your local customer support for assistance.
4. Observe the configuration location by checking the "Managed Object"
Initiate SSH sessions:
field. This contains, for example, a branch of the
1.
The Log in toannounces
alarm the active cluster theattempt management
to raise annode. alarm Forwithexample
an the
unknownMMN-0.
Configuration Directory-based configuration or a path for file-based
2. Use
specific the following
problem. command to check the open SSH sessions:
one.
1.
5. Check
Add or the correctIdentifying
the invalid Application
attributeAdditional
mentionedInformation in the "Identifyingfield of the
>
70280showalarm user-management
that is raised. login-history
Additional Info" field. FollowItthe contains
guidelines the specific
provided in
problem
customer which the alarm system
documentation foristhe was unable to
application recognize.
using the appropriate tool
1.
Note: Check Theifabove the targetcommand ID field shows displayed
all the as "<Empty>"
"still logged in"inusers the output
and of
show
(for alarm
example, active
a text filter-by
editor specific-problem
for the file-based 70280
configuration).
the
already SCLIlogged-out
commandusers. "show license target-id" to
2.
6. Check
Unlock whether
(if the
the alarm the specific
second problem
used)isorpresent restart in the alarm system
confirm that is step
reallywas valid. the application using the
repository
following usingcommands:
SCLI the following SCLI command:
2. Check
For example, if thethe licenseresultmanagement server
of invoking <unknown-specific-problem>
the above fetches
command the targetmightID lookof the NE
show
-successfully
unlock: alarm settype has specific-problem
unlock managed-object
as follows: upon its restart by executing<mo-name> the following
The
3.
- restart following
Check whether
set hasdiagnosis the specific
restart command
managed-objectproblem mustisbe invoked
present
<mo-name> by the
in the list operator,
of known in
SCLI command:
order
alarmstoingather the customersome diagnostics documentation. data for subsequent investigation on the
> set has restart managed-object /CLM.
Fillshow
reason in a of user-management
the alarm.
problem report andlogin-history send it to your local customer support.
3.Check that the target id (e.g. NE ID) has been properly configured via
CAM.
User If the license
name Loginmanagement time server
Logout continues
time to fail Host name
/opt/nokia/SS_RCPDBHAMgmt/tools/fsdbdiag.sh
The operatorthe
in fetching shouldtargetinform ID of the local NE, pleasecustomer
------------------------------------------------------------------------------------ contact support, yourand localreport
customer this as
a possible
support for problem
further caused
information.
extuser Mon Jan 16 08:22:09 2017 still logged in by maximnumn connection limitation of
10.157.3.230
database.
extaccountThe Fri configuration
Jan 13 12:49:06 may2017 have to FribeJan changed,
13 15:11:15 if the2017database has
been
10.157.3.230configured with too few connections. There are two possibilities to
avoid such situation. First, increase the maximum number of possible
Identify the application raising the alarm using the Application ID field in
connections to the DB and, second, reduce the number of applications
the alarm. From the SCLI shell, enter the bash shell
that
3. are simultaneously of accessing the database.
to Thegain preferred
access toway the system. closingexampleaccount@CLA-0
a session is a graceful exit. It is,
[test] > however,
shell
possible to close
[exampleaccount@CLA-0(test) it forcefully. The following example illustrates forceful
The
cleanup Operator
of a sessionshould for provide
user the information about which application uses
"extuser".
/home/exampleaccount] #
1.
which ResolveIP-address.the VRFEach name based on
database the VrfId reported
application has to describe in the alarm which
Now, if the subsystem (for example the PM9 server) is unable to write
additional
database
4. First, enterinformation
connections
the full by
bashare theused,
shellSCLI and
and command:
is
check responsible
the sshd for
process id of the child
result files into the result directory then the following
> showofnetworking
connecting/disconnecting
process "01256": vrf idto/from
<VrfId>the database. Additionally, each
shall be done at the management node where the Subsystem (here PM9
application must provide a description what actions are to be performed if
server) is active:
2.
this Check
alarm the network
occurs. Thisconnectivity
information to the peer
should network in element using ping
#
1.ps
Network -ef |reason
If the grep
leakage 1256
incould be caused
the identified by
applicationeitherbe anstored
incorrect
additional
a text field
infoconfiguration
in the
field of the alarm of
and tracerouteDirectory.
Configuration commands. Get the source node information from alarm
virtual
indicates switches,
issue "No or aspace virtualleft switch malfunction.
on device", then In case
Managed
root 1256 object 7701 field0fsipHostName=<node>
13:05 ?thebyvirtual
00:00:00 section
sshd: and execute
extuser [priv] the
ofcheck
an incorrect
whether configuration,
the disk is full executing switch the configuration
following should be
command at
following
The
10009 limit SCLI
can be commands:
increased by changing the value of the parameter
fixed shell1276
bash immediately. prompt: 1256 0 13:05 ? 00:00:00 sshd: extuser@pts/5
roota. Ping
"max_connections" in0the database
pts/4 configuration
00:00:00 grepfile
df -h if2504
Check <TARGET 17382DIRECTORY>
a new license
13:06
for the feature
1256
if BFD corresponding
/mnt/db/<dbname>/db_data/postgresql.conf address familyisisrequired,
IPv4: (Note:and install
actual DBitworking
if required.
If For feature
the example: itself dfis-h not /var/opt/nokia/SS_PM9/storage/
required, set the
start
directory
Terminate networking
depends
it:is full, get instance <VRF Name> diagnostics
on used deployment) of all the nodes. The database ping node <node>
If the disk
feature admin state to the
OFF. list of the files by executing the following
destination
has to be then<Destination
restarted IP>restarting
by source <Source the relevant IP> Recovery Group.
command
1. atthethestatusbash shell:
ifTo
# kill BFD check
-9 corresponding
1276 ofaddress
a feature, familyexecute is IPv6: the following SCLI command:
ls -lrt <TARGET DIRECTORY>
The start networking
following
"Disclaimer: severity
The instructions instance
level is<VRF Name>
supported: diagnostics
below make use of either unsupported ping6 node SCLI
For
>show example:
license ls -lrt /var/opt/nokia/SS_PM9/storage/results
feature allno equivalent
<node>
Note:
commands destination
There are
or space <Destination
currently
commands IP> sourceSCLI <Source
commands. IP>
Create
The free
feature required thefrom
onwould disk the unsupported
notby beremoving
displayed some full of
in the
bashthe shell.
output. old files.
b.
Warning:
Please TraceroutefsdbConnectionsAlarmLimit
carefully read the disclaimer (value
that from
is shown Configuration
when either Directory)
entering the
2.
2. If the reason in the identified application additional info field of the alarm
The
SCLIifTo
reached,BFD install
SSH a newfor
corresponding
minimum
session
unsupported
license,
number
vendoruser addressexecute family
of connections
"extuser"
mode or
the
is
the
following
is IPv4:
whichshell.
terminated.
full bash
SCLI
must command:
be free with check
indicates
>add issue
license "Permission
file <file> denied", then check
start
frequency
Dowhether
not use networking
therequired
commands instancein<VRF any otherName> diagnostics
context. traceroute node
where,destination the
<file> is the<Destinationpermissions
license file name arealong givenwith byPlease
executing check
its absolute
from
thepath. the
following
<node>
fsdbConnectionsCheckFreq
product documentation (value IP>from source <Source
Configuration
or from local technical support for more IP>
Directory)
command
3.ifTo at bash
setcorresponding
the feature admin shell: statefamily
to OFF, execute the following SCLI
BFD
information." address is IPv6:
ls -lrd <TARGET DIRECTORY>
command:
The
Followstart networking
following instance
diagnosisgiven
the instructions command <VRF
belowmust Name>
to clear diagnostics
be invoked
this alarm: by the traceroute6
operator, node in
For
> setto example:
license ls -lrd /var/opt/nokia/SS_PM9/storage/results
feature-mgmt codeIP> <code> feature-admin-state off on the
<node>
order
1)IfVerify destination
gather
if the some
SCLI <Destination
diagnostics
daemon is up. data
If source
you for can <Source
subsequent
access IP>
investigation
the fsclish shell, it
the output
where, does not indicate rwx permission for feature
user _nokfssyspm9,
reason
indicates of<code>
the alarm.
that
specifies
thecustomer
SCLI daemon
the feature
is up.
code whose admin state
contact
has the
to be set log local
to OFF. Support. A correct output will be
3. Check
Correctthe files (/var/log/master-syslog)
/opt/nokia/SS_RCPDBHAMgmt/tools/fsdbdiag.sh
2)displayed the for related faults.
aserrors,
shownifbelow: any, in the script provided by the user/subsystem.
The name of8the
drwxr-xr-x erroneous script
_nokfssyspm9 is indicated1024
_nokfssyspm9 in the Febalarm as the second
9 20:17
4. Check the
Identifying state of the
Application peer network
Additional Informationelement. field. This script can be
/var/opt/nokia/SS_PM9/storage
located at either one of the following locations:
/opt/nokia/configure/sh/
Try to create the subsystem certificate again and check why the certificate
cannot be created.
Disable the subsystem which raised the alarm, since it could be potentially
dangerous to run it in a non-secure mode.
Check why TLS connections cannot be made and potentially disable
Note: The procedure to properly disable a subsystem must be obtained
RUIM for the
from the Certificate Management Guide.
time of investigation.
Check the following options depending on the error code displayed (for
To disable RUIM, enter the following SCLI command:
If needed,
more save thesee
information, active configuration
the Identifying to clear the
Application alarm. Information
Additional
> set user-management ruim disable
field
1):
Depending on the Error code (see Application Additional Information field
- Check if the /etc/certs/<CertMan domain> directory is in a read-only
1)
mount.
Thiscause
is an informative alarm and
the for the problem can be:does not require any actions.
- Check if there is enough free disk space available for creating a
1. Certificates are not present in the "default" or "ruim" domain.
certificate file.
Verify whether Certificates are present in the "default" or "ruim" domain.
Use the following SCLI commands to check whether certificates are
Note: CertMan domain is the domain name where the Certificate is stored
installed:
1. In case of warning Management
alarm (with warning severity):
under the Certificate DN in LDAP.
> show security cert ruim ca-cert all
> showthe
a. Reduce security
number cert
of default ca-cert
routes in all
the forwarding table so that the route
count is below the supported limit.
2. Wrong server certificate is used.
Disclaimer: The instructions below use either unsupported SCLI
b. Ensure that the number of routes present in the node local forwarding
commands,
3. Theisexternal
table less thanLDAP server does
the maximum not support
number of routes TLS protocol. Execute the
supported.
or commands from the unsupported full bash shell. Please carefully read
following SCLI show commands, to view the routes in the forwarding table
the
Please disclaimer
contact that
yourislocal shown when either
customer support entering the unsupported
to resolve the issue. SCLI
of the node for which the alarm is raised:
vendor mode or the full bash shell. Do not use the commands in any other
Disclaimer: The instructions below make use of either unsupported SCLI
context. Please check from the product documentation or from your local
Commands
Ex: show networking or commands from the unsupported
forwarding-table runtime node full <node
bash shell.name> Please
technical support for more information.
carefully read the disclaimer
show networking forwarding-tablethat is shown runtime when
ipv6either
node entering
<node name> the SCLI
unsupported vendor mode or the full bash shell. Do not use the
1. Execute the following SCLI command to start an external bash shell
commands
2. In case inmajor
any other context. Please check raised
from the product
Check
session: the of environment alarmin (with
which major
the NE severity)
is installed after
(NOKIA warning alarm:or
laboratories
documentation or from local technical support for more information.
customer environment). If this alarm is seen in a customer
a. Reduce thecontact
environment, numberthe of local
routes in the node
customer localfor
support forwarding table so that
further information on
shell bash full
1.
the Execute
route the following
count is below SCLI
the commandlimit.
supported to start an external bash shell
how to change the NE's configuration to accept commercial
session:
licenses.
2. Execute If this alarm is seen
the action
following command in NOKIA to laboratories,
check whether then this can is
/ClusterNTP bein
The
b. corrective
Restart fornode
this alarm is to install a new valid bycertificate. If the
disabled
sync with if or
therereboot is no the need tofor which
allow the the alarm
installation is of
raised executing
certificate
shell
the bash
following is
full not
SCLI needed anymore, it can be
NOKIA
server internal
NTP: testcommand:
licenses into the NE. To disable this capability in
deleted
set as well.managed-object
The alarm will get automatically name> cleared when it is
NOKIA laboratories, execute the<node
has restart SCLI command
replaced
2. Check by thearoutes
new valid to thecertificate
NTP server or when the certificate
(depends on the configuration
"set license test-license state disabled"
/opt/nokia/SS_RCPNTP/bin/ntpdc -c peer from-n the unsupported vendor
is deleted, but this automatic clearance
made) could take up to 24 hours to
The
mode. corrective action(s) for this alarm depend on the reason for failure of
happen. So it is recommended to manually clear this alarm
automatic
In the output, KUR thereoperation,
should as
be described
an below:
after
3. Check the certificate
UDP port is 123 replaced
using the or asterisk
deleted.
steps below:
mark (*) on the NTP server IP,
which
Note: For more detailed description of the below mentioned SCLI
1. UNABLE_TO_FETCH_ROOTCA:
indicates that itwell
is synchronizing with the NTPused server andcommands,
the alarm has
commands
ps as as the parameters being in the
a. -ef
been Check | grep
cleared
-i ntp
whether the NE service domain contains the root CA certificate
automatically.
Check
please
root new
refer certificates
9760either 3638 to0the installed
15:12SCLI or
? command not:
00:00:00 online help or to the product
(trust anchor) by executing the SCLI command "show security cert
documentation.
/opt/nokia/SS_RCPNTP/bin/NTPMonitor
<domain>
If there is aca-cert
crosscheck trust-anchor
mark (x) on { [cert-type
the NTPpresentserver<cert-type>]?}".
IP, refer to step 7.
a.
ntpRun
As the
the first9765 below
step,9759 SCLI command
the
0 15:12 "C"
? value 00:00:00 in the "Appl. addl. info" of the
> show
alarm to find security
out if the
/opt/nokia/SS_RCPNTP/bin/ntpd certcertificate
<domain>is-u ee-cert
an EE -g -n -c /etc/ntp_master.conf
ntp:ntp
b.Use
3. If it isthe missing
"exit" then, install
command to the
exitcorresponding
the bash shell.root CA certificate by
certificate
Add more the or a
physicalCA certificate.
or virtualcert RAM to the node which raised the alarm.
executing "set security <domain> ca-cert" SCLI command.
For
Check Here
example, domain
that nameaddl.
theif source
"Appl. can forwarder
and be
info"fetched
of the from
alarmthe
addresses has"Application
"C:EE",
listed above Additional
then thebind to
are
4. Execute
Information
certificate isthean following
fields" EE field ofSCLI
certificate. the command
alarm.
And if "Appl. to find
addl. the NTP
info" of server
the IP: has
alarm
port 123 using the command below:
c. For more details, see the "Centralized certificate management" chapter
In thethen
"C:CA", output it isverify
a CA Not After field should contain new expiration date.
certificate.
in
showthe networking-service
platform administration guide.
ntpcertificate,
A. If the-alpn
netstat certificate
| grepis":123" an EE then follow these instructions:
A.1) To verify the validity of the certificate, execute the following SCLI
2.
5. CMP_REQUEST_FAILED:
If theany asterisk (*)raised
is the
not "Validity"
present the on the NTP server IP (in outputare of step
Check
command
4. Execute alarms
and
the check
following against
command to services
field:exit whose
the bash certificates
shell:
#a.2),Check
then whether
wait for the "default"
approximately domain
20 contains
minutes and the
check CMP configuration
whether alarm has
renewed
. showusing security the certfollowing
<domain> command: ee-cert
parameters
been by executing the "show security cert default cmp" SCLI
#>exit
show alarm active
For example, if "Appl. addl. info" of the alarm is "C:EE D:default DTE:2
command.
cleared using the 09:19:24-10:00",
following SCLI command:
CET:2013-03-15 execute command
Verify
5. "show
Check thatthe thestatus
security NE cert service
of has ee-cert".
default
/ClusterNTP beenor using the updated
/NodeNTP certificate(s)
SCLI and is
b. If they
show areactive
not configured, then configure themusing below
suitably by executing
A.2)
command Ifalarm
functioning onnormally
the certificate
the
filter-by
cluster.by specific-problem
hasreferring
expired
If the to
or the
is about
administrative
70377
corresponding
tostate
expire,is sections
obtain and
locked, of install
unlockproduct
them a
the "set security cert default cmp" SCLI command.
application
new one bydocumentation following the following guides. instructions.
6. IfNote
alarm thatis not
the cleared,
steps use thediffer,
to follow followingbased SCLI command
on whether thetocertificates
sync the
>c.show
For more has state
details, managed-object
see the "Centralized /ClusterNTP certificate management" chapter
system
are manually installed or automatically fetched via USAGE ROLE
OBJECT
in the platformADMINISTRATIVE administration guide. OPERATIONAL
Install more physical or virtual CPUs/cores

These steps can be performed to check the storage device status:


If it is a service error, use "systemctl status SERVICE_NAME" to check
the fault status for this service.
If it is a process error, use "ps aux | grep PROCESS_NAME" to check if
the process is running.
The supervision failure reason can be checked from journal log.
In case the service/process is running normally, the alarm is cleared
automatically.
To check the local storage agent target object, follow the steps:
If the alarm is not cleared automatically, contact your local Nokia customer
1. Check the SSH connection between the related management node and
support.
the target object node.
1. Check the TWAMP session configuration.
2. If the SSH connection can be established, log onto target object node
a. Get the session-id from alarm information, note it as sid here.
and use "systemctl status SERVICE_NAME" to check the fault status for
b. Show TWAMP Sender session configuration through SCLI command:
this target object service. The SERVICE_NAME to be used in the
> show networking monitoring twamp sender session-id <sid>
command can be found in the first additional information field of this
c. Check the reference documentation for IP parameters to make sure
1. Check the TWAMP session configuration:
alarm.
all session configuration fields are filled with correct values.
a. Get the session-id from alarm information, note it as sid here.
2. Check the network connectivity to the peer network element using ping
Inb. Show
case thethe SSH TWAMP Sender
connection sessionand
is normal configuration using aservice
the target object SCLI is
and traceroute commands. Get the source node information from alarm
command:
running normally, the alarm is cleared automatically.
Application field fsipHostName=<node> section and execute the following
> show networking monitoring twamp sender session-id <sid>
The
SCLIissue with system creation data inconsistency cannot be corrected
commands:
If c. Check
the alarm the reference
is not cleared documentation contact
for IP parameters to make sure
once the
a. Ping system has beenautomatically,
installed. your local customer
all session configuration fields are filled properly.
support.
In order to networking
> start fix the problem, the yaml
diagnostics used
ping in system
node <node>installation
destinationneeds to
2. Check the network connectivity to the peer network element using ping
be corrected for
<Destination IP>the nodes indicated in Managed Object ID of the alarm
and traceroute commands. Get the source node information from alarm
and the system needs to be reinstalled.
b. Traceroute:
Duplicated
Application IPv6 address situation must be
field fsipHostName=<node> resolved.
section and execute the following
>
1. Check startthe
networking diagnostics
IPv6 address traceroute
configuration node
locally <node>
in the systemdestination
and validate
SCLI commands:
<Destination
that IP>
a. the
PingIPv6 addresses are configured correctly.
3.
2. Check
Check if any network-related alarms are active on this node.
> start the IPv6
networking network planning
diagnostics and
ping other
node network
<node> elements'
destination
4. Check
configuration.the state of the peer network element.
<Destination IP> snapshots in the system, the startup snapshot needs
If there are multiple
b. Traceroute:
to be manually switched to point to an intact (backup) snapshot. This can
be > startbynetworking
done executing diagnostics
fsconfigure traceroute node --set-new-startup
tool with option <node> destination in the
<Destination IP>
management node bash shell. Tool fsconfigure will execute automatically
3. Check if anynode
management network-related alarms
(and whole VNF) are active
reboot on this
after user node.
confirmation.
If
4. the
Checkaccess
the key
statefor
ofthe
the automated
peer networkremote access account is missing or
element.
If there have been configuration changes after chosen backup snapshot
is invalid, it needs to be re-installed. For this, see the Customer
has been created, then those configuration changes will be lost.
Documentation Troubleshooting section for User Management.
Using fsconfigure requires full bash shell access to management node.
The actions to be performed in order to avoid a completely full database
Example:
are database-specific,
# fsconfigure so contact your local customer support and provide
--set-new-startup
them with the information you obtained from the alarm's additional
information fields.
Verified (clean) snapshots for delivery: VNF_BUILD_54.17.0-r9183-
Delete
171027 or remove data stored in DB memory to make the memory usage
Please contact your local customer support for resolving the issue.
lower than the pre-defined memory cancelling threshold, then the alarm is
BACKUP_2
cancelled
BACKUP_1 automatically.

Enter new startup configuration name and press [ENTER]


Cancellation
BACKUP_1 of alarm will be triggered automatically when negotiation
response
New startup andconfiguration:
sync message are received from master successfully.
BACKUP_1
Do you want to reboot MMN-0? "Yes/No, Y/N." Y
Below are some steps for troubleshooting PTP issues:
1. Check PTP
Triggering configuration.
reboot...
a. validate that same domain number set is used between slave and
master
If there is only one snapshot for VNF configuration data in the system and
b. has
that validate
beenthat master IP
corrupted, address
operator is configured
must correctly
recover the system by re-installing
2.
VNF Check
fromthe network connectivity from slave to the PTP master device.
scratch.
Get the source node information from alarm Managed object field
fsipHostName=<node> section and execute the following SCLI
commands:
a. Ping to check if PTP master is reachable:
> start networking diagnostics ping node <node> destination <PTP
master IP>
No actions needed other than keep monitoring if the alarm begins to
appear frequently. And if it does, contact your local customer support.

Cancellation of alarm is triggered automatically when master clock status


is detected OK.

Below are some steps for troubleshooting PTP issues:


1. Monitor IP traffic related to ports 319 and 320 with, for example,
The corrective actions for this alarm depend on the failure reason in the
Wireshark/tcpdump tool to check that the PTP UDP packets are sent and
3rd additional information field.
received normally.
Note: The below mentioned issuer-id is the first alarm additional
2. Check the Clock Class attribute of the announced message received
information field present in the alarm.
from the PTP master.
The corrective action for this alarm is to install a new valid certificate. If the
1. CERTMAN_CRL_DP_HOST_RESOLUTION_FAILED
certificate is not needed anymore, it can be deleted as well. The alarm is
CERTMAN_SOURCE_IP_NOT_CONFIGURED
automatically cleared when it is replaced by a new valid certificate or
a. Check if a valid IP address is configured and reachable (ping). If not, try
when the certificate is deleted, but this automatic clearance could take up
setting dedicated IP address by using SCLI command "set networking
to 24 hours. So, it is recommended to manually clear this alarm after the
As already
address indicated above, the number
<current_node_name> ip-address of potential
<ip-address errors in theiface
value>
certificate is replaced or deleted.
configuration
<interface-name> data add-user
is large. First,/CertMan"check carefully the
additional
b. Check ifinfo DNS fields of the
service alarm; in
is active the NE.daemon attempts
If it is not active,totry pinpoint
to check the
Note: For more detailed description of the following SCLI commands as
error
respective it encountered.
corrective Try to identify
actions for it. whether the illegal
well as the parameters being used in the commands, please refer either to
data
c. relates
If DNS serviceto the external connectivity
check if itconfiguration or tosome other
If the
the SCLI alarm is not iscancelled
command
not active,
online help automatically iswithin
or to the product
still possible
10 minutes,
documentation.
resolve
perform FQDN the
"internal"
based configuration.
distribution point The
to IP latter
address caseusingis more difficult mechanism. If it is
out-of-band
following steps:
in the sense
resolved, try that it most
to modify likely based
FQDN requires re-commissioning
distribution point to IPofusing the entire
"set
1.
First Identify
find out the if the missing
revoked node from the
certificate isalarm
an EEmessage.
certificate or a CA
system.
security Note
cert that this
config crl situation
dist-point would most likely
<primary-uri> occur
<issuer-id> <secondary-
2. Restart the faulty node from the cloud manager.
certificate.
during
uri>" the first commissioning of the system; if this error would be
3.
To If theiffault
verify persists,
it is raised,
an contact your
EE certificate, localthe customer support.
If
c. this
encountered
If the alarm
DNSisspontaneously
resolvedperform theexecute
e.g.
to a different following
after typerebooting
of IP
following
steps: of example
(for
SCLI command:
an earlierIPv6) than
-1.show security cert <domain> ee-cert
whatIdentify
commissioned
is assigned thenetwork
node with
to CertMan theRG,
element, corrupted
this
try towould brick
get most
the from thetype
likely
same alarm
indicate message.
a
of address if
2.
possibleRestart
corrupted forLDAP the faulty
PKI node
directory.
operator CRL identified
The in previous
distinguished
repository. name step in from the cloud
the alarm
Now
manager. compare the value in the 1st alarm additional information field with
identifies
d. Contactthe theinvalid object in LDAP
local customer support which
teamcaused
for more thedetails,
alarm. ifTry thetoproblem
fix the
the
3. "Serial
If the Number"
fault persists,fieldcontact
presentyour in the above
local output.support.
customer If both are the
object
persists.
The alarm (or delete it if fixing seems impossible) with
same, thenisthe cleared
revoked automatically.
certificate is the EE certificate otherwise it is the
SCLI
If the commands
fault persists, such as add/set/delete
contact your local customer networking. support.
CA certificate.
2. CERTMAN_CRL_DP_URI_NO_HOST
If the recovery of the affected files is needed, contact the local customer
CERTMAN_CRL_DP_LDAP_OPEN_CONNECTION_FAILED
support.
A. If the certificate is an EE certificate, then follow these instructions:
a. Check if the IP address is reachable (ping) and there are no network or
The
routingalarm issuesis cleared
to reach automatically
to PKI operator's as soon CRL as repository.
fabric path connectivity is
A.1) Install
restored. If athere
newiscertificate
hardware by following
failure on one
the of the below
compute instructions.
or networking
b. Check if the IP address is valid and is not discontinued/shunted by PKI
equipment,
operator. then restoration of the service is possible by moving the nodes
Note
to that the
another steps to follow differ, based on whether the certificates are
c. Check if availability
there is an zone. alternate IP address available from same issuer
manually installed or automatically fetched via the Certificate Management
using out-of-band
If this alarm is raised mechanism
during an and try to configure
in-service it using "set security
Protocol (CMP) initialization request (IR). upgrade this means that the
cert config crl dist-point <primary-uri>
UVM is unavailable. Therefore, a rollback of the in-service upgrade <issuer-id> <secondary-uri>"
d. Contact should
procedure the local becustomer
performed support
according teamtofor the more details,documentation
customer if the problem
A.1.a) Manual installation - Upload the certificate/key files to the NE using
persists.
instructions.
the file transfer process.
For example,
1. Check if IP SFTP/SCP
address of from
CLSanisexternal
reachable location, and use the below
from Management Node.
3. CERTMAN_CRL_DECODE_FAILED
SCLI command
2.CERTMAN_CRL_VALIDATION_FAILED
Check is CLS to install
IP addressthem:and port are properly configured.
- set security cert <domain> ee-cert cert-file <cert-file> key-file <key-file>
CERTMAN_CRL_UNSUPPORTED
CERTMAN_CRL_EXPIRED
For example, if the <domain> is default, <cert-file> is /home/user1/ee-cert-
CERTMAN_CRL_SIGNATURE_CHECK_FAILED
Check that the<key-file>
target type has been properly configured then via CAM. If the
file.pem, and is /home/user1/ee-key-file.pem, execute
CERTMAN_CRL_ISSUER_CERT_MISSING
CLSClient fails todefault
fetch the targetcert-file
type of/home/user1/ee-cert-file.pem
the NE, please contact your local
"set security cert ee-cert key-
a. Check with
customer support PKI for operator
furtherifinformation.
the distribution point is still valid and not
file /home/user1/ee-key-file.pem" command.
shunted or discontinued.
b. Check if there is alternate distribution point where valid CRL content is
A.1.b) Using CMP - Steps to certificates are retrieved using the CMP
present for the same issuer using out-of-band mechanism and try to
initialization request (IR):
configure it using "set security cert config crl dist-point <primary-uri>
- set security cert <domain> ee-cert-profile subject-name <subject-name>
<issuer-id> <secondary-uri>"
- set security cert <domain> ee-candidate-key auto-generate
c. Since this is PKI operator's CRL repository specific issues and not
- start security cert <domain> cmp-request initialize cert-directory <cert-
CertMan framework specific issues, further action needs to be discussed
dir>
with PKI operator to solve such kind of errors.
- set security cert <domain> ee-cert cert-file <cert-file>
d. Check if the issuer certificate is missing, install its respective issuer
certificate.
Note: The above instructions assume that the CMP configuration, like the
e. Contact the local customer support team for more details, if the problem
CMP server ip, address, pre-shared key, etc., are already configured in
persists.
1. Check if IP address of CLS is reachable from Management Node.
2. Check is CLS IP address and port are properly configured.

Disclaimer: The instructions below make use of commands from the


unsupported full bash shell. Please carefully read the disclaimer that is
shown when entering the full bash shell. Do not use the commands in any
other context. Please check from the product documentation or from local
customer support for more information.
Disclaimer: The instructions below use either unsupported SCLI
commands, or commands from the unsupported full bash shell. Please
The target node has to join the cluster properly, so as to activate the
carefully read the disclaimer that is shown when either entering the
dynamic configuration. The suggested workaround is to restart the nodes
unsupported SCLI vendor mode or the full bash shell. Do not use the
for which the alarm was raised, restart the systemd service cluster-
commands in any other context. Please check the product documentation
If the alarm is not cancelled
membership-mgr.service, orautomatically
restart both inwithin case 10 thatminutes,
the issue perform
persists. the
or contact your local customer support for more information.
following
These twosteps: actions will force the nodes to join the cluster again and
1. Login instructions:
1. Identify
activate theirthedynamic
missing configuration
node from theifalarm message.
it is currently either not activated, or
a. Log into the vNE.
2. Restart
partially the faulty node from the cloud manager.
activated.
b. Switch to root account (root privilege required).
If the fault persists, contact your local customer support.
Disclaimer
USER@NODENAME : The instructions [NE] >below set user make use of either
username root unsupported SCLI
To restart the failing nodes, perform the following actions:
commands
...
1.
or Find
commands the node from forthewhich the alarm full
unsupported wasbash raised. shell. The alarmcarefully
Please containsread such
info
the in the field "Managed object".
2. Check the logs of the sync-user-files script:
2. Restart the
disclaimer that node.
is shown when either entering the SCLI unsupported
#
1. journalctl
Use the /opt/nokia/bin/sync-user-files
SCLI commands to check the attributes defined for the SNMP
3. Verify
vendor that
mode it joined the cluster successfully after the restart.
3.
agent. The attributesofare
Check the status thelocated
storage undernodes. theMust
following be "accepted"
entry: and
or the full bash shell. Do not use the commands in any other context.
"running":
fssnmpNEId=<agent
To restart cluster-membership-mgr.service, IP / hostname>, perform the following actions:
Please
# cmmcli -g -f state,is-running | grep <Storage Node-0>
fssnmpAttributeType=NEattrs,
1. Check the status of the service cluster-membership-mgr.service:
check
# cmmcli from -g the product documentation
-f state,is-running | grep <Storage or from Node-1>local customer support for
fssnmpMediatorName=1,
"systemctl :status cluster-membership-mgr.service"
Disclaimer
more
# cmmcli -g The
-f instructions
state,is-running below
| grep make
<Storage use of either unsupported SCLI
Node-n>
fsFragmentId=SNMP,
It should beor active with a timestamp of the specific time it shell.
was last
commands
information.
If commands
none of the Storage Nodes is operational, corrective from the unsupported full bash actions Please
need to be
fsClusterId=ClusterRoot
started/restarted.
carefully
taken. The directory /mnt/_global/etc needs to be mounted to each node
2.
read
1. Restart
Find this
the service:
disclaimer that with is shown when either entering the SCLI
from
Use theout
the shared
following
attribute
storage.
SCLI
unacceptable
command to check the attributes:
value. The name of the
"systemctl
unsupported
attribute can restart
be cluster-membership-mgr.service"
found in the 'Identifying Application Additional Information'
4. Check
Disclaimer the : Theinternal interface status . Must
instructions below make use of either the unsupported be "UP":
3.
fieldCheck
vendor of mode
the the status
alarm. the again:
orinternal full bash shell. Do not use the commands in any other
#
SCLIip
> show addr
commands |
configgrep or commands from thefsFragmentld=SNMP
fsClusterld=ClusterRoot unsupported full bash shell.
"systemctl
context. Pleasestatus cluster-membership-mgr.service"
check from thetoproduct documentation or from local
If not, corrective
Please carefully
fssnmpMediatorName=1 actions
read the need
disclaimer be takenthat itisto
fssnmpAttributeType=V2traps restore internal
displayed when network
entering the
If
For it
customeris active
example, supportwith
the a new timestamp,
for more entry
following information. raises then was
alarm restarted
70001- if successfully.
xxx is not a host
connectivity
SCLI unsupported vendor mode
name
5. the that
Check can
ifbash
thethe be
ssh resolved:
key Doauthentication between nodes
or
2.
If aVerify
switch
fullthat shell.
isisrestarted, optional then
not use the
attribute
we have
commands
fssnmpUDPPort
a redundant
in anyhasisotherconfigured
the context.
value thatandthe
1.If the trap
fssnmpNEId=xxx,
working. Each unnecessary,
node check
fssnmpAttributeType=NEattrs,
needs to have whether
/root mounted isswitch
on waythrough
there fssnmpMediatorName=1,
a the to disable
shared whichthe
storage
Please
SNMP agent is listening to. The default value is 161.
the
sending traffic
fsFragmentId=SNMP,
and use flows.keys forfsClusterId=ClusterRoot
same
check from the product ssh authentication.
documentation or from local technical support for
of
6. the
After trap in the SNMP
resolving the agent,
issues, run orthe useuser filtering in the SNMP
management Mediator. The
synchronization
more
3. Verify that the optional attribute fssnmpProtocolVersion
If
SNMP
2. this
Modify
script alarm
again: the isattribute's
raised, then value it isinan the indication
Configuration that the switchisis
Directory
the same that
byrestarted.
using the
information.
the SNMP agent supports. The default value is V2c.
Mediator
As explained above, this alarm does not typically indicate an errorSCLI
following
#./sync-user-files may
SCLI be configured
command: to filter out traps using the following
The
command alarm can be safely ignored if the restarting of the switch was
condition,
4. Verify that butthe thisoptional
relates to other maintenance actions or state
attributes changes
intentional,
add config for example, as a part offssnmpReadCommString
a system reboot. and
in
1. the
Log in toobject
system.
fssnmpWriteCommString theHowever, fsClusterId=ClusterRoot
active ifare
there
management theisones nonode.
clear fsFragmentId=SNMP
that reason
the SNMP for this
agent alarm
expects.and
> add config
fssnmpMediatorName=1
especially if theattribute
alarm fsClusterId=ClusterRoot
system does not cancel fsFragmentId=SNMP the alarm automatically
Rule out the possible operation
fssnmpMediatorName=1
fssnmpAttributeType=NEattrs errors suchobject-class
fssnmpAttributeType=V2traps
attribute-list as an accidental issuing of a
attribute-list
within
2.
5. Log
Send reasonable
inSNMPto the switchtime(specified
(depending tointhe on
the the state
Identifying of Application
theagent
system), certain
Additional
"Disclaimer:
restart
fssnmpV2TrapId command,
FSSNMPNESpecificAttrs Thetrap orinformation
instructions
<trap physical
OID>" below
removal useknown
and either SNMP
unsupported
reinsertion of thefromSCLI
switch.the Note
SNMP
corrective
Information
Mediator to actions
field)
check should
using
whether ssh bethe
orperformed.
telnet
SNMP session.
agent is able to send response or
commands
that depending
object-name orfssnmpNEId=xxx
commands
on your hardware, from fssnmpNEId
the thereunsupported
mayxxx full
befssnmpMOID
other bash shell.abcde
functions Please
- like fiber
not.
carefully
channel If the SNMP
read
(FC) agent
the disclaimer
switches does
- integrated not
that respond
is in shown
the sameit should
when time
eitherwith out
enteringand alarm
thethe
2.If
Start thebyistraptrying contains
toby detect important
whether information,
the link shouldthe blade
implementation
be up the or not.
the Ethernet
Noteof that
3.
70003Verify if authentication-failure-trap is full
enabled using command:
unsupported
switch.
SNMP
3. Restart Inraised
these
Mediator SCLI
/SNMPMediatorcases,the SNMP
vendor
should be mode
operations Mediator.
updated.
by using ordone the
Note
the withbash
that
following thetheshell.
FC rules
SCLI Do
switch not
that may,
command: useas
define thea side
how
"link"
#show
Detailed refers
snmp-server to either hardwired
instructions to perform Ethernet
the above connections
actions are inlisted
the under Test
commands
effect,
the
set has cause
restart inrestarting
any otherof
managed-object context.
the Pleaseswitch.
Ethernet
/SNMPMediator check from the product
No specific actions are
backplane/midplane,
Instructions. orrequired.
Ethernet cables between switches and other
documentation
SNMP Mediator orresponds
from your whenlocalitcustomer
receives support
traps for
areactive moreofinformation."
a part the and
devices.
Output: During normal operation, all links between switches
1. A
The
However, possible
implementation.
4. possible
it shouldreason
values for
bebe for
notedpotential
NEId failure
attributes could
will bebe a power
typically failure
any of the
physical
other devices should up. that a broadcast storm is traditionally
1.
Fill
nodesLocate
switch.in a Use the
problem
present the backup-specific
following
report
in the and SCLI
cluster,oflikesend log from
command
it to the
your tolocation
check
local /mnt/backup/share/.
the
customer power status
support. of the
considered
snmp-server asenable
an indication the/CLA-0
existence or /AS-1
of a loop etc, in which can be found
the network.
2. See the backup
switch:
in "/etc/hosts" file onsummary
the active at cluster
the endmanagement of the log. node
Loops,
Try to rule
snmp-server however, theshould
outsystem-name possibility notSwitch-1-bi
be
thatencountered
a device at in theFlexiServer-based
other end of the link systems
is
3. Execute the following command to search the log contents for
since
down rapid
because
authentication-failure spanning of a power tree
trap protocol
feed failure.
disable (RSTP) For is used forinblocking
example, the
a host computer,
"ERROR"
show hardware and "WARNING" statements andshould
to identify thesuchfailed backup
Setting
use the fssnmpNEId
redundant links.state
following SCLI tonodeany
command
<nodename>
of these to values
check the network avoid alarms.
connectivity:
module:
4. If it is enabled, verify the community/user/group defined for the switch
<nodename>
Therefore, if the shouldalarm bedoes the namenot clear of switch, that is, in
automatically, thiscase of AHUB4
indicates that it
show
using networking
thefile command: interface runtime
cat
would <log be path> | egrep -i "ERROR|WARNING"
AHUB4-A-1-8.
there
#showare severe errors in the system configuration. In this case, include
running-config
additional
Executing debugging this command information
will display to ainformation
problem report of theand send it to your
interfaces.
4.
2. Refer
Watch toforthe any troubleshooting
other power fluctuations instructionsindicating for Backup. a failing power stage.
local
5. If customer
70007 alarm support.
is not desired ,disable authentication-failure-trap using
5. If the backup has failed before the log file was created, search the
If
the both devices are up and running, the next probable cause of a link being
commands:
syslog
3. Rulefor outthe thelatest "fsbackup"
possibility of someone entries.accidentally
Use the following turningcommand:
the power off of
down is a loose cable. terminal
Switch-0-bi#configure Check for a bad connectivity, especially if "link"
the whole chassis.
refers to a cable. Even snmp-server
Switch-0-bi(config)#no in a case of aauthentication-failure-trap
backplane/midplane link, check that
Note: The term "TCU-0" refers to the name of the node. The node names
used in the examples may not be valid for certain products. The actual
node names vary across products. The exact node names for each
product can be found in the product specific documentation.
This situation is most likely caused by a hardware fault. Contact your local
The node may restart, but this cannot be verified by HAS. The immediate
customer support to have the disk replaced.
task
is to ensure that services can be activated on remaining operational
nodes.
This
If thesituation can be available
node becomes caused bywhen
a node or network
restarted, HASoverload.
performsContact
recoveryyour
local customer support to get assistance in the analysis.
actions
and no user action is required.

Check
Disclaimer: the status of the nodebelow
The instructions by using usethe following
either procedure:
unsupported SCLI
commands or commands from the unsupported full bash shell. Please
1. Log in to the cluster.
carefully
2.
read Execute
the disclaimerthe following that is command
shown when to verifyeither that the Administrative
entering the unsupported state is
UNLOCKED and Operational state
SCLI vendor mode or the full bash shell. Do not use the commands in any is ENABLED for the node:
1.
show Check hasthat statethe air flows freely through the cabinet and the chassis.
other context. Please check from the product documentation or from
technical
2.
For If example:
the alarm>isshow persistent,
has state replace the faulty plug-in
managed-object /TCU-0 unit:
support for more information.
- Refer to the hardware maintenance documentation for detailed replacing
instructions.
Expected output:
1. LoginExecute instructions
the following for switches:
SCLI command to check the availability of the
-OBJECT
The details of the faulty plug-in unit (cabinet, chassis and slot) are found
DHCP A. Login ADMINISTRATIVE
instruction for AHUB2 OPERATIONAL
& AHUB4-A: USAGE ROLE
in the ApplicationDYNAMIC
PROCEDURAL Additional Info field of the alarm.
server a. bash_prompt#
and the associated telnet <switch_name>
TFTP server for (You alarms maywithneed
IAAItoTrivial
change Filethe
switch
Transfer name, replace this with the switch name from the alarm)
3.
/TCU-0 If thereUNLOCKED are numerous alarms ENABLED of this kind ACTIVEfrom several
- -onplug-in - units,
Enter password
Protocol (TFTP) get or putthe failure or server. those falling back default fabric
1.
check Reload the the image
air conditioning from andTFTP temperature in the network element (NE)
>
configuration:
equipment
3. If room.
show b.theUse
has state
the
state of managed-object
the node
"enable" is displayed
command to turn ason mentioned
the privileged above within 10
mode.
2. If the
minutes, problem remains, reload/DHCPD the image from the original source to the
TFTP
4.
then Checktheserver,if there
following andisreload any high
instructions the power
same
to reset image
surge the from the
because TFTP
of which server.
thebealarm
2.B.If Login
the instruction
recovery group for isAHUB3-A:
not providing thenode
service manually can
i.e. if the status skipped.
is
could
Else,
1. Login have
the HAS been
instructions raised.
recovery operations
for output,
switches: will be pending and waiting to verify
not a.matching
bash_prompt# the below telnet <switch_name>
then execute (Youthe may
next need
step. to change the
3.
the Ifnode
the problemis isolated remains,is, compare the md5sum of the imageorfile on the
switch name, replace(that this with it cannot
the switch accessname anyfrom databases
the alarm) other
TFTP
5. If the server
problem to that remainsofthis
the case,
original source.
shared
OBJECT A.
Enter Loginresources).
instruction
password
ADMINISTRATIVE In for after
AHUB2 applying
the userthe
& AHUB4-A:
OPERATIONAL must instructions,
manually
USAGE verify
pleasethat
ROLE the
contact
your
node local
is down:customer support.
a. bash_prompt# telnet <switch_name> (You may need to change
AHUB3#
PROCEDURAL
4. If the md5sumsDYNAMIC are the same, the original image file is also corrupted,
theb.switch For BI, name,
start basereplace this with
ethernet CLIthe switch name from the alarm)
1.
in Login instructions.
which
Procedure case for contact
ATCA your local customer support in order to get the valid
Hardware:
/DHCPD EnterUNLOCKED
AHUB3# password
base-ethernet ENABLED ACTIVE - - -
image.
- PressA. Login thestart instruction
hot fabric
swap etherenetfor AHUB2
button of the &blade.
AHUB4-A:
For> FI,
a. bash_prompt# telnet CLI
<switch_name> (You may need to change
-3.Remove b. Use
AHUB3#
Execute the the blade
"enable"
fabric-ethernet
the following and wait
command for a while.
to turnto on the privileged mode.
-the Ifswitch
5.Re-insert
the md5sums name,
the blade. differ, SCLI
replace something commands
this with the switchunlock
is continually name the DHCPD
from
corruptingthe the
alarm)server:
image
set
during hasEnter unlock
the password
transfer managed-object
from the original /DHCPD source to the TFTP server. If possible,
1.
2. Check
B. Login
Use the scenariosfor
instruction listed under
AHUB3-A: thethatblocktheMeaning of the alarm. Ifisthe
replace
Procedure > the the
instructions
suspected
for BCN
below
component,
Hardware:
to check
for example,
CPU usage
cable, switch
threshold
unit etc.
not
alarm
set toa.
4. Execute isbash_prompt#
raised,
abnormally the it is
low.
following cleared
telnet
SCLI whencommandsthe scenario
<switch_name> to is over.
(You
restart may
the need to change
DHCPD server:
-the
Log b.
inUse to name,
thethecluster.
"enable" to turn on privileged mode.
set switch
has restart replace this with
managed-object the switch name from the alarm)
/DHCPD
6.
2. If the
- Restart
Execute problem
the the faulty remains
followingnode command
byafter
using applying
the the instructions,
following
to AHUB2:
check thatSCLI please
command:
the reported FRU contact
is
A. Enter
CPU password
usage threshold check for
your
> B.
set Login
local
hardware instruction
customer restart fornode
support. AHUB3-A:
<node-name>
plugged AHUB3#
device-name#
5. Check
Fora.alarms in and active:
show monitor cpu-usage
1. that with
bash_prompt# IAAItelnet
the affected Switch
unit Issue,
is healthy
<switch_name> checkand/var/log/master-syslog
plugged
(You mayinneed
properly.
to changefor
show
Switch b.tools
Forthe system-status
BI,unitstartif base brief list CLI:
ethernet
2.
the
4. Restart
switch
Once name,
theusage node needed.
replace
has on been this with
reset, the switch
logAHUB3-A:
in towait name
the forclusterfrom the alarm)
B. AHUB3#
related
3. CPU
Replace
Enter errors. thebase-ethernet
Based
password unitthreshold thischeck
if needed. for
information, next and
DHCP setlease
the node timeto
isolate
For example:
Forproblem
FI, start
AHUB3(blade_mgmt)#
expiry > show fabric tools system-status
etherenet
show CLI: these
snmp-traps brief list
4. If
state the
AHUB3#
by using theremains
following after following
procedure: instructions, please contact
alarm
your AHUB3#
to
local clear.
customer fabric-ethernet
support.
>
1. set
Checkb.
has For BI,
isolate start base
managed-object ethernet CLI:
<node-name>
Expected CPUthat
C. AHUB3# output:
usage the threshold
affected unit
base-ethernet check is healthy
for AHUB4-A:and plugged in properly.
2.
6. Restart
Please
Use the
wait...
the
device-name(enable)#
If the alarm unit
instructions
does if not
necessary.
belowshow
clear, to
the check
cpuload
execute thatthethefollowing
memory SCLI usagecommand
thresholdto is
5.
3. AfterForthe
Replace FI,the start
node unitfabric
is ifset etherenet
to isolationCLI:
necessary. state, HAS performs recovery actions
not
restartset to abnormally low.
that
4. AHUB3# fabric-ethernet
theIf the problem remains below after following
to checkthese
---------------------------------------------------------------------------------
3. Follow
Switch: the instructions CPU instructions,
usage: please contact
were
your
Node/
set pending.
local
A.hardware customer
Memory restart usage node support.
threshold <node> HW
check for AHUB2: HW Node RU Active
2. Use the instructions below for the switches to check that the port error
HW Unit
device-name#
A. For Location
AHUB2: show monitor ram-usage Type State State State Alarms
monitoring
Note: If threshold
thevalueisolation is setofcorrectly.
state a node is set without actually verifying the
---------------------------------------------------------------------------------
device-name#show
Note: The of the cpu
<node> utilization
can be taken from the fsipHostName="
node is down,
B. Memoryofusage
" parameter then serious
the Managed threshold data corruption
checkfield
Object may
for AHUB3-A: occur.
in the alarm information.
A. Foryour
Contact AHUB2: local customer support to find out the reason
ADPE2-A
Not supported.
B. Command
For /chassis-1/power-slot-1
AHUB3-A: ADPE2-A ON forN/A failure.N/A 0
not supported
AHUB3-<Base/Fabric>#
7. If the above steps cannotshow resolve process cpu
the situation, contact your local
The C.HW
customer Memory State
support. ON indicates
usage threshold thecheck FRU displayed
for AHUB4-A: under HW Unit section is
B. For AHUB3-A:
active.
Not
C.AHUB3
Forsupported
AHUB4-A:
(blade-mgmt)# show snmp-traps
The HW State OFF indicates
device-name(enable)# show thecpuload
FRU displayed under HW Unit section is
inactive.
3. Follow the instructions below to check memory usage:
C. For AHUB4:
If the HW Unit is not displayed in the output but it is configured (present in
1. Determine the sensor number from the "Sensor" field of Identifying
Application Additional Information section of the alarm.

2. Determine the affected field-replaceable unit (FRU) from the "Position"


field of Identifying Application Additional Information section of the alarm.
Note: The terms "CFPU-0", "CFPU-1", "CSPU-1" in the instructions below
refer to node names in a cluster and node names may vary across
3. Determine the severity of the alarm from the "Severity" field of the
different products. The node names used in below examples may not be
alarm.
valid for certain products. In such case use node name that is applicable
for the specific product. For more details refer product documentation.
1.
4. Check
Please that thethe
follow affected unit isinhealthy
steps given and plugged
the hardware in properly.
documentation to check
2.
theRestart
sensorthe unitusing
value if necessary.
the sensor number and FRU name.
The following instructions are only applicable for the causes "No bootable
3. Replace the unit if necessary.
media", "PXE (preboot execution environment) Server not found", "Invalid
4.
5. If the problem
Correct remains
the sensor after
value by following
replacing these instructions,
the FRU please
as mentioned contact
in the
boot sector", and "Timeout waiting for user selection of boot source":
your
hardwarelocal documentation.
customer support.
1. Determine the sensor number from the "Sensor" field of Identifying
Application Additional Information section of the alarm.
6. If the problem exists even after following these instructions, contact
1. Try to restart the unit.
your local customer
2. Determine support
the affected with your observations
field-replaceable unit (FRU) on from
the sensor values.
the "Position"
field
2. of Identifying
Check that aisbootApplication
media Additional Information section of the alarm.
exists.
1. If the alarm
Acknowledge raised
automatic: when
Onceyouthetry to log
alarm is in at bootitphase,
cleared, will be check that
automatically
the caps lock is off.
acknowledged.
3. Determine the severity ofboot
the alarm from the "Severity"PXEfieldserver
of the is
2. Check
Check theif thevalidity
alarm ofis the
being mediaby
caused (disk is bootable,
another user.
alarm.
available).
3. If the problem remains after following these instructions, please contact
your local customer support.
1.
4. Determine
Please thethe
Executefollow sensor
the steps
following number
given
SCLI infrom
commandthe theto"Sensor"
hardware field of Identifying
checkdocumentation
the status to check
of the node (if
Application
the
it hassensor
booted Additional
value
up orusing Information
not): the sensor section
numberofand theFRU
alarm.name.

2.
5. Determine
If the
show themanaged-object
problem
has state affected
is field-replaceable
permanent, check unit (FRU)
the status
/<value> of the from
power the
units"Position"
in
field of Identifying
accordance to the Application
hardware Additional Information
documentation and section
replace them of ifthe alarm.
1. Check that the affected FRU is healthy and plugged in properly.
necessary.
Provide
2. Restart anthe
appropriate value for the managed object.
FRU if necessary.
3. Determine the severity of the alarm from the "Severity" field of the
3. Replace the FRU if necessary.
alarm.
6.
4. If
5. If the
the problem
problem exists
remains
persistseven after
after
after following
following
following these
thesetheinstructions,
instructions,please
instructions, contact
please your
contact
contact
local
your customer
local support
customer with
support.
your local customer support. your observations on the sensor values.
1.
4. Replace the battery
Please follow the stepsof the affected
given in theunit.
hardware documentation to check
2.
theIf the
Acknowledge
The problem
following remains Once
automatic:
instructions after following
alarmthese
the applicable
are only instructions,
is cleared,
for will beplease
the itcause contact
automatically
"Local
your
boot local
sensor
acknowledged.
error customer
value using
while support.
the
executingsensor
fromnumber
flash": and FRU name.

5.
1. Find
1. another
Restart thethe
Determine corresponding
affected
sensorunit. Theorposition
number related
from alarm
theof in the
the affected
"Sensor" system
field unit intothe
resolve
of Identifying
the issue
example scope
below (For
is example: FRU, Shelf,
"Position=/chassis-1/slot-7":Cabinet,
Application Additional Information section of the alarm. System, Site).

6.
2. Correct
a) Check the
Determine thenode
temperature
the name of
affected issue by checking
the Replacement
Field affected if any
unit. For
Unit of thefrom
example:
(FRU) fan is broken,
the
air filter is dirty, air flow is blocked, or ambient temperature
"Position" field of Identifying Application Additional Information section is out of range.
of
THE
show RECOVERY
hardware state listPROCEDURE VARIES BETWEEN HARDWARE
the alarm.
ENVIRONMENTS,
7. Correct the temperature
<...> PAY ATTENTION IN BELOW
issue by replacing PROCEDURES!
the FRU as mentioned in
the
CSPU-1 hardware: node documentation.
available /cabinet-1/chassis-1/piu-1/addin-7/CPU-1/core-
3. Determine the severity of the alarm from the "Severity" field of the
0,1,2,3,4,5,6,7,8,9,10,11
alarm.
Disclaimer
8. If the problem: The instructions belowfollowing
make use theofinstructions,
either unsupported SCLI
<...>
1. Check that theexists even after
shelf managers are appropriately plugged contact your
in, and are
commands
local customer or commands
support with from
yourthe unsupportedonfull
observations thebash
sensorshell. Please
values.
running
4. Please in follow
a healthy state. given in the hardware documentation to check
the steps
carefully
b) read the disclaimerFor thatexample:
is shown when either entering the SCLI
theRestart
sensorthe valueaffected
using unit.
the sensor number and FRU name.
unsupported
Acknowledge vendor
automatic:mode or the
Once the full bashisshell.
alarm Do not
cleared, use
it will betheautomatically
2. Check the configuration of shelf managers (whether username exists,
commands
acknowledged.
set hardware in restart
any other node context.
CSPU-1 Please check from the product
network
5. Correct configuration
the sensor etc.). by replacing the FRU as mentioned in the
value
1. Check that allorthe
documentation fromFRUslocalare pluggedsupport
customer in into their correct
for more places based
information.
hardware
on the documentation.
intended hardware configuration.
Resetting
3. If the problemCSPU-1 [ok] even after following the instructions, please
persists
2. If the
Note: Theproblem remains
terms "CLA" and after applying
"CFPU" referthe
to instructions, please contact
a node that contains
contact
6. If the your
problem localexists
customer
evensupport.
after following these instructions, contact
your
2. If localdoes
centralized
this customer
O&M not and
fix support.
cluster
the issue,management
try to reflash functionalities.
the embedded The node
software
your local customer support with your observations on the sensor values.
names used in unit
of the affected the examples
by following maythenot be valid for
instructions oncertain
productproducts. Actual
documentation.
node names vary across different products. Exact node names for each
Acknowledge automatic: Once the alarm is cleared, it will be automatically
product
3. If the can problembe found
remainsin the product-specific
after following these documentation.
instructions, please contact
acknowledged.
your local customer support.
Instructions for ATCA HW:
The following instructions are only applicable for the cause "Network
To
boot recover
error": the CLA node from Disk Out Of Sync (DOOS) on the ATCA
HW, perform the following steps:
1.
ThisConnect
error may to the activeifCLA
happen node via
the cluster Securenode
manager Shellis(SSH).
not yet available
2.
to Enable
provide the PXEPreboot
service,Execution
for example, Environment (PXE)restart.
during system boot.
Use the following SCLI command to enable PXE boot:
1. Determine the sensor number from the "Sensor" field of Identifying
Application Additional Information section of the alarm.

2. Determine the affected field-replaceable unit (FRU) from the "Position"


field of Identifying Application Additional Information section of the alarm.
Disclaimer: The instructions below use either unsupported SCLI
commands, or commands from the unsupported full bash shell. Please
3. Determine the severity of the alarm from the "Severity" field of the
carefully read the disclaimer that is shown when either entering the
alarm.
unsupported SCLI vendor mode or the full bash shell. Do not use the
commands in any other context. Please check from the product
Disclaimer:
4. Please follow The instructions
the steps given below in theusehardware
either unsupported
documentation SCLI to check
documentation or from your local customer support for more information.
commands,
the or commands from the unsupported full bash shell. Please
carefully
voltage sensor read the value disclaimer
using the that is shown
sensor number whileand entering
FRU name. the unsupported
NOTE: For some of the application additional information fields (AAI),
SCLI vendor mode or the full bash shell. Do not use the commands in any
there is no corresponding recovery instructions. The events with possible
other
5. context. Please value checkby from the product documentation or from your
1. Correct the sensor
AAI_MTP3_EVENT_PC_CONGESTED:
recovery are only listed below:
replacing theTo FRU reduceas mentioned
congestion, in the
add
local
hardware customer support for more information.
documentation.
more links to the affected point code. If required, introduce more linksets
to theevent
If the affected raised point is code.
for an issue in the configuration
AAI_SCCP_USER_OUTOFSERVICE
6. If the problem exists even after following - Thethesealarminstructions,
is raised for contact the
addition/modification/deletion, then first check the existing configuration.
following
your local reasons:
customer support with your observations on the sensor values.
2.
The AAI_MTP3_EVENT_DPC_CONGESTED:
correct configuration can be verified fromTo reduce congestion,
customer documentation. add
NOTE:
1. The remote For some networkof theelement
application (NE)additional
is down, information
so there is no fields (AAI) there
more links to the affected point code. If required, introduce more linksets
will not be any corresponding
communication
Acknowledge between the local recovery NE and instructions.
remote NE. Only the events
to the affectedautomatic: point code.Once alarm is cleared it will be automatically
having
2. The remote
acknowledged. possibleNE recovery
is up but arethe listed
M3UA below:
association configured between the
1. AAI_SNM_INIT_CONFIG_NOK:
local NE and remote NE is down.
If the issue
Check the entry still persists,
for "SGWNetMgr" please collect inthere signalingon
/etc/hosts diagnostics
the cluster. symptom
An issues
entry or
The
3. The operator
M3UA should association checkiswhether up but Subsystem are any Test interoperability
(SST) and Subsystem
1.
MUSTAAI_MTP3_LINK_DOWN:
report and
be contact with
present with alocal valid customer
IP address. support/technical
For example: support.
configuration
Allowed (SSA)mismatches messages are between not exchangedthe SGW entity properly andbetween
the peerthe nodes.
local
Refer to Customer documentation chapter "Signaling symptom collection"
NE
a) and remote
Due to signalingtransmission NE. failures or high error rates on the link, it can also
to collect
169.254.0.10 symptom report.
SGWNetMgr.internalnet.localdomain SGWNetMgr
1. SSP
4. AAI_EMTP3_UNEXP_SLTA_RECV: This event is raised when the
go to itsmessage unavailable is received
state. Rectifying from the the remote NE because
transmission problem the SCCP
can bring
signaling
subsystem linkin test
the message
remote (SLTM)
network or signaling
element is down linkortestunregistered.
back
If the the entry link is tonotitspresent,
available state.contact your local customer support.
AAI_MTP3_EVENT_PC_INACCESSIBLE
acknowledgement (SLTA) please is enabled and an andSLTA message is received
b) If the link is inhibited, this alarm can be raised. The user can uninhibited
AAI_MTP3_EVENT_DPC_INACCESSIBLE:
before
Recovery sending an SLTM. This error occurs if SS7 links are not configured
the
2. link to instructions:
make the link available.
AAI_SNM_SERVER_IPPORT_NOK:
1. This
properly.
Check alarm
The
if remoteis raised
operator NE when must
is down. all the
check Bring links
the up configured
link remote for
theparameters NEat the
toMTP2 self point
make level
activecode
and
a)
are
MTP3Check
disabled.
level.that To theclearport the used by SGWNetMgr
alarm, one must enable is 49231 oneand or morenothing linkselse.
to
communication
2. AAI_MTP3_ROUTE_DOWN: with the local NE.
b)
makeCheck the ifpointthe IP code address
accessible. and port intended to be used by the
The
a) Due switch
SGWNetMgr to the may have a different
unavailability of theconfiguration
linkset, this alarm than what couldisbe configured
raised. in
2. AAI_EMTP3_INVALID_LINKSET_ID:
Execute This eventthe is status
raised of when the
the
is not beingthethe
configuration
Bringing used
following
linkset directory.
by toanyitsSCLIavailable
other
command
program. state will to check
Use clear
the the alarm.
following command
the M3UA
toand
2. If all ID
linkset
association theinlinks the
between are down
received the due NE
message
local to transport
andis invalid.
remote errors,
The
NE: rectifyingmust
operator the errors
check the
If
b) the
checkIf thealarm local
this: doesnode notreceived
get cleared, a TFP, thethis user will have
alarm couldtoberestart raised. theWhenswitch the
bringing
configuration up the of links
the mightand
SGW make the the peer point code accessible.
nodes.
manager
local nodeprocess receives (inacase
TFA of foraasoftware
particularfault), or the switch
destination PC, this (inalarm
case will of a
show
hardware
be signaling
cleared. fault). ss7 association all
netstat
The
3. If raising
TFP -npis -t of-lthis
received
AAI_EMTP3_INVALID_SIO: | grepalarm -i <IP
from theaddress
indicates remote
This that of
node
eventSGWNetMgr>
the digital
and
is the
raised signal
route
when processor
towards
erroneous device
the SIO
is no
parameters
Sample longer
Destination Output: functioning
PC
are is down,
configured properly.
the Destination
between As a
tworecovery
PC
nodes.can action,
go
The to the
its
operator proxy
inaccessible
must process
check
Internal
If the
This issue
command network still MUST connectivity
persists, NOT please problem
show collect
any from
the the
entries theexcept
activediagnostics
signaling switch manager
SGWNetMgr symptom
with the
in high
state.
the availability
Bringing
configuration the services
route uplevelwill
might try to
thereset
make the nodes faulty
Destination core. PC If accessible.
the core
report
same toand
IP the switchat
contact
address and
MTP3
management
your portlocal as
ofinterface
customer
the ones
local
----------------------------------------------------------------------------------------------------
node can also
support/technical
assigned for
and
be peer
the nodes.
root
support.
SGWNetMgr. cause Refer forto
becomes
--------- functional, then this alarm will be automatically cleared.
the alarm. documentation
Customer In this case, verify chapter that the used network
"Signaling symptom configurations
collection" toare not
collect
If the
issue
M3UA
The issue
alarm still still
Association
does persists,
persists, not please
require please collect
any collect
direct the
theaction signaling
signaling as diagnostics
diagnostics
the mirroring symptomsymptomis
application
blocking
the
3. signaling access symptomto
AAI_SLM_STACKHNDLR_SERVER_IPPORT_NOKthe switch
report. management interface.
report and the contact DSPyour local customer support/technical
----------------------------------------------------------------------------------------------------
running support. Refer
the to
a) Checkinthe
Customer Signaling
documentation
coreGatewayand cannot
chapter Units be(SGUs)
"Signaling
controlled. within
symptom
Aftereach some SGW
collection"
time,fragment
to collect
---------
system willfollowing
synchronize the DSP
1. the
in
the
UseConfiguration
the
signaling symptom
SCLI
Directory
report.
command andcores. to restart
ensure that the switch manager
ports defined are
association id : 1
correct and available on the cluster as explained in 2.b above. The IP
local-as-name
set has restart managed-object : /SwitchManagerLAS1
addresses to
Disclaimer : The be used for each
instructions belowSLMmake type (foruseSCCPof either and ISDN) are SCLI
unsupported
local-ip-addr : 10.22.115.31
defined in /etc/hosts
commands or commands in below from format:
the unsupported
local-server-port
2. Verify from the used :
network configuration2909 that nothingshell.
full bash Please
is blocking the
<ip-address>
carefully read <SCCP
the RG
disclaimer name>.internalnet.localdomain
that is shown when either <SCCP
entering theRG SCLI
vrf-name
access from the switch :
manager to default
the switch management interface.
name>
unsupported vendor mode or the full bash shell. Do not use the
node : /EIPU-0
commands
remote-as-name in any other context. : Please MSS1 check from the product
1.Below
3.
ForRestart are thethe different
switch bylocal recoverythe
following actions to be platform taken based on error type.
example:
documentation
remote-ip-addr or from : customerhardware support for more information.
10.102.232.130
specific
instructions.
remote-port
0 - No recovery needed : 2905
169.254.1.88
Note: The term "CLA-0" SCCPSGU-1-0-1.internalnet.localdomain
refers SCCPSGU-1-0-
exchange-mode
1 - Connect to Packet Timing : toUnit the name
usingdoubleof the node. The
the serial console and review
node can be
1
any node in the cluster. :The node names used in the examples may not
sctp-profile
the local eth2 interface status/configuration. If the local eth2 SS7
be valid for certain products. :Actual nodeipsp
communication-type names vary across different
configuration
Note: RG Exact andisRU correct,
names connect
are for to Ethernet
deployment SwitchFor
specific. andexact
review the
products.
admin-state node names : eachenabledproduct can be found innames
the refer
configuration
product documentation. for the interface 0/27
product-specific
role documentation.
2 - Check external :SFP cablingserver connections for the Active external SFP
priority
3 all
- Check external :
SFP cabling 1connections for the Backup external
For SLMs, the IP
1. Log in to the cluster :using the user addresses used are the same
account with as thethe Node privileges
required IP
status
SFP connection_down
addresses
to execute the below mentioned SCLI commands.
----------------------------------------------------------------------------------------------------
on4which
- Check theyexternal
run. SFP cabling connections for the Main external SFP.
---------
If cabling is working properly, consider replacing the SFP transceiver
2. Lock the node that possibly has the faulty disk with following SCLI
If 5 - issue
the Checkstill external
persists, SFP cabling
collect theconnections
signaling diagnosticsfor the Backup symptom externalreport
command:
Note: IfThe field is "node" in above output can bereplacing
any nodethe name
SFP.
and contact cabling your working
local customerproperly, consider
support/technical support. SFPwhere the
transceiver
> set has service
signaling lock force is themanaged-object
running. The node/CLA-0 name may vary across different
6,7 - Reconnect SFP transceiver
1. Check according to the user (troubleshooting) guide that the
"bladeSelfTestComplete" simple network management protocol (SNMP)
traps is enabled.

2. Check if the blade self test status of the hardware unit has passed.
Execute the following instruction to check the connectivity:
3. If the blade self test status of the hardware unit has failed, or is
1. Check that the peer network element is connected by using traceroute,
pending, please contact your local customer support.
ping, or other similar utilities. The following commands could be executed:
a. Ping: ping <SESSION_DST_ADDRESS> -I
Fill in a problem report, and then send it to your local customer support.
<SESSION_SRC_ADDRESS>
b. Traceroute: traceroute <SESSION_DST_ADDRESS>

2. Check the log files (/var/log/master-syslog) for network-related fault(s).


Try to check what is wrong with the newly activated delivery. If there is no
possibility
3. Check theto continue thepeer
state of the software upgrade
network with the given delivery,
element.
contact the local customer support about the failure, and take the new
corrected version
4. Applications into
that use when
needs it is available.
the ICMP monitoring alarms to be suppressed
should use the suppress-mhop-bfd-icmp-alarm
1. Use the "top" command to see ClusterTraceManager'sfeature type in the
CPU usage.
NodeType.xml.
Execute the appropriate SCLI command to reduce the tracing level or stop
the trace sessions which could be the cause.

Use the following SCLI command to see the level at which the context
A single instance of the alarm may be due to some transient cause, and
(application) is writing into the buffer:
does not require specific actions. If the alarm from the same node
however
show tracing config src-context all-nodes
does not clear automatically , or keeps repeating, replacing the board or
the reported memory module (DIMM-X) should be considered.
1.
Use Determine
the following the sensor
SCLI command name from to the "Sensor"
change field of the Identifying
the level:
Application Additional Information fields section of the alarm.
1. Use the following command to get the alarm details:
set tracing src-level off/fine/finer/finest/all all-nodes process <process-
show alarm active filter-by specific-problem 70370
2. Determine
name> context the affected field-replaceable unit (FRU) from the "Position"
<context-name>
field of the Identifying Application Additional Information fields section of
1. If theIDapplication :specific
Alarm 417 SW delivery pre-check command does not
the
2. Ifalarm.
the automatic
problem still persists,
provide
Specific problem cleaning : 70370 upfill in a problemfree
-operations
MEMORYtoERRORS
reportthe anddiskcontact
space, your the local
customer
following the support.
manual steps can be tried.
Managed object : fsipHostName=CLA-
3. Figure out other related alarms in the system to resolve the issue.
0,fsFragmentId=Nodes,fsFragmentId=HA,fsClusterId=ClusterRoot
Check
Severity for any unused SW deliveries which could be removed to free the
: 5 (warning)
1.
4. If the application
Please follow thespecific steps given SW delivery pre-check
in the hardware command does
documentation not the
to find
disk
Clearedspace (in case:ano new SW delivery needs to be installed to the
provide
FRU, or automatic
other hardware cleaning moduleup operations
associated to with
free thethe FRU,
disk space,
which is themissing
system).
Clearing All deliveries : manualcan be listed with the following SCLI command:
following
with the help manual of the steps
sensor cannamebe tried. and the FRU name.
Acknowledged : no
> show
Ack. user sw-manage
ID list
: N/A
Check
5. Try to forcorrect
any unused the issue SWby deliveries
inserting whichFRU could be removed by to free the
Disclaimer:
Ack. time The instructions below usethe properly,
either unsupported orSCLI replacing
disk
the space
FRU or (in case aN/A
hardware
: new SWassociated
module delivery needs with to be
the FRU installed
as to the in
mentioned
commands,
A currently
Alarm time or commands
active delivery
: 2015-03-31frombe
can the unsupported
listed with the EEST
12:40:40:576 full bash SCLI
following shell. command
Please
system).
the hardware All deliveries
documentation. can be listed with the following SCLI command:
carefully
(this
Eventone type should not :be x5removed):
(equipment)
read the disclaimer that
Application :list is shown when either entering the unsupported
fsClusterId=ClusterRoot
> show
6. If the sw-manage
problem remains after following
SCLI
> show
Identif vendor
sw-manage
appl. mode
addl. infoor the
current
: fullallbash
Class: shell.these
correctable, DoAffected
notinstructions,
useMemory: please contact
the commands in any
Disclaimer:
your local The instructions
customer support. below use either unsupported SCLI
other
DIMM-2 context. Please check from the product documentation or from your
commands,
A currently active or commandsdelivery can frombe thelisted
unsupported full bash SCLI
with the following shell. command
Please
local
A delivery
Appl. addl. can
infothe be disclaimer
removed
: Error with
rate: the following
10/24h, AffectedSCLI command:
location:
carefully
(this
NOTE: one read
should
This alarm notwill be removed):
also be that
raisedis shown
in case when a either
hardware entering
entity the
is present
customer
DIMM ID=2, support
Channel for moreID=3mode information.
unsupported
but not responding. SCLI vendor To identify or the
this, full bash
examine shell.
all the Do not
alarms use the
raised for the
> delete sw-manage in unit.
any other delivery
context. <delivery
Pleaselabel>
--------------------------------------------------------------------------------
commands check from the product
> show
same sw-manage
plug-in In current
case all
this hardware entity is left out intentionally,
Disclaimer:
Note: The output The instructions
displayed by belowthe showuse either
hardware unsupported
inventorySCLI list brief this
documentation
alarm can be or from your local customer support for more information.
ignored.
commands,
command
2. Check
Figure out inor
for the
any commands
instructions
unused frombelow the refer
configuration unsupported
to the from
snapshots nodefull bash
andMO
created shell.
board Please
thenames
byfield user.
of theTheof
A delivery
carefully canthe
read benode
the removed
disclaimer
that has
with
that
memory
the
is the
shown
errors
following when SCLI the
command:
either entering the
a cluster.
snapshots
alarm. For Thecan
example,node
be and
listed
from board
with
step the names
# following
1, vary between
SCLI
affected command:
node different
is CLA-0. products.
1. This alarm indicates
Acknowledge automatic: that the alarm signalingcleared,
connection it willcontrol block
unsupported SCLI vendorOnce mode or theisfull bash shell. be
Doautomatically
not use the
resource
> delete
acknowledged. has reacheddelivery
sw-manage the signaling<delivery connection
label> congestion limit defined in
commands
Out
>
3. showof memory
Figure in any
snapshot
out otherclass
situations
listall context.
as detected Please bycheck
this
typealarm from the the Identifying
are product
rare. If this alarm
the system bythe theerror
parameterand memory from
"sccp-connections-congestion-threshold"
documentation
is
Application or from your local customer support for more information.
which
2. Check is configured
for any unused as part of product deployment
configuration snapshots created data. Inby suchtheauser. The
raised,
Notice
Additional then
that thethesnapshots
Information basic recovery
field created mechanism
ofsignaling
the automatically
alarm. For isexample,
to during
restartfrom the plug-in
delivery
stepby unit.
installation
# 1,
situation,
snapshots thecan number
be listed of idle
with the followingconnections
SCLI as specified
command: thethe
For
Before
are
error CONNECTION_CONTROL_BLOCK
automatically
class is removed
"correctable" when
and the thememory case:
delivery is is removed
"DIMM". (so no need to
"Idle" sub-field in the Application Additional Information field will be less
doing
remove so,them
in order for your local customer support to investigate the
manually).
than
> show or equal
snapshot to the difference between the parameters "max-sccp-
listall
1. This
problem
4. Use the alarm indicates
following SCLI thecommand
signaling connection
to clear the control alarm and block resource
toThe
verify if and
connections" and "sccp-connections-congestion-threshold".
has reached
further,
Snapshots
how soon please
thecan the be
alarm maximum
collectremovedthe data
reappears. limit
with asdefined theininstructions
perfollowing
the the SCLI
system by thebelow:
listed
command: parameter
parameters
Notice "max-sccp-connections"
that the snapshotsIncreated and "sccp-connections-congestion-
"max-sccp-connections". such aautomatically
situation, the during number delivery
of idle installation
signaling
threshold"
are are defined
automatically removed as part when of deployment
the delivery data.
is removed (so no need to
connections
1.
>
set Determine
delete
alarm snapshot
clear asthe specified
affected
config-name
alarm-id by the Replacement
Field
<alarm-id> "Idle" sub-field
<name of snapshot> in the
Unit (FRU)Application
from the
remove
Additional them manually).
"Position" Information field is 0. It means that all the signaling connection
2. The signaling connections that are triggered to be closed as specified in
control
field block
If theofalarm
the isresources
Identifying are utilized
Application
still not cleared, contact andcustomer
Additionalno more connections
Information
support fields will be of
section
services.
1. This alarm indicates that there are signaling connections dropped. Use
the product documentation to check the signaling connection drop.

2. If the value of the "Disturbance Identifier" is "ROUTE_FAILURE" and


there is only one instance of alarm 70387 present in the active alarm
1. This alarm indicates that the SCTP buffers for the association is full and
database, then it does not imply that there has been only one SCCP
signaling messages are dropped. With this, there will be impact to the
connection dropped from that DPC. There is optimization done in the
signaling connections and thus degrading the KPI. Use the product
software to raise the alarm only once in a duration of 5 seconds. This is
documentation to check the statistics related to signaling connections.
done to reduce the overhead on the system and to avoid raising the same
"Disclaimer:
duplicate alarm. The instructions below make use of either unsupported SCLI
2. If the alarm is raised frequently, then actions are needed by the
Commands or commands from the unsupported full bash shell. Please
operator.
carefully
3. If the value read the of the disclaimer
"Disturbance that isIdentifier"
shown when either entering the
is "RLC_FAILURE", then SCLI
unsupported
consider enhancing vendor the mode SCCP or the timer full Tbash shell. in
(interval) Dothe notSCCP use the timer profile
3. Consider increasing the number of associations in the association set.
commands
that is assigned in any to other
the SCCP context. Please check
destination point code from the (DPC) product specified in the
1. This
This willalarm
help in indicates
distributing that the there load is inconsistent
and reduce the signaling
chances configuration
of bottleneck
documentation
Identifying Application or fromAdditional local technical Informationsupport(IAAI) for more field. information."
between
in the SCTP the transmit
local network buffers. elementFor M3UA/SS7 and the remote protocol, network the SCLI element. Also,
this alarm will be
administration raised only
commands arewhen the signaling object in question is
the following:
1.
4. This
Execute alarm indicates that there are signaling to findconnection refusals. This
activated
a. add signaling atthe both following
the association
ss7 local SCLI and commandremote
id <number> network the SCCP timer
elements. profile
can
assigned be understoodto the SCCP by checking
destination thePointsubfields Code:"ConnectionAttempts",
b.
This setalarm signaling notifies ss7the association of anidautomatic
user"ConnectionFailed" <number>protection action the system
"ConnectionSuccess" and as specified in the
2. Verify and align the signaling configurations as specified in the
takes,
Application
show and by
signaling itself,
Additionalsccp generally
Information
destination-point-code doesfield. not need<destination user actions.point Determine
code from
Application
In case increasingAdditional the Information
associations subfields
in thenumber: "ConfiguredParams"
association set is andto
other
name> temperature-related alarms (Alarm 70297) thenot able
scope of the
"ActivatedParams"
overcome the SCTPbetween association the local congestionand remote network elements for the
thermal
2. Whencondition the alarmcausing is active, this the alarm,
SCCP andstack usealarm, then this
its instructions
instance will logto
concerns
correct the
statistical
specific
very slow objectprocessing as specified of signalingin the Identifying
messages Application
at the peer Additional
network element.
condition.
information
5. Use thethe However,
such
following as the if the
SCLI thermal
number
command ofcondition
connection
to find is limited
the attempts
SCCP totimer
the component
towards values the for a
1. Check
Information
So, such cases configuration
subfields need "ObjectID"
to be and
jointly theand state of the
"InstanceID". IP network.
the localEach SCTP
raising
destination
specific this
SCCP alarm,
point timer follow
code, the
profile: thenumber stepsinvestigated
below:
of connection
with
failures
customer
towards the
payload
support and protocol the support has its own teaminquiry of the SCLI peer networkcommand for its configuration.
element's vendor.
destination
For M3UA, point
the SCLI code, and the top
command is "showreasons (restricted
signaling ss7 up to 3 reasons) for
association".
Collect
1. Determine the masterthe syslogs
position and
of the signaling
the destination
affected syslogs
component on the
using local network
connection
show
For IUA,signaling
"show failures sccp towards
signaling sccp-timer-profile
isdn association". name point
<SCCP code. timerThethe system
profile "position"logs
name>
element
fieldinformation using
from Identifying the the following SCLI command:
this in theApplication
signaling syslogs Additional every Information
5 minutes. section of the
save
Find
alarm. symptom-report
out what wrongname
isfollowing with the <SymptomSCTP associationsreport name> include
in SCCP
the association subreport- set.
6.
2. Execute
Check the the used SCTPSCLI profile command
parameters to createand compare an timer
it to the profile
peer
syslog
The
with
endpoint's possible
enhanced SCTP reasons
timer for signaling
value
parameters. for T connection refusal
(interval):
It is recommended to use are: exactly the same
1.
2.
a. Use
Check
End-user theif following
the chassis
originated SCLI coolingcommands systemtoisfind out theworking
in proper association condition,identifiers
that
parameters
Share the on
results both with endpoints.
the local support.
in
is, the
b.
add the
End-user association
fans are
signaling congestion
sccp set
in place that are serving
running
sccp-timer-profile at fullname the
speed, local
<SCCP application
filters are
timer not server
clogged,
profile> or local
and,
Signaling
adjacent
c. End-user Gateway
components
failure Process(SGP)
and the filler and plates theare partner
in place. remote application
Disclaimer:
SCTP parameters The instructions
mismatchbelow typically use either retransmissions
causes unsupported SCLI at one
server.
d.
7. SCCP-user
Execute For the M3UA: originated
following SCLI command to assign the newly created
commands,
endpoint andorreceived commands duplicatesfrom the atunsupported
the other, that fullis,bash shell.
retransmissions Pleaseare
3.
e.
SCCP Verify
Destination
carefully timerthat
readprofilethere
address
the is
to an
the
disclaimer unobstructed
unknownSCCP thatdestination
is shown air flow point
when through code:
entering the chassis slot.
done before data is acknowledged. In this case you need the unsupported
to either
show
f. signalinginaccessible
Destination ss7 association all
SCLI
increase vendor mode or the TimeOut
Retransmission full bash (rto-min/rto-max)
shell. Do not use or thedecrease
commands in any
Selective
4.
g.
set
other If the
Network
signaling application
context. resource
sccp
Please additional
- check
QoS
destination-point-code information
unavailable/non-transient
from the product fielddocumentation
name shows
<SCCP thedestination
textor "fallback,
frompoint your
Acknowledgement
1. For object failures Delay (sack-period) so that the ackowledgement delay
For
when
h.
code
local
has
IUA:
Network
a
thermal
name>
technical
smaller
controller
resource -that
sccp-timer-profile-name
support
value than QoS
for
are
exits
the more
raised by
possibly
unavailable/transient
peer's information.
the to
due
<SCCP
retransmission
SCCP timerstack
manual profile>
timeout.
instance(s),the
intervention",
Or to be more
resolve
Accessthe
PowerThrottler
i.specific, just recovery
failure introduced group inconsistency
or the recovery in the unit configuration
specific tobetween the nodethe will
refer below:
j.database
show
be
8. in
Access
If thesignaling
an and
inactive
congestion
value
For LM_CONNECTION_FAILURE:
the
of isdn
thesignaling
state. association service
Unlock/restart
"Disturbance all usingthe
Identifier" appropriate
recovery
is group
"IAR_EXPIRED", or relevant
or the signaling
recovery
then
SCLI
unit
k. incommands.
Subsystem
consider thisenhancing
case. failure Thisthe will
SCCP be applicable
timer T+(IAR) for configuration
in the inconsistencies
local
due torto-min
"add" =
orthepeer's
"modify" SACK.PERIOD
operations only. Round
For TripSCCP
delete Time
operation,
timer profile
(RTT) + safety
do not
that
2.
l.is Check
Expiration
assigned
Disclaimer: if theof
to "admin-state"
connection
SCCP of
establishment
destination the associations
point timer
code in
(DPC) the association
specified in set
the that
To
margin. clear theThe alarm, instructions
follow thebelow procedure use commandsbelow: from either the
attempt
are
5.
m. If serving
no to undo
apparent
Incompatible
Identifying the
Application the
local configuration
external
user application
data reason
Additional change.
server,
is causing
Information or Next,
local
this
field. use the
SGP,
alarm, andthe following
the partner
hardware SCLI
unsupported
1. Check if the fullSNM bashRGs shell, HAS or unsupported
administrativeSCLI statecommands.
is LOCKED. Please
If so, then
command
remote
component
n. Hop counter toitself
application switchover may server
violation bethe is Active
faulty "disabled".
(sensor,SCCP Use
heatsothe that
sink) the
following
and Standby
may SCLI need unit (or new
commands
carefully
unlock
3. If the the read
alarm SNM the
stays RG.disclaimer
on permanently, that is shown when
ping/traceroute either can entering
help the
locating the
Active
to activate
replacement.
o.
9. No
Execute SCCP)
translation the
the will provide
associations.
forvendor
following an address SCLIthe For availability
M3UA:
oforsuch
command of
nature,
to findsignaling connections:
unsupported
problem. SCLI mode, the full bashthe SCCP
shell. Do timernot use profile
the
p.
a. Unequipped
assigned
commands
Use thetofollowing
the
in anyuser
SCCP other
SCLI destination
context.
command point
Please code:
to checkcheckthe from SNM theRG product status:
This
set
6. If hasalarm
signaling
the problemindicates
switchover ss7 that the
managed-object
association
persists even SCTP
id <>
after association
<RU
admin-state
following Name> the has
enabled terminated
instructions, please
documentation
4. If the IP network or from your local
conditions cannot technicalbe support
improved, for
increasemore information.
the error limit
abnormally.
contact
The
show your
signaling
signaling local
syslogs
sccp customer can be support.
destination-point-codeaccessed at the location
<destination point code
show
for has
thiswait alarm state managed-object /SGWNetMgr
Next,
For
name> IUA: until(path-max-retransmits).
"/var/log/signaling_syslog.log". the role of the previously Follow
In this
the Active case
procedure SCCP increasing stack instance
described
"assoc-max-
in service
step(5)
Instructions
retransmits" vary
should based on the service
be considered that has failed. Identify the
1. Check
becomes
for accessing the configuration
"HOTSTANDBY",
the signaling and andthe
syslog. itstoo.
state Changing
procedural of the state IP the isSCTP
network. profile
Each
"INITIALIZED". SCTPof theUse
which
The
association has
active failed
SNMwill by
RG
require observing
has the its the
administrative
association "Managed to bestateobject"
reset, as field
UNLOCKED
that is, in the
change alarm.
and the
payload
the
set
10. following
signaling
Execute protocol SCLI
isdn
the hascommandits own
association
following SCLI inquiry
toid check
<>
command SCLIthe
admin-state command
SCCP
to find RU
enabled
the for
SCCP its configuration.
status: timer values of
The
operational
admin-stateway to manage state
to "disable" this and
as ENABLED. external then alarm later set is described
it to "enable". in a separate
The
a following
specific
operator-provided
1. netconsole: SCCP is the
timer snapshot
profile:
instructions. of the output from the signaling syslog file:
For
show
3. Use M3UA,
has
thestate the SCLI
following command
managed-object
SCLI command is<RU"show toname> signaling
check if there ss7are association
"SCTP all".
b.
For Use IUA, the the following
SCLI SCLI command
command is "show tosignaling
unlock the isdn SNM RG:
association all".
ASSOCIATION
REDUCED
show
a. Forsignalingthe possible SUCCESS FAILURE"
sccpreason RATE alarms
(d) DPC
sccp-timer-profile for
in theSTATS: the associations
name <SCCP
Application Additional in the association
timer Information,
profile name>
Next,
set that switchback
are serving thethe SCCP local
PointCodeName=RAN,ConnectAttempts=20,ConnectSuccess=10,Connec so that the original
application server Active
and the SCCP
partner is remote
having role
first
set verify
has unlock if the netconsole
managed-object package /SGWNetMgris available in the OS.
2. Check
as
11. "ACTIVE".
application
tStarting=1,
Execute the used
server:
the SCTP profile
following SCLI command parameters and compare it to the peer
To check whether the netconsole packagetoiscreate included an in SCCP your timerbuild,profile
endpoint's
with enhanced SCTP timer parameters.
CREF_END_USER_FAILURE=4,CREF_END_USER_CONGESTED=2,C
value for TIt is recommended to use exactly the same
(IAR):
execute
2. If the SNM the following RG operationcommand stateas is root
ENABLED, in the full then bash make shell: sure the SNM
parameters
2.
show Foralarm objecton
REF_END_USER_ORIGINATED=1 active both
failures thethatendpoints.
filter-by are raised by the SS7
specific-problem <> stack instance(s), resolve
process is listening on the correct IP (as assigned to SGWNetMgr in
the
add just introduced
signaling sccp inconsistency
sccp-timer-profile in the name configuration
<SCCP timer between profile> the
# modinfo
/etc/hosts netconsole
file) and port (49231).
3. Correct
4.
3. If the consider
database
First, alarmand stays
the
the situation on for
tosignaling
make permanently,
service
each
corrective of the ping/traceroute
using appropriate
association
actions depending as actions
oron
per thecan
relevant
the help for
signaling
instructions
reasons
locating
SCLI
provided
failure
12. Execute the
commands.
in
available problem.
the
the alarm
in This
the
following willSCLI
definition
signaling be applicable
for "SCTP
syslogs
command for
as to configuration
ASSOCIATION
mentioned
assign the in inconsistencies
the
newly FAILURE".
previous
created
Sample
a. Use the output:following command to check the port where SNM is listening to:
due
When
step.
SCCP to the
"add"
timer or "modify"
"status"
profile oftoeven theoperations
one association
SCCP only. For
destination delete
ispoint
becomes operation,
code: do not
"asp_state_active",
4.
then If the
attempt the IP
tosystemnetwork
undo the conditions
cancelsconfiguration
this cannot
alarm. change. be improved,
Next, perform increasing a restart "assoc-
for each
modinfo:
netstat -nap could | grep not -ifind module netconsole
sgwNetMgr
max-retransmits"
SS7
4. RU
setCorrective -
signaling sccp one at
actions ashould
time. be
Before considered.
might also be needed
destination-point-code restarting, Changing
ensure
name in the <SCCP the
that SCTP
the
peer network below
destination profile
element of the
point if
association
precautions
there
code are
name> will
are require
inter-operabilityconsidered:
sccp-timer-profile-name the association
issues and suchto
<SCCP becasesreset,
timer that
needprofile> is,
to bechange
jointly the
Such
Sample Output:an output confirms that the netconsole OS package is not available.
The way to manage this external alarm is described in a separate
operator-provided instructions.

The way to manage this external alarm is described in a separate


operator-provided instructions.

The way to manage this external alarm is described in a separate


operator-provided instructions.

The way to manage this external alarm is described in a separate


operator-provided instructions.

The way to manage this external alarm is described in a separate


operator-provided instructions.

The way to manage this external alarm is described in a separate


operator-provided instructions.

The way to manage this external alarm is described in a separate


operator-provided instructions.

The way to manage this external alarm is described in a separate


operator-provided instructions.

The way to manage this external alarm is described in a separate


operator-provided instructions.

The way to manage this external alarm is described in a separate


operator-provided instructions.

The way to manage this external alarm is described in a separate


operator-provided instructions.

The way to manage this external alarm is described in a separate


operator-provided instructions.
The way to manage this external alarm is described in a separate
operator-provided instructions.

The way to manage this external alarm is described in a separate


operator-provided instructions.

The way to manage this external alarm is described in a separate


operator-provided instructions.

The way to manage this external alarm is described in a separate


operator-provided instructions.

The way to manage this external alarm is described in a separate


operator-provided instructions.

The way to manage this external alarm is described in a separate


operator-provided instructions.

The way to manage this external alarm is described in a separate


operator-provided instructions.

The way to manage this external alarm is described in a separate


operator-provided instructions.

The way to manage this external alarm is described in a separate


operator-provided instructions.

The way to manage this external alarm is described in a separate


operator-provided instructions.

The way to manage this external alarm is described in a separate


operator-provided instructions.

The way to manage this external alarm is described in a separate


operator-provided instructions.
The way to manage this external alarm is described in a separate
operator-provided instructions.

The way to manage this external alarm is described in a separate


operator-provided instructions.

The way to manage this external alarm is described in a separate


operator-provided instructions.

The way to manage this external alarm is described in a separate


operator-provided instructions.

The way to manage this external alarm is described in a separate


operator-provided instructions.

The way to manage this external alarm is described in a separate


operator-provided instructions.

The way to manage this external alarm is described in a separate


operator-provided instructions.

This alarm is raised with the intent to avoid flooding of alarms in cases
where the signaling objects are changing status frequently. The user must
refer to the "Instructions" section of those specific alarms that were raised
due to object status change.
1. Use the following SCLI command to check whether other alarms (for
example: 70399, 70397) affecting the D-channel in question are active:

show alarm active filter-by specific-problem <alarm ID>


Determine the affected hardware component from the Identifying
If so, perform the necessary actions according to their instructions.
Application Additional Information fields section of the alarm. If the
component reported is EXTERNAL DEVICE in position EXTERNAL
2. If the alarm is generated when you are activating the D-channel for the
INPUT in BCN hardware, then follow the steps in Section B, otherwise
first time, perform the following:
follow the steps in Section A.
1. Use the following SCLI command to list the eSW status of the NE:
a. Check the configured TEI/SAPI values (where the values of TEI and
Note: Shelf Manager (on ATCA) related operations such as upgrade,
show sw-manage
SAPI should be "0"embedded-sw statusend,
for both) at remote all and also check whether the
switchover, and restart involve re-initialization of the shelf System Event
D-channel is successfully connected to the switching network. See Site
log. This causes a one time heavy hardware event burst, which in turn
Use the output
Documents to investigate
for more in detail which FRU'S and Esw components
information.
1.
may Execute
triggerthe
thefollowing
Rate Limitercommand to restart
protection action,the
and switch:
alarm 70434. In these
may have failed in installation or activation. It is also advised to go through
set
cases hardware
the top restart
componentnode event
<node> rate remains low, when compared to the
the eSW log
b. Check the files related of
functioning to the
failed installation
remote end (foror example,
activationPBX).
task as this to
Refer
total event rate. It indicates that something else than the specified FRU in
provides
the relevantsupplementary
product information.
customer documentation of the remote end.
Note:
the IAAI The name
field of the node
is causing can be
the flood andobtained
alarm. Infrom thescenarios
these parameterthe alarm
fsipHostName
can be clearedof theno
and Managed
corrective Object
actionsfield in needed.
are the alarm information.
2.
3. Please referistogenerated
If the alarm the Embedded when software troubleshooting
the D-channel has alreadyinstructions
been in
the product documentation
successfully activated, it for details.
could be that after a successful activation, the
2. If the A
Section alarm is not cleared after restarting the switch, contact your local
fault
customerthat prevents
support. the functioning of the D-channel is most probably
located in the data link connection. The exchange terminal (ET) and
1. The hardware component in Identifying Application Additional
transmission equipment alarms can be used for locating the fault. Refer to
Information fields section of the alarm is the source of the flood.
the instructions of those alarms.
2. As a first-aid action, attempt to power off the faulty component
4. If the fault cannot be found with the above instructions, contact your
remotely;
local customer support.
this is optional. If this option is used, do take due precautions related to
the component role in the cluster not to accidentally shut down essential
1. Determine the affected Field Replacement Unit (FRU) from the
Identifying application additional information (IAAI) field in alarm record.
The "Position" field in IAAI indicates the location of FRU.

2. Use the following SCLI command to get the node name for the faulty
This alarm is an informative alarm indicating that a node has been
FRU:
(re)started. Check the restart caused by the reported sensor offset and
> show hardware inventory list brief
related detailed meaning. If the reported cause is internal to the node or
system and node restart occurs again and again, the node may be
The column "Node/Host" will give the node name for the corresponding
defective and needs to be replaced. For hints on what is causing restarts,
Following
entity as shown are theindifferent
the following recovery example:actions to be taken, based on the
check the alarm status in the cluster after the restart.
error type.
Actual Type Unused Node/Host Expected Type Admin-ignore Entity
Note: To check if the alarm is repeating follow the below steps:
ErrorType = 0, 1, 2:
1. Clear the alarm manually using below SCLI command and wait for few
The
HDSAM-APacket Timing N/A Unit internal monitoring process (monitd)
HDSAM-A will attempt
1. Determine the affected
minutes: Field Replacementno Unit (FRU) /chassis-1/AMC-1
from the
to recover the GNSS
BS2AM-A process[0],BS2AM-A
PTU-1-1-1 HWM process[1] no and SYNC process[2].
/chassis-1/AMC-
identifying
set alarmapplication
clear alarm-id additional
<alarm_id> information (IAAI) field in alarm record.
Therefore,
2 the recovery will be automatic. If the problem persists, consider
The "Position"
where field incan
<alarm_id> IAAI beindicates
found from thethelocation
activeofalarm
FRU.record.
restarting
BMFU-B the unit.N/A BMFU-B no /chassis-1/ft-1
2. If the alarm is raised again repeat the above step 1.
BMFU-B
2. Repeat
Use the the following N/A SCLI BMFU-B no /chassis-1/ft-2
3.
1. Ifrestart
the value same
of Switchfor at command
least is5 other
times.to get the node name for the faulty
To
BAFU-A the unit do theStatus
N/A following
BAFU-A steps: thanno OK, check that all fibre
/chassis-1/ft-3
FRU:
channel
BCNMB-B cables at the back of theBCNMB-B
LMP-1-1-1 chassis are properly no connected to their
>
If show
the hardware
problem inventory
persists after list brief the instructions,
following please contact your
corresponding
a. Determine the
/chassis-1/motherboard-1fibre channel
affected Fieldswitch modules. Unit (FRU)
Replacement from the
local
2. customer
If the problem support.
"Position"
BCNAP-B field ofstill
theN/Apersists,
Identifying replace
BCNAP-B the affected
Application no fibre/chassis-1/power-
Additional channel
Informationswitchfields
The column "Node/Host" will give the node name for the corresponding
module
section
supply-1 in the alarm.
of chassis.
1. If the
entity or value
FRU, of asPort
shown Status
in the is other
following thanexample:
OK, the reason for triggering this
3. If the previous steps
BCNAP-B N/A have not solved the situation,
BCNAP-B noof the alarmcontact your local
/chassis-1/power-
alarm has to be studied properly. If the cause appears to be
customer
b. To
supply-2 get support.
the node name for the faulty FRU,enter the following SCLI
in the application
Actual Type Unused software or configuration,
Node/Host Expected it has
Typeto Admin-ignore
be corrected. Entity
command
BMPP2-B : CFPU-0,SE-0 BMPP2-B
2. If the problem still persists, replace the affected fibre channel switch no /chassis-1/slot-
>
1 show hardware
module in the chassis.inventory list brief
HDSAM-A N/A HDSAM-A no /chassis-1/AMC-1
3. If the previous steps
BS2AM-A PTU-1-1-1have notBS2AM-A solved the situation, no contact your local
/chassis-1/AMC-
The
From column
customer "Node/Host"
the example,
support. displaysof
if the position thethenode name
faulty FRUfor is the corresponding
"/chassis-1/AMC-2",
2
entity
then as node
the shownname in thewould following
be example:
"PTU-1-1-1".
BMFU-B N/A BMFU-B no /chassis-1/ft-1
BMFU-B N/A BMFU-B no /chassis-1/ft-2
Actual
3. Check Type the Unused
status of Node/Host
the troubled Expected using Type the Admin-ignore Entity
BAFU-A N/A BAFU-Areference no following SCLI
/chassis-1/ft-3
command.
BCNMB-B Check also whether
LMP-1-1-1 BCNMB-B the reference hasno valid parameter values
HDSAM-A
(priority and SSM). N/A HDSAM-A no /chassis-1/AMC-1
/chassis-1/motherboard-1
BS2AM-A PTU-1-1-1 BS2AM-A no /chassis-1/AMC-
BCNAP-B N/A BCNAP-B no /chassis-1/power-
2
> show clock-sync server PTU-1-1-1 hwclock dpll-ref-status
supply-1
BMFU-B N/A BMFU-B no /chassis-1/ft-1
BCNAP-B N/A BCNAP-B no /chassis-1/power-
BMFU-B
HW Clock Zl30310 N/A
DPLL All BMFU-B
Reference no
Status: /chassis-1/ft-2
supply-2
BAFU-A
ref0 - SFP0 statusN/A = invalid BAFU-A no /chassis-1/ft-3
BMPP2-B CFPU-0,SE-0 BMPP2-B no /chassis-1/slot-
BCNMB-B
ref1 - SFP1 status LMP-1-1-1
= invalid BCNMB-B no
1
/chassis-1/motherboard-1
ref2 - GNSS status = invalid
BCNAP-B
ref3 N/A BCNAP-B no /chassis-1/power-
From- the NC example,
status = invalid
if the position of the faulty FRU is "/chassis-1/AMC-2",
supply-1
ref4 - EXT FREQ INPUT status = invalid
then the node name would be "PTU-1-1-1".
BCNAP-B
ref5 - NC status = invalid N/A BCNAP-B no /chassis-1/power-
supply-2
ref6 - TCLKA status = valid
3. Following are the different recovery actions to be taken, based on the
BMPP2-B
ref7 CFPU-0,SE-0 BMPP2-B no /chassis-1/slot-
error- type.
NC status = invalid
1
If
[6]allUse
reference
the defaultinputconfiguration
type status are file into the
loadinvalid
working state, that means
settings no
and recover
From
reference this example,
clock type ifisthe positionInofthis
enabled. thecase,
faultytheFRU is "/chassis-1/AMC-2",
operator should choose
the needed settings, so that a new startup configuration can be created.
then the node
appropriate name would
reference be "PTU-1-1-1".
Use the following SCLIclock commandinput type
to load usingthethe following
default SCLI command,
configuration
to see
settings: if the reference status is valid.
c. Enter the following SCLI command to restart the unit:
> set clock-sync server PTU-1-1-1 runtime default-config
> set hardware
clock-syncrestart servernode PTU-1-1-1
PTU-1-1-1 hwclock dpll-input ref-input
BACKPLANE activation-option immediate
[7] Use the startup configuration file to load working settings if the error 6
ErrorType = 3:
is not raised. If error 6 is raised, the startup configuration file needs to be
The
4. Packet
If the problemTiming Unit internal
persists monitoring
after following thetoprocess
instructions,(monitd) attempts
contact your to
local
manually edited and recovered according the content available in the
recover
customer the NTP process.
support. Operator Therefore,
can check thein recovery
advance is
the automatic.
reference Ifsource
the
documentation.
problem
and the are persists, consider
transmission path restarting the ntpd process or the unit.
If there no problems in of the reference
loading clock
default configurationfor failure.file, then restart
the sync-mgmt-app using the following SCLI command, so it reads the
Disclaimer: The instructions below use either unsupported SCLI
current start-up configuration file to load the working settings.
commands or commands from the unsupported full bash shell. Please
> set clock-sync server PTU-1-1-1 restart
carefully read the disclaimer that is shown when either entering the
unsupported SCLI vendor mode or the full bash shell. Do not use the
4. If the previous steps have not resolved the situation, contact your local
commands in any other context. Please check the product documentation
Clearing Time to Live

After correcting the fault as presented in the Instructions 0


section (in other words, after
sending the report to the local customer support), clear the
alarm
with the following SCLI command:
The system clears the alarm automatically when the fault 0
set alarm clear alarm-id <alarm id of the alarm>. If the
has been corrected.
alarm id of the alarm is unknown, use the following SCLI
command (that requires full alarm information):
set alarm clear-matching-alarms filter-by specific-problem
The
70005system clears the alarm
managed-object automatically
<managed object ofwhen the fault
the alarm> 0
has been corrected.
application-id <application id of the alarm> identifying-
application-additional-info <identifying application additional
info of the alarm>
0

After correcting the fault, as presented in the Instructions 0


section, clear the alarm with the following SCLI command:
>set alarm clear alarm-id <alarm id of the alarm>
If the alarm id of the alarm is unknown, use the following
SCLI command (that requires full alarm information):
The alarm is automatically cleared by the postgres 0
>set alarm clear-matching-alarms filter-by specific-problem
watchdog after the fault
70025 managed-object <managed object of the alarm>
has been corrected.
application-id <application id of the alarm>

The system clears the alarm automatically when the 0


measurement result goes down and is continuously held at
the maximum threshold
clearing level or below.
Clear the alarm manually with an alarm management 0
application.

Do not clear the alarm. The alarm is cleared automatically 0


by the fault detector of the operating system when the CPU
usage is on
a relatively low level. The raising/clearing thresholds are
different to prevent unnecessary trashing.
Do not clear the alarm. This alarm is automatically cleared 0
by the operating system's fault detector once
the amount of available disk space increases above the
specified limit. The raising/clearing thresholds are different
to prevent unnecessary
trashing.
The system automatically clears the alarm when the fault 0
has been corrected.

Do not clear the alarm. This alarm is automatically cleared 0


when memory usage drops below the memory usage limit.
The raising/clearing thresholds are different to prevent
unnecessary trashing.
Do not clear the alarm. The alarm is automatically cleared 0
when the fault detector of the operating system is able to
capture the
statistics of the failed resource.
The alarm is automatically cleared by the operating 0
system's fault detector when the Ethernet link comes up. If
raised due to unused
Ethernet interfaces, the alarm is cleared as
soon as the admin state of those interfaces is set to
The clearing of the alarm is done automatically by the 1
DOWN.
AlarmSystem after its time to live has expired and is
therefore not visible
as "ALARM CANCEL" messages in alarm logs.
Clear the alarm after carefully checking the alarm status in 0
the cluster.

The clearing of the alarm is done automatically by the 1


AlarmSystem after its time to live has expired and is
therefore not
visible as "ALARM CANCEL" messages in
alarm logs.
The alarm system clears this alarm automatically after its 1
time to live has expired.

The clearing of the alarm is done automatically by the 1


AlarmSystem after its time to live has expired and is
therefore not visible
as "ALARM CANCEL" messages in alarm logs.
The alarm system will clear the alarm automatically after 0
the problem is solved (that is if the
switchover is successful).

The system clears the alarm automatically when the 0


measurement result goes up and is continuously held at
the minimum threshold
clearing level or above.
The alarm is cleared automatically when access to the 0
alarm log file is restored.
The system clears the alarm automatically when the fault 0
has been corrected.

After correcting the fault, as presented in the Instructions 0


section (in other words, after sending the report to the local
customer
support), clear the alarm with the following SCLI command:
set alarm clear alarm-id <alarm id of the alarm>
After performing the steps as mentioned in the Instructions 0
If the alarm id of the alarm is unknown, use the following
section (in other words, after sending the report to the local
SCLI command (the command can clear a few 70244
customer
alarms as specified in
support), clear the alarm with the following SCLI command:
identifying-application-additional-info as the full alarm
The alarm
identity is automatically cleared.
is difficult): 0
set alarm clear alarm-id <alarm id of the alarm>
set alarm clear-matching-alarms filter-by specific-problem
If the alarm id of the alarm is unknown, use the following
70244 managed-object <managed object of the alarm>
SCLI command:
application-id <application id of the alarm>
set alarm clear-matching-alarms filter-by specific-problem
The
70245 alarm system clears<managed
managed-object the alarm object
automatically after it is
of the alarm> 0
restarted, if the
application-id alarm system
<application heartbeating
id of the alarm> is switched ON
in the alarm
processor configuration in the Configuration
Directory.
The system clears the alarm automatically when the fault 0
has been corrected. Verify that an alarm for the situation
has been cancelled
using the following SCLI command:
> show alarm active filter-by specific-problem 70249
The system clears the alarm automatically when the fault 0
has been corrected.

The system clears the alarm automatically when the fault 0


has been corrected.

After correcting the fault, as presented in the Instructions 0


section (in other words, after sending the report to the local
customer support), clear the alarm using the following SCLI
command:
> set alarm clear alarm-id <alarm id of the alarm>
The system clears the alarm when the inert mode is 0
If the alarm id is unknown, use the following SCLI
switched off from the managed object.
command (that requires the full alarm information):
> set alarm clear-matching-alarms filter-by specific-problem
<concrete specific problem> managed-object <managed
After
objectcorrecting the fault,
of the alarm> as presented
application-id in the Instructions
<application id of the 0
section, clear
alarm> identifying-application-additional-info <identifying
the alarm inadditional
application the NE using
info the following
of the alarm>SCLI command:
> set alarm clear alarm-id <alarm id of the alarm>
Alarm is automatically cleared by the NE when the issue is 0
The alarm can also be cleared manually by an operator
resolved, that is when replication is successful and/or when
with the following SCLI
a dedicated IP address is configured for RUIM.
command: (that requires the full alarm information):
> set alarm clear-matching-alarms filter-by specific-problem
70267 managed-object <managed object of the alarm>
application-id <application id of the
alarm> identifying-application-additional-info <identifying
application
additional info of the alarm>
After correcting the fault, as presented in the Instructions 0
section, clear the alarm in the NE using the following SCLI
command:
>set alarm clear alarm-id <alarm id of the alarm>
The system clears the alarm automatically when the fault 0
An operator can also clear the alarm manually with the
has been corrected.
following SCLI
command: (that requires the full alarm information):
>set alarm clear-matching-alarms filter-by specific-problem
The alarm
70269 is cleared automatically
managed-object <managed by the system.
object of the alarm> 0
application-id <application id of the alarm> identifying-
application-additional-info <identifying application additional
info of the alarm>
The alarm system clears the alarm automatically after its 10
time to live has expired.

The system clears the alarm automatically when the fault 0


has been corrected.

The alarm will be automatically cleared by the postgres 0


watchdog as soon as the failure is corrected, and the
synchronization is established.

The alarm is automatically cleared by the postgres 0


watchdog, as soon as the number of free connections is
greater than the threshold limit set.

The alarm will be automatically cleared when the faulty 0


subsystem is able to write files again.
The alarm will also be cleared upon subsystem
restart/switchover, and raised again if the problem persists.
The system clears the alarm automatically when the BFD 0
session switches to the UP state, or when the BFD session
is deleted.

In case a BFD session Reference ID is modified when


The system automatically clears the alarm when the 0
there is an active alarm for the session, then the alarm with
messages from multiple senders are no longer received
an old Reference ID is cleared, and a new alarm with a
and ten minutes pass
modified Reference ID is raised. The timestamp of the new
without any fresh messages being received.
alarm does not indicate anymore than the actual time when
The system went
the session clearsdown
the alarm automatically
but the when
original failure the
time fault
can be 0
has
foundbeen
fromcorrected.
7th additional information field.

After correcting the fault, as presented in the Instructions 0


section, clear the alarm manually from the management
interface.
The system clears the alarm automatically when the 0
certificate is successfully generated next time.

The system clears the alarm automatically when the TLS 0


connection is established with the external LDAP Server.

The system clears the alarm automatically once the active 0


configuration has been saved or an old configuration has
been restored.

The system clears the alarm automatically when the alarm 1


time expires.

Do not clear the alarm manually. When the number of 0


routes goes below the supported limit, the warning alarm is
automatically cleared. The major alarm is automatically
cleared when the node for which the alarm is raised is
rebooted.
The system clears the alarm automatically when the fault 0
has been corrected.

The system clears the alarm automatically when the fault 0


has been corrected (such as connection to forwarder was
regained).

The system clears the alarm automatically when the fault 0


has been corrected. The corrective action to clear this
alarm is to set
the test-license state to "disabled" in the configuration
directory by executing the SCLI command "set license test-
The alarm will get automatically cleared when the faulty 0
license state
certificate is replaced with a new valid certificate or when
disabled" from the unsupported vendor mode.
the
certificate (CA) is deleted, but this automatic clearance
could take up to 24 hours to happen. So it is recommended
Clear the alarm manually soon after performing the 0
to manually
required corrective actions. To clear the alarm manually,
clear this alarm soon after the certificate is replaced or
execute the following SCLI command:
deleted. To clear the alarm manually, after correcting the
fault as
> set alarm clear alarm-id <alarm-id>
This alarm in
presented will beInstructions
the automatically cleared.
section, execute the following 0
SCLI command:
Example: If "Alarm ID" of the alarm is 1365, then execute
. set alarm clear alarm-id <alarm id of the alarm>
the SCLI command
For example, if "Alarm ID" of the alarm is 1365, then
"set alarm clear alarm-id 1365"
execute "set the
Do not clear alarm clearThe
alarm. alarm-id
alarm1365".
is automatically cleared 0
when the fault detector detects that there is enough
physical or virtual RAM according to the required amount
defined by the application running on it.
Do not clear the alarm. The alarm is automatically cleared 0
when the fault detector
detects that there are enough physical or virtual
CPUs/cores installed as required
by the application running on it.
Do not clear the alarm. The system clears the alarm 0
automatically when the fault has been corrected.

Do not clear the alarm. The system clears the alarm 0


automatically when the fault has been corrected.

Do not clear the alarm. The system clears the alarm 0


automatically when the fault has been corrected.

Do not clear the alarm. The system clears the alarm 0


automatically when the fault is corrected.

There is no need to clear this alarm. After reinstalling the 0


system with correct input data, the alarm will not be raised
anymore.

Do not clear the alarm. The system clears the alarm 0


automatically when the fault has been corrected.

The system clears the alarm automatically during startup 0


once the VNF startup configuration data validation pass.

The alarm is automatically cleared after a successful 0


authentication for the remote access account. A successful
authentication notes that the keys are valid and correctly
installed, since the account cannot be used via any other
authentication method.
Do not clear the alarm. The system clears the alarm 0
automatically when the fault has been corrected.
The alarm can also be cleared manually by an operator
with the following SCLI command:

Do not clear
1. show alarmthe alarm.
active This alarm
filter-by is cleared automatically
specific-problem 70462 0
by the
This fault detector
command of the
displays theoperating
alarm-id system when the file
system problem
2. set alarm clearisforced
fixed. yes alarm-id <alarm id of the
alarm>
Do not cancel the alarm. The system clears the alarm 0
automatically when the fault has been corrected.
Do not clear the alarm. The system clears the alarm 0
automatically when the fault
is corrected.

The system clears the alarm automatically when the fault 0


has been corrected.

Alarm is cleared automatically by the CertMan framework, 0


after it can fetch, validate and store the CRL for a specific
issuer-id or if the DP configuration is deleted.

The alarm is automatically cleared when the revoked 0


certificate is replaced with a new valid certificate or when
the certificate (CA) is deleted, but this automatic clearance
could take up to 24 hours to happen. So, it is
recommended to manually clear this alarm soon after the
After correcting the fault, as presented in the Instructions 0
revoked certificate is replaced or deleted.
section, clear the alarm by using the following SCLI
command:
To clear the alarm manually, after correcting the revoked
set alarm clear alarm-id <alarm id of the alarm>
certificate as presented in the Instructions section, execute
If the alarm id of the alarm is unknown, use the following
Do
the not clear the
following alarm.
SCLI The system clears the alarm
command: 0
SCLI command (that requires the full alarm information):
automatically when the fault has been corrected.
set alarm clear-matching-alarms filter-by specific-problem
- set alarm clear alarm-id <alarm id of the alarm>
71061 managed-object <managed object of the alarm>
For example, if "Alarm ID" of the alarm is 1365, then
application-id <application id of the alarm>
execute "set the
Do not clear alarm clearThe
alarm. alarm-id
system1365".
clears the alarm 0
automatically when the fault has been corrected.

Do not clear the alarm. The system clears the alarm 0


automatically when the fault has been corrected.

Do not clear the alarm. The system clears the alarm 0


automatically when the fault has been corrected.

Steps: 0
1. Rollback the in-service upgrade procedure according to
the customer documentation instructions.
2. Clear the alarm manually.
After correcting the fault, according to the Instructions
The system clears the alarm automatically when the fault 0
section, clear the alarm by using the following SCLI
has been corrected
command:
set alarm clear alarm-id <alarm id>

The system clears the alarm automatically when the fault 0


has been corrected
The system clears the alarm automatically when the fault 0
has been corrected

The alarm will be cleared automatically when the target 0


node joins the cluster and the dynamic configuration is
activated successfully.

Do not clear the alarm manually. The alarm is automatically 0


cleared when the
user or group account synchronization succeeds.

Do not clear the alarm. The system clears the alarm 0


automatically when the fault has been corrected.

The alarm will be automatically cleared by the alarm 300000


system after 5 minutes. If the configuration is still out of
order after that, the alarm is raised again.

After correcting the fault, as presented in the Instructions 0


section, clear the alarm by using the following SCLI
command:

>set alarm clear alarm-id <alarm id of the alarm>


After correcting the fault as presented in the Instructions 0
section, use the following SCLI command to clear the
If the alarm id of the alarm is unknown, use the following
alarm:
SCLI command (that requires the full alarm information):
>set alarm clear-matching-alarms filter-by specific-problem
>set alarm clear alarm-id <alarm id of the alarm>
After
70003correcting the fault,<managed
managed-object as presented in the
object Instructions
of the alarm> 0
section, clear the
application-id alarm using
<application id ofthe
thefollowing
alarm> SCLI command:
If the alarm ID of the alarm is unknown, use the following
>set alarm clear alarm-id <alarm id of the alarm>
SCLI command instead(that requires the full alarm
information):
If the alarm id of the alarm is unknown, use the following
After correcting the fault, as presented in the Instructions 0
SCLI command (that requires the full alarm information):
section,
>set alarmuseclear-matching-alarms
the following SCLI commands to clear the
filter-by specific-problem
>set alarm clear-matching-alarms filter-by specific-problem
alarm:
70004 managed-object <managed object of the alarm>
70007 managed-object <managed object of the alarm>
application-id <application id of the alarm>
application-id <application id of the alarm> identifying-
show alarm active filter-by specific-problem 70008
The system automatically clears
application-additional-info the alarm
<identifying when the
application fault
additional 0
has been
info of thecorrected.
alarm>
set alarm clear alarm-id <Alarm_ID>
If the user removes a blade or an AHUB, the raised alarms
will be cleared only when the blade or the AHUB is inserted
back into the system. If a new AHUB (having factory
1. Execute the following SCLI command to clear the alarm 0
defaults) is inserted or rebooted then there might be a case
after correcting the fault as presented in the Instructions
where raised alarms will not be cleared since the inserted
section:
AHUB is not having the SNMP configuration untill the
DHCP cycle is completed.
set alarm clear alarm-id <alarm ID of the alarm>
The system clears the alarm automatically. 0
2. Execute the following SCLI command as an alternative if
the alarm ID is unknown (this requires the full alarm
information):

set alarm clear-matching-alarms filter-by specific-problem


70064 managed-object <managed object of the alarm>
application-id <application id of the alarm> identifying-
application-additional-info <identifying application additional
info of the alarm>
Alarm is cancelled automatically. 0

If the automatic cancellation does not succeed then after


correcting the fault clear the alarm using the following SCLI
command:
In case of a real hardware failure, the alarm has to be 0
cleared manually after the disk has been replaced.
set alarm clear alarm-id <alarm id of the alarm>
However, in case the failure is transient or simulated, the
system will clear the alarm automatically when it detects
If the alarm id of the alarm is unknown, use the following
that the fault situation is over.
The
SCLIsystem
commandclears the alarm automatically when the fault 0
has
(thatbeen corrected.
requires the full alarm information):

set alarm clear-matching-alarms filter-by specific-problem


<concrete
The system clears the alarm automatically when the fault 0
specific
has been problem> managed-object
corrected.
<managed object of the alarm> application-id <application
id of
the alarm> identifying-application-additional-info
Clear the alarm
<identifying after correcting the fault as presented in
application 0
the instructions.
additional info of the alarm>

Use the following SCLI commands to clear the alarm:


Clear the alarm with the alarm management application 0
show alarm active filter-by specific-problem 70275
after correcting the fault as presented in the instructions.
set alarm clear alarm-id <Alarm_ID>
Use the following SCLI commands to clear the alarm
Clear the alarm with the alarm management application 0
show alarm active filter-by specific-problem 70276
after correcting the fault as presented in the instructions.
set alarm clear alarm-id <Alarm_ID>
Use the following SCLI commands to clear the alarm
Clear the alarm with the alarm management application 0
show alarm active filter-by specific-problem 70277
after correcting the fault as presented in the instructions.
set alarm clear alarm-id <Alarm_id>
Use the following SCLI commands to clear the alarm:
Clear the alarm with the alarm management application 0
show alarm active filter-by specific-problem 70278
after correcting
the fault as presented in the instructions.
set alarm clear alarm-id <Alarm_ID>
Use the following SCLI commands to clear the alarm:
The system clears the alarm automatically when the fault 0
has been corrected.
show alarm active filter-by specific-problem 70279

set alarm clear alarm-id <Alarm_ID>


After correcting the fault, as presented in the Instructions 0
section, use the following SCLI command to clear the
alarm:

set alarm clear alarm-id <alarm-id of the alarm>


After correcting the fault, as presented in the Instructions 0
section, use the following SCLI command to clear the
alarm:

set alarm clear alarm-id <alarm-id of the alarm>


The system clears the alarm automatically when the fault 0
has been corrected.

Execute the following SCLI command to clear the alarm: 0

set alarm clear alarm-id <alarm-id of the alarm>

NOTE: This alarm can be cleared (manually) for BCNOC-A


Clear the alarm with the alarm management application 0
type hardware after the
after correcting the fault according to the instructions.
verification. The following instructions require root
privileges.

The system: clears


Disclaimer the alarm
The following automatically
instructions makewhen theeither
use of fault 0
has been corrected.
unsupported SCLI
commands or commands from the unsupported full bash
shell. Please carefully
read
After the disclaimer
correcting the that
fault,isas
shown whenin
presented either entering the
the Instructions 0
SCLI unsupported
section, use the following SCLI command to clear the
vendor
alarm: mode or the full bash shell. Do not use the
commands in any other
context.
set alarmPlease check from
clear alarm-id the product
<alarm-id of thedocumentation
alarm> or
The
fromsystem clears the alarm automatically when the fault
local customer 0
has been
support forcorrected.
more information.

1) Execute the following SCLI command to start an external


bash shell session():
After correcting the fault, as presented in the Instructions 0
section, use the following SCLI command to clear the
shell
alarm:bash full
2)
setLog intoclear
alarm the FRU on which
alarm-id the alarm
<alarm-id of theisalarm>
raised using the
After
SCLI correcting
command:the fault, as presented in the Instructions 0
section, use the following
> shell remote-console SCLI command
fru-location to clear the
<fru_location>"
alarm:
Example:
set alarm
> shell clear alarm-idfru-location
remote-console <alarm-id of/chassis-1/slot-3
the alarm>
The system clears the alarm automatically when the fault 0
has been corrected.
Note: <fru_location> can be found from alarm field, IAAI.

3) Execute the command 'ipmitool sdr get


The alarm should be
"BOOT_ERROR"' twicecleared
with manually after correcting the 0
situation.
a minimum Below
time are the steps
interval to clear the alarm:
of 1 minute.
1. Retrieve
If the first
state of thethe alarmcontinues
sensor id for the to
alarm
be in70303,
de- enter the
following command:
asserted(that is, the field States Asserted is not present),
after>
The
show
the
system
alarm active
command beingfilter-by
automatically executed specific-problem
clears thefor the second
alarm
70303
time,
when the the 0
shelf
Note down
user can clear the value
the alarm of the
using parameter
the SCLI Alarm ID.
manager becomes available through thecommand
primary IP
2. Clear the alarm 70303.
provided.
address. System also clears the backup shelf manager
To clear the alarm, enter the following command:
unavailable alarm, once the ICMP pings to the backup shelf
> set alarm clear alarm-id <Alarm-ID>
Example:
manager succeeds.
The system
Replace
# ipmitool sdrclears
the the alarm
getAlarm-ID
"BOOT_ERROR" automatically
with when the
the value retrieved fault
from 0
has been
previous
<...> corrected.
step.
Event Message Control : Entire Sensor Only
States Asserted : Boot Error
[No bootable media]
Assertion Events : Boot Error
[No bootable media]
<...>

The SCLI command to find the type of hardware is "show


hardware inventory
component <chassis and slot of the FRU>".

Example:
The system clears the alarm automatically when the fault 0
has been corrected.

The alarm will be cleared automatically once the faulty 0


entity is restarted. It must be ensured that the configuration
and setup has been corrected as described in the
Instructions section otherwise the alarm will be raised
again.
This alarm will be cleared when the Subsystem Allowed 0
(SSA) message is received from the remote subsystem or
when the remote subsystem configuration is deleted from
the local side.
The system clears the alarm automatically when a "no 0
congestion" event is received. No congestion event is
received when the congestion level is zero.

The system clears the alarm automatically after it is 10000


expired.

The system clears the alarm automatically when the state 0


of the link or the route changes to "Available".

The system clears the alarm automatically when the point 0


code becomes accessible.

Do not clear the alarm. 0


The system clears the alarm automatically when the fault
has been corrected.

Clearing of this alarm is not possible for missing hardware


The alarm is automatically cleared by the DSP high 0
in AB4, incase of any mismatch between the Configuration
availability services proxy process when the core is working
Directory (that is, planned deployment configuration) and
again.
Switch Configurations (that is, installed physical HW
configuration), as this alarm remains always in the active
The
list. alarm is automatically cleared by the DSPHASProxy (a 0
process that is monitoring and controlling the DSP units)
process when the number of out of sync DSP cores goes
down the configured threshold value.
Clear the alarm with an alarm management application 0
after correcting the fault as presented in the Instructions.

The alarm is automatically cleared, when the ethernet link 0


comes up.
After correcting the fault as presented in the Instructions 0
section,

Alarm can be cleared with the following SCLI command:


fsclish set alarm clear alarm-id ALARM-ID
The system clears the alarm automatically when the ICMP 0
monitoring session switches to the UP state (that is, the
Alarm can be acknowledged with the following command:
peer end is reachable with the configured session source
fsclish set alarm acknowledge alarm-id ALARM-ID
and destination addresses).
The alarm should be manually cleared by the operator. 0

The alarm is manually cleared by the operator by executing 0


the following SCLI command:

set alarm clear alarm-id <alarm-id>


When the ClusterTraceManager's CPU usage over the pre- 0
configured time period comes below the threshold limit, the
system clears the alarm automatically.

The alarm will be automatically cleared 24 hours after the 86400000


latest occurrence of the error. Use the following SCLI
command to clear the alarm:
set alarm clear alarm-id <alarm id of the alarm>
The system clears the alarm automatically when the fault 0
If the alarm ID of the alarm is unknown, use the following
has been corrected.
SCLI command
instead (this requires the full alarm information):

Thealarm
set alarmclear-matching-alarms
is cleared automatically.filter-by specific-problem 0
70370 managed-object
<managed object of the alarm> application-id <application
id of
the
Thealarm>
alarm isidentifying-application-additional-info
cleared automatically. 0
<identifying application
additional info of the alarm>

Do not clear the alarm manually. This alarm is 0


automatically cleared by the fault detector of the operating
system when the memory usage is on a relatively low level.
The raising/clearing thresholds are different to prevent
unnecessary trashing.
If the alarm is raised because the congestion limit defined 0
by the parameter "sccp-connections-congestion-threshold"
is reached, then the system clears the alarm automatically
when the number of active signaling connections goes
below or equal to 90% of the signaling connection
The alarm is automatically cleared once the TTL (time to 600000
congestion threshold definition, which is configured as part
live timer) expires. If the failure persists, the alarm will be
of the SCCP limits administration and is normally defined
raised again.
during the product deployment phase.
The alarm is automatically cleared once the TTL (time to 600000
live timer) expires. If the failure persists, the alarm will be
raised again.

The alarm is automatically cleared once the TTL (time to 10000


live timer) expires. If the failure persists, the alarm will be
raised again when the outbound message fails at SCTP
with error code EAGAIN.
The system will clear this alarm when there is an increase 0
in the success rate of the overall signaling connections,
that is, both the incoming and outgoing connections (pre-
configured to a minimum of 100) in the network which are
handled within the signaling instance that earlier raised this
The alarm is automatically cleared when the specific 0
alarm. The success rate to clear the alarm should be at
instance of the object identified by the subfields "ObjectID"
least 97% defined in the product deployment.
and "InstanceID" in the Identifying Application Additional
Information field is re-administered by the user with valid
network configurations.
The system clears the alarm automatically when the power- 0
throttling mechanism has been stopped based on the
temperature reading from the processing unit sensor and
also the ambient temperature sensor.
The system cancels the alarm upon the successful 0
In addition, the system clears the alarm when the recovery
recovery of the failed path in the SCTP Multi-homed
unit is started.
association, for example, after a successful transfer of
traffic to the SCTP primary path, or after an SCTP
heartbeat message passes through the secondary path.
For M3UA or IUA protocol, the system cancels the alarm if 0
the at least one SCTP association's status in the
For M3UA protocol or IUA protocol, the system also
association set is in "asp_state_active" status.
cancels the alarm if the SCTP association status takes
"connection_down", because the path state is supervised
The
insideSIGNALING SERVICE INTERNAL
an active association FAILURE
by the SCTP. alarm,
That is why this 0
FailureType
alarm can beas ONCONFIG_DB_FAILURE,
and has meaning only ifwill
thebeSCTP
cleared
when the next
association LDAP search operation is successful.
is active.

The SIGNALING SERVICE INTERNAL FAILURE alarm,


The alarm will be cleared by following the instructions listed 0
FailureType as LM_CONNECTION_FAILURE, will be
under the "Instructions" section. If the problem persists
cleared when the SLM and SNM TCP connection recovers
after restarting the RG/RU, then report the problem to your
from failure.
local technical support.
This alarm will be cleared automatically when the service is 0
For FailureType
started successfully.
"DISTRIBUTED_STACK_SYNCUP_FAILURE",
"SCCP_ACTIVE_STANDBY_SYNCUP_FAILURE",
"LM_STACK_CONNECTION_FAILURE", and
"STACK_MESSAGE_QUEUE_CONGESTION", the system
After noticing the termination of the association, the alarm 0
is cancelled by the system upon the successful recovery
attempts to re-establish the association until this is of
the failures.
successfully done. The system cancels the alarm upon the
successful establishment of the association. In override
For FailureType
mode, the alarm "INSUFFICIENT_RESOURCES",
is cancelled when the state for thethe
The system
affected clearsservices
signaling the alarm automatically
will be restartedonce the
and naturally 0
standby association goes from "ASSOC_STATE_DOWN"
external
the alarmalarm
raisedinput
is assumes
also its assigned normal state.
cleared.
to "ASSOC_STATE_INACTIVE".
Note: Automatic clearing of external alarms
ACTIVATOR_SNM_DISCONNECT at restart
alarm shall
will only
be exempted.
get cleared when the next configuration update is triggered.
The restart of SNM process will not clear this alarm.
The system clears the alarm automatically once the 0
external alarm input assumes its assigned normal state.

Note: Automatic clearing of external alarms at restart shall


be exempted.
The system clears the alarm automatically once the 0
external alarm input assumes its assigned normal state.

Note: Automatic clearing of external alarms at restart shall


be exempted.
The system clears the alarm automatically once the 0
external alarm input assumes its assigned normal state.

Note: Automatic clearing of external alarms at restart shall


be exempted.
The system clears the alarm automatically once the 0
external alarm input assumes its assigned normal state.

Note: Automatic clearing of external alarms at restart shall


be exempted.
The system clears the alarm automatically once the 0
external alarm input assumes its assigned normal state.

Note: Automatic clearing of external alarms at restart shall


be exempted.
The system clears the alarm automatically once the 0
external alarm input assumes its assigned normal state.

Note: Automatic clearing of external alarms at restart shall


be exempted.
The system clears the alarm automatically once the 0
external alarm input assumes its assigned normal state.

Note: Automatic clearing of external alarms at restart shall


be exempted.
The system clears the alarm automatically once the 0
external alarm input assumes its assigned normal state.

Note: Automatic clearing of external alarms at restart shall


be exempted.
The system clears the alarm automatically once the 0
external alarm input assumes its assigned normal state.

Note: Automatic clearing of external alarms at restart shall


be exempted.
The system clears the alarm automatically once the 0
external alarm input assumes its assigned normal state.

Note: Automatic clearing of external alarms at restart shall


be exempted.
The system clears the alarm automatically once the 0
external alarm input assumes its assigned normal state.

Note: Automatic clearing of external alarms at restart shall


be exempted.
The system clears the alarm automatically once the 0
external alarm input assumes its assigned normal state.

Note: Automatic clearing of external alarms at restart shall


be exempted.
The system clears the alarm automatically once the 0
external alarm input assumes its assigned normal state.

Note: Automatic clearing of external alarms at restart shall


be exempted.
The system clears the alarm automatically once the 0
external alarm input assumes its assigned normal state.

Note: Automatic clearing of external alarms at restart shall


be exempted.
The system clears the alarm automatically once the 0
external alarm input assumes its assigned normal state.

Note: Automatic clearing of external alarms at restart shall


be exempted
The system clears the alarm automatically once the 0
external alarm input assumes its assigned normal state.

Note: Automatic clearing of external alarms at restart shall


be exempted.
The system clears the alarm automatically once the 0
external alarm input assumes its assigned normal state.

Note: Automatic clearing of external alarms at restart shall


be exempted.
The system clears the alarm automatically once the 0
external alarm input assumes its assigned normal state.

Note: Automatic clearing of external alarms at restart shall


be exempted.
The system clears the alarm automatically once the 0
external alarm input assumes its assigned normal state.

Note: Automatic clearing of external alarms at restart shall


be exempted.
The system clears the alarm automatically once the 0
external alarm input assumes its assigned normal state.

Note: Automatic clearing of external alarms at restart shall


be exempted.
The system clears the alarm automatically once the 0
external alarm input assumes its assigned normal state.

Note: Automatic clearing of external alarms at restart shall


be exempted.
The system clears the alarm automatically once the 0
external alarm input assumes its assigned normal state.

Note: Automatic clearing of external alarms at restart shall


be exempted.
The system clears the alarm automatically once the 0
external alarm input assumes its assigned normal state.

Note: Automatic clearing of external alarms at restart shall


be exempted.
The system clears the alarm automatically once the 0
external alarm input assumes its assigned normal state.

Note: Automatic clearing of external alarms at restart shall


be exempted.
The system clears the alarm automatically once the 0
external alarm input assumes its assigned normal state.

Note: Automatic clearing of external alarms at restart shall


be exempted.
The system clears the alarm automatically once the 0
external alarm input assumes its assigned normal state.

Note: Automatic clearing of external alarms at restart shall


be exempted.
The system clears the alarm automatically once the 0
external alarm input assumes its assigned normal state.

Note: Automatic clearing of external alarms at restart shall


be exempted.
The system clears the alarm automatically once the 0
external alarm input assumes its assigned normal state.

Note: Automatic clearing of external alarms at restart shall


be exempted.
The system clears the alarm automatically once the 0
external alarm input assumes its assigned normal state.

Note: Automatic clearing of external alarms at restart shall


be exempted.
The system clears the alarm automatically once the 0
external alarm input assumes its assigned normal state.

Note: Automatic clearing of external alarms at restart shall


be exempted.
The system clears the alarm automatically once the 0
external alarm input assumes its assigned normal state.

Note: Automatic clearing of external alarms at restart shall


be exempted.
The alarm is automatically cleared if the signaling object 0
does not change status frequently, or the object status
change is within the threshold as specified in
"ObjectFluctuationThreshold" within the time span of
"CriticalAlarmTimer".
After noticing the termination of the D-channel, the system 0
attempts to re-establish the D-channel until this is
successfully done. The system automatically cancels the
alarm upon the successful establishment of the D-channel.
This is also cleared automatically if the D-channel is
Once faulty hardware is replaced the alarm needs to be 0
administratively "disabled".
cleared manually. Use the following SCLI command to
clear the alarm manually:

set alarm clear alarm-id <alarm id of the alarm>


Use the following SCLI command to clear the alarm 0
manually:

set alarm clear alarm-id <alarm id of the alarm>


The system clears the alarm automatically when the fault 0
has been corrected.
The system clears the alarm automatically when the fault 0
has been corrected. The two conditions under which alarm
gets cleared are as follows:

1. DPLL is recovered from hold over mode, which means


The alarm will be cleared after Time-to-live (TTL) interval, 14400000
suitable reference is available for synchronization.
that is 12 hours. It is also possible to clear the alarm
2. Input reference is recovered.
manually using below SCLI command:

> set alarm clear alarm-id <alarm_id>


The system clears the alarm automatically when the fault 0
has been corrected.
where <alarm_id> can be found from the active alarm
record.

Clear the alarm with the alarm management application 0


after correcting the fault according to the instructions.

The alarm is automatically cleared by the system if the fibre 0


channel switch recovers from the malfunction. If the switch
module needs to be replaced, the alarm has to be cleared
manually.
The alarm is automatically cleared by the system when the 0
corresponding fibre channel port comes up. If the switch
module needs to be replaced, the alarm has to be cleared
manually.
Note:
See the How to Read This Report tab
for instructions on the usage of Alarm List

Changes between issues Changes between releases Alarm Number


01A and 01 LTE16A and LTE17A

Changed Changed 7650

Changed Changed 7651

Changed Changed 7652

Changed Changed 7653

Changed Changed 7654

Changed Changed 7655

Changed Changed 7656

7657

New New 7662

Changed Changed 7665


Alarm Name Probable Cause

BASE STATION FAULTY Indeterminate

BASE STATION OPERATION DEGRADED Indeterminate

BASE STATION NOTIFICATION Indeterminate

CELL FAULTY Indeterminate

CELL OPERATION DEGRADED Indeterminate

CELL NOTIFICATION Indeterminate

BASE STATION CONNECTIVITY LOST Indeterminate

BASE STATION CONNECTIVITY Indeterminate


DEGRADED

Spare Base Station/Unit Alarm Indeterminate

BASE STATION TRANSMISSION ALARM Indeterminate


Event Type Default Severity

Equipment Critical

Equipment Major

Equipment Minor

Quality of service Critical

Quality of service Major

Quality of service Minor

Communications Critical

Communications Major

Quality of service Critical Major Minor Warning

Communications Minor
Meaning Effect

A critical fault (or faults) has occurred in the base The effect of the fault on the functioning of the
station. network element depends on the fault
description. For more information, see base
Check the reason for the fault from the station fault descriptions in LTE System Libraries.
supplementary text field of the alarm.
A major fault (or faults) has occurred in the base The effect of the fault on the functioning of the
station. network element depends on the fault
description. For more information, see base
Check the reason for the fault from the station fault descriptions in LTE System Libraries.
supplementary text field of the alarm.
A minor fault (or faults) has occurred in the base The effect of the fault on the functioning of the
station. network element depends on the fault
description. For more information, see base
Check the reason for the fault from the station fault descriptions in LTE System Libraries.
supplementary text field of the alarm.
A critical fault (or faults) has occurred in a unit (or The effect of the fault on the functioning of the
units) that belong to the sector indicated in the network element depends on the fault
alarm. description. For more information, see base
station fault descriptions in LTE System Libraries.
Check the reason for the fault from the
A major fault (or faults) has occurred in a unit (or The effect of the fault on the functioning of the
supplementary text field of the alarm.
units) that belong to the sector indicated in the network element depends on the fault
alarm. description. For more information, see base
station fault descriptions in LTE System Libraries.
Check the reason for the fault from the
A minor fault (or faults) has occurred in a unit (or The effect of the fault on the functioning of the
supplementary text field of the alarm.
units) that belong to the sector indicated in the network element depends on the fault
alarm. description. For more information, see base
station fault descriptions in LTE System Libraries.
Check the reason for the fault from the
A critical fault (or faults) has occured in the base
supplementary text field of the alarm.
station interface.
Check the reason for the fault from the
supplementary text field of the
alarm.
A major fault (or faults) has occured in the base The effect of the fault on the functioning of the
station interface. Check the reason for the fault network element depends on the fault
from the supplementary text field of the alarm. description. For more information, see base
station fault descriptions in LTE System Libraries.
This is a spare alarm for handling late churn in The effect of the fault on the functioning of the
the release. Please see (1) additional information network element depends on
fields in the alarm to determine the BTS fault that the fault description.
raised this alarm and the (2) reported dynamic
severity in the alarm to determine the urgency of
A transmission fault (or faults) has occured in the The effect of the fault on the functioning of the
the issue.
BTS. This alarm is an encapsulation alarm that is network element depends on the fault
This alarm is applicable independent of the value
used to transfer the Flexi Transport Submodule description. For more information, see base
of actCategoryAlarms
(FTM) alarm data over the BTS O&M connection station fault descriptions in LTE System Libraries.
through iOMS to NetAct. In NetAct this alarm is
shown in opened format. This means that the
alarm number, alarm text, and supplementary
information are shown in the original FTM format.
Check the reason for the fault from the
supplementary information fields and
supplementary text field of the alarm.
Identifying Additional Information Fields Additional Information Fields

1. rack (cabinet) number

2. shelf number

3. slot
1. rack (cabinet) number
4. type of unit
2. shelf number
5. unit number
3. slot
1. rack (cabinet) number
6. subunit number
4. type of unit
2. shelf number
7. path (for alarms where field "type of unit" contains one
5. unit number
3.
of slot values: FSM, FT, FSP, FBB, FR, FAN,
1. the
rack (cabinet) number
AntennaLine, MHA, RET, FYG, SFP, ASIA, ABIA)
6. subunit number
4. type of unit
2. shelf number
7. path (for alarms where field "type of unit" contains one
5. unit number
3.
of slot values: FSM, FT, FSP, FBB, FR, FAN,
1. the
rack (cabinet) number
AntennaLine, MHA, RET, FYG,
6. subunit number
4. type of unit
SFP)
2. shelf number
7. path (for alarms where field "type of unit" contains one
5. unit number
3.
of slot values: FSM, FT, FSP, FBB, FR, FAN,
1. the
rack (cabinet) number
AntennaLine, MHA, RET, FYG,
6. subunit number
4. type
SFP, of unit
ASIA, ABIA)
2. shelf number
7. path (for alarms where field "type of unit" contains one
5. unit number
3.
of slot values: FSM, FT, FSP, FBB, FR, FAN,
1. the
rack (cabinet) number Destination IP address
AntennaLine,
2. subunit
shelf number MHA, RET, FYG,
6. number
4. type of unit
SFP,
3. slotASIA, ABIA)
4.
7. type
path of
(forunit
alarms where field "type of unit" contains one
5. unit number
5.
of unit
the number FSM, FT, FSP, FBB, FR, FAN,
values:
1. rack (cabinet) number Destination IP address
6. subunit number
AntennaLine, MHA, RET, FYG,
2. subunit
6. shelf number
number
SFP,
3. slotASIA, ABIA)
4.
7. type
path of
(forunit
alarms where field "type of unit" contains one
5.
of unit
the number FSM, FT, FSP, FBB, FR, FAN,
values:
1. rack (cabinet) number Serial Number (when applicable)
6. subunit number
AntennaLine, MHA, RET, FYG,
2. shelf number
SFP,
3. slotASIA, ABIA)
4. type of unit
5. unit number
1. probable cause reported by FTM
6. subunit number
7. path
2. the managed object reported by FTM
8. supplAlarmInfo (contains the real Alarming Object and
Alarm Name)
3. alarm number reported by FTM. The FTM has reserved
alarm numbers from space 61000-61999
Instructions

Verify the configuration of a BTS by checking the commissioning, cabling


and correct installation of the units/modules at the BTS. Make sure the
environment does not cause the fault.

Before a unit/module is replaced at the BTS, the site should be reset to


Verify the configuration of a BTS by checking the commissioning, cabling
recover from any temporary faults which might have caused the
and correct installation of the units/modules at the BTS. Make sure the
malfunctioning of the unit/module. When an active alarm is a 'Start' alarm,
environment does not cause the fault.
a site reset is required to cancel the alarm. If the site reset or module
block/unblock does not help, replace the faulty unit/module (see the
Before a unit/module is replaced at the BTS, the site should be reset to
Verify
sourcethe of the configuration
alarm andof a BTS by checking
instructions fields of the thecorresponding
commissioning, cabling
alarms).
recover from any temporary faults which might have caused the
and correct installation of the units/modules at the BTS. Make sure the
malfunctioning of the unit/module. When an active alarm is a 'Start' alarm,
environment
In case of thedoes FSMnot cause
failure (FSM the or fault.
FSM sub-unit is reported as source) it
a site reset is required to cancel the alarm. If the site reset or module
is possible that other units are also reported as faulty since it is impossible
block/unblock does not help, replace the faulty unit/module (see the
Before
for them a to unit/module
continue functioning is replacedwithout at the BTS, a system the site should
module. Inbe suchreset case to
Verify
source theof the configuration
alarm andof a BTS by checking
instructions fields of the thecorresponding
commissioning, cabling
alarms).
recover
replace fromsystem
the any temporary module faults
first and which
check might
if the have
modulescaused the
previously
and correct installation of the units/modules at the BTS. Make sure the
malfunctioning
marked as Faulty of theare unit/module.
now in Working When an active alarm is a 'Start' alarm,
state.
environment
Below is the list does of not BTScause fault(s) thewhichfault.might have caused this alarm in
a site reset is required to cancel the alarm. If the site reset or module
BTS FL17A/TL17A release. In case of earlier SW release and in case you
block/unblock does not help, replace the faulty unit/module (see source of
Before
need a unit/module
more detailed fault is replaced at the BTS, to the site should be resetThe to
Verify
the
Below the
alarmisfrom configuration
and
the instructions
list ofdescriptions
afields
BTS ofbythe
refer
checking
corresponding
LTE System
thecaused
commissioning,
alarms).
Libraries. cabling
recover
BTS fault anyoftemporary
descriptions
BTS fault(s)
are also
which
faults which
included
may
in
have
might
BTS have
SW caused
release
this alarm
the in BTS
and correct installation
FL17A/TL17A release. of
In the units/modules
case When of an earlier at theSWBTS. releaseMake sure the
malfunctioning
documentation. ofIf the
thatunit/module.
does not help, an active
contact alarm
your local is aand
customer
if you
'Start' alarm,
support.
environment
Below
need is thedetailed
more does
list of not BTS cause
fault faults the
which
descriptions, fault. mightLTE have
System caused this alarm
Libraries. The in BTS
BTS
a site reset is required to cancel the alarm. If the site reset or module
FL17A/TL17A
fault descriptions release.
arenot In case
also included of earlierin theBTS SW SW release
release and in case you need
documentation. If
block/unblock does help, replace
----------------------------------------------------------------------------- faulty unit/module (see the
Before
more
that does a unit/module
detailed not fault
help, is replaced
descriptions
contact your at the
refer
local to BTS,
LTE
customer thesupport.
System site Libraries.
should beThe reset BTSto
Verify the
source of the configuration
alarm andof a BTS by checking
instructions fields of the thecorresponding
commissioning, cabling
alarms).
recover
fault from any temporary
descriptions are also faults which
included in BTS might
SW have
release caused the
and correct
FAULT NAME installation of the units/modules at the BTS.documentation.
Make sure the If
malfunctioning
that does notdoes of the
help, unit/module.
contact your localWhen an active alarm is a 'Start' alarm,
environment
Below
2M is the list of not
external BTScause fault(s) thewhichfault.customer
----------------------------------------------------------------------------- support.
might have caused this alarm in
a site reset isreferencerequired missing to cancel the alarm. If the site reset or module
BTS FL17A/TL17A release. In case of earlier SW release and in case you
block/unblock does not help, replace the faulty unit/module (see the
-----------------------------------------------------------------------------
FAULT
Before
need
FAULT aNAME
more ID unit/module
detailed fault is replaced
descriptions at the BTS,
refer to the LTEsite should
System be resetThe
Libraries. to
FAULT
source
Antenna NAME
of theisalarm
link down and instructions fields of the corresponding alarms).
recover
BTS
1899: fault from any
descriptions
EFaultId_2MExtRefMissAl temporary are alsofaults which in
included might
BTS have
SW caused the
release
S1
FAULTinterfaceNAME recovery failure
malfunctioning
documentation.ofIf the thatunit/module.
does not help, When an active
contact alarmcustomer
your local is a 'Start' alarm,
support.
Below
# isIDthe listreset
autonomous
FAULT of BTS fault(s) which
astorecovery action might have caused this alarm in
a site reset is required
MEANING cancel the alarm. If the site reset or module
FAULT
BTS
476: ID
FL17A/TL17A
EFaultId_Rp3BusError release. In case of earlier SW release and in case you
block/unblock does
clocknot help, replace
------------------------------------------------------------------------------
The reference monitoring has detected the faulty unit/module of(see theMHz
6317:
need
FAULT
FAULT of
EFaultId_S1ResetRetryOut
more ID
NAME detailed fault descriptions refer to a LTEsignal
System loss Libraries.
2.048 The
source
received the alarm
from an and instructions
external reference fields ofconnected
source the corresponding to the Syncalarms).
In
BTS
52: fault descriptions
EFaultId_UnitAutonomousResetAl
MEANING
GTP-U Path Failure are also included in BTS SW release
FAULT
interface
MEANING NAME
of the System Module.
documentation.
The antenna (RP3)If that linkdoesis down. not help, The contact
transmitter yourorlocal customer
receiver support.
Below
Abnormal is the list of BTS fault(s) which may have caused this drops
alarm from in BTS
The
MEANINGS1 interface reset hastofailed
synchronization.
CPRI interface next hop
after<OptIfseveral ID> attempts. This indicates that
FAULT
FL17A/TL17A
INSTRUCTIONS ID release. In case of earlier SW release and in case you need
a severe failure has occurred: there is some
------------------------------------------------------------------------------
Flexi BTS FDD, Flexi BTS TDD, AirScale BTS configuration
FDD, AirScale problem;
BTS TDD: there
6150:
more
FAULT EFaultId_GtpuPathFailure
detailed fault descriptions refer to LTE System Libraries.
alarmThe BTSto
Note:
Refer
is
ThenofaulttoID
Perform
S1
INSTRUCTIONS the
connectivity;
informs
the steps
instructions
the MME below
for has
operator
in
thethatBTS thefault
failed;
the
listed
BTS
order
andreported
so
is on.until
trying into thealarm
the correct a onfaulthow
fault
1974: descriptions
disappears.
handle the failure are
reportedalso included
EFaultId_AbnormalCpriInterfaceToNextHopAl in theAirScale in BTS SW release documentation. If
alarm
FAULT
situation
Flexi
MEANING BTS NAME
by FDD,performing
Flexi a recovery
BTS TDD, reset toBTS
a unit, FDD,theAirScale
site, the BTS BTS,TDD: or the
that does not help, contact your local customer support.
INSTRUCTIONS
10b8b
BTS
1. Restartcoding
secondary the error
BTS.unit.inInopticalcase interface
of anelement
RF moduledevice reset in RF chaining
In GTP-U
MEANING
1. Check supervision,
the cabling a network
(connected to did not respond to the GTP-U:
Note: Perform
configuration, the BTS
the steps alsobelow inthe
resets thethe Sync
listed In interface).
order until
thatthe arealarm
2.
Echo
The If that
Request
optical does not
message
interface help, is replace
not within
workingthe all RF modules
alarming
allotted
-----------------------------------------------------------------------------
properly. time.module. further in the
disappears:
FAULT
chain.
FAULT ID
NAME
The
2. CPRI
Check iftransmission
the 2.048 MHz is not workingsource
reference properly. is working normally and the
2004:
Flexi EFaultId_OIC_LVDSRecAl
BTS FDD/TDD: In case the FSP$U] is in
causing
$Protocol
INSTRUCTIONS
FAULT
This
2.048
1. fault
Check MHz NAME Timing
is
the valid
signal source
only
is
availability for lost
available.
ofradio
the
[on unit
modules
MME. chainthe
[,ainterface alarm, replace the
$IF]
configuration.
Flexi
FSM
1. Check
Antenna Zone
because the Micro the
GTP-U FDD,
alarmpath Flexi
is Zone
related
supervision Micro
to the TDD, Flexi
configuration of the the
fixed FSP Zone,
inside Flexi
eNB. Zone SFN
FSM.
Note:
2. Check TheLine Optif
the
Device
is from
connection
failure
the with perspective
the MME. of the previous radio.
MEANING
Antenna:
FAULT
2. Check ID if the supervised network element is operating correctly.
3.
3. Replace
Disconnect
fiber cable alarming
and reconnect module. the MME.
The
Flexi
61059:
3. BTS
Check
FAULT BTS IDtries
FDD/TDD,toisrecover
FTM_TIMING_SOURCE_LOS
the experiencing
communication fromequipment
AirScale interference;
aBTSfault situation
FDD/TDD: between databy transmission
performing
If FBBx
the eNB or ABIxanda site faulty.
reset.
is causing
the
INSTRUCTIONS
the alarm,
supervised
3100:
1. Check replace
network the
cables element.
EFaultId_AntennaLineFailure
the FBBx or ABIx.
between the RF module that is reported as the fault
A physical connection failure between the optical interface and the
INSTRUCTIONS
MEANING
source and the RF module that is closer to the system module or ABIA.
summing
Check
Flexi the
Zone function
fault has occurred.
history and
Flexiother Zoneactive faults of themight
unit.
One
In
2. case
MEANING of
Reset the theMicro FDD,
synchronization
actDualUPlaneIpAddress
RF module. sources =Micro
in the
true, TDD, Flexi
configured
the S-GW Zone,
priority Flexi
still beZone
list is SFN
Antenna:
unavailable.
reachable
The
3. System
If that Replace
although
doesModule, not help,thean Flexi
alarm
or block Zone
the device caused Micro
or unblock by BTS,
itself, thefault
hassite. Flexi
6150 Zone
is Access
reported.
detected an abnormal Point, or
FAULT
FAULT
The reason NAME
NAME of the fault might be aanylow-quality or polluted SFP,and or acheck low-
Note:
Flexi
This
Check
operation This
Zone
can be
whether fault
orSFN
the
a does
Antenna.
primary
other
failure not or
alarms
in therequire
the secondary
caused
antenna (special)
by
line clock
fault
device. actions.
6150 source.
are active,
4.
S1Ifinterface
Addressthat doesmismatch not help,
setup inused replace the RF module that is causing the alarm,
summing
quality
whether optical
all GTP-U cablefailure
paths to theconnect radio module to system not. Ifmodule
which is connected to theto faulty S-GW opticalare linkconcerned
to the end or farther you the
from find
and/or
In
thatthe extension
allalarm:
GTP-U Protocolbaseband
paths tocan thebe module.
samePDH, S-GWTiming areover faulty,Packet
the S-GW(ToP),isand not
INSTRUCTIONS
system
FAULT module.
ID
FAULT
Synchronuous ID ethernet. for the root BTS fault reported in the alarm on
reachable
Refer
2019: to the anymore.
instructions
6308: EFaultId_Muksu_SsubMmAl
EFaultId_S1SetupRetryOut
Clearing Time to Live

Do not cancel the alarm. The system automatically cancels 0


the alarm when the fault has been corrected.

Do not cancel the alarm. The system automatically cancels 0


the alarm when the fault has been corrected.

Do not cancel the alarm. The system automatically cancels 0


the alarm when the fault has been corrected.

Do not cancel the alarm. The system automatically cancels 0


the alarm when the fault has been corrected.

Do not cancel the alarm. The system automatically cancels 0


the alarm when the fault has been corrected.

Do not cancel the alarm. The system automatically cancels 0


the alarm when the fault has been corrected.

Do not cancel the alarm. The system automatically cancels 0


the alarm when
the fault has been corrected.

Do not cancel the alarm. The system automatically cancels 0


the alarm when the fault has been corrected.

Do not cancel the alarm. The system automatically cancels 0


the alarm when the fault has been corrected.

You might also like