You are on page 1of 46

TOPIC 4

FAULT MANAGEMENT
OUTLINE
• Introduction
• The process of locating fault
• Benefits of fault management (FM)
• FM steps
• FM tools
• FM reporting
• Event & Log Management

2
Introduction
Several common terms related to fault
•“Network is DOWN!”
•“Request time out”
•“No Connection”
•“Connection fails”
•………………………..

3
Introduction
• What is fault? – error, problems.
• What causes fault?
• Components / hardware / hosts
• Software / applications system
• Links / media (heavy / congested)
• Others: power failure

4
Fault Management
• The process of locating network problems or
fault.
• It is used to:
• Trace, detect and identify the occurrence of fault
on the data network (performing diagnostic tests)
• Isolate the cause of a fault (performing actions
upon error detection)
• Correct or repair the fault (if possible)
• Manage error logs (maintenance of error logs)

5
Fault Management (cont’d…)
• It encompasses activities such as:
• ability to trace faults through the system
• To carry out diagnostics
• To act upon the detection of errors in order
to correct the faults
• To trace errors through the log and time
stamping

6
Benefits

• Increase network reliability –by providing


tools to quickly detect problems and
initiate recovery procedure (solution).
• Increase productivity of NE
• The effectiveness of a network

7
Accomplishing FM

• Fault Mgt involve 3 steps:


• Identify the fault
• Isolate the cause of the fault
• Correct the fault (if possible)

8
Fault Detection
• To learn that a problem exist, we need
to gather data about the network.
• Faults can be detected by either:-
• Error report generation (Logging critical
network events or notifications)
• Monitoring (Polling network devices)

9
Critical Network Events

• Transmitted by a network device when a fault


condition occurs (i.e.: of critical network
events):
• Failure of a link
• Restart of a device
• Lack of response from a host
• Relying solely on CNE may not have always
up-to-date status on every network device.
• i.e.: network device that fails completely, cannot
send an event.

10
Polling of Network Devices
• Can help NE find faults in a timely manner.
• However, NE must weigh the degree of
timeliness in finding problems vs the
consumption of bandwidth involved.
• The shorter the notification, the greater the
amount of bandwidth.
• Another factor when deciding on polling
interval are the number of devices NE want
to poll & the bandwidth of the link.

11
Polling of Network Devices (example)
• Each query & response (incl. Data, header,
control info.) = 100 bytes long
• Each device require 200 bytes.
• For 30 devices = 6000 bytes or 48 kb /each
polling
• For 1 hour, with polling every second would
require (48,000 bits * 3600 seconds)
≈173 megabits of bandwidth.

12
Faults priority
• Not all faults have the same priority
• NE needs to decide which faults must be
managed (the most important types of faults
for his network environment).
• This is because:
• If # of faults is high, NE cannot handle the
volume.
• Reduce redundant or useless info and minimize
the waste of bandwidth by limiting the event
traffic.

13
Faults Determination
• The determination of which faults to
manage will be influenced by:
• The scope of control by NE over the
network
• Size of the network.
• This will affect the amount of info NE
can obtain from network devices.

14
Tools Implementation
• After deciding which problems require mgmt
and determine how NE will collect the data,
the next step is to implement the necessary
FM tools.
• Tools effectiveness rely heavily on the type of
info the network devices provide.

15
Fault Mgt Tools
• There are 3 types of tools:
• Simple Tool
• More Complex Tool
• Advanced Tool

16
Simple Tool
• Show the existence of a problem but not
indicate its cause.
• Queries devices (ping ~ to test the
connectivity)
• Useful for old devices or hosts on the
network that are not sophisticated enough to
send network events.
• Output of this tool:-
• Log file
• Changing colors on network map.

17
Example of Ping Test

18
More Complex Tool
• Useful for devices that can generate or report
network events.
• It will inform NE when it detected a problem by
logging network events or by polling.
• Finding a fault through a critical network event
also helps isolate the cause or report the
device.
• It performs a bit of fault mgmt, but it doesn’t
correct the problem.

19
Advanced Tool
• Able to:
• Show / identify problems by using polling
or logging critical network events.
• Isolate the cause of the problems.
• Correcting the faults.
• Report problem

20
Form of Reporting Faults
• The form in which a fault is reported is
important.
• The most common forms are:-
• Text messages
• Graphical messages
• Auditory signals

21
Form of Reporting Faults
• Text msg is the best choice since it works on
any type of terminal.
• A picture msg is most effective but it needs
access to a color display capable of
sophisticated graphics or flash the picture of
the device with the fault.
• Audible bell or noise can quickly call NE
attention.
• Combination of any of the messages would
be an extra advantage.

22
Advantage of color graphics
• Color graphics will help indicate the status of
a network device more efficiently.
• A graphical interface (network map) could
show every device on a map drawn by NMS.
• Color scheme can be used to indicate the
status of each device.
• Green ~ device up with no errors
• Yellow ~ device may have an error
• Red ~ device in error state
• Blue ~ device misconfigured

23
Alarm Reporting Function (ARF)
• ISO 10164-4 standard that provide many
definitions on alarm types, alarm causes,
severity levels.
• ARF allows the MO to send notifications to the
managing process (manager) about a variety of
problems encountered by a MO.
• Notification is a message that contains the
message details, why it happens, where it
happens and to whom it’s intended to,
produced by MO when abnormal operation
happen.

24
Alarm Reporting Function (ARF)
• An organization uses ARF to report the
following problems:
• Communication failure (distortion of signal,
CRC error)
• Processing failure (lack of memory, file
access errors)
• Quality of service failure (throughput,
response time)
• Equipment failure (cable problems, locked
ports)
• Environmental failure (high/low temperature,
smoke detection)

25
Alarm Reporting Function (ARF)
• This alarm is carried by
M_EVENT_REPORT from agent to
manager.
• It is packaged inside alarm reporting
service which can be found at the
agent and manager.

26
Alarm Reporting Service

27
Alarm Reporting Service
• It includes the parameter that can be
found inside the M_EVENT_REPORT sent
to the manager.
• Parameter inside the alarm reporting
service:-
• Invoke Identifier
• Mode
• Managed Object Class
• Managed Object Instance

28
Alarm Reporting Service Parameter
• Event Type – showing types of alarm
• Communications
• Quality of Service
• Processing Error
• Equipment
• Environmental

29
Alarm Reporting Service Parameter
• Event Information –parameters such as:-

• Probable cause • Trend Indication


• Specific problems • Threshold Information
• Perceived Severity – • Notification Identifier
has 6 level: critical, • Correlated Notification
major, minor, warning, • State Change Definitions
intermediate, cleared.
• Monitored Attributes
• Backed-up status.
• Proposed Repair Actions
• Backed-up object.
• Additional Text
• Additional Information
• Current Time
• Errors

30
Event Report Management Function
• ISO 10164-5, is used for distributing and
controlling events reports.
• The function allows a manager to select the
events that are reported within a selected
period of time.
• All notifications from MO need to be filtered
before sending to manager.
• It will ensure the necessarily to log the event
report to the database.

31
Event Report Management Function
• ERMF is built on an event mgmt model
which consists of:-
• Discriminator (mgmt service control
discriminator)
• EventForwardingDiscriminator / event
forwarding control function
• The functions of these two MO are to add
the flexibility and control on how
notifications are changed to event report.

32
Event Report Management Function
• ERMF can support the following
services:-
• Receive and analyze notification from MO
• Initiate the event reporting service
• Terminate the event reporting service
• Suspend event reporting service
• Resume event reporting service
• Modify event reporting criteria
• Retrieve event reporting criteria

33
Discriminator
• Filter which notification need to be sent with
the time interval.
• Attribute inside the discriminator :-
• Discriminator id
• Discriminator construct
• Administrative state
• Operational state
• Notification from MO will be processed and
change to the event report.

34
EventForwardingDiscriminator (EFD)
• Event report from discriminator is the
input for EFD and this EFD decides
which event report should be sent.
• EFD will send the event report to the
destination according to the destination
attribute inside EFD.

35
EventForwardingDiscriminator (EFD)
• The EFD has attributes such as:-
• Destination
• Optional backup destination
• Optional confirmed or non-confirmed mode
package

36
Event Report Management Function

37
Event Report Management Function

• Conclusion: There are two most


important function for ERMF model:
• Event detection & processing function
• EFD processing function

38
Log Control Function (LCF)
• ISO 10164-6 ~ describes the operations
for the NM log.
• NM logs are used to record information
about the MO in the networks.
• LCF establishes the rules for the
operation of the log. These rules are
known as log behavior.

39
Log Control Function (LCF)
• Events and notifications received need to be
kept for use in the future or to analyze the
problems.
• Notification sent by MO need to be processed to
produce log record.
• Log record will be sent and keep into log files
inside the system through filter or
discriminator.
• Discriminator or filter uses the behavior (rules)
how the this log record will be kept.

40
Event And Log Management

41
Event And Log Management
• MO Log has attribute :-
• Log id
• Discriminator Construct
• Administrative State
• Operational State
• Log Full Action
• Availability Status
• Finite log size package
• Log alarm package
• Scheduling package
• Record log is used to keep the information using log
file, and it can be erased or retrieved.
• Record log has the attribute: LogRecord Identifier &
Logging Time.

42
Event And Log Management
• Manager should be capable of managing
the log file (create log, delete log,
suspend and resume log activities,
modify log attribute, delete log record,
retrieve log record).
• The function of log management is the
control the operation of log and record
logs in the agent. It also interact with the
event log.

43
Trouble Ticket
• Also known as ‘tracking throughout the life
cycle’.
• It is the basis for front-end application for fault
management where it is being monitored.
• The items needed inside the trouble ticket are:-
• Date and time
• Place
• Description of the problem
• Priority in solving the problem
• Status
• Location
• Person in charge (verify, solve)
• Date & time when the problem is solved

44
Architecture for Trouble Ticket

45
SUMMARY
• FM Definition
• FM Objectives
• FM Benefits & Advantage
• FM Tasks/Responsibilities
• FM Tools
• Other FM Functions
• Alarm Reporting Function (ARF)
• Event Report Mgt Function (ERMF)
• Log Control Function (LCF)
• Trouble Ticket

46

You might also like