You are on page 1of 3

 What is fault management

Fault management is the component of network management concerned with detecting, isolating and
resolving problems. Properly implemented, fault management can keep a network running at an optimum
level, provide a measure of fault tolerance and minimize downtime. A set of functions or applications
designed specifically for this purpose is called a fault-management platform.

Important functions of fault management include:

 Definition of thresholds for potential failure conditions.


 Constant monitoring of system status and usage levels.
 Continuous scanning for threats such as viruses and Trojans.
 General diagnostics.
 Remote control of system elements including workstations and servers from a single location.
 Alarms that notify administrators and users of impending and actual malfunctions.
 Tracing the locations of potential and actual malfunctions.
 Automatic correction of potential problem-causing conditions.
 Automatic resolution of actual malfunctions.
 Detailed logging of system status and actions taken.

 Fault Management Manager. (1st, 2nd line support)

Fault Management to take end-to-end responsibility for corrective maintenance of Network problems
ensuring trouble tickets are actively managed until resolution to SLA (Service Level Agreement, which
defines the interdependent relationships in support of a service-level agreement (SLA) & OLA
(Operational Level Agreement, The agreement describes the responsibilities of each internal support
group toward other support groups, including the process and timeframe for delivery of their services.)
requirements:

 An operational-level agreement (OLA)


 Leadership and Management of Fault Management team
 Leadership and Management of the Shift teams including shift scheduling/handover
 Leadership of technical support specialists
 Leadership of Trouble Ticket & Works Order Dispatch & Admin
 End-end management of Fault Management process
 Ensures 24x7 cover is available (shift & callout) for all technologies as needed
 Fault Management Process & Tool development
 Coaches and develops Fault Management team
 Ensures escalation management and technical support
 Manages escalations as required and is active member of escalation team
 Liaises with Customer Care organisations regarding Network outages
 Liaises with other service providers regarding network outages
 Ensures end-end support, coordination and control of assigned Trouble Tickets
 Provides reports for service outages and recommends follow up actions
 Reviews Trouble Ticket reports against SLA/OLA and recommends follow up actions
 Provides Major service outage investigations and follow up
 Ensures planned outages are carried out in maintenance window
 Ensures Operator Customer Care is fully updated for service affecting outages.
 Fault Management Engineer - BSS

Key exposure requirements:

 Sound knowledge of BSC (Flexi, mcBSC) & RNC (Classical & mcRNC) fault troubleshooting
 Good understanding of LTE (eNB & OMS)
 BTS/ NodeB/ eNB commissioning and troubleshooting knowledge
 Good knowledge of MS office, specially Excel.
 Excellent communication Skill
 Perform Daily Health Check of BSC/RNC/WBTS/BTS and resolution of faults.
 Handling Configuration Support & fault analysis
 Responsible for analysis & rectification of critical BSC Faults and alarms
 Taking RNC & BSC Backup
 First point of contact for resolution of any RF/KPI related issues
 BTS Troubleshooting & Commissioning.
 Providing Technical Support to Fault Management team on ongoing issues/faults
 Real time alarm analysis and outage handling and providing end to end support for rectification.
 Provide technical support to team for all faults and configuration related support.
 Participation in all Governance meets and closure of all actions points opened on FM.
 Should have Experience in Alarm Monitoring
 Hands on exp on trouble ticket handling for outage alarm with in SLA.
 Handling Alarm Monitoring, Fault Localization/ Correction/ Verification.
 Exp on network Monitoring for alarms for all network elements.

Position Description

Fault Management takes end- to- end responsibility for corrective maintenance of Network problems
ensuring trouble tickets are actively managed until resolution to SLA & OLA requirements. It is
completed responsible for full fault management in the network from alarm surveillance to resolution.
Alarm Monitoring, Fault Localization/ Correction/ Verification* resolution from remote delivery center.
Corrective Maintenance (centralized routines) *
Liaises with Customer Care organizations regarding Network outages
Provide support and coordination with subcontractors and 3rd parties to resolve faults.
Liaises with other service providers regarding network outages.
Supports end- end support, coordination and control of assigned Trouble Tickets and work- order.
Providing incidents reports and RCA for the outages in the network.
Supports Major service outage investigations and follow up
Ensures planned outages are carried out/ rolled back in maintenance window
Ensures Operator Customer Care is fully updated for service affecting outages
Operation and maintenance for multi vendor transmission nodes (PDH, SDH, DWDM)
Carry out preventive proactive measure to improve network availability and reducing MTTR.
Working experience on Transmission link configuration & integration.
O&M for transmission equipment ( optical and Wimax) , SDH, PDH.
Follow NOC procedure / process and ready to work in 24x7 environments.
Knowledge of 2G, 3G , GSM, CDMA.
 NOC Manager (Responsibilities)

Provide leadership, mentoring, and management of Tier 1/2 engineers in the NOC

Make sure that trouble tickets are completed correctly, quickly and professionally by your staff

Analyze root causes of problems and find proactive ways to avoid them in the future

Develop team schedule, set clear expectations of daily activities, monitor the progress of service issues to
their timely resolution, review and post completed trouble tickets

Be the lead engineer for strategic accounts as required

Create Standard Operating Procedures for the NOC

Be the escalation point for issues that Tier 1/2 engineers are not able to solve

Deliver consulting engagements including but not limited to Exchange, Office365, VMware, HyperV, MS
Server, migrations, upgrades, wireless and wired networks, disk array sizing and implementation, and
new deployments of technology

Experience with AppAssure or Veeam a plus

Experience dealing with complex IP networks and diverse equipment; Cisco experience a huge plus

Qualifications:

Supervisory/management experience in an IT services environment

Current IT industry certifications with relevant real-world experienced to back up the claim of being a
Tier 3 engineer

Be action-oriented, hold yourself and others to a high standard of quality service delivery, and improve
the proactivity of service to our clients

Have a clear understanding of how an MSP model operates and what it takes to be successful in both
service delivery and revenue generation

Highly proficient at the configuration and management of PSA and RMM software with Autotask and
AVG/LPI experience being ideal

Must be self-driven and have the ability to know when things just need to get done

Must have the ability to learn “on the fly” when it comes to new technology

Must be able to commute to Carlsbad, CA to perform this job on site

You might also like