You are on page 1of 6

Key Principles of Kyndryl Major Incident Management is one of the

most critical elements of the Incident


Major Incident Management process. It has high visibility
Management and can make or break our customers’
confidence in Kyndryl’s capabilities.

BE PREPARED
• Know your customer and customer stakeholders.

• Know your business, have architecture diagrams available and test


your environment credentials regularly.

• Understand what recovery options exist.

• Prepare your communications channels and review them regularly.

EXECUTE EFFECTIVELY
• Establish the overall Leader of the situation, involve the right
people at the right time and assign all responsibilities.

• Determine the scope of the incident and understand the


business impact.

• Concentrate on the recovery of the services; do not let


emotions rule the situation.
• Communicate status updates effectively and efficiently, keeping to
the communication plan and the agreed timeline.

LEARN AND APPLY


• Prepare and run a thorough Major Incident Review to understand the
gaps in all Major Incident Management areas.

• Review and understand the gaps in all Major Incident areas. Develop an
improvement plan and create lessons learned.
• Update architecture diagrams and configuration data as needed.

• Define follow-up activities, such as Hypercare, extensive Monitoring, post


incident customer communication/ interaction, etc.

REMEMBER: To our Customer, we are only as good as our capability to promptly


react and effectively restore the last Major Incident.
Key Principles of Kyndryl
Major Incident
Management

Other important resources:

MAJOR INCIDENT REVIEWS


https://kyndryl.sharepoint.com/sites/SEISM_Global/SitePages/Tips-and-Techniques--
.aspx

TECHNICAL SMEs ENGAGEMENT


https://kyndryl.sharepoint.com/sites/SEISM_Global/SitePages/Tips-and-Techniques---
Timely-Engagement-of-Expertise.aspx

INCIDENT MANAGEMENT PRACTICE


https://kyndryl.sharepoint.com/sites/ServiceManagementinKyndryl/SitePages/Incident-
Management.aspx

KYNDRYL SERVICE ASSURANCE TEAM (KSAT)


https://kyndryl.sharepoint.com/sites/Advanced-Global-Delivery/SitePages/Kyndryl-
Service-Assurance.aspx

MAJOR INCIDENT MANAGEMENT AND ALERTING


https://kyndryl.sharepoint.com/sites/SEISM_Global/SitePages/Major-Incident-
Management-Education.aspx

MAJOR INCIDENT MANAGEMENT YAMMER COMMUNITY


https://web.yammer.com/main/groups/eyJfdHlwZSI6Ikdyb3VwIiwiaWQiOiIxMjg2NTQ2
MzkxMDQifQ

REMEMBER: To our Customer, we are only as good as our capability to promptly


react and effectively restore the last Major Incident.
Major Incident Management
– recommended knowledge

GENERAL
• Crisis Management
• Communication Techniques
• Executive communication skills
• Handling difficult negotiations
LEADERSHIP • Mindfulness
• Global Major Incident Management and Incident alerting
• Kyndryl Internal Major Incident Support teams
• Communication Techniques
ENVIRONMENT • Problem isolation methodologies
• Troubleshooting methodologies
• Understanding IT Service Environment
• Communication Techniques
• Digital RCA
• Executive communication skills
PROCESS
• Global Major Incident Management and Incident alerting
• Kyndryl Internal Major Incident Support teams

ACCOUNT SPECIFIC

• Architectural solution overview


• IT service vs Business service map and criticality
• Account resources and their Focal Points (escalation matrix)
LEADERSHIP
• Vendor services overview
• Vendors contacts/matrix
• Architectural solution overview
• IT service vs Business service map and criticality
• Vendor services overview
• Vendors contacts/matrix
ENVIRONMENT • Systems dependability
• Risks (risk register)
• Account resources and their Focal Points (escalation matrix)
• Major Incident Process and Timelines
• Major Incident Communication templates and channels
• Vendors contacts/matrix
PROCESS
• Account resources and their Focal Points (escalation matrix)
How can you prepare for a failure ?

CREATE ‘MUSCLE MEMORY’


• Consider if you know who needs a seat at the Major Incident table

• Do you have contact and escalation information for them?


• Do they know what is expected of them?
• Do they have their standard health checklists, does the major incident team
have a copy, are they executed consistently?
• Do you have the necessary contact information to raise service tickets with
your other partners (credentials, customer numbers, escalation, etc.)

KNOW YOUR BUSINESS

• Do you have architecture and topology diagrams?

• Do you understand what business cycles can influence priorities?

• Do you have workarounds already built for some services?

• Do you have a DR program, is in inclusive, when was it tested last, what


were the results?

PREPARE COMMUNICATIONS PLAN


• Do you have pre-defined bridges established?

• Do you know who needs to be on which bridge?

• Do you have distribution lists that are regularly verified?

• Do you have a pre-established cadence in place for communications?


Major Incident Management
– first minutes are crucial!

Major Incident Manager


The first 5 minutes – what does should each team member being doing?

• If the call is already opened – LISTEN – can you hear chaos or control?
• Is there someone who has already taken control of the call – is it the
right person?
• Is the triage program being executed, actions assigned, or evidence
that the issue is being narrowed down?
• If the bridge is not yet open – get it opened! Open a chat channel
with technical team to enable easier communication.

Technical Support

• Have you checked for alerts, alarms?


• Is your team working currently on associated issues – could they be
related/ causal?
• Have you initiated your standard health checklist (the 10 -15 things
that you should always check as you enter the call) ?
• Check in to the Major Incident chat, announce yourself by providing
the area you cover, status info and any errant conditions you found
that appears unusual or outside expected values.

Client Interface

• How is the client describing impact?


• Is the client taking actions themselves to minimize impact/ affect?
Will these actions affect our ability to triage/ remediate?
• Had the client provided direction on communications cadence
requirements? If yes, have you shared that with the Major Incident
Manager?

1
Major Incident Management
– first minutes are crucial!

Major Incident Manager

• Review recent changes – engage the change manager.


• Build the IS/IS NOT map – develop the Incident Statement.
• Work to create the first communication (‘we got it’).
The next 15 minutes…

Technical Support

• Pull topology diagrams, transaction flows – present them on the Teams


call.
• Initiate log pulls for all components within the affected transaction flow.
• Identify which support tickets need to be opened.

Client Interface

• Ensure the client knows the technical teams are engaged.


• Work with business to identify required communications cadence (based
on time of day, level of impact, business cycles).

Major Incident Manager

• Develop the Fix – Failover - Fallback plan.


• Continue communications cadence.
• When ready – develop the stand down plan.
From then on…

Technical Support

• Continue execution of actions approved by Major Incident Manager.


• Review similar issues for any insights to recovery.
• Escalate for additional skills where needed.

Client Interface

• Determine likelihood of client escalation to Kyndryl – initiate/ manage


Tier alerts as required.

You might also like