You are on page 1of 9

PRACTICES FOR

PAGER ROTATION
DUTIES
PRODUCTION DEPLOYMENT AND
RELEASES
Unexpected problems affecting production deployment and releases
• Incidents
• Crash
• Unexpected breaks
• Outages (Kim et al., 2021)

Impacts
• recurring problems for Op downstream engineers
• Dissatisfied customers

Responses
• fix
• Escalate
PRACTICE 1
• Every individual in the value stream shares the downstream
responsibilities of handling operational incidents.
• Everyone, including the developers, architects, and
development managers, is on pager rotation.
• This ensures that the defects are fixed timely and prompts the
development of new functionality(Kim et al., 2021).
• When developers and managers get feedback on the operations
of the applications, including fixing defects, they are able to
make upstream coding and architectural decisions to improve
the application or the experience of the customers.
PRACTICE 2
On call rotations
• Engineers in the team responsible for maintaining software
availability are put on an on-call rotation, whereby they get
paged once there is a defect during a shift.
• The on-call engineer responds to the call and fixes the defect
immediately to avoid problems(On-Call Rotations and
Schedules, n.d.).
• They must be available in the course of their shift to perform
troubleshooting.
Some of the best on-call rotation practices include:
On–call scheduling software that automatically routes
notifications to the engineers following a predefined schedule.
It saves time and ensures that the experts get information at the
right time(On-Call Rotations and Schedules, n.d.).
• Another aspect of scheduling is making
people aware of when they are off or
on duty to avoid missing shifts.
On- Call practices
Setting up teams with individuals who have on-call
responsibilities. When an incident arises, the on-call engineer
is routed to the team responsible for the service. A
collaboration tool such as chat is used to recruit teammates
needed to work together on resolving an issue(On-Call
Rotations and Schedules, n.d.).
• Making the operations team responsible for maintaining and
responding to incidents is another practice. Operations team
has a tiered structure, with members of different levels. In case
of an incident, the level 1 members try to fix, and escalate it to
level 2 if they are unable to. This reduces operations costs.
On- Call practices
Define escalation policies which express the actions that
should be taken to resolve an incident, and who is the lines of
defense. For instance, the software engineer who developed the
code is in the first tier of defense. Escalation ensures that the
problem is noticed and resolved(On-Call Rotations and
Schedules, n.d.).
Developing time limits in which the incident escalated if the
first responder does not act within the time limit.
Effective on-call rotations ensure:

Happy and satisfied customers after being


helped by contact on-call employees

Saves time by quickly getting on-call


responders

Better service reliability

Improved transparency in handling incidents


References
• Kim, G., Humble, J., Debois, P., Willis, J., & Forsgren, N.
(2021). The DevOps handbook: How to create world-class
agility, reliability, & security in technology organizations. IT
Revolution.
• On-call rotations and schedules. (n.d.). PagerDuty. Retrieved
February 9, 2023, from
https://www.pagerduty.com/resources/learn/call-rotations-sche
dules
/

You might also like