You are on page 1of 19

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/366812512

Artificial Intelligence for IT Operations – Basic Guide to Start with AIOps

Research · January 2023


DOI: 10.13140/RG.2.2.20295.16803

CITATIONS READS

0 1,151

2 authors:

Mari Onkamo S M Tahsinur Rahman


Lappeenranta – Lahti University of Technology LUT Lappeenranta – Lahti University of Technology LUT
3 PUBLICATIONS 1 CITATION 6 PUBLICATIONS 93 CITATIONS

SEE PROFILE SEE PROFILE

All content following this page was uploaded by Mari Onkamo on 02 January 2023.

The user has requested enhancement of the downloaded file.


ARTIFICIAL INTELLIGENCE FOR IT OPERATIONS

Basic Guide to Start with AIOps

2 January 2023
Mari Onkamo, S M Tahsinur Rahman
LUT University
ABBREVIATIONS

AI Artificial intelligence

AIOps Artificial Intelligence for IT Operations

I&O Integrated operations

ITOM IT operations management

ITSM IT service management

KPI Key performance indicator

ML Machine learning

MLOps Machine learning operations

MTTD Mean time to detection

NLP Natural language processing

ROI Return on investment

SLA Service level agreement

SRE Site reliability engineering

2
Table of contents
1 Introduction .................................................................................................................... 4

2 AIOps ............................................................................................................................. 5

2.1 What is AIOps and why is it used for ..................................................................... 5

2.1.1 Event correlation .............................................................................................. 7

2.1.2 Anomaly detection ........................................................................................... 7

2.1.3 Automation ...................................................................................................... 7

2.1.4 Performance analytics ...................................................................................... 8

3 Background of AIOps and other IT operations .............................................................. 9

4 Impacts and benefits of AIOps ..................................................................................... 10

4.1 Impacts of AIOps .................................................................................................. 10

4.2 Benefits of AIOps ................................................................................................. 11

5 Future of AIOps ............................................................................................................ 12

6 Implementing AIOps .................................................................................................... 14

6.1 Observability ......................................................................................................... 14

6.2 Predictive analytics ............................................................................................... 15

6.3 Proactive response ................................................................................................. 15

7 Conclusion .................................................................................................................... 15

References ............................................................................................................................ 17
1 Introduction

Volume of data is growing in high-speed (“Total data volume worldwide 2010-2025,” 2021)
and only minor of those are examined. One evaluation estimated that just 0.5% of all data
was analyzed in 2015 and the percentage would even decrease while the amount of data
increases (Guess, 2015). With the ever-increasing quantity of data, it is impossible to wait
for humans to interpret and provide insights of those. Additionally, as the digitalization is
pushing all, including the most traditional companies, to transform to new data driven
business models (Minashkina and Happonen, 2020; Palacin et al., 2020), there aren't any
signs for the reduction of the amounts of generated data. Artificial Intelligence for IT
Operations (AIOps) takes advantage of artificial intelligence (AI) tools, like machine
learning (ML), to automate operational processes, for instance, incident handling. This
solution contributes to better utilization of data and increases the percentage of analyses
done.

The report in hand aims to gather basic information of AIOps that anyone new to the topic
should know to get the fundamental understanding of the subject. The report does not cover
real life examples nor focus on detailing all variations of AIOps, but it pursues to provide a
starter pack from where to begin with the topic. The report is done by literature review
utilizing journals and articles found from various sources via digital libraries and internet.

First the report introduces the reader to AIOps, what it is and where is it used for. The
following chapter goes through the background of AIOps, where does it come from and the
reasons why it was developed in the first place. The next chapter describes the impacts and
benefits of AIOps, what kind of influences it has to IT operations and business environment,
and what are the main benefits why an IT organization would start utilizing AIOps. The fifth
chapter looks towards the future and attempts to see the trends where the AIOps is heading
to in coming years. The following sixth chapter presents best practices how organizations
can implement AIOps. The closing chapter draws together the conclusions of the report.

As a relatively new concept, AIOps has little scientific articles to cover an overall theory.
There are lots of books and articles of separate tools in AIOps, for instance, AI, ML, and
process automation. This report provides a general information of AIOps by introducing the
concept and going through the common use cases, history, effects, and future, and explaining
how AIOps can be implemented.
4
2 AIOps

AIOps is a modern application utilized to increase a performance of IT service management


(ITSM). The application is one of the latest trends in universal practices to streamline work.
This chapter composes an overview what is AIOps, where is it used for and why.

2.1 What is AIOps and why is it used for

AIOps i.e., Artificial Intelligence for IT Operations, is an implementation where artificial


intelligence is utilized in IT operations to improve and automate operative IT tasks (“aiops,”
2022). The AI methods in AIOps are, for example, machine learning (ML), neural networks,
or natural language processing (NLP). These AI tools are then used to improve and robotize
ITSM workflows from developing and delivering IT services to manage them. The
workflows can include pieces of work, like creating service incidents or managing event
notifications or remote connections.

Figure 1 AIOps Platform and tasks to automate with AIOps (Prasad et al., 2022)

AIOps Platform and tasks to automate with AIOps (Prasad et al., 2022) illustrates a picture
of Gartner’s view of AIOps platform in IT operations management (ITOM) and the tasks
which could be automated by AIOps.
5
In the center of AIOPs platform is the big data analytics and machine learning. The machine
learning is used to observe, engage, and automate workflows. Each of these workflows have
signature tasks: history analysis and anomaly detection to observe, task automation and
change risk analysis to engage, and scripts and runbooks to act, just to mention few. In Figure
2 is Splunk’s interpretation of AI use cases in AIOps.

Figure 2 AI use cases in AIOps (Splunk, 2020a)

The following subchapters cover some of the most important use cases of AIOps: event
correlation, anomaly detection, automation, and performance analytics (Prasad et al., 2022).

6
2.1.1 Event correlation
Event correlation is the first widely spread solution of AIOps and a base for current AIOps
solutions. This applied science was exploited in the first wave of AI in the late 1980s. The
event correlation uses rules and logic-based observations to filter and group event data
stream. Its core idea is to construct rules which are implement to straightforward and static
IT solutions (Cappelli et al., 2019). Event correlation groups events that should be noted and
prioritizes service issues. It is automatized based on key performance indicators (KPI) or
other business measurements.

2.1.2 Anomaly detection


Anomaly detection might be the most important use case in AIOps today. It discovers
patterns to determine normal and abnormal behavior. Anomaly detection can follow a one
KPI or several services simultaneously to detect indicators of approaching issues and lessen
the impact it would cause (Splunk, 2020a). One example of anomaly detection use case is a
change in customer behavior.

2.1.3 Automation
Automation is one of the key use cases of AIOps. Human labor is time-consuming and
vulnerable to errors. Robots help automatizing mundane work without aberrations and can
work close to 24/7 without breaks. Robots can also perform the work a lot faster than humans
(Collins, 2020).

In AIOps, the goal is to automate as many workflows as possible. As a rule, all tasks that
can be standardized, can also be automated. Yet, it is still not wise to automate everything.
Tasks related to low Return on Investment (ROI) and customer support should be left
unautomated because of minimal impact and often negative effect on customer experience
according to a blogpost in DigiCert (2021). Instead, the blogpost lists the activities that
should be automated in the following way:

• manual tasks that are repetitive


• tasks with high volume
• processes that are sensitive to human error

7
• processes that are audit-sensitive
• tasks that are done by multiple people
• tasks that are time-sensitive
• updates

2.1.4 Performance analytics


Because of the increasing amount of data, the volume and variety of those are certainly too
wide. IT developers just cannot analyze data normal methods, even though those would
include ML practices. AIOps assistances exploit more sophisticated techniques to digest
biggest data magnitude to perform advanced analytics to recognize right service levels or
other business promises, even before problems occur (Splunk, 2020b). The performance
analytics are used to discover root-causes for issues, and automate analyses, reporting and
insights of business data.

By Gartner’s research, AIOps took part in 40% of conversations concerning to all inquiries
which Gartner’s clients gathered for IT performance analysis in the past 12 months (Prasad
et al., 2022). The inquiries addressed topics were:

• Awareness of technology and market.


• Selection of platforms.
• Decisions regarding to build versus buy considerations.
• AIOps deployments optimizations for existing solutions.
• Initialization of a strategy to new solutions.
• Pros and cons of a shared platform over security, I&O (Integrated Operations),
DevOps, and SRE (Site reliability engineering) functions.
• Data visualizations, diagnostics, and recommendations for several AIOps use cases
related and nonrelated to IT.
• New use cases for event correlation based on global pandemic.

8
3 Background of AIOps and other IT operations

AIOps as commonly known as Artificial Intelligence for IT Operations works as a


combination of big data and ML that helps automating IT operations processes which also
includes correlation of events, anomaly detection and casualty determination. AIOps being
relatively new concept is quite popular and on-demand in the IT industry. However, it came
into practice through evolution.

When we look at the definition of AIOps we get some idea of how it works, but initially
AIOps was introduced as Algorithmic IT Operations which was later coined as Artificial
Intelligence for IT operations by Gartner. (Levin et al., 2019)

Discussing about AIOps is incomplete without mentioning Machine Learning. Machine


learning was introduced in IT operations back in the year 2001 where it was used for
analytics and pattern recognitions with respect to algorithms and their functionalities. As
analytics using Machine learning started getting popular, by the year 2010, several
enterprises shifted to cloud infrastructure. With that shift, ML also experienced a growth in
IT operations.

In the year 2016, AIOps was coined by Gartner, which is a technological R&D firm. Before
coining the term AIOps officially, researchers at Gartner realized how this could be a
massive thing. Through their immense network and research, they saw how Machine
learning coupled with big data were favoured by global IT companies. Seeing the upgoing
trend and how AI could be a potential solution to automate several IT operations by reducing
time loss and increasing efficiency, Gardner as a company put more emphasis on IT
operations using AI and coined the term AIOps. (Lerner, 2017)

AIOps often confused as like machine learning operations MLOps, is different in terms of
operation. However, the introduction of AIOps is incomplete without a short brief on
MLOps. In general, MLOps is the standardization and streamlining of machine learning life
cycle(Treveil, 2020). Machine Learning Ops is referred as the set of principles and practices
that helps production of DevOps tasks efficiently by deploying and maintaining machine
learning models. These set of principles helps in automating the deployment of ML and Deep
learning in large-scare production environments. (Canuma, 2022)

9
Figure 3 The History of AIOps (Feldman, 2018)

4 Impacts and benefits of AIOps

Like any other development practice, AIOps has influences on business. Some of the
influences are valuable and bring edge to companies and others might be negative or
transform the ways of doing business. This chapter discusses the influences AIOps
cultivates, first from the perspective of impacts and then from the perspective of benefits for
businesses and work environment.

4.1 Impacts of AIOps

Today, there are lots of enthusiasm around data, especially with big data analytics and
artificial intelligence. This is not a surprise because data is said to be the new oil (“Data is
the New Oil,” 2006). Enterprises are trying to find ways to discover insights and gain
competitive advantages by means of data analytics. New projects are started, and modern
technologies are implemented with a rapid speed. There is a pressure on developers to
10
discover innovative solutions and implement advanced analytics into use. Further, AIOps
reduces mundane work of IT professionals and leaves time to solve more complex problems,
like Slunk (2022) writes:

“Implementation of AIOps platforms will be done to augment the capacity of


existing IT departments, taking on repetitive or well-understood tasks and
leaving IT professionals free not only to solve more complex problems, but
also to plan and innovate. In other words, AIOps adds much-needed “slack” to
the system, to give teams precious time needed to work on longstanding
projects that otherwise never seem to receive attention.”

Developers are pioneers in tech which is why their influence in adapting innovative solutions
has a crucial impact to organizations implementing AIOps. Hence, AIOps is dependent on
developers (Cappelli et al., 2019). Developers are also needed because AIOps solutions are
built by humans, at least for now (“Artificial intelligence is evolving all by itself,” 2020).
Another notable aspect when discussing of impacts of AIOps, is the data. Analyses are as
good as the data. High-quality of data enables high-quality in analyses, and vice versa
(Harding, 2019).

4.2 Benefits of AIOps

AIOps can analyze enormous mass of network and machine data to find patterns which could
not be possible by humans (Splunk, 2022). Slunk (2020b) enumerates some of the main
benefits of AIOps, like:

• Customer and employee satisfaction increases by lowering downtimes.


• Comprehensive analyses and insights are more feasible by bringing siloed
data sources together.
• Saving time, money and recourses is possible by systematic root-cause
analyses and enchantments.
• Service delivery improvements increase by the fast-pace and solidity of
incident response.
• IT’s capacity supports growth, which lead to more quickly improves in
discovering and fixing complex issues

11
• IT teams can prioritize and focus on advanced analysis and optimization by
recognizing and check errors proactively even before those occur.
• System forecasting and application development can meet upcoming demand
by proactive reaction.
• Humans can focus on more demanding problems and increased productivity
by allowing artificial intelligence to handle mundane work.

IBM also highlights a better and more productive cooperations between different IT roles,
for example, in DevOps, IT operations, governance and security functions (“aiops,” 2022).

5 Future of AIOps

Though AIOps is new in the IT industry as a technology, it has gained good reputation
already. And, looking forward, it seems like the future of IT is a lot aligned with AIOps.
Gartner, the firm that coined the term AIOps officially, released a report in the year 2020
which predicted that 40% of DevOps teams will adopt to AIOps for automating their work
models. (Gartner, 2020) On another note, a technical briefing published by Microsoft in the
year 2019 predicted that by the year 2024, 60% of the firms globally will adopt to AIOps for
their DevOps team. (Dang et al., 2019)

Now the question arises, why is AIOps considered as the future of DevOps? The blog from
IBM gives an overall insight into why AIOps is the future. As the many of the DevOps teams
globally are shifting towards AIOps, it is considered as trustworthy as it provides clarity and
visibility throughout. AIOps is being used by IT leaders by generating futuristic analysis and
insights across the development lifecycle of an application. In this modern era, we can see
that the use of cloud infrastructure is growing exponentially. With that rise, the systems are
getting more complex as well which needs constant monitoring. Thanks to AIOps, these
monitoring can be done completely without having any lag in real time. (IBM Cloud
Education, 2021)

12
Figure 4 Vision of Future of AIOps (Dang et al., 2019)

Projecting the future of AIOps, it could be said that value of AIOps is immense which would
ensure higher quality of service including customer satisfaction. In addition to that,
productivity in the engineering section would increase while it would contribute to reduce
the operational costs as well. Future of AIOps could be envisioned into three segments (Y.
Dang et al. 2019). In figure 3, how different sectors in IT will grow in the future because of
AIOps. They are briefly discussed below -

a. High Service intelligence- Looking forward to the future of AIOps, it is expected


that AIOps-powered services would be time-aware in real time. The system is also
expected to predict any future changes in the service based on history and trends.
This would result in streamlining the AIOps-powered services more efficiently.

b. High Customer satisfaction- As most of the IT systems are being upgraded to


modern standards, it also noted that the IT systems are being more complex for the
customers to use. Whenever a customer is engaging with unwanted lags within the
service, with the help of AIOps-powered systems, the customers could anticipate
what is going on as the system itself would engage with the customers. In addition
to that, based on a customer’s need, an AIOps-powered system would be able to
anticipate the need of a customer and suggest improvements of the service
13
accordingly. This would result in massive customer satisfaction with AIOps-powered
services.

c. High Engineering productivity- Within any IT team, some major drawbacks to


saving time goes into solving few manual issues such as collecting information while
solving an issue or ticket, fixing recurring problems. Now here is the futuristic
assumption, with an AI assisted service or with AIOps, engineers engaged in
maintenance would have much better insights into fixing these problems efficiently.
Thus, productivity would also increase.

6 Implementing AIOps

Implementing an any new concept requires flexibility to change. There is no one feasible
way to start implementing AIOps but there are some suggestions for it. This chapter presents
ideas on how to start with implementing AIOps.

Regarding to IBM, implementation of AIOps has three basic aspects when beginning to
utilize AIOps tools: observability, predictive analytics, and proactive response (“aiops,”
2022). Next subchapters cover these aspects.

6.1 Observability

Observability relates to collection, aggregation, and analysis of operational data. The core
of observability is to monitor, troubleshoot and debug live-stream IT implementations to
match customer’s anticipations, like service level agreements (SLAs) or other business
promises. The data collection and aggregation are done to various data sources across
applications, infrastructure, and network (“aiops,” 2022). The observability is the base for
further development of AIOps.

14
6.2 Predictive analytics

Part of AI solutions are utilized to analyze and correlate data to gather finer understanding
and automated actions in AIOps. Predictive analysis allows IT professionals to sustain
control over complex IT processes and ensure performance of IT operations. Predictive
analysis is done based on automatization and insights of data intending to correlate and
isolate issues. It helps to find problems that perhaps otherwise have not been found.
Examples of predictive analysis practices are anomaly detection, alerts, recommendations,
and optimization of IT performance (“aiops,” 2022).

6.3 Proactive response

AI solutions are utilized for analyzing and correlating data to proactively respond on
unexpected happenings, in particular outages or slowdowns. The aim is to keep actual
application performance in line with resource planning, scheduling, and allocation.
Predictive algorithms can recognize patterns and trends from performance metrics that
coexist with IT issues. By forecasting, it is possible to prevent problems before those arise.
Examples of uses cases where proactive response is utilized are recourse management and
mean time to detection (MTTD).

7 Conclusion

AIOps takes advantage of AI tools to automate operational processes in real-time. The report
provided a starter pack to AIOps for anyone new to the topic, which would help the reader
to get the basics idea of the AIOps concept. The report started by introducing the topic
AIOps, what it means, and its use cases. As the report progresses, it discusses further about
the background of AIOps, the impacts and benefits of using AIOps and the future of AIOps.
The final part of the report was dedicated to give guidance from where the implementation
of AIOps could begin with.

Throughout this report, the authors have gained quite several insights although the approach
was to scratch the surface of AIOps. As relatively a new concept, AIOps does not have plenty

15
of scientific journals as opposed to other technologies that has been in practice in the industry
for 20 to 30 years. However, there are upcoming research on this topic every day and as a
growing technology it is being adopted massively.

16
References

aiops [WWW Document], 2022. URL https://www.ibm.com/cloud/learn/aiops (accessed


10.24.22).
Artificial intelligence is evolving all by itself [WWW Document], 2020. URL
https://www.science.org/content/article/artificial-intelligence-evolving-all-itself
(accessed 11.5.22).
Canuma, P., 2022. MLOps: What It Is, Why It Matters, and How to Implement It. URL
https://neptune.ai/blog/mlops (accessed 4.11.22).
Cappelli, W., Longbottom, C., Governor, J., 2019. AIOps Manifesto The Role of AI in
Assuring Digital Transformation.
Collins, J., 2020. This robot scientist works 1000 times faster than human counterparts
[WWW Document]. Happy Mag. URL https://happymag.tv/robot-scientist-works-
1000-times-faster-than-human-counterparts/ (accessed 11.4.22).
Dang, Y., Lin, Q., Huang, P., 2019. AIOps: Real-World Challenges and Research
Innovations, in: 2019 IEEE/ACM 41st International Conference on Software
Engineering: Companion Proceedings (ICSE-Companion). Presented at the 2019
IEEE/ACM 41st International Conference on Software Engineering: Companion
Proceedings (ICSE-Companion), IEEE, Montreal, QC, Canada, pp. 4–5.
https://doi.org/10.1109/ICSE-Companion.2019.00023
Data is the New Oil [WWW Document], 2006. . ANA Mark. Maest. URL
https://ana.blogs.com/maestros/2006/11/data_is_the_new.html (accessed 10.28.22).
Feldman, P., 2018. The History Of AIOps- THE INFOGRAPHIC. URL
https://www.loomsystems.com/blog/the-history-of-aiops-the-infographic (accessed
3.11.22).
Gartner, 2020. Everything You Need To Know About AIOps. URL
https://thechief.io/c/editorial/everything-you-need-to-know-about-aiops/
Guess, A.R., 2015. Only 0.5% of All Data is Currently Analyzed. DATAVERSITY. URL
https://www.dataversity.net/only-0-5-of-all-data-is-currently-analyzed/ (accessed
10.28.22).
Harding, K., 2019. The Impact of Poor Data Quality in 2021. Objective. URL
https://objectiveit.com/blog/what-is-the-impact-of-poor-data-quality/ (accessed
11.5.22).
IBM Cloud Education, 2021. Three Reasons AIOps Is the Future of ITOps. URL
https://www.ibm.com/cloud/blog/three-reasons-aiops-is-the-future-of-itops
Lerner, A., 2017. AIOps Platforms. URL https://blogs.gartner.com/andrew-
lerner/2017/08/09/aiops-platforms/ (accessed 4.11.22).
Levin, A., Garion, S., Kolodner, E.K., Lorenz, D.H., Barabash, K., Kugler, M., McShane,
N., 2019. AIOps for a Cloud Object Storage Service, in: 2019 IEEE International
Congress on Big Data (BigDataCongress). Presented at the 2019 IEEE International
Congress on Big Data (BigData Congress), IEEE, Milan, Italy, pp. 165–169.
https://doi.org/10.1109/BigDataCongress.2019.00036
Minashkina, D., Happonen, A., 2020. Decarbonizing warehousing activities through
digitalization and automatization with WMS integration for sustainability supporting
operations. E3S Web Conf. 158, 03002.
https://doi.org/10.1051/e3sconf/202015803002

17
Palacin, V., Gilbert, S., Orchard, S., Eaton, A., Ferrario, M.A., Happonen, A., 2020. Drivers
of Participation in Digital Citizen Science: Case Studies on Järviwiki and Safecast.
Citiz. Sci. Theory Pract. 5, 22. https://doi.org/10.5334/cstp.290
Prasad, P., Byrne, P., Siegfried, G., 2022. Gartner Reprint [WWW Document]. Mark. Guide
AIOps Platf. URL https://www.gartner.com/doc/reprints?id=1-
2A6HEH3Y&ct=220531&st=sb (accessed 10.24.22).
Splunk, 2022. 6 Myths of AIOps Debunked.
Splunk, 2020a. Modern IT Management With AIOps, in: Embrace Digital Transformation
with Splunk for IT OperationsModern IT Management With AIOps.
Splunk, 2020b. The Essential Guide toAIOps.
Total data volume worldwide 2010-2025 [WWW Document], 2021. . Statista. URL
https://www.statista.com/statistics/871513/worldwide-data-created/ (accessed
10.28.22).
Treveil, M., 2020. How to Scale Machine Learning in the Enterprise, First. ed.
What to Automate and What Not to Automate | Automation Security Solutions | DigiCert
[WWW Document], 2021. URL https://www.digicert.com/blog/to-automate-or-not-
to-automate (accessed 11.4.22).

18

View publication stats

You might also like