You are on page 1of 17

Service analytics for IT Y.

Diao

service management E. Jan


Y. Li
D. Rosu
Outsourcing enterprise IT service management is an increasingly A. Sailer
challenging business. On one hand, service providers must deliver
with respect to customer expectations of service quality and
innovation. On the other hand, they must continuously seek
competitive reductions in the costs of service delivery and
management. These targets can be achieved with integration of
innovative service management tools, automation, and advanced
analytics. In this paper, we focus on service analytics, the subset
of analytics problems and solutions concerning specific service
delivery and management performance and cost optimization. The
paper reviews various service analytics methods and technologies
that have been developed and applied to enhance IT service
management. We use our industrial experience to highlight the
challenges faced in the development and adoption of service
analytics, and we discuss open problems.

Introduction delivery and management performance and cost


In recent years, the information technology service optimization. Service analytics comprise a collection
management (ITSM) industry has faced continual of data analysis, modeling, and optimization methods that
competitive pressure to improve service quality, while aim to improve performance and competency of ITSM.
simultaneously reducing service delivery and management Figure 1 illustrates the high-level interactions between the
costs. These objectives are particularly critical for IT service analytics architecture and the main ITSM processes
service outsourcing, in which service providers manage (as per best industry practices Information Technology
the IT infrastructures, applications, and business processes Infrastructure Library, or ITIL [1]), in the process of
on behalf of their customers. In IT service outsourcing, delivery to IT service customers. More specifically, ITSM
customers seek reduced costs, and accelerated time to comprises a set of management processes and technologies
market, along with external (i.e., outside of their company) that enable service providers to manage the IT
expertise, assets, and/or intellectual property. Providers infrastructure and applications from the perspective of
must efficiently address the scale and complexity of their customers and of their own business—following best
managed IT environment and the diversity of processes practices, during the execution of ITSM processes. IT
across all of their outsourcing customers. In current market service providers track customer interactions and IT
conditions, the approach to achieving economy of scale system performance, producing large volumes of data.
through traditional workforce management and process This data is processed by service analytics to produce
standardization is less effective than it was in the past. various types of insights for feeding back into the
Nowadays, IT service providers rely on innovative data management of ITSM processes.
analysis technologies to improve the efficiency of their The objective for service analytics is to analyze these
service management processes. Valuable business insights, processes and related business data, and produce valuable
extracted from analysis of service management data, insights related to service cost and quality. Analysis
drive to higher IT service efficiencies and quality. spans multiple dimensions of service delivery, such as
Service analytics identifies the subset of analytics service processes, workload types, customer domains, and
problems and solutions concerning specific service delivery geographies. The analytic insights are consumed
in many ways, including feedback for management of
IT service processes, decision support for operating
Digital Object Identifier: 10.1147/JRD.2016.2520620 personnel, and real-time reports on service quality and

ÓCopyright 2016 by International Business Machines Corporation. Copying in printed form for private use is permitted without payment of royalty provided that (1) each reproduction is done without
alteration and (2) the Journal reference and IBM copyright notice are included on the first page. The title and abstract, but no other portions, of this paper may be copied by any means or distributed
royalty free without further permission by computer-based and other information-service systems. Permission to republish any other portion of this paper must be obtained from the Editor.

0018-8646/16 B 2016 IBM

IBM J. RES. & DEV. VOL. 60 NO. 2/3 PAPER 13 MARCH/MAY 2016 Y. DIAO ET AL. 13 : 1
Figure 1
Service analytics for IT service management

performance for IT service customers. Service analytics A large set of service analytics methods have been
are materialized in a collection of tools, henceforth called studied and applied by both academic and industrial
the service analytics platform, which facilitates the researchers to address specific problems encountered in
extraction of analytic insights, from raw data collection ITSM. This paper is a review of service analytics research,
and preparation through insight (i.e., model) production taking a holistic perspective to classification of the
and consumption [2]. The service analytics platform is contributions with respect to the target business problems
often designed as hybrid distributed systems with a and applied technical methods. Drawing on our industrial
cloud-based API (application programming interface) experience, we highlight domain-specific challenges that
components platform [3–5]. The degree of integration must be addressed by service analytics methods and
across the analytics tools may vary widely. Similarly, the platforms. We believe both of these contributions will help
enterprise-level integration of content through taxonomies solidify the current service analytics research and set up a
and representation standards varies widely as well. new ground to motivate future studies.
Automation and content integration are the key enablers of Previous overview studies in the area of ITSM are rather
efficient real-time trustworthy business insight discovery. limited. Galup et al. [6, 7] focus on the architectural

13 : 2 Y. DIAO ET AL. IBM J. RES. & DEV. VOL. 60 NO. 2/3 PAPER 13 MARCH/MAY 2016
aspects, including ITSM process structure and standards. Process optimization methods are often applied to
Galup et al. [6] also address the research methodology processes that traditionally involve manual operations,
appropriate for services research. Analysis methodology, including Service Operations processes, such as change
including the empirical variable approach of management request management [22] and workforce scheduling [23].
information systems (MIS), and the process theory Typically, service analytics solutions integrate service
approach, is considered with respect to high-level business modeling methods with prediction and optimization
goals of quality management and process reengineering. methods. For instance, the optimization of change
In this work, we review analytic methods for ITSM scheduling employs information on the effort spent with
processes with respect to specific low-level business the execution of various types of change operations,
problems of cost and efficiency. Ardagna et al. [8] review which is obtained with service modeling methods.
the analytics methods used for quality-of-service (QoS) The remainder of this paper is organized as follows.
management in cloud, classifying methods by the First, we provide an overview of the main ITSM
component that affects the QoS, i.e., workload and system processes, which will introduce the reader to the processes
modeling methods. While targeting a more limited scope themselves and to the opportunities within where analytics
of service management and different classification and optimization methods can be used. Next, we identify
criteria, reference [8] complements our work, addressing service analytics challenges that are specific to ITSM.
specific cloud-management business problems. Further, we discuss each of the categories of service
Let us now discuss a service analytics taxonomy. The analytics methods, providing examples of representative
service analytics methods proposed in the literature can be business problems and analytic tools. Also, we focus on
categorized into three main categories: the service analytics platform, discussing techniques
and requirements specific to IT service provider domains.
1. Service modeling, used to understand features that Finally, we conclude with a review of open problems.
characterize process execution as the basis for applying
efficiency improvement actions. Service management and analytics challenges
2. Performance prediction, used to gain awareness on
upcoming performance and workload events, and trigger IT service management
proactive cost-efficient actions. ITSM, henceforth also called IT services, encompasses
3. Process optimization, used to determine organization the processes involved in the management of IT for
and execution models that maximize service providing value to customers in the form of services. As
management effectiveness. we alluded to, the most adopted framework for ITSM is
ITIL [1], which systematizes the planning, delivery and
These categories that span service analytics are applicable support of IT services around a five-stage service lifecycle.
across all of the service management processes (see The lifecycle comprises Service Strategy, Service Design,
Figure 1). For instance, service modeling in Service Service Transition, Service Operation, and Continuous
Operation processes (see rectangle in Figure 1) relates to Service Improvement, as we have just discussed and as
service requests, such as incidents and changes. Analytics illustrated at the bottom layer in Figure 1. At each stage,
help characterize operational aspects such as arrival the focus of service management changes, evolving from
patterns, handling procedures, service effort, human error, marketing and strategy, to delivery and operations, and
etc. These insights are used as input for process closing the service lifecycle loop with continuous service
improvement actions such as personnel training [9, 10], improvement. Thus, the data associated with the service
service management tool tuning [11, 12], process management at each stage changes as well, requiring
automation [13], knowledge reuse [14–16], and many specific analytics methods to handle stage specific data
others. On the other hand, in Service Design (see rectangle types and extract stage specific insights, as well as to
in Figure 1), integrate across stages, for global insights.
service modeling relates to a high-level model of customer In Service Strategy, the focus is on transforming service
workload and infrastructure features that are further used to management into a strategic asset by understanding which
decide on the most efficient service-level agreements IT service offerings are of more interest to customers.
(SLAs) [17] and delivery organization models [18]. This helps the service providers to have the most “fitted”
Performance prediction methods are often applied in service offerings to achieve customers’ required business
service operations—for prevention of outages, SLA outcomes. Business models and service value models
violations [19, 20] or for capacity planning. In Service are used in microeconomics to formalize and analyze the
Transition, prediction is used for decision support, guiding relationships between customers, providers, and services.
process-management decisions with a useful tradeoff The key Service Strategy processes relate to (i) service
between operational efficiency and performance risks [21]. portfolio management, which ensures that the service

IBM J. RES. & DEV. VOL. 60 NO. 2/3 PAPER 13 MARCH/MAY 2016 Y. DIAO ET AL. 13 : 3
provider has the right mix of services (i.e., service and access management. Specifically, the service requests
portfolio) to meet required business outcomes at an (a.k.a., tickets) contain embedded knowledge about the
appropriate level of investment, (ii) financial management, service performance trends, workforce efficiency, capacity
which concerns the provider’s own budget, accounting, planning, and configuration compliance, to name a few.
and charging requirements, and (iii) demand management Service modeling methods are used to model request
for understanding a customer’s demands and provisioning arrival [8], execution effort [9], errors users experience
to meet them. Service analytics for Service Strategy [30], process execution patterns [31, 32], and many other
supports marketing decisions that have an impact on the service operational features. These models are used to
design requirements and the go-to-market service catalog optimize performance in Service Operations and
and prices [24]. Analytics for service modeling based throughout the service lifecycle. Event and performance
on business and service request data provide the relevant prediction is used to prevent outages [19, 33, 34],
features for the optimization of service portfolio and provision resources for workload peaks [35, 36], and
sales success [25]. Optimization techniques use dispatch incidents [37, 38]. Process optimization is used
strategy-specific cost/utility functions to represent the to schedule personnel based on workload fluctuations [23],
nonfunctional requirements and map the business bundle change execution [22], prioritize service request
requirements into IT requirements [14]. execution, allocate computational resources [36], and
Service Design is focused on designing IT services to for many other purposes.
realize the strategy and ensure quality service delivery, Finally, the focus of Continuous Service Improvement
customer satisfaction, and cost-effective service (see rectangle in Figure 1) is on creating and maintaining
provision. Service Design processes include service value for customers through better design and operation.
catalogue management, service-level management, Service analytics help us identify changes in service
availability management, capacity management, and request patterns—and tune service management tools
IT service continuity management, information security and processes in order to attain the desired levels of
management, and supplier management. Service modeling productivity, and quality of service. Relevant service
and process optimization service analytics methods are modeling methods include process behavior models for
used in Service Design to develop more efficient detection of deviations and defect similarity analysis for
organization models [17, 18] and related cost models; detection of repeated defects [33]. Prediction models
cost models must tie into the operational models in order are used to recommend infrastructure upgrades [19],
for the designed service solutions to match in production automation solutions, knowledge transfers, or skill
the targeted service level agreements (SLAs) for upgrades [10].
service availability and support [26, 27]. Although the above introduction to ITIL depicts the
The focus of Service Transition is to develop the service management processes in a staged way, the
capabilities to transit customer new and transformed implementation of these processes follows a networked
services into operational, steady-state service delivery. model, where their roles intertwine and decisions made in
Service Transition processes include change management, earlier stages (e.g., design management and change
service asset and configuration management, knowledge management) have consequences in later stages (e.g.,
management, transition planning and support, release SLA management). Also, decision in one phase depends
and deployment management, service validation and on service parameters that are managed and can only
testing, and evaluation. A wide range of service analytics be collected or measurement in other phases. An
methods are used in this stage in order to drive efficient integrated implementation of service analytics within the
execution and minimal risks. Sample problems addressed enterprise Service Analytics Platform is imperative for
by analytics include optimization of project plan, risk deployment of analytics solutions with high ROI
prediction [21], workforce optimization, and optimal test (return on investment).
coverage [28]. A newly emerged paradigm in the
Service Transition area is the cloud transition modeling, Challenges of service analytics
which raises new types of problems for service analytics. In addressing the diverse analytics opportunities arising
For instance, performance prediction [29] and real-time in the context of ITIL processes, as described in the
anomaly detection [28] are paramount for minimization of previous section, IT service provider organizations face
go-to-market risks and service performance penalties. challenges that stem from the specifics of large-scale
Service Operation is focused on achieving effectiveness IT service operations and have an impact on the ROI of
and efficiency in service delivery to ensure value for both service analytics technologies. At the core of these
customers and service providers. Key Service Operation challenges is customer diversity, including service
processes include event management, incident requirements, managed environments, geographic
management, problem management, request fulfillment, locations, and other service dimensions. This translates in

13 : 4 Y. DIAO ET AL. IBM J. RES. & DEV. VOL. 60 NO. 2/3 PAPER 13 MARCH/MAY 2016
diversity with respect to processes and tools. Another processes. Examples include the difficulty to correlate the
group of challenges derives from the SLA-driven business performance in Service Operation processes and the
model, which drives to a fast-paced and risk-adverse work performance in Transition and Transformation processes,
environment. The ROI of service analytic solutions is or, even worse, the correlation of service requests and
highly sensitive to managed environment—solutions configuration management data within the Service
developed for one customer or delivery environment might Operation processes. Service analytics must include the
not work as well in other environments because of the development of dedicated tools for entity resolution in
differences in available content, analytic models, and order to overcome this limitation. Another aspect of
model performance. In the following, we discuss the main process integration relates to difference across
challenges for development and adoption of service geographical delivery center. Difference within same
analytics solutions. service lifecycle phase, like Service Operations, might
occur because geography-level teams have autonomy in
Diversity of processes adoption and customization of service management
Most often, ITSM processes are implemented differently tools and processes. Also, country-level privacy
across service customers, geographies, and work cultures. regulations limit the content availability for development
While standardization is a well-known solution for of analytic solutions.
effective IT services, it is often not an option or too
expensive to implement as a large-scale transformation. Limited expert availability
Local or customer-specific implementation of IT Typical IT service personnel operate in high volume and
management solutions lead to limitations for service high risk situations, where risk draws from tightness of
analytics, such as lack of data from particular areas of the SLA constraints, high unpredictability of complex
business, and differences in data semantics and data computing systems, and high costs of SLA failures
quality. This cascades into increased complexity of data [39, 40]. As a result, the service personnel subject matter
preparation, limitations on choosing “universally” applied experts (SMEs) have limited time to spend helping
analytical methods, and lack of full consistency across analysts with laborious labeling of observations for
the resulting models. supervised learning methods [41]. Even more notable,
the quality of record details acquired during the
Diversity of tooling production processes is often low, as the personnel might
Aside from process differences, service management tools rush to move to the next task. Service personnel might
used to perform a specific process component may be shorten the time spent navigating complex taxonomy
different with respect of the service record details that are hierarchies for picking the right category choice, or might
available. For example, within the incident management choose avoiding provide details that are not mandatory,
process, multiple customer-specific tools might be used to or include typos. For instance, a ticket’s close time could
capture incident records. Across tools, it is common to be earlier than its open time, or a classification into
have different models to capture service details that are service-specific taxonomies are missed or set to the
beyond what is required for basic incident tracking. default value of “other”—thus providing no insightful
Hence, details (such as the references to related servers or values for data analysis.
applications), or categorizations of incident root cause and
resolution method, which can be highly relevant for Risk-adverse attitude
service analytics, may be missing. Analytics that rely on The SLA-driven operation raises the propensity for
these details would require more complex development risk-avoidance across service personnel and management
effort to integrate across models. For instance, in order to staff. As a result, the adoption of service analytic solutions
extract the missing incident features, unstructured data is highly dependent on their perceived impact on risk.
analysis is used, which adds a high development overhead The perceived impact depends on how analytics are used.
and quality risks as models must be tuned for each The higher the risk, the higher the expectation of accuracy
service domain. and the intuitive explanation for the analytics results.
For instance, low risk is perceived for decision-support
Limited process integration analytics in which service personnel are always available
A typical approach across service lifecycle processes is to to override, or the cost of an error is small. Samples
maintain per-stage ownership of service management include recommendation for incident solution reuse [14]
tools. As a result, these tools are often developed and and request dispatching [38]. Other decision-support
configured in a disconnected manner, i.e., without analytics may have higher costs of error, and thus higher
integration across processes. This makes it difficult to risks, such as financial risk prediction [21] or server
connect the insights about service data in related upgrade recommendations [19]. In risk-adverse processes,

IBM J. RES. & DEV. VOL. 60 NO. 2/3 PAPER 13 MARCH/MAY 2016 Y. DIAO ET AL. 13 : 5
the analytic methods must provide high accuracy and a priori [42, 43]. Identification of incident repeat
provide an acceptable explanation, such as the occurrence and co-occurrence patterns [34] helps root
identification of factors with largest impact on the analytic cause analysis and more effective resolution.
results [19]. Analytics can support autonomous execution A second goal is to reduce the risk of SLA violation.
of operations, such as image relocation [36]. A sample approach to this problem is to ensure that each
Overall, these challenges are faced by most service request is handled by the service personnel that are
analytic projects, and affect the ROI. Awareness of these most efficient in solving the specific request type. To
challenges is critical for adopting the most appropriate this end, workload characterization methods are used
analytics solutions and establishing an agile roadmap for twofold. First, request types are classified into a specific
enterprise-level integrated analytics platform. taxonomy. Next, for each request types, efficiency profiles
are created to model how efficient a service staff is at
Service analytics methods solving the request of the particular type. The request
dispatching system will use the two pieces of information
Service modeling to map incoming requests to the service personnel likely
The goal of service modeling is to model features that to complete work the fastest [8]. Another instance of this
characterize customers and process behavior as the basis problem is related to the cloud where image migration
for understanding service requirements and delivery represents the main approach to controlling SLA violations
profiles. Service modeling methods are commonly applied in a likely overprovisioned infrastructure [36].
in all phases of the ITIL service lifecycle and provide the A third goal is to reduce the problem determination
enabling knowledge for efficiency improvement actions. effort. An approach to this problem is to create a
The most frequently occurring type of service modeling system for recommendation of solutions based on a
is related to workload, i.e., the request flow that a specific characterization of request details. Feature profiling
process or system component is observing. Workload techniques and text analytics are used to characterize the
characterization aims to model arrival patterns and service historical content and create a solution catalog using
effort patterns that are related to various process features. request description and related system components [3, 30],
In Service Operations, sample process feature include: [42]. A characterization of resolution type is used to trim
(i) IT components such as servers, middleware, and the catalog by removing duplicate solutions. At request
business applications, (ii) service request types such as arrival time, the profiling technique is used to determine
incident failure types, (iii) resolution methods such as the the type and retrieve relevant solution options.
types of tools used, and (iv) process patterns, such as A fourth goal is to improve the quality of change. An
organization-level interactions. Workload characterization approach to this problem is to prevent change operations
is fundamental to understanding how service-personnel that are likely to lead to error. For instance, service
effort is spent with respect to tasks and activities, and how personnel can analyze the error profile of specific change
customer experiences the service delivery. This enables operations or installed software, and can avoid use
an IT provider to solve various business problems alternate solutions if a specific operation has high
contributing to cost savings, competitive analysis, and likelihood of error. The error profile would allow answers
workforce and process optimization. to questions like “Which cloud management tools should
The following paragraphs provide a few examples of I install such that we get the least amount of capacity
business problems in the scope of Incident Management events?” The error profiles can be created by integration
processes that can be solved using service modeling across different types of service management
methods. For example, one goal concerns the reduction content—such as change management, incident
of the occurrences of the most prevalent failure types. management, and configuration management. Entity
The approach to address this goal is for SMEs to identify resolution methods can be used to identify server of
the high-volume incident types, perform investigations, software/tools references in structured and unstructured
and eventually implement changes that will reduce the change and incident text. Configuration records that
volume of those incident types. As incident classification related to servers can further provide features of interest
into the enterprise-specific taxonomy is often not such as installed middleware and applications as well
consistently capture in incident records, workload as server purpose, manufacturer, and age [16]. These
characterization methods help classify the requests by integrated records can be fed to search tools [44] or
failure type. The classification model can be build using question answering [45] and accessed manually or
supervised data mining methods applied to structured automatically [20] upon change record creation.
and unstructured incident-record fields using sample Service modeling is performed by applying statistical
content labeled by SMEs [38]. Alternative methods based and data mining methods to model the occurrence of target
on clustering can be applied when taxonomy is not known features within a selected scope (see [46, 47]). For a rich

13 : 6 Y. DIAO ET AL. IBM J. RES. & DEV. VOL. 60 NO. 2/3 PAPER 13 MARCH/MAY 2016
representation of the business scope, analysis is most often classification taxonomies. For simple domains, one can
based on integration across several types of ITSM sources, use rules and keywords provided by SMEs or extracted
such as customer management content, incident from the web. For more complex domains, machine
management, configuration management, etc. learning methods—such as Latent Dirichlet Allocation
The service analytics challenges discussed in the earlier (LDA) [53], Latent Semantic Analysis (LSA) [54], and
section often result in missing or invalid references to Probabilistic Latent Semantic Analysis (PLSA) [55]—are
service components, resources, or service features. As a fairly popular and can yield good results in handling the
result, extensive data cleaning [20, 34, 43, 48] and entity service data [56]. Alternatively, part-of-speech (POS) [57]
resolution [11, 19] are necessary to link across all data analysis allows the identification of topics using specific
sources of interest. Such actions help us expand the set of POS patterns [42]. The challenge is that there are only
records available for analysis, improve the accuracy of limited SME resources for data annotation, so that methods
the service modeling, and provide knowledge beyond what must be specialized [16, 52]. Work-around approaches,
is captured by the service management data models. For although not desired, include the use of predefined rules
instance, in an sample scenario of operating system for semi-automatic data labeling, followed by
support services, extraction of server reference from the bootstrapping [58].
text of incident description and resolution increases the Once the set of features is cleansed and complete,
ability to match an incident with a server from ∼65% to descriptive data mining models can be created with
over 90% [19]. clustering [10, 19, 30, 42, 43], summarization [31, 59],
Another consequence of service analytics discussed association discovery [60], classification [10, 12, 20, 38],
above is dealing with missing information. A series of and sequence discovery [34]. Most often, the goal of
advanced data mining methods can be applied to create the modeling is to label the records based on taxonomy of
missing information. One approach is to discover specific interest. When the taxonomy is unknown, clustering
relationship across several structured attributes, and then methods are used to discover the taxonomy and label the
use business rules to fix the incorrect or missing values records accordingly. On the other hand, when taxonomy
[49]. Another approach is to identify the missing elements is known a priori, classification approaches are used
in text-based fields the record. Extraction can be done for record labeling based on the learning from an input
using dictionary-based identification, structural set of annotated records. When feature taxonomies are
decomposition [16], semantic analysis [42], or topic not of interest, yet records with a specific degree of
discovery [43]. similarity are sought, summarization, or text search
As illustrated by previous examples, text analysis is a methods can be used.
highly valuable method, as service delivery best practices
require that service records capture relevant details Summarization
about the actions taken. A common approach to exploit Summarization methods are the most frequently used
the information captured by text fields is to expand the service analytics methods, First, summarization methods
features set with elements extracted from or based on text provide the data that service personnel use to monitor
fields [42], such as incident description and resolution. service performance and perform root-cause analysis [59],
Natural language processing (NLP) tools are typically [61–64]. Specialized visualization, tailored to metrics of
used. Challenges arise because typical text in IT service interest to service request management can further expand
tickets is not as well-formed as newswire articles. The the tools that are available for process analysis. For
ticket-specific sub-language exhibits numerous ad-hoc instance, Cavalcante et al. [31] present a visualization
abbreviations, limited details, grammar mistakes, and method that illustrates the correlation between service
punctuation errors [50]. Text preprocessing and request response times and SLA failures, which helps
normalization are necessary for initial cleanup. In addition, target process anomalies. Process Behavior Charts (PBC),
compound word extraction and substitution, and alias which combine summarization with forecast, can be used
identification and replacement, can give better accuracy, for all of the processes and key performance indicators
if SME input is available. There are multiple methods that within the service delivery organization [33]. The PBC
can be used to extract text-based features. First, n-gram method has embedded rules for interpretation, based on
analysis can be extracted as collection of n-grams. the variation of running averages, providing consistent
Unigram decomposition is the most popular technique [51], insights compared to leaving the user to perform the
but bigram and trigrams give best results for small and interpretation alone, as with other visualization tools. The
large corpora, respectively. Second, semantic token method for organization of the data presented through
extraction can be used to extract keys and to link visualization tools is highly relevant for service
non-standard data templates based on domain specific rules management, in general, and for cloud management, in
[52]. Third, topic extraction can be used to extract new particular, where the volume of data and number of

IBM J. RES. & DEV. VOL. 60 NO. 2/3 PAPER 13 MARCH/MAY 2016 Y. DIAO ET AL. 13 : 7
metrics to be visualized is very high and where the user record. While TF-IDF [71] is the most prevalent similarity
might not be always highly knowledgeable about the metrics, other metrics are used to better fit the domain.
details of the managed system [65]. Moreover, For instance, in [51] the search similarity metric is the
summarization provides the information necessary to run correlation coefficient of topics structures—where
process optimization methods [23], such as distributions of the topics are extracted for each pair of documents using
service request arrival times or resolution effort. Also, LDA or other equivalent method, and the correlation
summarization is integrated in tools of automated service coefficient is computed considering an array with the
management [12], where configurable or adaptive common set of features.
monitoring profiles are used to trigger service management
actions, such as file system clean-up or incident ticket Performance prediction
generation. Performance prediction methods are aimed to help the
service provider take early action towards prevention of
Classification failures or more efficient execution. Most frequent
Popular statistical classifiers such as Supporting Vector targets for prediction relate to process performance,
Machine (SVM) [66], Maximum Entropy Model (MaxEn) service effort, and service workload. Such insights can be
[67], Naive Bayesian with Maximum Likelihood, used to assist actions such as resource planning or staffing
Discriminative Training (DT) [68], and k-Nearest recommendation, process management, infrastructure
Neighbor are commonly used for classification of ITSM configuration, and many others.
records. Furthermore, feature selection methods are used For example, in Application Management Service
to improve data classification accuracy, such as the (AMS), it is estimated that organizations can spend as
Random Forest [69] (see [19, 32]), SVM (see [12]), much as 80% of their application budgets on maintenance
or MaxEn (see [67]). Classification methods that integrate related activities [72]. AMS providers are interested in
with on-line learning, such as the Perceptron learning reducing the maintenance costs, and analysis of tickets
algorithm used in [52], help overcome the specific IT trends helps forecast the type of skills that will be
service delivery challenges of limited SME availability needed for the plan period as well as forecast the
and customer diversity. Similarly, benefits are brought by effectiveness of skill development actions and other
efficient tools for specification of classification rules by process improvement actions [10].
the service personnel, such as [41], or by methods based Another application assessment is related to workload
on weak supervision, such as proposed in [70]. management in cloud environments. Workload forecasting
[73] allows for automated migration of virtual machines,
Clustering in order to minimize the resource utilization hot-spots
Some of the most widely used clustering methods are [36, 73], or provisioning of additional virtual machines for
k-means, and hierarchical clustering (see [47] for details). services with increasing demand [35].
The L2 Euclidean distance or Term Frequency Inverse Other applications of prediction include prediction of
Document Frequency (TF-IDF) weighted cosine distance delivery performance given customization of the SLA
(for text n-gram collections) [43] are the most common model [17], prediction of project performance based on a
dissimilarity functions. Dissimilarity metrics specifically large set of qualitative factors collected at project start
targeted to the service domain give better results. For time [21], and prediction of ticket reduction upon
instance, Li and Katircioglu [9] measure the dissimilarity system upgrades [19].
of ticket clusters using a combination of the number of Overall, performance prediction is based on the
shared resources solving these tickets and the scale of process-specific records such as incident tickets, change
sharing. An n-gram clustering technique is illustrated by request records, and monitoring events, which illustrate
Mani et al. [43], who apply hierarchical merging to the how service processes are managed as well as how
initial set of clusters defined by the principal components organizations utilize their IT and human resources. In the
of the TF-IDF term matrix. Another techniques for text following paragraphs, we discuss three targets for
clustering is fuzzy match, using the edit-distance as a prediction in IT services.
dissimilarity metric [i.e., how many changes must be
performed to change one string (or set of tokens) into Performance control
another], which allows the clustering of tickets Measuring and monitoring process KPIs (key performance
despite typos. indicators) is the basis of any cost and quality
management system. In IT services, each service
Search management process has specific KPIs. For example,
Information Retrieval (IR) methods can be used to in Service Operation processes, KPIs typically quantify the
determine the top-K most similar records given a target volume, effort, and quality regarding incident resolution.

13 : 8 Y. DIAO ET AL. IBM J. RES. & DEV. VOL. 60 NO. 2/3 PAPER 13 MARCH/MAY 2016
Other common KPIs include ticket volume per server, optimization can be applied throughout the ITSM
number of failed changes, mean time to repair, and lifecycle. For example, in Service Strategy, optimizing the
backlog size. Statistical Process Control (SPC) is a Demand Management process helps to meet the customer
widely adopted method for KPI tracking in services workload demands taking into consideration both SLA
performance [74]. Its tools, such as cumulative sum charts violation cost and delivery labor cost. Other process
(CUSUM) and c-chart, help tracking KPI evolution optimization scenarios include optimizing change request
over time, detecting trends and anomalies, and enabling scheduling in the Change Management process to meet
the business to take proactive actions and drive early complex constraints and dispatching incident tickets in
root-cause analysis and remediation [75]. the Incident Management process to meet the desired SLA
attainment level.
Effort estimation Compared to resource optimization problems commonly
Service effort estimation is the key element for assessing encountered in computing system performance
labor cost and driving effort-reduction programs. management, lack of measurement data and limited data
Per-activity effort indicates the actual amount of time accuracy are two common challenges in service process
that a system administrator spends on a service activity, optimization. Our experience shows that these limitations
such as incident ticket resolution, request auditing, and stem from the impact that the overhead of collecting
proactive maintenance. Nevertheless, the nature of IT actual work-times (albeit small, 1 min/record [76]) has on
processes makes it difficult to measure the service effort the productivity of service personnel; data collections are
with high accuracy, because of the lack of standard limited to 1 to 2 months per year, and errors often occur
effort-recording tools, the manual process needed for effort to mapping interval of times to the correct activity. To
recording, and the common multi-tasking behavior due deal with the lack of data, one can use estimation from
to the managed resource readiness or the required process existing data using service modeling methods. When
transitions [39]. There indeed have been some efforts to the amount of data necessary for modeling cannot be
address this issue such as the ticket priority-based effort collected, process changes are necessary to acquire it.
estimation approach in [9] for modeling resources’ For example, even arduous, time-motion studies may need
multitasking behavior, and the time-volume capture tool to be introduced and enforced for the system
developed in [76] along with time-motion studies. administrators to record where and how they are spending
their time working on various service activities [76].
Service forecasting For data accuracy limitations, the two common solutions
Service volume forecasting uses statistical regression are: (i) to conduct rigorous data validation to identify
models to predict the volume expected for certain types problematic data and to remove statistical outliers, and
of service activities. Analytical methods combine volume (ii) to introduce feedback loops or feedback controllers
forecasting with effort estimation to address business to self-correct the modeling inaccuracy.
problems pertaining to staff planning, SLA feasibility Generally, process optimization methods can be
assessment, return of investment (ROI) of automation classified into two categories: workforce optimization and
projects, and contract negotiation. In addition, prediction workload optimization, which are two of the most
of hardware maintenance costs or prediction of troubled important factors in IT management processes.
customer engagements also offers significant value to With respect to workforce optimization, we note that
IT service providers. The state-of-the-art methods in time managing service personnel labor cost is essential in
series forecasting can usually be applied in the context of ITSM. However, it is often difficult to decide the right
ITSM. Examples include autoregressive integrated staffing level due to the complex relationships among
moving average (ARIMA), neural nets, and autoregressive dynamic customer workload, strict service level
conditional heteroskedasticity (ARCH) models [77]. constraints, and service personnel with diverse skill sets.
Other methods include stochastic processes with For modeling simple service activities such as answering
time-varying intensities such as non-homogeneous Poisson and making calls in call centers, analytical models or
process and Poisson-Gamma processes for load forecasting simple simulation models can work well [80, 81].
of a telephone call center (see [78]) and for load However, for more complex service operations, such as
simulation of an IT support organization (see [79]). platform support and storage management, complicated
discrete event simulation is typically required to model the
Process optimization service delivery environment with a relevant level of
By analyzing various service data created and collected details [37, 79]. Once a simulation model is available, a
during service process execution, process optimization simulation-optimization approach can be applied to
is aimed to improve IT management effectiveness through minimize the total staffing related variable cost while
resource planning and staffing recommendation. Process considering the contractual service level constraints, the

IBM J. RES. & DEV. VOL. 60 NO. 2/3 PAPER 13 MARCH/MAY 2016 Y. DIAO ET AL. 13 : 9
skills required to respond to different types of service common type takes on the form of a tuple: (percentage
requests, and the shift schedules that the service agents attainment, scope, time frame, target time). For example,
must follow [23, 82]. 95% (percentage attainment) of all severity-1 incident
Another challenging workforce optimization problem tickets (scope) that are opened over each one month period
relates to the best composition of service delivery teams. (time frame) must be resolved within 3 hours (target time).
To accomplish this, a clustering-based approach can be One will typically find a large number of service level
used where service requests that require similar problem agreements associated with each customer contract. The
solving skills are first grouped into a single cluster objective of optimizing incident management is to find the
using a statistical clustering technique [46]. Subsequently, most cost-effective way to meet all relevant service level
associations are built between service request clusters and agreements while restoring normal service operations.
service agents with respective skills and confidence In the case where data accuracy is a concern, a closed-loop
levels [10]. The insights can be extended to optimize skill performance management solution makes use of feedback
management, encouraging service personnel to train in controllers to dynamically adjust the priority of the
skills with limited coverage. incident tickets based on both statically defined contractual
In the same scope area of skill management, Agarwal service attainment targets and dynamically measured (with
[83] presents a closed-formula solution for predicting the measurement inaccuracy) service attainment levels [84].
best workload assignment when service personnel can
bid for it, based on their skills and preferences. These two Service analytics platform
approaches to skill development, one driven by the Previous sections have identified multiple types of service
organization and the other by the individual wishing to analytics methods that can help IT service providers
gain more bids, complement each other towards the improve and maintain high levels of productivity and
creation of highly effective resources. quality. In this section, we focus on how service analytics
With respect to workload optimization, we note that a are implemented and delivered. We review a typical
large amount of service requests are being handled in architecture and requirements that are drawn from the
service delivery centers on a daily basis, and service nature of IT services.
delivery teams must handle them in an order that The service analytics platform comprises the collection
optimizes across costs and SLAs. There are two common of IT components—e.g., hardware, software, APIs,
type of service workload: (i) requests that have graphical user interface (GUI), and business artifacts
dependencies among them (e.g., change requests) and (e.g., taxonomies, rules, processes—that are integrated
(ii) requests that are independent but have a target finish for delivery of service analytics. The interactions of a
time (e.g., incident tickets). service analytics platform with the IT delivery processes
Workload optimization in the Change Management are illustrated in Figure 1 along with the main categories
process relates to change request scheduling. All change of components [2]:
requests need to be allocated to available change windows
taking into account various constraints such as the • Data preparation components, which integrate with IT
temporal sequence of changes, change window times, and infrastructure and applications for collection, cleanse,
service level penalty for missing the change window or and imputation of data.
passing the deadline. Optimizing change request • Model production components, which use service
scheduling typically requires the use of an optimization analytics methods to estimate and validate models for
model that takes into account both constraints and costs. analysis and optimization the ITSM processes. Analytics
While heuristics based optimization approaches are methods typically draw from the categories analyzed
more commonly used (e.g., see [22]), the optimization in previous sections, including service modeling,
model is formulated in a way that can be solved using performance prediction, and process optimization. These
standard mathematical programming techniques methods use the prepared data along with service
(i.e., mixed integer programming). This approach not specific artifacts, such as taxonomies and policies.
only results in strictly optimal solutions, but provides a • Model consumption components, which use the analytics
scalable tool for scheduling a large set of change models in production environment in order to support
requests with complex constraints and also performing service personnel or management to improve the quality
sensitivity analysis. of model production. Model consumption includes
In the Incident Management process, workload activities of model scoring for the generation of business
optimization is related to the SLAs on incident handling recommendations, generation of business artifacts such
that are established at the time of contract negotiation as taxonomies or configuration policies, and generation
as part of the service-level management process. Although of business metrics summaries and forecasts. A feedback
many types of service level agreements exist, the most loop, from model consumption to model production, is

13 : 10 Y. DIAO ET AL. IBM J. RES. & DEV. VOL. 60 NO. 2/3 PAPER 13 MARCH/MAY 2016
Figure 2
Illustrative service analytics platform architecture

often necessary for model tuning and continual through existing service management tools (e.g., incident
improvement [11]. management tool), dedicated portals [3, 59], or reporting
engines [61, 62]. Also, artifacts can be delivered to
Figure 2 illustrates the architecture of a typical service automated enactment tools that adjust the service
analytics platform (for other examples, refer to [11, 59]). management environment (e.g., new service patches) or
The platform includes data preparation components for data collection engines (e.g., threshold for alert generation).
collection of service management environment data into An enterprise that exploits a wide range of service
databases or warehouses. The most common business analytics typically comprises multiple instances of the
artifacts considered for data collection include events, pipeline illustrated in Figure 2, such as effort analysis [59],
alarms, and tickets. Data collection may also happen outage prediction [34], monitoring event optimization [11],
through service management tools and infrastructures for recurring defect detection [59], and support for defect
running surveys and crowd-sourcing events [85]. The resolution [3].
model production components are represented as the Drawing from our experience with developing and
Analytics Engines (upper right in Figure 1), comprising a deploying service analytics tools, we identify several
collections of analytics tools such as the behavior modeler requirements for the analytics-platform architecture
to identify the service workload trend (e.g., using including: data consistency, awareness of confidence
performance control and prediction methods) or to levels, and efficient user interactions. All these contribute
characterize the interactive relationship among multiple to increased usability, better adoption, and overall better
service components (e.g., process the results of process ROI for service analytics projects. We discuss these
modeling). In addition, the labeling classification tool requirements next.
generates rules/policies to assist the service modeling With respect to data consistency, we note that across
methods, such as classification and clustering. The the multiple service analytics tools within the enterprise,
identified behavior models and policies feed into the and given the challenging diversity of customer
inference and optimization engine, which produce environments and processes, data consistency is an
recommendations, as model consumption artifacts. The imperative. Users expect that data that represents the same
model consumption artifacts are delivered to end-users service features to be acquired through equivalent data

IBM J. RES. & DEV. VOL. 60 NO. 2/3 PAPER 13 MARCH/MAY 2016 Y. DIAO ET AL. 13 : 11
collection and preparation procedures. Difference in However, for service analytics that are not constrained
processing may lead to different statistics and, possibly by SLA deadlines (such as analytics related to service
affect the business insights. Similar requirements apply to design or change scheduling), given their response time is
business metadata, such as taxonomies and business not immediate, the analytics consumption model must use
catalogs, which are used in processing the data. In order to asynchronous interactions in order to avoid slowing the
address these requirements, model unification and end-user’s overall activity.
standardization processing must be performed at the With respect to agile and low cost development, service
service-management data warehouse. Standardization analytics solutions need to attain low costs for
relates to data cleaning and preparation, such as development and lifecycle management. The nature of IT
imputation of missing data, correction of input errors per service workloads, with high volumes and demanding
specific taxonomies, entity resolution, and linking across SLA deadlines, drives the requirement for efficient human
content domains (e.g., tickets and servers) [86]. interactions, especially in the data collection and model
With respect to a requirement for awareness of consumption stages. For instance, analytical methods must
confidence levels, note that previous research shows that be selected with the understanding that the experts (SMEs)
service personnel performing server, application, and are not readily available to annotate large volume of
database administration have particularly high expectations observations or revise extensive taxonomies. Interactions
for accuracy and trustworthiness, because of the high-risk for collecting data input and insight delivery need to have
nature of their work [39, 40]. This risk adversity is a low overhead and to possibly be asynchronous.
challenge for service analytics that we identified in earlier Agile and effective analytics platforms have been the
sections. While a certain degree of error is intrinsic to data focus of extensive research and development efforts in
mining and process modeling, the end-users must be both industry and academia. In the recent past, platforms
aware of the confidence level. Moreover, the implication such as IBM Smarter Computing and Oracle Business
of these errors must be carefully considered at time of Intelligence Foundations offered new hardware, software,
model production and process design. For instance, if and services technologies that facilitate the development
outage prediction has high likelihood of false positives, the and deployment of highly scalable service analytics
daily workload will increase due to the added overhead of platforms.
checking the system condition for false positive outage The emerging Cloud services platforms enable agile
predictions. Similarly, confidence levels must be obvious development of specific types of service analytics.
or understandable for decision-support analytics, given For instance, the Google** Analytics API can be used for
that risk of the business decision compounds on the workload summarization analytics for management of
confidence of all related business insights. In other web applications. The Google Cloud Platform Prediction
applications of service analytics, the impact of inaccuracy API [5] allows the development of service analytics
may be less critical. For instance, incident classification for performance prediction. Microsoft Azure** [3]
for workload dispatching may be erroneous at times provides built-in services for performance prediction
without a critical impact unless the resolution deadline is analytics, but also a Cloud-integrated environment for
very tight. As a result, throughout the analytics pipeline, development of custom big data analytics. IBM Bluemix*
from data preparation to model consumption, tools must [4] provides similar services and, in addition, some
provide visibility into model confidence levels in order to machine translation services [45] that facilitate data
provide the users with relevant risk assessments. However, preparation for service providers with a wide geographical
state of the art service analytics solutions are limited to footprint. Splunk** [76] is an integrated platform for
providing accuracy estimations for the individual analytics operational analytics, covering most of the functions of a
results, such as classification and prediction, e.g., [21]. service analytics platform, including data collection,
Methods and models for composition of accuracy search and analysis, and exploration. Integration with
estimations into overall risk assessments must be created, Hadoop** provides for a wide range of analytics, from
probably building on existing provenance and probabilistic summarization to custom prediction methods. In addition,
database research. a wide variety of visualization tools, such as Google
With respect to a requirement for efficient end-user Charts, and Domo** [64] can be provided for agile
interactions, we note that efficient delivery of business customization of presentation tools.
insights is critical for service analytics adoption. A A service provider can use cloud-based APIs when its
challenge discussed in previous sections, namely the service management tools are running in a private cloud or
SLA-driven service model, requires highly efficient tool it has scalable and secure methods for data transfer into
interactions. Analytics tools for SLA-driven processes the cloud. Enterprise versions, such as offered by Splunk
must add a minimal overhead to the existing work [76], allow for agile development of decision support
procedures in terms of wait times and tool interactions. tools. More extensive development efforts are required to

13 : 12 Y. DIAO ET AL. IBM J. RES. & DEV. VOL. 60 NO. 2/3 PAPER 13 MARCH/MAY 2016
integrate analytics into service management automation, lifecycle. Service Operations and Service Design
or find optimal solutions using provider-specific simulation (Figure 1) (i.e., Service Steady State and Sales
models. Hybrid solutions for the service analytics platform Engagement) have large coverage with analytics. We
are required, integrating cloud APIs into in-house suggest that this is because these processes have potential
frameworks of ITSM process models and solution and for highly visible economic impact, and the related
content governance. analytics benefit from the largest set of integrated service
Flexible execution of analytics tools in dynamic management tools and data sources, which provide a rich
computing environments is supported by standards and base for content for analytics. Other areas, such as
developed for specification of method components: Service Transition, are less represented, because many
features, models, and scores. For example, Predictive business problems require integrated data across the
Model Markup Language (PMML) is addressing entire service lifecycle. For instance, to assess the risk
these elements with an XML (Extensible Markup associated with a specific service design solution requires
Language)-based approach [87], and some of the integration across the sales (solution design), transition,
cloud-based platforms, like Google Prediction, support it. and operations. This requires an integrated enterprise-level
Overall, there are multiple alternative technologies and content model which is often missing in older service
tools for development of a service analytics platform, organizations and very expensive to deploy. The challenge
ranging from the most agile (e.g., the cloud-based APIs), is to develop efficient and effective methods for content
to the most comprehensive (e.g., enterprise-level data integration across the various lifecycles of service
analytics frameworks). Several appropriate architecture management.
choices must be made in order to satisfy a minimal set of A second problem or challenge concerns the pervasive
usability requirements specific to service analytics. awareness of analytics confidence levels. As discussed
previously, awareness of confidence levels of analytic
Conclusion insights is a requirement, intrinsic to the business model of
The IT services industry faces continual pressure to IT service delivery. There is an open question on how to
improve the quality of its services while simultaneously achieve this in a way that is flexible to match the variety
reducing the service management cost. The scale, of data sources, analytics methods, and service
complexity, and diversity of the managed environment and management processes that integrated them. Also, we must
processes make it difficult to effectively address these consider this agility and open standards, as Cloud-based
conflicting requirements. Service analytics represents an analytics APIs are building momentum. Further relating to
important research and application area, where a variety of confidence awareness, the field evaluation of service
data analysis, modeling, and optimization methods can analytic solutions must be integrated in the service
improve ITSM performance and competency. This paper analytics platform. Current approach for evaluation is for
provided a review of service analytics research from the the data analyst to use the most appropriate data mining
perspective of target business problems. The categories or statistical methods applied to a given training set.
of service analytics include service modeling, performance However, the performance of analytic solutions on actual
prediction, and process optimization. In each of the content and operating conditions is not systematically
categories, we reviewed relevant business questions and collected and exploited. We suggest that in a large-scale,
analytic methods that are used to answer them. content-diverse ITSM domain, the service analytic
Throughout the paper, we drew from our industrial solutions should be monitored for performance in a similar
experience, to highlight the challenges related to approach as we monitor the managed resources. The
development and deployment of service analytics information will support realistic assessment of ROI, and
solutions. Indeed, some of the aforementioned service provide valuable feedback for retraining the models or
analytics research has resulted in significant value for the revising the methods.
IT services business. For example, a modeling solution A second challenge concerns automated identification of
coming from the workforce optimization research has relevant analytic insights. The volume and diversity of
been deployed globally in IBM’s service delivery centers data produced by service management infrastructures is
covering 15,000 service personnel in more than continuously growing. Data analysts, either formally
500 delivery teams. trained or ad hoc users, need to efficiently screen the
While a substantial amount of service analytics research large-volume and diverse content—and identify interesting
and applications exist today, there are several open cases for deeper investigation, rare cases, or new
problems that wait for even better solutions. One problem developments. Automated tools are necessary to guide
or challenge concerns the expansion of the service their comprehensive investigation of the data. It is an
analytics coverage across the service lifecycle processes. open question on how to model a highly customizable,
The focus of service analytics is uneven across the service role-based insight profile, and how to scale the automated

IBM J. RES. & DEV. VOL. 60 NO. 2/3 PAPER 13 MARCH/MAY 2016 Y. DIAO ET AL. 13 : 13
search for insights to the volume of operational data from 4 “Bluemix overview,” IBM Corp , New York, NY, USA
[Online] Available https //www ng bluemix net/docs/#overview/
large-scale IT service provider. overview html#overview
A fourth challenge involves scalable text analytics tools. 5 “Google cloud platform Prediction API,” Google, Mountain
The application of NLP to text descriptors in IT service View, CA, USA [Online] Available https //cloud google com/
prediction/
records is difficult to scale, since each customer or 6 S Galup, J J Quan, R Dattero, and S Conger, “Information
operation domain uses its own abbreviations, notations, technology service management An emerging area for academic
and protocols. This further complicates the development research and pedagogical development,” in Proc ACM SIGMIS
CPR, 2007, pp 46–52
of a common cross customer/domain data set for 7 S Galup, R Dattero, J J Quan, and S Conger, “An overview
performance evaluation. A set of industry specific tools of IT service management,” Commun ACM, vol 52, no 5,
and artifacts will enable efficient and repeatable extraction pp 124–127, May 2009
8 D Ardagna, G Casale, M Ciavotta, J F Pérez, and W Wang,
of entities and relationships—for instance, process-specific “Quality-of-service in cloud computing Modeling techniques
taxonomies and related pattern matching rules, and their applications,” J Internet Serv Appl , vol 5, no 1,
dictionaries of terms and synonyms for software and pp 417–423, Jan 2014
9 Y Li and K Katircioglu, “Measuring and applying service
hardware products, or process semantic representation. request effort data in application management services,” in Proc
A highly relevant text analytics problem is anonymization IEEE Int Conf Serv Comput , Jun 2013, pp 352–359
for personal privacy reasons. As previously discussed, a 10 Y Li and K Katircioglu, “Improving application management
services through optimal clustering of service requests,”
challenge in ITSM organizations that span multiple in Proc SRII Global Conf , 2012, pp 885–894
countries, access to personal content is limited due to 11 S Kim, W Cheng, S Guo, L Luan, D Rosu, and A Bose,
country-level regulations. Very often, this limits the access “Polygraph System for dynamic reduction of false alerts in
large-scale IT service delivery environments,” in Proc Usenix,
of analytics tools to service management records. 2011, pp 24–28
Anonymization techniques are used to remove sensitive 12 L Tang, T Li, L Shwartz, F Pinel, and G Grabarnik,
content, yet course-granularity patterns also discard “An integrated framework for optimizing automatic monitoring
system in large IT infrastructures,” in Proc KDD, 2013,
elements that are relevant for analysis. Methods for pp 1249–1257
efficient anonymization should be developed, customizable 13 R Gupta, H Karanam, L Luan, D Rosu, and C Ward,
for maximum performance across a large variety of “Multi-dimensional knowledge integration for efficient incident
management in a services cloud,” in Proc IEEE SCC, 2009,
service management records. pp 57–64
We expect many of these open problems to be better 14 R Knapper, C M Flath, B Blau, A Sailer, and C Weinhardt,
resolved soon by development of novel artifacts for “A multi-attribute service portfolio design problem,” in Proc
Serv Oriented Comput Appl , 2011, pp 1–7
service process representation. Overall, the application and 15 H Li and M Vukovic, “Intelligent solution discovery for
adaptation of NLP and machine learning methods to IT self-service systems,” in Proc IEEE Int Conf Serv Comput ,
service analytics will continue bringing greater benefits to 2009
16 X Wei, A Sailer, and R Mahindru, “Enhanced maintenance
the management of IT services. services with automatic structuring of it problem ticket data,” in
Proc IEEE Int Conf Serv Comput , 2008, pp 621–624
Acknowledgments 17 Y Diao, L Lam, L Shwartz, and D Northcutt, “Predicting
service delivery cost for non-standard service level agreements,”
We would like to express our gratitude to Yu Deng, Vijay in Proc IEEE/IFIP NOMS, pp 1–9
Naik, and Ronnie Sarkar, all employed by IBM, for many 18 A Agarwal, R Sindhgatta, and G Dasgupta, “Does one-size-fit-all
helpful and constructive discussions that helped us suffice for service delivery clients?” in Proc ICSOC, 2013,
pp 177–191
improve the quality of this paper. 19 J Bogojeska, D Lanyi, I Giurgiu, G Stark, and D Wiesmann,
“Classifying server behavior and predicting impact of
*Trademark, service mark, or registered trademark of International modernization actions,” in Proc CNSM, 2013, pp 59–66
Business Machines Corporation in the United States, other countries, 20 S Güven, C M Barbu, D Husemann, and D Wiesmann,
or both “Change risk expert Leveraging advanced classification and
risk management techniques for systematic change failure
reduction,” in Proc IEEE/IFIP NOMS, 2012,
**Trademark, service mark, or registered trademark of Google, Inc , pp 795–809
Microsoft Corporation, Splunk Corporation, Apache Software 21 S Guven, M Steiner, T Ide, S Makogon, and A Venegas,
Foundation, or Domo, Inc , in the United States, other countries, or “Mining for gold How to predict service contract performance
both with optimal accuracy based on ordinal risk assessment data,”
in Proc IEEE SCC, 2014, pp 315–322
22 L Zia, Y Diao, D Rosu, C Ward, and K Bhattacharya,
References “Optimizing change request scheduling in IT service
1 “IT infrastructure library ITIL service support, version 3,” Office management,” in Proc IEEE Int Conf Serv Comput , 2008,
Government Commerce, London, U K , 2007 pp 41–48
2 R Grossman, “What is analytic infrastructure and why should 23 Y Diao and A Heching, “Staffing optimization in complex
you care?” ACM SIGKDD Explorations, vol 11, pp 5–9, 2009 service delivery systems,” in Proc Int Conf Netw Serv
3 “Microsoft azure machine learning frequently asked questions,” Manage , 2011, pp 1–9
Microsoft Corp , Redmond, WA, USA [Online] Available 24 Y Zhou, L Liu, C Perng, A Sailer, I Silva-Lepe, and Z Su,
http //azure microsoft com/en-us/documentation/articles/ “Ranking services by service network structure and service
machine-learning-faq/ attributes,” in Proc IEEE 20th ICWS, 2013, pp 26–33

13 : 14 Y. DIAO ET AL. IBM J. RES. & DEV. VOL. 60 NO. 2/3 PAPER 13 MARCH/MAY 2016
25 R Knapper, B Blau, T Conte, A Sailer, A Kochut, and 45 “Watson services for bluemix,” IBM Corp , New York, NY, USA
A Mohindra, “Efficient contracting in cloud service markets [Online] Available https //console ng bluemix net/#/
with asymmetric information—A screening approach,” in solutions/solution=watson
Proc IEEE 13th CEC, 2011, pp 236–243 46 C Bishop, Pattern Recognition and Machine Learning New
26 A Sailer, M R Head, A Kochut, and H Shaikh, “Graph-based York, NY, USA Springer-Verlag, 2007
cloud service placement,” in Proc IEEE Int Conf SCC, 2010, 47 T Hastie, R Tibshirani, and J Friedman, The Elements of
pp 89–96 Statistical Learning Data Mining, Inference, and Prediction,
27 A Lall, A Sailer, and M Brodie, “A graphical approach to 2nd ed New York, NY, USA Springer-Verlag, 2011
providing infrastructure recommendations for IT,” in Proc IEEE 48 E Rahm and H Do, “Data cleaning Problems and current
Int Conf SCC, 2008, pp 161–168 approaches,” Bull Techn Committee Data Eng , vol 23, no 4,
28 M R Head, A Sailer, H Shaikh, and D G Shea, “Towards pp 3–14, 2000
self-assisted troubleshooting for the deployment of private 49 C Sapia, G Höfling, M Müller, C Hausdorf, H Stoyan, and
clouds,” in Proc IEEE CLOUD, 2010, pp 156–163 U Grimmer, “On supporting the data warehouse design by
29 B Tak, C Tang, H Huang, and L Wang, “PseudoApp data mining techniques,” in Proc GI-Workshop Data Mining
Performance prediction for application migration to cloud,” in Data Warehousing, 1999, pp 1–8
Proc IFIP/IEEE Int Symp IM, 2013, pp 303–310 50 S Symonenko, S Rowe, and E D Liddy, “Illuminating trouble
30 Y Song, A Sailer, and H Shaikh, “Hierarchical online problem tickets with sublanguage theory,” in Proc Hum Language
classification for IT support services,” in Proc IEEE TSC, Technol Conf NAACL, 2006, pp 169–172
2011, pp 345–357 51 C D Manning and H Schütze, Foundations of Statistical
31 V F Cavalcante, C S Pinhanez, R A de Paula, C S Andrade, Natural Language Processing Cambridge, MA, USA
C R B de Souza, and A P Appel, “Data-driven analytical MIT Press, 1999
tools for characterization of productivity and service quality 52 E -E Jan and B Kingsbury, “Rapid and inexpensive development
issues in IT service factories,” J Serv Res , vol 16, of speech action classifiers for natural language call routing
pp 295–310, 2013 system,” in Proc Spoken Language Technol , 2010,
32 I Giurgiu, J Bogojeska, S Nikolaiev, G Stark, and pp 336–341
D Wiesmann, “Analysis of labor efforts and their impact 53 D Blei, A Ng, and M Jordan, “Latent Dirichlet allocation,”
factors to solve server incidents in datacenters,” in J Mach Learn Res , vol 3, pp 993–1022, 2003
Proc Int Symp Cluster, Cloud Grid Comput , 2014, 54 S Deerwester, S T Dumais, G W Furnas, T K Landauer, and
pp 424–433 R Harshman, “Indexing by latent semantic analysis,” J Amer
33 G Stark, “Three Killer Graphics,” Research Gate, 2014 Soc Inf Sci , vol 41, no 6, pp 391–407, 1990
[Online] Available http //www researchgate net/publication/ 55 T Hofmann, “Probabilistic latent semantic indexing,” in Proc
259763980_Three_Killer_Graphics_v2 SIGIR, 1999, pp 50–57
34 R Liu and J Lee, “IT incident management by analyzing 56 G Miao, Z Guan, L E Moser, X Yan, S Tao, N Anerousis,
incident relations,” in Proc Int Conf Serv Oriented Comput , and J Sun, “Latent association analysis of document pairs,” in
2012, vol 7636, pp 631–638 Proc ACM SIGKDD Conf , 2012, pp 1415–1423
35 W T Tsai, P Zhong, and J Balasooriya, “Services utility 57 K Toutanova, D Klein, C Manning, and Y Singer
prediction on a cloud,” in Proc IEEE Int Conf SOCA, 2011, “Feature-rich part-of-speech tagging with a cyclic dependency
pp 1–4 network,” in Proc HLT-NAACL, 2003, pp 252–259
36 S Daniel and M Kwon, “Prediction-based virtual instance 58 E -E Jan, J Ni, N Ge, N Ayachitula, and X Zhang, “A statistical
migration for balanced workload in the cloud datacenters,” machine learning approach for ticket mining in IT service
Dept Comput Sci (GCCIS), Rochester Inst Technol , delivery,” in Proc Int Symp Integr Netw Manage , 2013,
Rochester, NY, USA, Tech Rep , 2011 pp 541–546
37 H S Gupta and B Sengupta, “Scheduling service tickets in 59 Y Li, T -H Li, R Liu, J Yang, and J Lee, “Application
shared delivery,” in Proc Int Conf Serv Oriented Comput , management services analytics,” in Proc IEEE Int Conf Serv
2012, pp 79–95 Oper Logistics, Informat , 2013, pp 366–371
38 D Loewenstern and Y Diao, “A dynamic request dispatching 60 R Agrawal, T Imieliński, and A Swami, “Mining association
system for IT service management,” in Proc IEEE CNSM, 2012, rules between sets of items in large databases,” in Proc ACM
pp 271–275 SIGMOD, 1993, pp 207–216
39 E Haber and J Bailey, “Design guidelines for system 61 “Cognos software—Business intelligence and performance
administration tools developed through ethnographic field management,” IBM Corp , New York, NY, USA [Online]
studies,” in Proc ACM Symp Comput Hum Interaction Available http //www-01 ibm com/software/analytics/cognos/
Manage Inf Technol , 2007 62 “Tableau server—Rapid-fire business intelligence, today,”
40 N F Velasquez and A Durcikova, “SysAdmins and the need Tableau Softw , Seattle, WA, USA [Online] Available http //
for verification information,” in Proc ACM Symp Comput www tableau com/products/server
Hum Interaction Manage Inf Technol , 2008, pp 4 1–4 8 63 “What is Splunk?” Splunk, San Francisco, CA, USA [Online]
41 Y Diao, H Jamjoom, and D Loewenstern, “Rule-based problem Available http //www splunk com
classification in IT service management,” in Proc IEEE Int 64 “What is domo?” Domo, American Fork, UT, USA [Online]
Conf Cloud Comput , 2009, pp 221–228 Available http //www domo com/
42 R Potharaju, N Jain, and C Nita-Rotaru, “Juggling the Jigsaw 65 Y Thoss, C Pohl, M Hoffmann, J Spillner, and A Schill,
Towards automated problem inference from network trouble “User-friendly visualization of cloud quality ,” in Proc IEEE
tickets,” in Proc NSDI, 2013, pp 127–141 CLOUD, 2014, pp 890–897
43 S Mani, K Sankaranarayanan, V S Sinha, and P Devandu, 66 C J Burges, “A tutorial on support vector machines for pattern
“Panning requirement nuggets in stream of software maintenance recognition,” Knowl Discovery Data Mining, vol 2,
tickets,” in Proc ACM SIGSOFT Int Symp FSE, 2014, pp 121–167, 1998
pp 678–688 67 A L Berger, V J D Pietra, and S A D Pietra, “A maximum
44 J Lenchner, D Rosu, N F Velasquez, S Guo, K Christiance, entropy approach to natural language processing,” Comput
D DeFelice, P M Deshpande, K Kummamuru, N Kraus, Linguistics, vol 22, pp 39–71, 1996
L Z Luan, D Majumdar, M McLaughlin, S Ofek-Koifman, 68 H -K J Kuo and C -H Lee, “Discriminative training of natural
D P, C -S Perng, H Roitman, C Ward, and J Young, language call routers,” IEEE Trans Speech Audio Process ,
“A service delivery platform for server management vol 11, no 1, pp 24–35, Jan 2003
services,” IBM J Res & Dev , vol 53, no 6, 69 L Breiman, “Random forests,” Mach Learn , vol 45, no 1,
pp 1–17, 2009 pp 5–32, 2001

IBM J. RES. & DEV. VOL. 60 NO. 2/3 PAPER 13 MARCH/MAY 2016 Y. DIAO ET AL. 13 : 15
70 C Kadar, J Iria, “Domain adaptation for text categorization coauthor of the book Feedback Control of Computing Systems
by feature labeling,” in Advances in Information Retrieval, Lecture (Wiley 2004) He received an IBM Outstanding Innovation Award
Notes in Computer Science Berlin, Germany Springer-Verlag, in 2005, was named as IBM Master Inventor in 2007, and also
2011, pp 424–435 received an IBM Outstanding Technical Achievement Award in
71 H C Wu, R W P Luk, K F Wong, and K L Kwok, 2013 He is the recipient of the following the 2002 Best Paper
“Interpreting TF-IDF term weights as making relevance Award at IEEE/IFIP (Institute of Electrical and Electronics
decisions,” ACM Trans Inf Syst , vol 26, no 3, pp 1–37, Engineers/International Federation for Information Processing)
2008 Network Operations and Management Symposium, the 2002–2005
72 C Alexander and M Marzullo, “Rethinking applications Theory Paper Prize from the International Federation of Automatic
management A best practice approach to managing your Control, the 2008 Best Paper Award at IEEE International
portfolio,” HP, Palo Alto, CA, USA, HP Viewpoint Paper, 2013 Conference on Services Computing, and the Second Prize of the
[Online] Available http //www8 hp com/h20195/v2/GetPDF 2012 Innovation in Analytics Award from Institute for Operations
aspx%2F4AA4-5494ENW pdf Research and the Management Sciences He served as Program
73 T Wood, P Shenoy, A Venkataramani, and M Yousif, Co-chair for the 6th International Conference on Network and
“Black-box and gray-box strategies for virtual machine Service Management in 2010 and Program Co-chair for the 13th
migration,” in Proc USENIX NSDI, 2007, pp 17–28 IFIP/IEEE International Symposium on Integrated Network
74 D Wheeler, Understanding Statistical Process Control, 3rd ed Management in 2013 He is an Associate Editor of IEEE
Knoxville, TN, USA Statist Process Control Press, 2010 TRANSACTIONS ON NETWORK AND SERVICE MANAGEMENT, Journal of
75 J Oakland, Statistical Process Control, 6th ed Evanston, IL, Network and Systems Management, and Computers & Electrical
USA Routledge, 2007 Engineering An International Journal
76 M Buco, D Rosu, D Meliksetian, F Wu, and N Anerousis,
“Effort instrumentation and management in service delivery
environment,” in Proc Int Conf Netw Serv Manage , 2012, Ea-Ee Jan IBM Global Technology Services Division,
pp 257–260 Poughkeepsie, NY 12601 USA (ejan@us ibm com) Dr Jan is a
77 J De Gooijer and R Hyndman, “25 years of time series Senior Data Scientist in the IBM Global Technology Services (GTS)
forecasting,” Int J Forecast , vol 22, no 3, pp 443–473, 2006 Division He is the technical lead for ATM predictive analytics
78 L Brown, N Gans, A Mandelbaum, A Sakov, H Shen, services Prior to joining GTS in 2015, he was a Research Staff
S Zeltyn, and L Zhao, “Statistical analysis of a telephone call Member at IBM Research He has worked on machine learning and
center A queueing science perspective,” J Amer Statist Assoc , data mining in the areas of speech recognition, natural language
vol 100, no 469, pp 36–50, 2005 processing, machine translation, and multi-lingual information
79 C Bartolini, C Stefanelli, and M Tortonesi, “SYMIAN A retrieval for software research He later worked on image analytics
simulation tool for the optimization of the IT incident for cloud computing, and service analytics for IT delivery Dr Jan
management process,” in Proc IFIP/IEEE Int Workshop was a Research Assistant Professor at Rutgers University before
Distrib Syst , Oper Manage , 2008, pp 83–94 joining IBM He holds 13 U S patents and has published more than
80 Z Aksin, M Armony, and V Mehrotra, “The modern call 50 technical papers He is an IBM Master Inventor and was recently
center A multi-disciplinary perspective on operations recognized by the IBM Corporate Data Science Board as one of
management research,” Prod Oper Manage , vol 16, no 6, the first data scientist practitioners in IBM
pp 665–688, Dec 2007
81 M Cezik and P LaEcuyer, “Staffing multi-skill call centers via Ying Li IBM Research Division, Thomas J Watson Research
linear programming and simulation,” Manage Sci , vol 54, Center, Yorktown Heights, NY 10598 USA (yingliatus ibm com)
no 2, pp 310–323, Feb 2008 Dr Li has been a Research Staff Member at the IBM T J Watson
82 Z Feldman and A Mandelbaum, “Using simulation based Research Center since 2003 She was with the Exploratory
stochastic approximation to optimize staffing of systems with Computer Vision Group before she joined the Service Research area
skills based routing,” in Proc Winter Simul Conf , 2010, in 2011 Dr Li’s research interests include audiovisual content
pp 3307–3317 analysis and management, computer vision, business analytics, and
83 S Agarwal, “Hybridsourcing A novel work allocation service management She has authored or coauthored 30 patents
mechanism to provide controlled autonomy to workers,” and approximately 50 peer-reviewed conference and journal papers,
in Proc IEEE/IFIP NOMS, 2014, pp 1–8 as well as five books and book chapters on various multimedia
84 Y Diao and A Heching, “Closed loop performance management and computer vision related topics Dr Li received her M S and
for service delivery systems,” in Proc IFIP/IEEE Netw Oper Ph D degrees from University of Southern California in 2001 and
Manage Symp , 2010, pp 61–69 2003, respectively She is a senior member of the Electrical and
85 M Vukovic, J Laredo, and S Rajagopal, “Challenges and Electronics Engineers (IEEE)
experiences in deploying enterprise crowdsourcing service,”
in Proc Int Conf Web Eng , 2010, pp 460–467
86 D Rosu, W Cheng, E Jan, and N Ayachitula, “Connecting Daniela Rosu IBM Research Division, Thomas J Watson
the dots in IT service delivery—From operations content to Research Center, Yorktown Heights, NY 10598 USA (drosu@us ibm
high-level business insights,” in Proc IEEE Int Conf Serv com) Dr Rosu is a Research Staff Member in the Service Delivery
Oper Logistics, Informat , 2012, pp 410–415 Management and Analytics department of the IBM T J Watson
87 PMML 4 1—General Structure of a PMML Document, Research Center Her current research interests include Q&A
D M Group [Online] Available http //www dmg org/v4-1/ systems for client-assist on IT trouble resolution In the past, she
GeneralStructure html worked in multiple areas including process modeling and
optimization of IT Service Management processes, productivity
Received March 3, 2015; accepted for publication tools for IT Service Operations, business goal-driven resource
April 3, 2015 management in complex IT environments, operating systems
support for high-performance web servers, distributed web caching
infrastructures, and real-time operating systems Dr Rosu received a
Yixin Diao IBM Research Division, Thomas J Watson Research Ph D degree in computer science from the Georgia Institute of
Center, Yorktown Heights, NY 10598 USA (diao@us ibm com) Technology in 1999, with a dissertation in the area of adaptation in
Dr Diao is a Research Staff Member at the IBM T J Watson complex real-time systems She also holds an M S degree in
Research Center He received his Ph D degree in electrical computer science from Georgia Institute of Technology (1995) and
engineering from Ohio State University in 2000 He has published an MS degree in theoretical computer science from the Faculty of
more than 80 papers in systems and services management, and is Mathematics, University of Bucharest, Romania (1987)

13 : 16 Y. DIAO ET AL. IBM J. RES. & DEV. VOL. 60 NO. 2/3 PAPER 13 MARCH/MAY 2016
Anca Sailer IBM Research Division, Thomas J Watson
Research Center, Yorktown Heights, NY 10598 USA (ancas@us
ibm com) Dr Sailer received her Ph D degree in computer science
from UPMC Sorbonne Universités, France, in 2000 She was a
Research Member at Bell Labs from 2001 to 2003, where she
specialized in Internet services traffic engineering and monitoring In
2003, Dr Sailer joined the IBM T J Watson Research Center
where she is currently a Research Staff Member and IBM Master
Inventor in IBM Watson Health She was the architect lead for
the automation of the IBM SmartCloud* Enterprise Business
Support Services, which earned her, in 2013, an IBM Outstanding
Technical Achievement Award Dr Sailer holds more than two
dozen patents, has coauthored numerous publications in IEEE and
ACM-refereed journals and conferences, and co-edited three books
on network and IT management topics Her research interests
include cloud computing business and operation management,
self-healing technologies, and data mining She is a senior member
of the IEEE

IBM J. RES. & DEV. VOL. 60 NO. 2/3 PAPER 13 MARCH/MAY 2016 Y. DIAO ET AL. 13 : 17

You might also like