Improving Service Level Control With Process Mining

Improving Service level control with process mining
A research that shows how managers can control the service levels of their product using the event log of the incident management system
Subject: Student:
Research project
Ing. R.H.J.C. van Wel
Date: Status:
09 January 2013 Complete
Improving Service level control with process mining| 09 January 2013
Summary
The objective of this research was to examine if the information, which is registered in the event log of the incident management system, can add value in controlling the service level of a product. By using process mining techniques and tools, we were able to get insight in the distribution- and handling activities of the incident management process. During our process discovery phase we discovered that the service level of the product types Desktop and Laptop, rapidly decreases when incidents are handled by two or more assignment groups. In addition, we discovered that the incident management system does not always register the correct timestamp of executed incident handler activities. Also we saw that some incident handlers execute unusual process activities and that the incident management system does not add extra service level time when an incident is reopened after it was closed. Finally we discovered that the company is able to extract data from the event log that can be used as a predictive indicator for an increasing or decreasing workload. The conclusion of this research is that the event log of the incident management system contains enough information to visualize the distribution- and handling activities of the incident management process. By using this information the company is able to be more in control over the service levels of their products.
Colophon
CAI Master of Science program University Leiden
Course element Student Email Version
Research project Ing. R.H.J.C. van Wel Royvanwel@gmail.com 1.4
Pagina 3 van 27
Index
Summary 1 Introduction 1.1 Preface 1.2 Business case 1.3 Research relevance 1.4 Theoretical framework 1.5 Research question 1.6 Scope & delineation 2 Research methodology 3 Research results 3.1 Analyze event log 3.2 Process discovery 3.3 Process conformance 3.4 Workload prediction 4 Conclusion 4.1 Conclusion 4.2 Recommendations 4.3 Discussion References Appendix
2 5 5 5 5 6 7 7 8 9 9 10 13 16 21 21 21 23 25 26
Introduction
This research has been conducted within a company whose name cannot be mentioned for security reasons. 1.1 Preface Many IT companies use incident management processes and incident management systems to control their incident handling process. To manage this process, one can use Key Performance Indicators (KPI). The use of a KPI can, for example, be very helpful to see how good (or bad) the service level of a certain product has performed or how well a related business unit has performed in the handling of incidents. 1.2 Business case Based on an interview, which was held with the Senior Process Manager (SPM) of the company, the SPM states that the company is currently not able to respond quickly enough to increasing workloads. One of the main reasons is that most Business Unit Managers (BUM) focus on a monthly based KPI Incidents Resolved in Time. By using the KPI Incidents Resolved in Time, BUMs can only act on a reactive way because the distribution- and handling process has already occurred. This KPI also does not show how incidents were distributed and handled by the business units. Therefore it is difficult to find the reason why a service level of a product has decreased. 1.3 Research relevance The purpose of this research is to examine if the information, which is registered in the event log of the incident management system, can add value in controlling the service level of a product. Therefore, the first goal is to get insight in the distribution- and handling activities of the incident management process. The second goal is to examine if the information from this event log can be used to predict increasing workloads and see what the effects are of these increasing workloads.
Pagina 5 van 27
1.4
Theoretical framework According to van der Aalst (2011, p.55) the performance of a process or organization can be defined in different ways. Typically, three dimensions of performance are identified: time, cost and quality. For each of these performance dimensions, different Key Performance Indicators (KPIs) can be defined. When looking at the time dimension, the following performance indicators can be identified: The lead time (also referred to as flow time) is the total time from the creation of the case to the completion of the case; The service time is the time actually worked on a case; The waiting time is the time a case is waiting for a resource to become available; The synchronization time is the time an activity is not yet fully enabled and waiting for an external trigger or another parallel branch.
Many systems have some kind of event log often referred to as history, audit trail, transaction log, etc. The event log typically contains information about events referring to an activity and a case. The case (also named process instance) is the thing which is being handled, e.g., a customer order, a job application, an insurance claim, a building permit, etc. The activity (also named task, operation, action, or work item) is some operation on the case. Typically, events have a timestamp indicating the time of occurrence. Moreover, when people are involved, event logs will characteristically contain information on the person executing or initiating the event, i.e., the performer. (van der Aalst, van Hee, 2002) The idea of process mining is to discover, monitor and improve real processes (i.e. not assumed processes) by extracting knowledge from event logs readily available in todays systems. (van der Aalst, 2011). According to van der Aalst (2011, p.9) event logs can be used to conduct three types of process mining, namely: 1. Process discovery The first type of process mining is discovery. A discovery technique takes an event log and produces a model without using a-priori information. [] If the event log contains information about resources, one can also discover resource-related models, e.g., a social network showing how people work together in an organization. 2. Process conformance The second type of process mining is conformance. Here, an existing process model is compared with an event log of the same process. 3. Process enhancement The third type of process mining is enhancement. Here, the idea is to extend or improve an existing process model using information about the actual process recorded in some event log. Whereas conformance checking measures the alignment between model and reality, this third type of process mining aims at changing or extending the a-priori model.
Pagina 6 van 27
This research addresses the Process discovery phase and Process conformance phase. The manner in which these phases have been executed, is described in Section 2. The research conclusion and recommendations are defined in Section 4 and meant to be used for the Process enhancement phase in further research. 1.5 Research question Which information should business unit managers extract from the incident management event log to control the service level of their products? To be able to answer the main research question the following sub questions have to be answered: Sub questions: Sub question 1: Which event log data must be used as information to visualize the distribution- and handling activities of the incident management process? Sub question 2: Which information should business unit managers extract from the event log to be able to predict a workload increase and see what the effects are of these increasing workloads?
1.6
Scope & delineation This research will only focus on incident management activities that were managed by business units and in particular one business unit which we will call EUS (End User Services). Therefore this research will not examine human resource activities. The Process mining tools ProM and Disco will be used to execute the process discovery phase. ProM will be used because it is an open-source tool which has many plugins (e.g. Social networks and Petri nets) that can be used for process analyses. However, the commercial process mining tool Disco is more easy to work for the process conformance phase. The results of the process discovery phase and process conformance phase will be based on quantitative measurements. The quality of the incident management process and related activities, can be discussed when the process enhancement phase is executed. The main research question will answer how managers can control the service level of their products. In this research the meaning of service level control implies that one is able to explain the cause and effects of a service level performance, based on the information that is registered within the event log. If one is able to explain the cause and effects of a service level performance, one also has the ability to share this information and take action when this is necessary.
Pagina 7 van 27
Research methodology
Analyze event log First we need to extract the data, which is registered within the event log of the incident management system. To analyze distribution- and handling activities we need to have a substantial amount of historical data. Therefore we will extract an event log, which contains information of all closed incidents between the period of 01-09-2010 and 30-09-2012. After analyzing this event log, we will determine the research focus and define which data (Case ID, Activity ID, Resource ID and Time dimension) can be used for the process discovery step. Process discovery To determine which information is valuable for controlling the service level of a product, we need to get insight in how the distribution- and handling activities were executed. Therefore, we will visualize the incident management process by using the process mining tool Disco. The results will show information about the distribution activities of the incident management process. To see how the incidents were handled by the business units, we will use a social network plugin from the process mining tool ProM. These results will show how the business units interacted with each other. Process conformance The results of the process discovery phase will be discussed with the SPM of the incident management system and each unusual observation will be defined. After executing the process discovery phase and the process conformance phase, we are able to identify which information is needed to visualize the distribution- and handling activities of the incident management process (Sub question 1). Workload prediction To predict an increasing or decreasing workload, we will build hypotheses and examine them based on the information that is registered in the event log. In each hypothesis we will explain our assumptions, explain what must be done to examine these assumptions and analyze the results. This hypothesis cycle will be continued until we are able to define which information business unit managers should extract from the event log to be able to predict a workload increase and see what the effects are of these increasing workloads (Sub question 2). After answering sub question 1 and sub question 2, we are able to conclude which information business unit managers should extract from the incident management event log to control the service levels of their products (Main research question). Process enhancement The research conclusion and recommendations are defined in Section 4 and meant to be used for the Process enhancement phase in further research.
Pagina 8 van 27
Research results
3.1
Analyze event log The extracted event log consists of a lot of information. To define which information we can extract from the event log, we have aggregated the data into one table1. The event log encompasses the numbers, which the incidents are registered on. These incident numbers can be used as Case IDs. The executed activities (Opened, Assignment, Resolved, Closed) can be used as Activity IDs. Each activity ID is linked to a Time Stamp and a Resource ID. This Recourse ID shows the name of the business unit (named assignment group in the event log) that executed the activity. To distinguish the different types of assignment groups, we will rename these assignment groups in the event log before we execute het process discovery phase. The assignment group, which is linked to an open activity, will be called Control group. According to the SPM, this type of resource is responsible for managing the incident to the assignment group that is responsible for resolving the incident. The assignment groups, which handle the activities after the control group, will be called First Reassignment group, Second Reassignment group, Third Reassignment group, Fourth Reassignment group, Fifth Reassignment group and Nth Reassignment group2. The assignment group, which is linked to a resolved activity, will be called a Resolved group. The assignment group, which is linked to a closed activity, will be called Closed group. This assignment group is always the same assignment group as the Control group. The event log also shows which incident has Breached the service level time and which incidents were Resolved in time. Therefore we can use this type of information to determine the service level performance of a product. Each incident is registered on a Product type. Therefore we can use this information to filter on specific product types that were managed by the business unit EUS. We can use the Elapsed time3 data to see how much time it has taken to resolve an incident. This time-type only measures the time, which is stipulated according to the Service Level Agreement of a product. Research focus Our event log covers all incidents that were closed between the period of 01 September 2010 and 30 September 2012. In this period the company had resolved 162677 incidents. These incidents were registered on 1679 different product types. 52546 of the incidents were registered on the product types Desktop and Laptop and managed by the Business unit EUS. Therefore we will continue this research by focussing on all incidents that were registered on the product types Desktop and Laptop.
1
2
See appendix Event log information Nth reassignment groups means 6=< reassignment group Elapsed time = service level time (measured time between opened time and resolved time)
Pagina 9 van 27
Trimmed mean Looking at the spread of the elapsed times, we see that there are several outliers. The maximum recorded elapsed time is 3403 hours and the minimum recorded elapsed time is 0,0 hour. We will call the outliers with high elapsed time top-outliers and outliers with low elapsed time bottom-outliers. Moore and McCabe (2006) describe outliers as individual values that fall outside the overall pattern. The trimmed mean is a measure of centre that is more resistant than the mean but uses more of the available information than the median. Trimming eliminates the effect of a small number of outliers. Identifying outliers is a matter for judgement. Look for points that are clearly apart from the body of the data, not just the most extreme observations in a distribution. Moore & McCabe (2006)
According to the SPM, these outliers should not be taken into account for this research, because these outliers are unusual circumstances and will affect the research results in a negative way. Therefore we will compute a 5% trimmed mean. To execute this 5% trimmed mean, we discarded 5% of the top-outliers and 5% of the bottom-outliers. After trimming the top-outliers and bottom-outliers, the event log consists of 47290 incidents. The maximum elapsed time is 309 hours and the minimum elapsed time of 1,3 hours. Table 3.1.1 shows the amount of incidents that were controlled or resolved by the business unit EUS. Table 3.1.2 shows the amount of incidents that were controlled and resolved by the business unit EUS. Table 3.1.3 show the amount of incidents that were controlled by the business unit EUS and resolved by other business units. Table 3.1.1 Incidents divided per control group and Resolved group
Control group Business unit EUS Other Business units Total Control group Business unit EUS Control group Business unit EUS Opened 47257 33 47290 Resolved group EUS 20598 Resolved group ALL except EUS 26652 Resolved 20600 26690 47290
Table 3.1.2 Controlled and Resolved incidents by Business unit EUS
Table 3.1.3 Control group EUS / All resolved groups except EUS
3.2
Process discovery Figure 3.2.1 shows the process model that Disco has discovered based on the 47257 incidents that were managed by the business unit EUS. The process model visualizes the flow of the incident distribution process. The arrows show how the incidents were forwarded between the Control group (01 Opened), Assignment groups (02 First Reassignment group, 03 Second Reassignment group, 04 Third Reassignment group, 05 Fourth Reassignment group, 06 Fifth Reassignment group and 07 Nth Reassignment group), Resolved group (06 Resolved) and Closed group (09 closed). The frequency of the activities are visualized per colour (low frequency = light blue & high frequency = dark blue), by number and thickness of the arrows (low frequency = small arrow & high frequency = thick arrow). We will comment on the process model in section 3.3
Pagina 10 van 27
Figure 3.2.1 Process model
Incident handling process To visualize how the incidents were handled between the control groups and the resolved groups, we used a social network plugin within the process mining tool ProM. Hereby we divided the results by:
Incidents that were managed (controlled) and resolved by the business unit EUS (Figure 3.2.2); Incidents that were managed (controlled) by the business unit EUS and resolved by other business units (Figure 3.2.3).
The size of the circles illustrate the number of incidents that each control group or resolved group handled. The arrows show the relation between the control groups and resolved groups. The colours are used to divide the control groups and resolved group from each other. Figure 3.2.2 Control groups EUS & resolved groups EUS
Figure 3.2.3 Control groups EUS & all resolved groups except EUS
We see that Control group 1 EUS managed most incidents within the business unit EUS, but also the incidents that were resolved by other business units. By generating these social networks (Figure 3.2.2 and Figure 3.2.3) we see how many different types of resolved groups exists. BUMs can distinguish the importance of a relationship by creating these social networks and use this information to control the service level of their product.
3.3
Process conformance The process model gives a good overview of the handling of the incident management process. However, looking at the process model and data in the event log we also observe some unusual process activities, namely: Observation 1 We would not have expected that incidents need to be reassigned to an assignment group after an incident is closed. In each case this activity occurs, the time difference between the registered activities Closed and Reassignment group is 1 second (e.g. Table 3.3.1). We assume that both activities were executed simultaneously by one resource, however the system registered those activities with a small time difference. This activity should not occur because it illustrates a wrong perspective on the incident distribution- and handling process. Therefore we recommend that the SPM should examine this observation further. Table 3.3.1 example reassignment activity after closed activity
Activity Opened Reassignment Group Resolved Closed Reassignment Group Resource Control group 1 EUS Assignment group 1 Resolved group 1 Closed group 1 EUS Assignment group 1 Date 04.06.2011 06.06.2011 06.06.2011 06.06.2011 06.06.2011 Time 23:43:56 7:34:20 10:01:01 10:01:29 10:01:30
Observation 2 We would not have expected that an incident needs to be reassigned to another assignment group after an incident is resolved. It seems that incident handlers execute additional tasks after the incident was resolved (e.g. Table 3.3.2). These activities do not influence the service level performance, because the incident is already resolved. However, this sort of activity should not be executed according to the process model. Therefore we recommend that the SPM should examine this observation further. Table 3.3.2 example reassignment activity after resolved activity
Activity Opened Reassignment Group Resolved Reassignment Group Closed Resource Control group 1 EUS Assignment group 1 Resolved group 1 Assignment group 2 Closed group 1 EUS Date 17.05.2011 17.05.2011 25.05.2011 25.05.2011 25.05.2011 Time 0:25:42 7:38:59 9:00:11 9:08:01 9:08:38
Observation 3 When an incident is closed, it is possible to reopen the incident. For example, when the end user is not satisfied with the resolved solution. However, when an incident is reopened, the incident management system does not restart elapsed time. As a result, the actual service level time is not registered correctly.
Pagina 13 van 27
In Table 3.3.3 we can see that the registered elapsed time of a case, is 6 hours and 28 minutes. This time is based on the opened activity (16.08.2012 / 8:17:11) and first resolved activity (16.08.2012 / 14:45:45) . Because the incident was reopened, the elapsed time should be measured up until the second resolved activity that was executed on 30.11.2012 / 9:01:52. Because the incident management system does not register the actual elapsed time, it is likely that more incidents breached the service level time. Therefore we recommend that the SPM should examine the cause of this type of occurrence and show how this effects the service level performance. Table 3.3.3 Actual elapsed time vs. registered elapsed time
Activity Opened Reassignment Group Resolved Closed Reopen Reassignment Group Reassignment Group Reassignment Group Reassignment Group Resolved Closed Resource Control group 1 EUS Assignment group 1 Resolved group 1 Closed group 1 EUS Control group 1 EUS Assignment group 1 Assignment group 2 Assignment group 3 Assignment group 4 Resolved group 2 Closed group 1 EUS Date 16.08.2012 16.08.2012 16.08.2012 16.08.2012 17.08.2012 17.08.2012 23.08.2012 23.08.2012 23.08.2012 30.11.2012 30.11.2012 Time 8:17:11 8:21:12 14:45:45 14:47:18 12:35:33 12:56:50 9:08:25 9:21:25 16:33:00 9:01:52 9:06:22
Observation 4 Only the time dimension elapsed time is usable without modifying the original event log data. This is because the incident management system has already calculated the actual service level time. Therefore we cannot measure, for example, the waiting time between the activity opened and activity first reassignment. Also the process mining tool Disco does not provide a filter method that exclusively measures the service level time. To solve this problem we built a formula into the event log, which measures only the service level window time. Observation 5 By using the formula, as described in Observation 4, we can measure the lead time and two types of waiting times, namely:
Waiting time between opened activity and first reassignment activity; Waiting time between resolved activity and closed activity.
It is not possible to measure the time dimension service time and synchronization time, because the event log does not provide data that is usable to measure these types of time dimensions. As we cannot measure the service time and synchronization time, it is also difficult to measure the amount of skills (human resources) that are needed to cope with the current (or future) workload. Therefore we recommend that the SPM should examine if the incident management system is able to measure the time dimensions service time and synchronization time.
Pagina 14 van 27
Observation 6 As we have distinguished the assignment groups from each other in Section 3.1, we also can examine the effect of the service level performance when incidents are handled by one or more assignment groups within the business unit EUS. Table 3.3.4 shows the effect of the decreasing service level performance (% Resolved in time Business unit EUS) when incidents are handled by one or more assignment groups. By comparing the % Resolved in time Business unit EUS with the % Average norm, we observe that incidents, most likely, do not meet the service level norm when they are not resolved after the first reassignment group. This effect is illustrated in Figure 3.3.5. In addition, we observe that there are more incidents closed after they were forwarded to three different assignment groups (Third reassignment group) instead of two assignment groups (Second reassignment group). Table 3.3.4 Service level performance
# Resolved in time # Breached # Total % Resolved in time Business units *EUS* 96.4% 84.4% 74.7% 68.2% 67.2% 52.7% 45.1% 81.9% % Average norm
No Reassignment group First Reassignment group Second Reassignment group Third Reassignment group Fourth Reassignment group Fifth Reassignment group Nth Reassignment group Total
108 13063 911 984 176 89 55 15386
4 2406 308 458 86 80 67 3409
112 15469 1219 1442 262 169 122 18795
85% 85% 85% 85% 85% 85% 85% 85%
These results show the importance that incidents must be assigned to correct assignment group in order to meet the service level norm. Therefore we recommend that the SPM examines how incidents can be forwarded more efficient in order to meet service level norm. Figure 3.3.5 Service level performance
Pagina 15 van 27
Answering sub question 1 Which event log data must be used as information to visualize the distribution- and handling activities of the incident management process? To visualize the distribution- and handling activities the following data must be used:
Event log data Incident number Opened activity Assignment activity Resolved activity Closed activity Time stamps Elapsed time Control group Assignment group Resolved group Closed group Information Case ID Activity ID Activity ID Activity ID Activity ID Waiting time & total time Service level time Resource ID Resource ID Resource ID Resource ID
During the process of answering sub question 1 we also discovered that if we rename the assignment groups, we were able to generate social networks to see how the assignment groups interact with each other. By using the data Resolved in time we were able to show what the effects are of the service level performance when incidents are handled by one or more assignment groups. The given recommendations are defined in Section 4.2 and discussed with the SPM in Section 4.3. 3.4 Workload prediction In this section we will use the answers of sub question 1 to find information that will predict an increasing workload. Based on the information from the event log we will build hypotheses and examine them. In each hypothesis we will explain our assumptions, explain what must be done to examine these assumptions and analyze the results. This hypothesis cycle will be continued until we are able to define which information business unit managers should extract from the event log to be able to predict a workload increase and see what the effects are of these increasing workloads. Hypothesis 1 We assume that the time dimension average waiting time first assignment and total average elapsed time will be affected when an increasing workload occurs. We assume that the number of incidents that breached the service level time will increase when the workload increases (number of opened incidents). Examine hypothesis 1 To examine our hypothesis, we need to count the number of opened and closed incidents and compare these results with the number of incidents that were resolved in time and/or the incidents that breached the service level time. In addition we will add the time dimensions and analyze if the time dimension can be related with an increasing workload.
Pagina 16 van 27
Observation hypothesis 1 Based on Figure 3.4.1 and Figure 3.4.2 we conclude that we cannot relate a time dimension with an increasing workload. We see that numbers Resolved in time and Breached are related with the numbers closed and resolved but none of these numbers show predictive signals that are usable for the BUM to act upon. Figure 3.4.1 Results hypothesis 1
Figure 3.4.2 Results hypotheses 1
To be able to act proactive on an increasing workload, we need to find information within the event log that will predict this increasing workload. Based on the information that is visualized in Figure 3.4.1 and Figure 3.4.2 we do not see any warning signals that show that the service level of the product is increasing or decreasing. When the amount of Opened incidents increases this effects the values, Closed, Resolved in time and Breached. Because we want to extract information to control the service level of their products, we need information that will tell us what the effects are on the incidents that have breached the service level time or have resolved in time. Therefore we created the second hypothesis. Hypothesis 2 We assume that when a business unit is not able to handle an increasing workload, the value Resolved in time will decrease and the value Breached will increase. Therefore we think that the difference between the value Opened and Resolved in time will correlate with the value breached. Execute hypothesis 2 To examine our hypothesis we need to subtract the value Resolved in time from the value Opened and analyze the relation between this value (Resolved in time Opened) with the value breached. In addition we will show how these results effect the service level performance.
Pagina 17 van 27
Figure 3.4.3 Results hypothesis 2 Relation (Open-resolved in time | Breached)
Figure 3.4.4 Results hypothesis 2 service level performance
Observation hypothesis 2 In Figure 3.4.3 we can see that value Opened Resolved in time relates with the value Breached. In addition we see that the value Opened Resolved in time has a predictive character when the workload rapidly increases or decreases. Also we see that when the value Opened Resolved in time increases or decreases, this effects the service level performance in a later time period. To examine how good the values Opened Resolved in time and Breached correlate4 with each other, we will measure the correlation coefficient of the two values.
Correlation Opened resolved in time Breached r 0,74
The correlation coefficient confirms that these two values have a relatively strong relationship.
4
The correlation measures the direction and strength of the linear relationship between two quantitative variables. The correlation (r) is always a number
between -1 and 1. Values of r close to -1 or 1 indicate a close linear relationship (Moore & McCabe, 2006).
Now we will examine hypothesis 2 again to see if the value Opened Resolved in time also has a predictive character, based on weekly results. Figure 3.4.5 Weekly results hypothesis 2 (Open-resolved in time | Breached)
Figure 3.4.5 Weekly results hypothesis 2 service level performance
Observation hypothesis 2 In Figure 3.4.5 we can see that value Opened Resolved in time still shows a predictive character. However, it seems that the relation between the two values is less accurate. When we examine the correlation coefficient based on these weekly results, we see that our assumption is correct.
Correlation Opened resolved in time Breached r 0,59
Pagina 19 van 27
Answering sub question 2 Which information should business unit managers extract from the event log to be able to predict a workload increase and see what the effects are of these increasing workloads? Based on our examinations, we observed that if the number of incidents, that were Resolved in time, are subtracted from the number of incidents that were opened, that this value has a predictive character compared with the number of incidents that breached the service level time. In addition we see that when the value Opened Resolved in time increases or decreases, this effects the service level performance in a later time period. Our results also show that the value Opened Resolved in time indicates a higher predictive character based on monthly results, compare to weekly results. According to the SPM, most BUMs focus on the KPI incidents resolved in time. Our research results show that the value Opened Resolved in time can be used to predict the effect of the service level performance. By using this value, BUMs can act more proactive and therefore can be more in control of their service level. Observation 7 Because the results on a monthly overview are more accurate than the results on a weekly overview, we recommend the BUMs to use the monthly overviews for long term decision making and use the weekly overviews to see what the effects are when short term decisions are made.
Conclusion
4.1
Conclusion Which information should business unit managers extract from the incident management event log to control the service level of their products? To visualize distribution- and handling activities, BUM should extract the following information from the event log of the incident management system and import this information into a process mining tool.
Event log data Incident number Opened activity Assignment activity Resolved activity Closed activity Control group Assignment group Resolved group Closed group Information Case ID Activity ID Activity ID Activity ID Activity ID Resource ID Resource ID Resource ID Resource ID
By renaming the assignment groups to first reassignment group, second reassignment group, etc., the BUM can create social networks and visualize how business units interact with each other. By comparing the number of incidents that were Resolved in time and the number of incidents that were closed in the same time period, the BUM can calculate the service level performance percentage (KPI incidents resolved in time). However, by adding the variable assignment groups, the BUM is also able to see effect of the service level performance when incidents are handled by one or more business units. The BUM can filter on the data Product type to focus on the products that are related to his/her responsibility. By subtracting the number of incidents that were resolved in time from the number of incidents that were opened in that time period, the BUM can use this information as a predictive indicator to see how the service level of an product will perform in the future if no action is taken into account. Our research results show that this predictive indicator is more accurate on a monthly based overview compare to a weekly based overview. 4.2 Recommendations Based on our observation the following recommendations are made; Recommendation 1 During the process conformance phase, we observed that in 80 cases the incident management system registers a wrong time stamp on the closed activity in the event log. This type of occurrence illustrates a wrong impression of how incidents were
Pagina 21 van 27
handled. Therefore we recommend that the settings of the incident management system should be changed, so that BUMs have an accurate view of the incident management distribution process. Recommendation 2: During the process conformance phase, we observed that, in 72 cases, an incident handler executes unusual activities in the incident management system, between the period an incident handler resolves an incident and the period an incident handler closes an incident. According to the SPM, this type of activity should not be executed according to the incident management process. Therefore, we recommend that the SPM examines the cause of this type of activity. The solution can be found in two types of changes:
1. The incident handler must execute this type of activity to be able to close the incident. In this case the SPM needs to change the incident management process model; 2. The incident handler executes an unnecessary activity. Therefore the incident handler needs to be briefed how the incident should be handled within the incident management system.
Recommendation 3: The incident management system does not add the extra elapsed time when an incident is reopened after it was closed. This means, that the chances are relatively high that many incidents breached the service level time after they were reopened. This affects automatically the service level performance of a product. Therefore we recommend that the SPM examines the cause of this type of occurrence and show how this affects the service level performance. If the effects on the service level performance are relatively high, we recommend that the settings of the incident management system should be changed, so that BUMs will have more accurate information to control the service level of his product. Recommendation 4: It is difficult to measure the amount of skills (human resources) that are needed to stay in control with the service level control, because the event log does not provide activity data that can be used to measure the time dimension Service time and synchronization time. We recommend that the SPM examines the possibility to measure the service times. If this is possible, the BUM can compare this information with the number of skills (human resources) and the KPI incidents resolved in time and calculate the amount of extra skills (human resources) that are needed when a service level decreases. Recommendation 5: The performance of the service level rapidly decreases when incidents are handled by two or more assignment groups. Therefore it is important that the quality of the information, by which incidents are registered on, increases, so that the incident coordinator knows which business unit must resolve the incident. If this quality can be increased, this will automatically affect the performance of the service level in a positive way. We recommend that the SPM examines how the quality of the information can be increased so that incident handlers can act more efficiently and more effectively.
Pagina 22 van 27
Recommendation 6: By subtracting the number of incidents that were resolved in time from the number of incidents that were opened in that time period, the BUM can use this information as a predictive indicator to see how the service level of an product will perform in the future if no action is taken into account. Because the results on a monthly overview are more accurate than the results on a weekly overview, we recommend the BUMs to use the monthly overviews for long term decision making (approximately 4 weeks) and use the weekly overviews to see what the effects are when short term decisions (approximately 1 week) are made.
4.3
Discussion Based on the conclusions and recommendations, the SPM stated the following: The research results are very interesting, because now we know that we can use valuable information from the event log of the incident management system to control the performance of our service levels. Recommendation 1: Although these unusual time stamp registrations will not affect the performance of the service level, it is interesting to see that process mining techniques can visualize these kinds of problems. We always strive to improve our processes including the systems that support the handling of these processes. Therefore I will ask an expert to examine this problem and change the registration activities when this is possible. Recommendation 2: This unusual activity also does not affect the performance of the service level. Based on your observation, I would like to know why this activity is executed. Therefore I will ask a process manager to examine this type of activity and make changes when this is needed. Recommendation 3: It is important that the incident management system registers the absolute elapsed time. Therefore I will examine how much percentage of the incidents were reopened after they were closed. If this percentage is significant, than we will look for possibilities of how we can measure and register the absolute elapsed time within our incident management system. Recommendation 4: At the moment it is not possible to measure the service time from the incident management system. To measure the service times, we extract data from our Enterprise Recourse Planning (ERP) system and compare this information with the incidents that were resolved per employee. These results give us a good estimation of how many extra skills (human resources) are needed. Therefore we do not need to examine the possibilities to measure the service time from the incident management system. Recommendation 5: By visualizing the effects on the performance of a service level when incidents are handled by two or more assignment group, we see the quality of information, by which incidents are registered on, must be improved. If we are able to improve the quality of information, incidents will be resolved quicker. In addition, if the incidents
Pagina 23 van 27
coordinators are more capable to assign the incidents to the correct assignment group, the workload of other assignment groups will decrease. These effects will increase the performance of the service levels. Therefore I will examine how we can improve the quality of information by which incidents are registered on. Recommendation 6: The research results show that we can use relatively simple data that can be used as a predictive indicator to control our service levels in a proactive way. Unfortunately we have to use the variables only based on judgment. I will investigate if we can use these variables on different product types. If so, then I will inform the BUM to use these variables and see what the effects are on our service levels.
Pagina 24 van 27
References
Literature: Jonker, J. & Pennink, B.J.W. (2004). De kern van methodologie. De kern van organisatieonderzoek. 2e dr. Assen: Koninklijke Van Gorcum. Leeuw, A.C.J. de. (2005). Bedrijfskundige methodologie. Management van onderzoek. 6e dr. Assen: Koninklijke Van Gorcum. Turban, E. & Sharda, R. & Delen, D. (2011). Decision Support and Business Inteligence Systems. 9th edition New Jersey: Pearson Education, Inc. Aalst, W.M.P. van der (2011). Navigeren met process mining. Automatisering Gids. Aalst, W.M.P. van der & Reijers, H.A. & Weijters, A.J.M.M. & Dongen, B.F. van & Alves de Medeiros, A.K. & Song, M. & Verbeek, H.M.W. (2007). Business process mining: An industrial application. Information systems Volume 32, issue 5, pages 713-732. Amsterdam: Elsevier. Aalst, W.M.P. van der (2011). Process mining. Dordrecht: Springer. Aalst, W.M.P. van der & Hee, K.M. van (2002) Workflow Management: Models, Methods, and Systems. Cambridge: MIT press. Moore, D.S. & McCabe, G.P. (2006) Introduction to the practice of statistics, fifth edition. W.H. Freeman and Company.
Internet sources Process mining tooling: Process mining tool ProM www.process mining.org Process mining tool Disco http://www.fluxicon.com
Pagina 25 van 27
Appendix
Pagina 26 van 27
Event log information

Nr. 1 2 3 4 5 6 9 10 11 15 17 17 18 19 Name Agreement ID Assignee Assignment group Breached Brief Description Calamity Closed Group Closed By Closed on (date/time) Company Control group Impact Incident Incident (type) Description Service level contract number Incident handler who executed the activity Business unit to which the incident is assigned when the activity were handled Was the incident resolved in time (True, False) One liner incident problem Did the incident lead to an calamity Business unit who has closed the incident Incident handler who closed the incident Date and time when the incident is closed Company who registered the incident Business unit who was responsible for managing the incident. Which impact is related to the incident (e.g. Users, Site, Enterprise) Registered incident number - Incident: a (potential) disruption of an agreed service. - Pro-active incident: a (system) message, which is (still) no disruption of service provides. - Information request: a question about a service - User support: a request to provide user support to a service. If the incident is linked to a problem Service level resolve time (e.g.4 hours, 11 hours, 33 hours, 110 hours) Incident handler who opened the incident Date and time when the incident is opened Which priority did the incident get based on the impact variable & Urgency variable. (e.g. Low, Standard, High, Major, Critical) Product type name (e.g. Desktop, Laptop) Assignment group who resolved the incident Incident handler who resolved the incident Date and time on which the incident is resolved Name of the Service Level Agreement When was the incident closed (date en time) When must the incident be resolved (date time) Service Level Object name Date and time when the incident is opened Is the incident currently suspended? (True/False) Closed, Open Which urgency variable did the incident get? (.g. Low, Normal, Major) Measured time between open activity and resolved activity
20 21 22 23 27 28 31 32 33 35 36 38 40 41 44 45 46 47
Linked to problem Norm Opened by Opened on (date/time) Priority Problem Type Resolved group Resolved by Resolved on (date/time) SLA title SLO end date/time SLO expiration date/time SLO name SLO start date Suspended Ticket status Urgency Elapsed time
Pagina 27 van 27

Improving Service Level Control With Process Mining

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Improving Service Level Control With Process Mining

Uploaded by

Copyright:

Available Formats

Improving Service level control with process mining

Ing. R.H.J.C. van Wel

09 January 2013 Complete

Improving Service level control with process mining| 09 January 2013

Improving Service level control with process mining| 09 January 2013

Course element Student Email Version

Research project Ing. R.H.J.C. van Wel Royvanwel@gmail.com 1.4

Improving Service level control with process mining| 09 January 2013

Improving Service level control with process mining| 09 January 2013

Improving Service level control with process mining| 09 January 2013

Improving Service level control with process mining| 09 January 2013

Improving Service level control with process mining| 09 January 2013

Improving Service level control with process mining| 09 January 2013

Improving Service level control with process mining| 09 January 2013

Table 3.1.2 Controlled and Resolved incidents by Business unit EUS

Improving Service level control with process mining| 09 January 2013

Figure 3.2.1 Process model

Improving Service level control with process mining| 09 January 2013

Improving Service level control with process mining| 09 January 2013

Improving Service level control with process mining| 09 January 2013

Improving Service level control with process mining| 09 January 2013

108 13063 911 984 176 89 55 15386

4 2406 308 458 86 80 67 3409

112 15469 1219 1442 262 169 122 18795

85% 85% 85% 85% 85% 85% 85% 85%

Improving Service level control with process mining| 09 January 2013

Improving Service level control with process mining| 09 January 2013

Figure 3.4.2 Results hypotheses 1

Improving Service level control with process mining| 09 January 2013

Figure 3.4.3 Results hypothesis 2 Relation (Open-resolved in time | Breached)

Figure 3.4.4 Results hypothesis 2 service level performance

Improving Service level control with process mining| 09 January 2013

Figure 3.4.5 Weekly results hypothesis 2 service level performance

Improving Service level control with process mining| 09 January 2013

Improving Service level control with process mining| 09 January 2013

Improving Service level control with process mining| 09 January 2013

Improving Service level control with process mining| 09 January 2013

Improving Service level control with process mining| 09 January 2013

Improving Service level control with process mining| 09 January 2013

Improving Service level control with process mining| 09 January 2013

Improving Service level control with process mining| 09 January 2013

Event log information

You might also like