IMPLEMENTING A SOFTWARE PROBLEM MANAGEMENT MODEL, A CASE STUDY

Jäntti Marko, Miettinen Aki
University of Kuopio, Department of Computer Science, P.O.B 1627, FIN-70211 Kuopio, Finland mjantti@cs.uku.fi

Abstract. The primary goal of software problem management is to minimize the impact of problems on the business and to identify the root cause of problems. At present, many organizations are planning to implement a problem management model that is compliant with IT Infrastructure Library (ITIL) framework. However, the ITIL framework is a heavy standard with a large number of difficult concepts. IT organizations need practical guidelines to be able to implement ITIL-based processes. The purpose of this study is to provide a checklist of issues that are essential for implementing the problem management process. The research question in this paper is: what are the requirements for implementing a software problem management model. A case study research method is used in this study to evaluate requirements.

Introduction
Software problem management is an important activity within software support and maintenance processes (ISO/IEC 1995). Both IT companies and IT customers need systematic approaches for managing software problems such as unavailability of IT services, software failures, poor performance, and poor usability. According

to a recent IT service management study, many organizations are planning to implement the problem management process in the near future and consider it as one of the most important process development targets (Materna 2005). Hence, we focus our research on software problem management and examine how IT service provider organizations can use problem management methods to improve the quality of IT services. Most studies in this research area can be classified either into 1) software defect management studies or 2) software problem management studies. Traditional defect management studies (e.g. Hirmanpour and Schofield 2003, Leszak et al. 2000, Frederick and Basili 1998, Mays 1990, Quality Assurance Institute 1995) are solely focused on software defect management. The main goal of defect management is to detect and remove defects early in the software life cycle (Florac 1992). Defect management is also one of the key process areas of the Capability Maturity Model (CMM) (Jalote 2000). Problem management studies belong to the second category and are typically based on some IT service management framework such as IT Infrastructure Library (ITIL) that was developed in the end of 1980's by British Government (Sallé 2004, OGC(1) 2002). ITIL provides guidance for IT service management, such as a systematic framework for managing problems in IT services. IT services can be customer tailored software projects by system integrators, ASP services, consultation, training and hosting services etc. Many organizations have started to "ITILize" their old service management processes because ITIL has become the world-wide de facto standard in service management (OGC(1) 2002, Hochstein 2005). However, introduction of ITIL framework might be difficult because it includes a lot of complex service management concepts. This study continues the work of our previous study where we identified following difficulties regarding defect management models: the defect / problem management process is seldom company-wide, project teams use different problem management methods which cause difficulties for the problem data analysis, there are limited resources for fixing defects and problems, there is need for good metrics for IT service problem management, customers complain that they do not receive problem resolution reports from IT providers and that the IT providers do not admit that there are bugs in their software, and the customers want a single point of contact service instead of "this is not our bug" service. A good problem management model should attempt to find a solution for above mentioned problems. The main problem is that organizations do not have enough information or experience to implement a problem management process that meets requirements of the ITIL framework. One additional challenge is how to automate the problem management process with a tool. In this paper, we provide a list of requirements that help organizations to implement the software problem management process. The general research question in this paper is: what are the requirements for an ITIL-based problem

management model. In this paper, we examine the problem management process of an IT service provider company. The rest of the paper is structured as follows: The second section describes the key issues required in problem management. Third section presents the research methods used in this study. The fourth section presents findings from a case study and finally the discussion and conclusions are presented.

Requirements for a Software Problem Management Model
Software engineering standards use various definitions for software problems. The problems are called defects, faults, anomalies, errors and bugs etc. In the IEEE Standard Dictionary, defects are defined as product anomalies, for example "omissions and imperfections found during early life cycle phases and faults contained in software sufficiently mature for test or operation" (IEEE 1989). According to the IEEE standard anomaly is "any condition that deviates from expectations based on requirements specifications, design documents, user documents, standards, etc., or from someone's perceptions or experiences. Anomalies may be found during, but not limited to, the review, test, analysis, compilation, or use of software products or applicable documentation" (IEEE 1994). Quality Assurance Institute (1995) has defined the defect as "an instance of one or more baselined product components not satisfying their given set of requirements". Additionally, a fault can be defined as "a manifested inability of a system or component to perform a required function within specified limits" (Binder 1999). An ITIL-based problem management process focuses on minimizing the impact of incidents and problems on the business. In this paper we focus on identifying requirements for the implementation of a software problem management model. The following requirements were derived based on the research literature. Requirement 1. Establish a Service Desk. IT organizations should implement a service desk that is a single point of contact for users of IT services (KajkoMattsson 2003). The service desk is responsible for collecting all the incidents. The goal is that the service desk could resolve as many incidents as possible and to achieve a good first-time fix rate at the service desk. Requirement 2. Define the lifecycle for incidents (OGC 2002): The typical lifecycle is 1) Incident, 2) Problem, 3) Known error, and 4) Request for Change (RFC). An incident can be defined as "any event which is not part of the standard operation of a service and which causes, or may cause, an interruption to, or a reduction in, the quality of that service". An incident does not necessarily lead to a problem or a defect. They can also be service requests (for example, a user needs instructions or advice). A problem is "an unknown underlying cause of one or

more incidents". A Known error is "an incident or problem for which the root cause is known and for which a temporary Work-around or a permanent alternative has been identified". There should be a traceable chain between incidents, problems, known errors and change requests. Requirement 3. Identify two different dimensions of problem management: 1) proactive and 2) reactive problem management. The purpose of the proactive problem management is to identify and resolve problems and known errors before any incident related to them occurs (ITILPeople 2005). Reactive problem control focuses on identifying the underlying cause of reported incidents. Requirement 4. Establish a problem management repository and a knowledge base. The ITIL framework recommends that problem records are stored in the Configuration Management Database (CMDB) but usually a CMDB is implemented as a separate database. A knowledge base is a database that contains information on problems, known errors, and their resolutions (Davis 2002). Requirement 5. Establish the problem control activity as follows (OGC 2002). Problem control begins when the analysis of incident data reveals repetitive incidents, or the analysed incident does not match any of the formerly appeared problems or known errors. Additionally, when incidents are defined as very serious and significant, they are sent directly to problem control. The process may also start by discovering a problem in the infrastructure. A) In the first phase of problem control the problem management team identifies and records the problem. The problem record needs to be linked to appropriate incident records. Hence, the problem solution or the work-around can be linked to the similar incidents or problems in the future. The identification process also includes linking configuration items (CIs) with the problem. B) Classify and categorize the problem as follows. In the second phase, the problem is classified by category, impact, urgency and priority. The possible categories may be, for example, network, hardware, operating system, or software. The impact of a problem is its analysed effect on the business. Priority should be based on the urgency and the impact of the problem. C) The third phase of the problem control is the investigation and diagnosis which aims to find the underlying cause of the problem (Zhen 2005). The linked incidents and their work-arounds are analysed here. The investigation may show that the problem is not associated to any configuration item (CI) currently in the CMDB but is procedural, for example, the problem might be insufficient testing. If the cause of the problem relates to some registered CI, the status of the problem will be changed to known error and the error control will handle it. Requirement 6. Establish the error control activity as follows: The purpose of the error control process is to correct known errors making changes to the infrastructure. The process works in cooperation with the change management. A) The first phase of error control begins by detecting a faulty CI or a CI that might cause an incident. This can be done using the known error data. This data is

produced by either the development environment or the live environment. The known errors from the development environment are the errors already known in the development phase of the service or the product. The live environment errors are the errors discovered when the service or the product is already in operation. B) In the second phase of error control the possible means of resolving the error are assessed. If necessary a request for change is generated. Priority is based on the urgency and the impact of the error. C) In the third phase, the request for change is linked to the known error record to maintain the traceability chain stated before. The impact analysis, detailed error assessment, testing and the final resolution of the error escalated to change management. The resolution process for known errors has to be recorded to the system. All the data including CIs, symptoms, and resolution are stored in the known error database. Thus, it can help in the future investigations of incidents and problems. Finally, after the successful resolution process all the relevant known error, incident and problem records are closed. Requirement 7. Define appropriate metrics for monitoring the problem management process. Metrics could be, for example, time-based performance metrics, quality of resolution of incidents/problems (Litten 2004) and the number of incidents and problems classified by impact, status, service, or user group (OGC 2002). Requirement 8. Monitor the problem management process. The problem management should continuously monitor the problem resolution process and the impact of the problems and errors on users or customers. Problem management should be aware of this progress although the change management is responsible for some parts of the resolution. The monitoring should be done against the service level agreement (SLA), the written agreement between the service provider and the customer (Kajko-Mattsson et al. 2004). Usually the SLA defines the maximum number of errors per period or service availability requirements. SLA might also define penalties for breached service levels (OGC(2) 2002). Requirement 9. Generate a request for change (RFC) to the change management team to implement the permanent resolution for the problem. Ensure that teams use standard methods to create RFCs (Dietel 2004). Requirement 10. Continuously improve the problem management process. Improvement actions could include training of the service support staff, the development of problem management tools, the frequent service reviews and inspections (Gilb and Graham 1993, Ebenau and Strauss 1994) with customers and third party service providers, the continuous development and evaluation of the working methods.

Methods
This case study is a part of the work of an ongoing research project SOSE (Service Oriented Software Engineering) at the University of Kuopio, Finland. One objective of the SOSE project is to research methods for improving the quality of software development. This paper focuses on software problem management. The research question in this paper is: what are the requirements for implementing a software problem management model. According to Yin (1989) case studies can be categorized into exploratory, explanatory and descriptive case studies. An exploratory approach was used in this study. At first, we present a list of key issues (requirements) required in problem management. Secondly, we analyze how these key issues of problem management fit to the case organization. The case study included following questions: • Who are the stakeholders involved in the problem management process? • What kind of problem management methods are used by the case organization? • What kind of metrics are used within the process? • What kind of challenges are related to problem management? Informal interviews were the main source of evidence in this study. The qualitative data was collected during the problem management pilot project (February-March 2006). Our case organization is a large IT service company with over 15 000 employees. It supplies information systems to various industries, such as banking and insurance, energy, telecom and media, and healthcare. The business unit, where this study was performed, develops and maintains customer information systems and energy data management systems. The case organization was selected for this study because software problem management plays very important role for it and it is interested in adopting ITIL-based problem management methods. The data was collected using participative observation in support&maintenance team meetings, informal interviews with service desk workers, a service support manager, a problem manager, and a system analyst. Additionally, some challenges regarding problem management were gathered during ITIL training session provided by the first author. Because we had an access to the support tool and problem database we could also identify several tool-related difficulties. Persons who participated in support team meetings and training sessions hold different roles in the organization (product delivery, product support, configuration management). A researcher's role in support team meetings was to participate in discussions and to record the results of discussions. A within-case analysis method was used in this study (Eisenhardt 1989). We consider the requirement checklist as an analysis framework. Our framework is a

literature-based ideal process. Data analysis was focused on analyzing how close the case organization is from the ideal process.

Problem Management Process: Main Findings from a Case Organization
In this section, we explore how the case organization's existing problem management process meets the requirements of the ITIL-based problem management model presented in Section 2. The major stakeholders (see Figure 1) involved in the problem management in the case organization are service desk teams, product support team and product development teams. There are different teams for different products. The case organization uses third party service providers, for example, server and database providers.

Figure 1. Stakeholders involved in the software problem management.

Table 1 describes our analysis regarding the problem management process of the case organization.
Requirements 1. Establish a Service Desk 2. Define the lifecycle Implemented Yes Partially Case Organization Both an internal and an external service desk exist. The terminology is different than in ITIL (for example,

of Incidents

3. Identify two dimensions of Problem Management: a) Reactive and b) Proactive 4. Establish the problem management repository 5. Establish the problem control activity a) Identify and record the problem b) Classify and categorize the problem c) Investigate the problem 6. Establish the error control activity a) Identify and record the error b) Error assessment c) Record error resolution 7. Define appropriate metrics for monitoring the problem management process 8. Monitor the problem management process 9. Generate a request for change for change management 10. Continuously improve the problem management process.

Partially

Yes Yes

term "known error" is not used). Types of incidents are pure incidents that might lead to problems, development ideas, change requests and change orders. The incident category "service request" is not used. PM methods are mainly reactive: reactive problem management means in this case resolving reported incidents. However, some teams use also proactive problem management methods such as FAQ function. A knowledge base is not being used. The organisation uses a support tool to manage incidents, problem, RFCs and known errors. The service desk collects incidents reported by customers. The incident is assigned to the product support team if the service desk cannot resolve it. Therefore, problem control activity is performed by product support teams.

Partially

Product development teams are responsible for error control activities and are responsible for correcting defects and recording information on the type of the fault, the cause of the fault, the time to resolve the fault, and the resolution of the fault Cases per period metrics are used such as the ratio of open cases and closed cases per month. Time based process performance metrics are not used. Currently, there are no service level requirements defined for incident and problem resolution. There is no Change Advisory Board in the current process. Product development teams usually make decisions whether changes are needed. The case organization has recognized the need for continuous process improvement

Yes

Partially No Yes

Table 1.

The problem management process of an energy company

The Challenges Related to ITIL-based Processes
This section presents our findings regarding the challenges of ITIL-based processes. Table 2 describes the ITIL process area, the goal for the process area, and challenges that were identified.

ITIL Process Area Incident Management

Goal To restore normal service operation (energy delivery) as quickly as possible

Problem Management

To detect underlying causes of incidents.

Release/Configuratio n Management

Maintain the Configuration Management Database and the information about configuration items.

Service Level Management

Availability Management

Table 2.

The challenges related to problem management and its neighbour processes

Maintain and negotiate Service Level Agreements and Operational Level Agreements (such as agreements between business units within the case organization) To ensure IT service availability

Challenges in the process -The lack of time-based performance metrics such as incident turnaround times. -There are a lot of duplicate incidents recorded in the database. The service desk staff is not trained to merge or relate incidents. -Customers cannot use products as search criteria for incidents. -The known error concept is not visible in the current problem management process. -There is no knowledge base available for known errors. -There is no problem category for errors in third party products such as bugs in database applications. -Problem records include many data fields that are seldom used. -The connection between testing support is unclear (reported problems and errors should have links to test cases) -Incidents and problems cannot be targeted to hardware configuration items such as servers. -Problem management is not connected to service level management (there are no service level requirements defined for problem resolution). -It is difficult to close several incidents and problems with one product release because many customers use customized product versions. -It is difficult to define an appropriate frequency for delivering bug fixes to different customers. -The organization does not have a Service Level Manager. -A lack of Service Level Agreement templates and SLM metrics: What is the number of breached/challenged SLAs/period? How many % of services are covered by SLAs? How to ensure the availability of online support site?

Analysis of the Findings
The problem manager and the service support manager of the case organization considered as major improvement actions regarding software problem management 1) reducing the increasing number of open incidents and problems by focusing proactive problem management methods and 2) creating service level agreements with IT customers to improve the IT service quality. Firstly, the case organization uses mainly reactive problem management methods to solve reported incidents from customers. In the long run, the organization has to focus on proactive problem management to be able to manage a large number of incidents and problems. A knowledge base might help as a proactive problem management method in this case. Secondly, the case organization needs to implement a service level management process and establish a role of a service level manager. Service level agreements (SLAs) are very useful for monitoring the quality of IT services. SLAs are suitable for both service providers and customers to monitor availability, quality, usability, and performance of the service and to ensure that critical IT services are available. The case organization has allocated a lot of resources to process improvement such as adopting the problem management concepts of ITIL. They already have a well-organized service desk. The service interface between the case organization and its customers is based on the service desk and the online support site. The problem control activity is performed by product support or back office teams. Product development teams are responsible for the error control, change management, and product development. In the future, the case organization is planning to implement a Change Advisory Board that would be responsible for approving all the change requests. As strength, the case organization has clearly defined processes in business framework WayToExcellence such as incident and problem management processes that are based on ITIL principles. The transition from the current process to the ITIL-based problem management process has caused several challenges. Combining ITIL-based problem management concepts to the organization's existing problem management process has been a challenge. A knowledge base function is under construction, and the support tool needs configuration work before it can be used to measure time-based performance data such as problem resolution times. New datafields need to be added to problem records such as a problem category that helps service desk and customers to find cases more rapidly. ITIL-based processes seem to be designed for large organizations. In practice, one person must hold several ITIL responsibility areas. In our case, the same person held roles of a problem manager and a change manager. The release and configuration management roles were also targeted to the same person. According to our observations, the customers of the case organization are also very interested in ITIL-based process improvement.

Discussion and Conclusions
This study aimed to explore the requirements for the ITIL-based problem management model. First, we presented a problem management checklist with ten process-related requirements. Second, we described the current problem management process of the case organization (an IT service provider). Finally, we analyzed how the case organization's existing problem management process meets the requirements of the ITIL-based problem management model. The main contribution of this study lies in helping IT organizations to identify the key issues required for the ITIL-based problem management model. These requirements are needed in implementing the transition from the current problem management model to the ITIL-based model. However, the requirement checklist we presented is not exhaustive. More research efforts are needed to explore proactive problem management methods. As with all case studies, there are threats to the validity of this study. First, construct validity is problematic in case study research. Data for the case study should be collected from several sources. In order to get a richer view of the problem management, we need to interview more members of the service desk and product support teams. Second, there is the threat to external validity, the generalizability of the results. The results presented in this paper are valid only in our case organization. In future studies we intend to improve our research framework by exploring the introduction of a knowledge base as a part of the problem management framework. The main contribution of this study is that it increases understanding of importance of building a software problem management model and gives a general overview about the current methods used within problem management. A systematic problem management model adds value for both software companies and their customers. Reactive and proactive problem management methods are used to minimize the impact of a problem on the business and prevent problems before they occur. Hence, problem management can be used as a way to improve the customer satisfaction.

Acknowledgments
This paper is based on research in the SOSE project (2004-2006), funded by TEKES (the National Technology Agency), European Regional Development Fund (ERDF), ICT and customer companies in electricity domain. We wish to thank professor Anne Eerola for her

comments, research assistant Niko Pylkkänen for his help in data collection and people in TietoEnator for participating in interviews.

References
Benbasat, I., Goldstein, D. K. and M. Mead (1987). The Case Research Strategy in Studies of Information Systems. MIS Quarterly (11:3), pp. 369-386. Binder, D. (2000). Testing Object Oriented Systems. Addison Wesley. Boardman, B. (2005). IT Best Practices. Network Computing, vol. 16, pp. 79. Card, D. N. (1998). Learning from Our Mistakes with Defect Causal Analysis. IEEE Software, January-February Davis, K. (2002). Charting a knowledge base solution: empowering student-employees and delivering expert answers. In Proceedings of the 30th Annual ACM SIGUCCS Conference on User Services (Providence, Rhode Island, USA, November 20 - 23, 2002). SIGUCCS '02. ACM Press, New York, NY, 236-239. Dietel, K. (2004). Mastering IT change management step two: moving from ignorant anarchy to informed anarchy. In Proceedings of the 32nd Annual ACM SIGUCCS Conference on User Services (Baltimore, MD, USA, October 10 - 13, 2004). SIGUCCS '04. ACM Press, New York, NY, 188-190. Gilb, T. and D. Graham (1993). Software Inspection. Addison-Wesley. Ebenau, R.G. and S.H. Strauss (1994). Software Inspection Process. New York, NY: McGrawHill. Eisenhardt, K. (1989). Building Theories from Case Study Research. Academy of Management Review, Vol. 14:4, pp. 522-5506. Florac, W. A. Software Quality Measurement (1992). A Framework for Counting Problems and Defects. Technical Report, CMU/SEI-92-TR-022, The Software Engineering Institute, Carnegie Mellon University. Frederick, M. and V. Basili (1998). Using Defect Tracking and Analysis to Improve Software Quality, US Air Force Research Laboratory, DACS State-of-the-Art Report SP0700-98-D4000. Hirmanpour, I. and J. Schofield (2003). Defect Management through the Personal Software process. Article in Crosstalk, The Journal of Defense Software Engineering. Hochstein, A. Tamm, G. and W. Brenner (2005). Service-Oriented IT Management: Benefit, Cost and Success Factors. In Proceedings of the Thirteenth European Conference on Information Systems (Bartmann D, Rajola F, Kallinikos J, Avison D, Winter R, Ein-Dor P, Becker J, Bodendorf F, Weinhardt C eds.), Regensburg, Germany. IEEE (1989). IEEE Standard Dictionary of Measures to Produce Reliable Software, ANSI/IEEE Standard 982.1-1988, p. 13 IEEE (1994). IEEE Standard Classification for Software Anomalies, IEEE Standard 1044-1993, p. 3. ISO/IEC (1995). ISO/IEC 12207, Information Technology: Software Life-Cycle Processes. ISO/IEC Copyright Office. ITILPeople.com (2005). What is ITIL? Retrieved November 11, 2005, from http://www.itilpeople.com/What%20is%20ITIL.htm.

Jacobson, I., Booch, G. and J. Rumbaugh (1999). The Unified Software Development Process. Addison-Wesley. Jalote, P. (2000). CMM in Practise, Processes for Executing Software Projects at Infosys. Addison Wesley. Jäntti, M. and Toroi, T. (2004). UML-based Testing: A Case Study. Proceedings of 2nd Nordic Workshop on the Unified Modeling Language (Turku, Finland, August 19-20, 2004). Kajko-Mattsson, M. (1998). A conceptual model of software maintenance. In Proceedings of the 20th international Conference on Software Engineering (Kyoto, Japan, April 19 - 25, 1998). International Conference on Software Engineering. IEEE Computer Society, Washington, DC, 422-425. Kajko-Mattsson, M. (2003). Infrastructures of Virtual IT Enterprises. In Proceedings of the 19th IEEE International Conference on Software Maintenance (Amsterdam, The Netherlands, September 22-26, 2003). International Conference on Software Maintenance. IEEE Computer Society, Washington, DC, 199-208. Kajko-Mattsson, M., Ahnlund, C., and Lundberg, E. (2004). CM3: Service Level Agreement. In Proceedings of the 20th IEEE international Conference on Software Maintenance (September 11 - 14, 2004). ICSM. IEEE Computer Society, Washington, DC, 432-436. Kruchten, P. (2001). The Rational Unified process, an introduction. Addison-Wesley. Leszak, M., Perry, D. E., Stoll, D. (2000). A case study in root cause defect analysis. Proceedings of the 22nd international conference on Software engineering, June. Litten, K. (2004). IT Service Management: Selecting the Right Metrics for Performance Measurement, INS Whitepaper. Retrieved November 10, 2005, from http://www.ins.com/knowledge/whitepapers.asp. Materna Finland Oy (2005). ITSMF Research. Retrieved November 7, 2005, from http://www.materna.de/FI/Home/. Mays, R.G. (1990). Experiences with Defect Prevention. IBM Systems Journal, Vol 29 No. 1. Office of Goverment Commerce (1) (2002). ITIL Service Support. The Stationary Office, UK, Ref. use in text, OGC(1). Office of Goverment Commerce (2) (2002). ITIL Service Delivery. The Stationary Office, UK, Ref. use in text, OGC(2). Pink Elephant (2004). ITIL Process Maturity, Pink Elephant Whitepaper. Retrieved November 8, 2005, http://www.pinkelephant.com/en-US/ResourceCenter/PinkPapers/PinkPapersList.htm Quality Assurance Institute (1995). Establishing A Software Defect Management Process. Research Report number 8. Sallé, M. (2004). IT Service Management and IT Governance: Review, Comparative Analysis and their Impact on Utility Computing. HP Technical Report, June 2. Yin, R. K. (2002). Case Study Research, Design and Methods, 3rd ed. Newbury Park, Sage Publications. Zhen, J. (2005). IT Needs Help Finding Root Causes. Computerworld, vol. 39, pp. 26.

Master your semester with Scribd & The New York Times

Special offer for students: Only $4.99/month.

Master your semester with Scribd & The New York Times

Cancel anytime.