You are on page 1of 8



Software Improvement through Benchmarking: Case Study Results
Dr. Hans Sassenburg, Dr. Lucian Voinea, Peter Wijnhoven

Abstract Since the early nineties in the previous century, many organizations have substantially invested into software process improvement. Starting in the military industry, the concept of process improvement has nowadays been widely adopted in many other industry segments. It is one the few initiatives that have sustained over time, this in contrast to many hypes. Available models and standards help to define improved processes not only on paper, but also to institutionalize them in the daily way of working. However, a justified and often raised question is what the payoff is. Does the quality of products increase? Has efficiency improved? Are products being brought to the market faster? And an overall question: compared to what? Benchmarking is a technique that makes use of external comparisons to better evaluate real capability and identify possible actions for the future. As such, it is an important instrument to drive improvement efforts. Using a best practice set of Key Performance Indicators to benchmark capability in several industrial case studies, no strong correlation could be found between real capability and maturity levels. Satisfying models or standards is no guarantee for real performance improvements. It is recommended to focus on a multi-dimensional assessment of the capability of an organization and derive improvements from benchmarked results.

Keywords Benchmarking, software process improvement, Key Performance Indicator, metrics, capability, performance.

1 Introduction
We manage things "by the numbers" in many aspects of our lives. These numbers give us insight and help steer our actions. Software metrics extend the concept of "managing by the numbers" into the realm of software development. The software industry still isn't doing a very good job at managing by the numbers. Intuition prevails, where numbers should be used. Most projects are still managed by three indicators only: scheduled dead-line, overall budget and removal of critical defects towards the end of the project. This is a narrow view on a multi-dimensional problem. Compare it with a contesting in a Formula 1 race looking only at the fuel- and speedometer. Neglecting oil pressure, tyre pressure, fuel stop planning, weather conditions, and many other variables, will definitely cause you to lose the race. Successful (software) organizations have found six measurement related objectives extremely valuable [Sassenburg 2006]: • • • Knowing the capability of one's organization through the analysis of historical project data. In addition, one’s own capability may be benchmarked against industry averages. Making credible commitments in terms of what will be delivered when against what cost. This involves project estimation based on known capability and analyzed requirements. Investigating ways to optimize project objectives (on dimensions like schedule or cost). This involves developing and comparing different project alternatives.

it is unlikely to be sustained and people might get lost in incomplete details. Based on research and industrial experience. But all too often. and the deferral of product release to enhance functionality or improve quality. All rights reserved. Efforts undertaken at improving development capability should have demonstrable effects on each of these KPIs. source code level) Test coverage (unit. Table 1: Typical management questions to answer. Even when it does. They should support answering the following questions as listed in Table 1. Category Project performance Typical Key Performance Indicators Schedule Effort Staffing rate (manpower build-up profile [Putnam 1992. identifying the relevant data and converting it to meaningful information for everyone is the challenge. Management needs the ability to step back from the details and see the bigger picture. These KPIs represent current best practice in industry [Sassenburg 2009]. Getting too little or too much data is easy. hereby enabling informed decision-making. but more than generating simple PERT and Gantt charts.Software Improvement through Benchmarking: Case Study Results • • Managing development once it starts. Deciding when a product can be released. Dashboards with the right information perform that function. integration. Analyzing the impact of new initiatives by assessing how capability is affected in which areas. • Being able to meet these objectives requires an implemented measurement process. The goal of these KPIs is to foster greater visibility and faster reaction to opportunities and threats. Page 2 . as indicated in the example in the most right column of Table 2. Software Benchmarking Organization. Table 2: Overview best practice KPI set [Sassenburg 2009]. This prevents organizations from chasing hypes. This involves project management. Category Project performance Process efficiency Product scope Product quality Typical questions How predictable is the performance of projects? How efficient is the development process? How large and stable is the scope of the planned effort? What is the quality of the resulting outcome/product? 2 Best Practice KPIs The critical success factor here is defining the appropriate Key Performance Indicators (KPIs) in each category. this enthusiasm does not translate into action. This is a trade-off between an early release to capture the benefits of an earlier market introduction. a coherent set of KPIs has been selected that answers the questions of Table 1. system testing) Defect density (released defects per KLOC or other unit) Cumulative defect removal efficiency Effect + + + + + + + + Process efficiency Product scope Product quality © 2009 . it is often easy to get people enthused about metrics.2010. Within a project or organization. 1997]) Productivity (LOC/hour or other ratio) Core activities (% of total effort) Support activities (% of total effort) Prevention activities (% of total effort) Appraisal/rework activities (% of total effort) Number of features Percentage of deferred features Size (in KLOC or other unit) Re-use level (percentage of size) Complexity (architectural level. that converts measured process and product attributes to meaningful management information.

Quality. This implied the assessment of values for each indicator.2010. software development organizations must focus on understanding how they perform. with the goal of locating and improving one’s own performance [Camp 1989]. Wagner 2007] and balancing between cost versus productivity and quality. Many (process) improvement initiatives resulted in satisfying standards/models instead of tangibly improving measured]. we used our own data from previous studies to benchmark against. This enables making solid business decisions about software development practices and their results in terms of productivity and quality.isbsg. Software Benchmarking Organization. • In case a software organization does not have sufficient. This leads to problems with respect to consistency. Two important conclusions were drawn regarding the availability and quality of the data found [Sassenburg 2009]: • Availability. Higher maturity organizations often have the opinion that they have access to rich sources of information. Although the systems are not consumer products. it was revealed that process improvement is institutionalized since many years and that CMMI Maturity Level 3 compliance is within reach. A common issue in embedded systems is the way feature size is calculated. The challenge is to identify these data sources and analyze them in order to obtain useful information. project time-to-market. As such. In most cases. In early discussions with both organisations. In most cases. 4 Case Study Results The best practice KPI set was used in several benchmarking studies. Although not centrally stored. Key measures of performance include productivity rate. completeness and reliability [Maxwell 2001]. no strong benchmarking data has been published for feature deferral ratios. Both organizations develop embedded systems for the business-to-business market. Benchmarking is comparing one’s own performance and operating methods with industry averages or best in class examples. In addition. The backfiring technique was used to convert lines of code to function points [Jones 1995]. • In the following paragraphs we highlight remarkable results from the studies that lead to further © 2009 . It allows using economics as the basis of quality analysis [Boehm 2000. the other system is complicated due to regulations in the area of security of information which may not be manipulated in any way. One system is considered safety critical. test coverage during different test types and complexity. In both cases the programming language used was C. the presented best practice KPI set was used to measure the performance capability of organizations. Presented here are the case study results of two different organizations as an example. and project deliverable quality. many sources of data can normally be identified. the resulting number of function points was close to 1‘000. some common issues had to be dealt with: • So far. and where to improve. volumes are fairly high. the conversion of data to useful management information is a weakness for many organizations. Instead. In both cases. Although function points are preferred as size measure. how. now data on thousands of projects is available to the software industry. All rights reserved. varying from hundreds to many thousands. The results are representative for many other studies. this is not true. but it does not provide context by itself – it is not sufficient for a complete understanding of status. In a series of assignments conducted by the authors. the only data available was lines of code.Software Improvement through Benchmarking: Case Study Results 3 Software Benchmarking To manage process efficiently. the availability of clearly defined measurement constructs and validation of measurement data before consolidation are exceptions. reliable benchmark data available. Many low maturity organizations have the opinion that they lack quantitative data. it is an important instrument to prioritize and drive improvement efforts. Through the work of Capers Jones [2008] and others. Assessing the results of software process is a starting point. re-use levels. Despite many measurements. Page 3 . the contrary is true. For many years the lack of readily available benchmark data prevented software managers from analyzing the real economics of software. they can make use of the published data of Capers Jones [2008] and ISBSG [www.

still much higher ratios for prevention would be expected. based on work of Juran [1988] and Crosby [1979]. If any of the projected ratios deviates substantially from values realized in the past. However. these improvement opportunities would most likely have been unnoticed and left unaddressed. In these cases. it will be obvious that improving efficiency will normally mean reducing the overall costs by reducing appraisal and rework costs. Prevention. In Figure 2. there should be assignable causes for this. administrative support. configuration management. Figure 1: Productivity benchmarking. which is very high. Also here. Appraisal. It is obvious that both cases show a much lower productivity level than industry average. coding. the lower productivity was believed to be a consequence of safety requirements for case study A and security requirements for case study B.Software Improvement through Benchmarking: Case Study Results analysis and improvement efforts. This became one of the focus points for improvement activities. rework. all scheduled activities can be mapped to the four categories and the ratios can be calculated. © 2009 . a ratio for Appraisal/Rework of approximately 50% was found. architecture. In a competitive market this is important to notice. These are costs incurred to prevent (keep failure and appraisal cost to a minimum) poor quality: quality planning. Regarding process efficiency. inspections. the case studies results are compared to industry averages. In both cases.2010. Support. All rights reserved. process improvement teams. Note that this enables management to validate the feasibility of a project plan if ratios from the past are known. Software Benchmarking Organization. Further remarkable results were found with respect to process efficiency. This can be achieved by increasing prevention costs. Distinction is made between four categories [Sassenburg 2010]: • • • • Core. These are costs incurred to determine the degree of conformance to quality requirements: mainly testing and defect detection/removal. analyse and improve. analysis led to the conclusion that this is a consequence of safety requirements for case study A and security requirements for case study B. Page 4 . This Cost-of-Quality approach is relatively easy to implement and to use. Costs in this category are essential and bring direct value to a customer by changing the product in some way: requirements. 1 1 Benchmarking ratios were obtained by mapping published project data [Jones 2008] to the four categories. a Cost-of-Quality approach is used. From a project plan. Using these definitions. Figure 1 shows how both case studies compare to benchmarking data regarding productivity in function points per staff month [Jones 2008]. Costs in this category are essential but do not bring direct value to a customer: project management. reviews. Without benchmarking against industry data. not only compared with industry averages but as an absolute figure as well.

they acknowledged that postrelease maintenance and support costs were extremely high and should be reduced. In-depth analysis revealed that the primary causes for low removal efficiency were highly complex architectures and code implementations. In Figure 3.Software Improvement through Benchmarking: Case Study Results Figure 2: Process efficiency benchmarking. As a result. many additional tests took place and delivery was to a limited number of users only. the case study results are compared to benchmarking data regarding defect density in defects per 1‘000 lines of code [Jones 2008]. On the other hand. While for software with safety requirements and security requirements one might expect having better figures than industry average. it was very clear to all stakeholders that there are two main weak areas: © 2009 . All rights reserved. the defect density that finally reached the end-user was less high. Page 5 .2010. Figure 4: Removal efficiency benchmarking. In other words. The answer of management in both cases was that after releasing the software. test coverage was very low. In both cases. Figure 4 shows that the high defect density finds its origin in the low defect removal efficiency compared to industry average [Jones 2008]: too many defects remain uncovered during development and are detected post-release time. Figure 3: Defect density benchmarking. the contrary is the case here. Software Benchmarking Organization.

modifiability and verifiability. The recommendation is to focus on a multi-dimensional assessment of the capability of an organization and derive improvements from benchmarked results. Process improvement and standardization should not be a goal in itself. high cyclomatic complexity [McCabe 1976] values were found. the first improvement actions are identified. Real capability improvement is achieved by taking a quantitative view on processes and products. no strong correlation could be found between capability. This brings us to the conclusion of this paper. problems arise regarding understandability. high fan-out values were indicators for low cohesion and tight coupling resulting in a high level of change propagation. A first step will be baselining the current capability using the best practice KPI set. with a high ratio for Appraisal/rework. responsibilities. standardization is no guarantee for real capability improvement. activities and deliverables. If post-release efforts for fixing defects would be included. 5 Conclusions Do higher maturity levels automatically lead to increased performance? In the studies performed. As a result. The interesting fact is that improvements can only be achieved by changing the way of working. This provides the basis for deriving improvements that make sense. • These two areas were considered the primary causes for low overall capability and were used as the basis to define improvements.Software Improvement through Benchmarking: Case Study Results • The effort distribution revealed a very insufficient process. In case measurements are not in place. the chances of sub optimization are reduced or even eliminated. The gap between target and actual values is the basis for deriving improvement steps. On the other hand. And of course. Software Benchmarking Organization. the ratio would even become substantially higher. However.2010. At code level. and setting realistic and quantified improvement targets. process improvement makes sense. realistic target values must be determined for a given period of time. At architectural level. As a second step. Using the presented KPI set in a benchmarking study reveals real capability by identifying strengths and weaknesses. The architecture and code quality in both cases were low. whereas implementing and sustaining such improvements are structured by the use of process maturity models. This creates transparency with respect to roles. That is why aiming at for instance higher CMMI levels only is considered a wrong approach. By focusing on the primary causes for low capability. © 2009 . All rights reserved. and maturity levels. expressed using the sixteen indicators. as it standardizes development processes. Page 6 . The availability of quantitative and benchmarked data helped both organizations to derive a solid business case for improvements. a validated improved way of working should be shared: considering standardization across projects becomes a logical process.

IEEE Computer Society. K. Wijnhoven. Wisconsin: Quality press for the American society for quality control.J.. P.. November 1995. Vol. P. Sullivan. Benchmarking: the search for industry best practices that lead to superior performance. ICSE. "Design of a Methodology to Support Software Release Decisions". 308-320.J. “Quality is Free”. ”Juran’s Quality Control Handbook“. Sep/Oct 2001. 2009. Automatisering Gids (in th Dutch).H.M. “Applied Software Measurement”. 4th ed. Gryna.M.... 1979. University of Groningen.. “Backfiring: converting lines of code to function points”. New York: McGrawHill Book Company.. New York: McGraw-Hill.. “A Complexity Measure”. “Standardization does not necessarily imply Performance th Improvement”.C. B. Putnam. L. Milwaukee. 2008. “Using Economics as Basis for Modelling and Evaluating Software Quality”.J. Proceedings of the Conference on The Future of Software Engineering. Jones. W.. H. C. J. C. 1997. W. R. “Measures for Excellence: Reliable Software On Time Within Budget”. McCabe. Sassenburg. IEEE Computer. S. ICSE.. 1992..W. Sassenburg. Jones. 2.J.. 1989 Crosby. Camp. All rights reserved. Page 7 . McGraw-Hill. H.. Voinea.. Proceedings of the First International Workshop on The Economics of Software and Computation. 1988. Doctoral thesis. L. Putnam.. Sassenburg. L. H. 2006. Sept. “From Testing to Designing”.. 2010.. Myers. “Software Economics: A Roadmap”. T. F. March 20 .. IEEE Transactions on Software Engineering. Software Benchmarking Organization. Maxwell. [Camp 1989] [Crosby 1979] [Jones 1995] [Jones 2008] [Juran 1988] [Maxwell 2001] [McCabe 1976] [Putnam 1992] [Putnam 1997] [Sassenburg 2006] [Sassenburg 2009] [Sassenburg 2010] [Wagner 2007] © 2009 . “Industrial Strength Software: Effective Management Using Measurement”.. 1976. 4 . 2007. Automatisering Gids (in Dutch).B. IEEE Software. Juran. Wagner. Myers. 2010. D.Software Improvement through Benchmarking: Case Study Results 6 Literature [Boehm 2000] Boehm. K. Yourdon Press Computing Series. “Collecting Data for Comparability: Benchmarking Software Development Productivity“.2010.. pp..H.

and integration and verification since 2000.sei. and is managing a group of consultants in the field of development process improvement. he worked as a freelance software engineer for companies in Europe and North America. where he founded a new consulting and training firm SECURE AG ( a consortium of international accredited partners. This company specialized in software process improvement and software architecture and was sold in 2000. During this period he fulfilled a wide variety of In 2007 he co-founded SolidSource (www.2010. He has over 25 years of experience in embedded systems Improvement through Benchmarking: Case Study Results 7 Author CVs Dr.sw-benchmarking. Hans Sassenburg received a Master of Science degree in electrical engineering from the Eindhoven University of Technology (Netherlands) in 1986 and a PhD degree in economics from the University of Groningen (Netherlands) in 2006. he co-founded the Software Benchmarking Organization (www.cmu. Mr. In 2001 he moved to Switzerland. During these years he has been actively involved in process improvement projects in multi-disciplinary R&D organizations in Europe. Page 8 . In 2009. ranging from programmer to architect. and Asia. Dr. he has been a visiting scientist at the Software Engineering Institute (www. All rights reserved. Dr.sw-benchmarking. Software Benchmarking Organization. Peter Wijnhoven Peter Wijnhoven holds a BSc degree in mechanical engineering and a BSc degree in computer science. Hans Sassenburg Dr. Lucian Voinea received a Professional Doctorate in Engineering degree (PDEng) from the Eindhoven University of Technology (Netherlands) in 2003 and a PhD degree in computer science from the same university in 2007. Wijnhoven is active as since January of 2005. Lucian Voinea Dr. and from consultant to manager consulting group of Sioux Embedded Systems (www. © 2009 .eu).and business consultant since a company that provides solutions to support software development and maintenance. when he co-founded a consulting and training firm. Voinea is an internationally published author on software engineering and visualization topics.sioux. North America. In addition. Sassenburg is an internationally published author on software engineering and economics. He worked as an independent consultant until 1996. architecture. In 2009 he co-founded the Software Benchmarking Organization (www.solidsourceit. Starting from 1999. from team to project leader. a consortium whose aim is to create a framework for benchmarking capability in the software development industry. Dr.