Robin Holt and Andrew Graves
Robin Holt, Management Research Officer, took his PhD in Government at LSE, and is now investigating links between social theory, management practice and construction. Andrew Graves, MIT Professor of Automotive Management and Director of Cross-Industry Research at the University of Bath, is responsible for research on developing ``lean production'' in the car, aerospace and construction industries.

Abstract Benchmarking is introduced as a practice of non-financial assessment that promotes continual performance improvement. Its relevance to and possible consequences for the public sector are discussed in relation to a case study in construction procurement. A pilot study investigating the achievements of government clients in construction procurement has identified a need for better client ``ownership'' of project risk and opportunity. The article argues that benchmarking can provide the vehicle for this ``ownership''. In conjunction with the clients and HM Treasury, a second stage of project assessment has just been completed and the methods and results described. The aim is to realize consistent, relevant and feasible metrics, co-operatively authored by client practitioners and academics with reference to private construction organizations, that will then be used for purposes of on-going self-assessment at both project and strategic levels. Keywords Benchmarking, Public sector, Performance, Risk management, United Kingdom

description of a pilot study, designed to begin a benchmarking process amongst key central government clients, we present a refined set of metrics that support a second phase of research and then discuss the findings. In conclusion we argue that unless good performance indicators are used in conjunction with one another, and are ``owned'' by practitioners, any performance improvement is at best sporadic and often temporary.
Benchmarking as a process


The UK Government is increasingly interested in policy reform that seeks to tie the realization of the public interest to specific performance improvements in service provision (Grout, 1997). The definition and attainment of these service levels is, we argue, determinable through a benchmarked procedure of iterative improvement where links between policy and results can be made explicit (Coe, 1999). This article describes how the measurement and consequent development of such service provision has been undertaken with specific regard to the procurement of construction projects. We address criteria informing public sector perspectives that bear on the development and use of metrics and that potentially threaten the widespread and prolonged adoption and adaptation of a benchmarking process. Informed by these criteria, we discuss the development of our benchmarking method and its intent. From the

Benchmarking operates at two levels: a method for measurement and a focus of activity. As a method for measurement, benchmarking assesses relative performance (in terms of productivity, or profit, or speed) to establish metrics that reveal good and bad performance. The aim is to convey the existing delivery of value as against potential delivery. Metrics only make sense through their capture of narrowly defined aspects of value. For example, a metric of safety is the number of reportable accidents; but the validity of this metric stems not from itself, but the quality of life it helps promote for operatives; the benefits realized from an absence of disruption and so on. Metrics must also be consistent with one another ± their aggregated portrayal of cost, function and quality must be comprehensive so that, for example, cost metrics are understood in terms of their impact upon function and quality. Metrics can frustrate the process of benchmarking if their definition fails to promote genuine comparison. In the context of construction, for example, different procurement options incur the transfer of different risks. More traditional contracts require the client to retain design responsibility, whereas recent ``one stop shop'' relations
The current issue and full text archive of this journal is available at

Measuring Business Excellence 5,4 2001, pp. 13-21, # MCB University Press, 1368-3047

1 3

maturity or delivering ``value'' enhances the scope for using ``worldview'' are often able to appreciate the good ``evidence'' as a decision tool in public policy because practices of others without having the time. and the construction suppliers (Kouzmin et al. Benchmarking is seen as setting horizons through an The public sector and construction context active and on-going awareness of where others have set. it looks at measuring the impact of The growth of procedural constraints experienced by initiatives that seek to attain. 1998a). of which in the UK domestic market of not only recognise the benefits of using metrics but also £55bn (FT. Benchmarking is initiative and incur considerable not a list or even a procedure. research adopted the use of what was learnt through the ``good'' practice and not `best' a general social vision. (for example. Good practice is more dynamic. 1998). 1999). Best practice is static adaptation. In public measured is appropriate and meaningful to practitioners. 1998) serve to dilute the construction services) to cover a general social vision argument for the radical level of change necessary to that looks to use metrics as milestones for general quality realise the potential of benchmarking. As such. and Thompson. 1999). environmental and economic pressure the prevailing and emerging influences upon public groups. so as to invest in them a up and adapted. 1998) is avoided.. This al. but in terms of out-turn cost linked to function considered in conjunction with the current status of and quality.transfer this risk to the constructor. future generations. the UK Government can assess and the procurement process and be related to a particular refine what it means by the delivery of ``value for procurement option. achieve value for money in the delivery of services such Having identified good practice. 1998. Any regulation and surveillance. taxpayers. Bespoke services can extend from the analysis of a specific process management systems of reactive design (Kouzmin et al.4 2001 1 4 . transport network) (Ammons. the benchmarking process to ask specific process to . the recipient must as construction. 1999) the Government procures 40 percent be in a financial and cultural position to use them. planning authorities. 1998). monitoring and requires practitioners to delivery of construction compliance (Hood et al. sustain and exceed both the public managers. shop'' or ``turnkey'' arrangements and do not necessarily first. it direct and indirect cost in terms ``Benchmarking the is a responsive activity that of initiation. financial growing use of performance measurement itself. the scope of the metrics needs to be alive to local residents. . resources or metrics can be used to discover ``what works'' (Davies et context to apply them (O'Dell and Grayson. not just in terms source of inertia may prove particularly powerful when of price. variations are more prevalent than under ``one stop Attending to the exact scope of any metrics requires. not least of which is the national and international political interests. In this way the idea that benchmarking progressive ``ownership'' of emerging metrics (Hinks and is primarily a sub-optimal strategy of ``catch-up'' (Cox McNay. managers within construction. that the ``worldview'' of those undertaking impinge on performance. Benchmarking the delivery of construction performance reporting in the public sector. To from the analysis of a organisation is required by the overcome this problem. through variations in cost must be identified at agreed points in benchmarking. 1998). inviting upon metrics must be accompanied by a dynamic consideration by offering itself as something to be taken interaction with practitioners.'' metrics. influences can be summarized as follows. (Graves et al. of life issues (for example. This recent emphasis upon Public clients of different asset strength. construction projects these include: users. and what could work better.. in order that what is allow for the inclusion of stakeholders... As a consequence. the creation of an effective Measuring Business Excellence 5. The concern of current UK Government policy is to reached and re-configured theirs. designed as controls that make performance and increased managerial discretion representational capability of more auditable. understand the circumstances of Benchmarking risks being seen services can extend achievement as much as the level as yet one more mode of of achievements. rather than just their and suggests performance is adoption (Ammons. can stymie these metrics. In this case. With regard to benchmarking being a focus for Departing from the culture of regulation business activity. . Metrics are also required that benchmarking is acknowledged. measured as the ability to Bestowing such a strategic role imitate. These interests. 1999). and so address their practice.. Under ``divided'' procurement money''. 1999). the use and relative success of using 1998) and partial coverage of collected information different types of contract forms in procurement of (Hyndman and Anderson. Davis.

1996). Key design and implementation of metrics and promote their characteristics of the two mutually exclusive data appropriate interpretation (Holt and Rowe. The group identified hypotheses that represented procurement risks and proposed metrics to inform those hypotheses. (Nonaka et al. capture. budget. avoid any resulting undue ``cleaned'' using quantifiable data. One example might be the have progressed using the following generally tendency to look at capabilities. the means for ^ The project selection criteria addressed: stage of satisfying travel need. benchmarking has benchmarking is reliant upon ^ Data anomalies evident measurement. type (road. samples are shown in Table I. The structural capture. applicable. easier than utility preference (in this case. Leicester. The multiple reality facing the new construction procurement and the experience of public manager (Dunleavy and Hood. to achieve a workable balance The reliance of public sector bureaucracies on explicit. codified information is also a source of frustration for approachable and practical. material use and metrics to test the hypotheses. development: the pilot and Stage two studies. long-term strategy be malleable and Government Construction Client Panel (GCCP). The desire to avoid Informed by the foregoing discussion. metric of confidence that the aim for representative needs to be understood to pre-empt conflict in the selection was at least moderately successful. social. 1999). the policy of follow-up interviews. not. These 1998. The measurement of ^ Client anonymity has been preserved to promote capabilities (such as road space provision) is typically far confidence through a non-threatening environment. 1999). our research uncomfortable investigations and outcomes by limiting embraces an iterative strategy to implement performance the scope of investigation and reported evidence would improvement using benchmarking within central promote benchmarking on the basis of what was feasible government. because value thresholds. form a dominating strategy. Metrics must also embrace would-be benchmarking practitioners. rail. where benchmarking in the private sector. As a process to catalyse change.and manufacturing processes (Mohamed. The pilot study focused on project output skill availability) and scope (planning. Any metrics used the structural variations of their context. rather than to the tied intimately with the mapping and management of tacit knowledge of relationships and personal experience risk. . may include roads amongst a contemporaneity. This means the will tend to orient themselves to what is most readily performance measurement capacity of benchmarking is captured in terms of existing data. under-determined. benchmarking has to be carefully managed to avoid any Research methodology resulting undue restriction of scope.. 1998).4 2001 1 5 . and metrics in the samples for each of to be carefully managed to themselves tend to the two studies were concentrate upon explicit. It was assumed that re-use. restriction of scope. and the themes addressed by metrics were variations between projects in terms of potential limited principally to information of established interest influence (political. Metrics and their application should comprising representatives from 13 government clients. flexibility) impact heavily upon the would stimulate dialogue and an exchange of value of information comparison both between perspectives as to why particular projects might have construction projects and between construction projects Measuring Business Excellence 5. and host of alternatives). The approach was pragmatic: the number The dimension of complexity and scope of metrics were geared towards assured data Construction is a complex activity. that completeness. water. We A dynamic culture have already argued that. to the client. life-span. as opposed to satisfying characterised methodology: utility preferences (Sen. urgency. The process has undergone two cycles of rather than on rational decision (Dorsch and Yasin. 1994). . 2000). project selections without regard to the perceived outcome or history of projects. Political cycles form a persistent reality and any benchmarking strategy must inform itself of prevailing political norms and emerging shifts in power. ``. economic. therefore. This Pilot study responsive nature of a civil service requires any The pilot study began with a focus group of the medium.'' using metrics itself might privilege a capability above a Clients were asked to make utility view. analysis and presentation of the information building). The widespread variation Political influence in performance (discussed later) was in line with UK Political realities face those implementing Audit Commission's findings of local authority benchmarking. of support metrics must be relevant. This provided a efficiency and the public interest need to be reconciled.

Stage two constituted both reinforcement of the successes of the pilot and an opportunity to establish a maturing model. both within and between clients. each with associated metrics. this suggests an underestimation of risk. We felt that the delicately balanced task of building a data model capable of population without excessive investigation yet sufficient for robust and challenging interpretations should not be under-stated. and the success of all project participants/stakeholders in the Stage two study. the pilot study provided a steady reminder that value assessment was so judgmental that even poor cost and programme performance did not meet with a rating of unsatisfactory performance. The results describing this ``gap'' are shown in Figure 2. A culture of limited expectation Notions of possible ``role fuzziness'' and confusion surrounding the performance of product and service were endorsed by the juxtaposition of generally high levels of client and end-user satisfaction with poor cost and time predictability. Informed by revealed trends and feedback from the process of data capture itself. judging their ``success'' in terms of their having fulfilled their required functional role. It is divided into four areas. Figure 3 shows how in the majority of projects investigated. but retained by the client.4 2001 1 6 . Given the UK Government's insistence upon delivering performance improvements. a mechanism that is intended to determine much of the client's risk in terms of a premium built into the tender price paid to suppliers (rather than explicit determination at the end of Measuring Business Excellence 5. Pilot study findings and discussion The evaluated metrics demonstrated considerable variation in performance and suggested a lack of consistency in approach. A joint responsibility was assumed for its promotion within each client between the research team and client representatives. This gap could be the result of the public managers looking foremost to rules of accountability. The two main findings were as follows. a process of client-informed metric refinement could be initiated. reflecting the extended concern of the Stage two study over the pilot study. b Pilot 60 £500 million 6a (8b) 100% ± 62% 38% June 1995 All clients submitting projects Stage two 65 £500 million 7a (10b) 52% 48% 48% 52% January 1998 excelled or gone awry. given the criteria detailed in the earlier discussion. Feedback from the pilot study was taken from the sampled clients as to the validity and reliability of the metrics and the appropriateness of the benchmarking study. The pilot study revealed an extensive reality gap between performance and performance assessment by practitioners. in conjunction with further focus group sessions of the GCCP and expert consultations with construction research groups.Table I Ð Comparison of pilot and stage two data samples Data sample characteristic Number of projects Total value of projects Number of client departments/agencies Project type Building Infrastructure Project status New build Refurbishment Average service commissioning date for projects Notes: a Clients submitting four or more projects. The risk identification and control model that emerged is shown in Figure 1. When compared to the cost and programme overrun figures in Figure 2. A key role of the pilot study was to demonstrate the post-project availability of data and the feasibility of capture through a generic format. This feedback was used. The metric count illustrates how the focus on outputs as seen by project funders and end users in the pilot study was supplanted by a balanced measurement of risk. Underestimating risks The pilot study found that clients accepted two thirds of realised programme risk. Stage two study The second stage of work continued to promote close contact with the GCCP in developing a second questionnaire. to re-configure metrics so as to better reveal existing and potential project risks and so increase the relevance and ``ownership'' of the benchmarking practices. The case for reviewing exposure to risk was enhanced by the fact that one third of the projects were procured on the basis of ``one stop shop'' style contracts such as Design and Build. The need to balance achievability and usefulness is an important lesson drawn from benchmarking experiences in other industrial sectors. the risk was not spread through suppliers using partnering or other teamworking agreements. practice usage.

Figure 1 Ð Linking metrics with themes under the risk-based evaluation framework g g y y V y g Figure 2 Ð Client perception of success measured against programme and cost in construction phase Measuring Business Excellence 5.4 2001 1 7 .

it was not consistent in delivering good project performance. and it was limited as is as shown in Figure 4. Conclusions Overall. but that this had been underutilised. There remained a ``reality gap'' between performance and performance assessment. was in those projects where clients used a ``basket'' of good practices. combining strategic relationship assessment with teambuilding with safety audits and so on (see Figure 8). further suggesting that the type of risk management systems in place required review. as shown in Figure 6. in particular between each metric and cost and time. Towards stage two Analyses were undertaken to investigate relationships between metrics. Where improvement was experienced. combined with the issues of unpredictable cost and time. It is suggested that there needs to be greater ``ownership'' of projects from Figure 4 Ð The use of risk management set against cost growth in the construction phase Measuring Business Excellence 5. This analysis lends further support to an approach to benchmarking that is underpinned through more effective risk mapping to promote shared ownership of risk. Perhaps the most significant finding was that ``bolting on'' specific good practices did not result in performance improvement (see Figure 7). however. This general point of learning. This failed to establish strong trends and supported the notion that construction projects bear a complex structural identity in which the specific demographic.4 2001 1 8 . Where risk management did occur. financial and political context has a large influence. Despite this. Figure 5 shows that construction cost overrun amongst the clients remained high. was seminal in moving us towards articulating the project to include many more non-financial metrics.Figure 3 Ð Use of partnering or teamworking set against cost growth in the construction phase construction based on the actual risks encountered). Stage two study findings and discussion The research highlighted variations in the application of good practices. hence shifting the emphasis in Stage two towards risk identification and its control. the public sector client was found to have both a strategic and operational role in achieving performance improvement in value for money service delivery. supplier satisfaction ratings made by the clients remained consistently high. and in line with the findings of the pilot study.

and can appreciate how best to pursue those aims. a suspicion that metrics amount to a regulatory audit (if not now. MBE Measuring Business Excellence 5. gathering the support of new clients in the UK and internationally. innovative public management culture. Through benchmarking. then at sometime) and that those insisting upon value for money criteria are doing so for reasons of political expediency as much as a genuine desire to create an open. We are optimistic that the gaining momentum of benchmarking will continue.Figure 5 Ð Construction cost increase from budget Figure 6 Ð Supplier assessment of client public sector clients and risk sharing with suppliers.4 2001 1 9 . however. It is also suggested that public sector management needs to expand its scope to deliver whole life value for money. value and the embodiment of both service-oriented and product-oriented approaches to procurement have a long way to go. There remains. The development of benchmarking to include whole life issues. public sector clients can better assess the relevance of their existing aims and purposes.

59 No. 17 No. Ammons. 2/3. Vol. HMSO. and Smith. 2. 59 No. (1999). January-March. Nutley. 23 No. ``Local government benchmarking: lessons from two major multigovernment efforts''. D. Rowe. Davies. International Journal of Public Sector Management. Vol.. 13 No. Facilities. Vol. 9-16. 14 No. Graves A. M. I. ``A proper mentality for benchmarking''. Rowe. and McNay. D. D. and Sykes. pp. P. A. A. 261-70. ``The economics of the private finance initiative''. (1998). P. Vol.. pp. Vol. (1999). 31-53. 4. ``A framework for benchmarking in the public sector''. S. 105-14. Hinks. Cox. and Thompson. Coe. J. 3. pp. Davis. D. (1998a). P. pp. Constructing the Government Client. 3.4 2001 2 0 . (1999). Oxford Review of Economic Policy. ``From old public administration to new public management''. (1998). London. Vol. 4. C. Vol. ``Editorial''. Graves. 11 No. (1994). Benchmarking for Quality Management and Technology. D. Public Money and Management. M. 1-20. Public Administration Review. C. HMSO. Vol. 110-21. pp. 2. pp. and Hood. Measuring Business Excellence 5. ``On the appropriateness of benchmarking''. Grout. Public Money and Management. 5 No. (1997).Figure 7 Ð Plotting outcome success against the use of mature realtionships Figure 8 Ð Plotting outcome success against a ``basket'' of good practices References Dorsch. pp. Journal of General Management.. Sheath. P. Dunleavy. (1999). P. pp. D. 53-66. 1/2. (1998). July-September. H... Sheath. 90-115. ``The burgeoning of benchmarking in British local government''. (1998b) Pilot Benchmarking Study. ``The creation of a management by variance tool''. Public Administration Review. and Yasin. London.

12 No. pp. and Korac-Kakbadse. (1996). Nonaka. Public Money and Management. C. N. James. (1998). and Grayson. N. Oxford. 4/5. I. California Management Review. C. Public Money and Management. ``If only we knew what we know: identification and transfer of internal best practices''. Loffler. 2. D. ``Regulation inside government: where new public management meets the audit explosion''. 4. G. 3. Sen. 3 No. (1999). E. April-June. O'Dell. 121-44. April-June. 541-53. (1999). R. (2000). pp. 17 No. Measuring Business Excellence 5. 16 No. Vol. and Senoo. 61-8. (1998). Vol. (1998). Jones. D. 673-84. Public Money and Management. Vol. accountability and executive agencies''. T. 154-74. G.Hood. Mohamed. 23-30. Vol. pp. P. O. public management and critical leadership in civil construction projects''. Klages. pp.. Leicester. S. Ï Kouzmin. 5-7. R. and Anderson. and Travers. Holt.. ``Benchmarking and improving construction productivity''. (1998). pp. International Journal of Public Sector Management. Hyndman. ``Performance information. January-March. ``The seven enemies of evidence based policy''. C... Benchmarking for Quality Management & Technology. pp. C. pp. European Management Journal.4 2001 2 1 . Vol. ``The art of knowledge: systems to capitalise on market knowledge''. Reinmoeller. (1998). 50-8. pp.. ``Total quality.. International Journal of Quality & Reliability Management. Scott. Oxford University Press. ``Benchmarking and performance measurement in public sectors''. 3. Commodities and Capabilities. and Rowe. A. A. H. 40 No.