You are on page 1of 13

The Journal of Systems and Software 59 (2001) 4355

www.elsevier.com/locate/jss

The relationship between ISO/IEC 15504 process capability levels,


ISO 9001 certication and organization size: An empirical study
Ho-Won Jung a,*, Robin Hunter b,1
a

Department of Business Administration, Korea University, Anam-dong 5Ka, Sungbuk-gu, Seoul 136-701, Republic of Korea
b
Department of Computer Science, University of Strathclyde, Richmond Street, Glasgow G1 1XH, UK
Received 13 November 2000; received in revised form 31 December 2000; accepted 12 February 2001

Abstract
The gradual spread in the use of ISO 9001 and ISO/IEC 15504 (also known as software process improvement and capability
determination (SPICE)) has raised questions such as ``At what ISO/IEC 15504 capability level would one expect an ISO 9001
certied organization's processes to be?'' and ``Is there any signicant dierence between the ISO/IEC 15504 capability levels
achieved by the processes of ISO 9001 certied organizations and those of non ISO 9001 certied organizations?''. This paper
provides answers to those questions as well as to the following question ``Is there any signicant dierence in the capability levels
achieved by the ISO/IEC 15504 processes of organizations with a large information technology (IT) sta and those with a small IT
sta?'' In order to answer these questions, we analyzed a data set including 691 process instances (PIs) taken from 70 SPICE phase 2
trial assessments performed over the two years from September 1996 to June 1998. Results show that the ISO/IEC 15504 processes
of the ISO 9001 certied organizations attained capability levels of around 12.3 in 15504 terms. Results also show dierences
between the capability levels achieved by ISO 9001 certied organizations and non ISO 9001 certied organizations, as well as
between organizations with a large IT sta and those with a small IT sta. 2001 Elsevier Science Inc. All rights reserved.
Keywords: Bootstrap; Condence interval; Capability level; Permutation test; ISO 9001 certication; SPICE assessments

1. Introduction
The software process improvement and capability
determination (SPICE) project is an ongoing project
supporting ISO/IEC JTC1/SC7/WG10 2 in the development and trialing of the emerging standard ISO/IEC
15504 for software process assessment, capability determination, and software process improvement. ISO/
IEC JTC1/SC7/WG10 has developed a technical report
2 (TR2) (ISO/IEC TR2, 1998) consisting of the document set shown in Table 1. In the development of these
*

Corresponding author. Tel.: +82-2-3290-1938; fax: +82-2-922-7220.


E-mail addresses: hwjung@mail.korea.ac.kr (H.-W. Jung),
rbh@cs.strath.ac.uk (R. Hunter).
1
Tel.: +44-141-548-3585; fax +44-141-552-5330.
2
Working Group 10 of Subcommittee 7 (Software Engineering
Standardization) under a Joint Technical Committee 1 for the
international organization for standardization (ISO) and the international electrotechnical commission (IEC). WG10 is working for
development of standards and guidelines covering methods, practices
and application of process assessment in software product procurement, development, delivery, operation, maintenance and related
service support.

documents, the SPICE project has empirically evaluated


successive versions of the document sets of the emerging
international standard through a series of trials (El
Emam and Goldenson, 1995; Goldenson and El Emam,
1996; Maclennan and Ostrolenk, 1995; Smith and El
Emam, 1996; ISO/IEC, 1999; Woodman and Hunter,
1998; El Emam and Jung, 2000). More information
about ISO/IEC 15504 may be found in ISO/IEC 15504
(1998) and in (El Emam et al., 1998).
ISO 9001 (1997) contains 20 clauses (see Appendix A)
that collectively provide the minimum requirements for
setting up a quality management system for use in software development and maintenance, as well as in other
industries. Satisfaction of all the requirements leads to
ISO 9001 certication. ISO 9000-3 (1997) contains software specic guidelines for the use of ISO 9001, and
TickIT (1999) is an initiative, which originated in the
UK, to promote the application of ISO 9001 to software.
Although ISO 9001 and ISO/IEC 15504 have dierent origins, i.e., ISO 9001 is a generic standard for
quality management and assurance while 15504 was
created solely for software process assessment, capability determination, and process improvement, the two

0164-1212/01/$ - see front matter 2001 Elsevier Science Inc. All rights reserved.
PII: S 0 1 6 4 - 1 2 1 2 ( 0 1 ) 0 0 0 4 7 - 4

44

H.-W. Jung, R. Hunter / The Journal of Systems and Software 59 (2001) 4355

Table 1
A set of documents in ISO/IEC 15504
Part
Part
Part
Part
Part
Part
Part
Part
Part

1:
2:
3:
4:
5:
6:
7:
8:
9:

Concepts and introductory guide


A reference model for processes and process capability
Performing an assessment
Guide to performing assessments
An assessment model and indicator guidance
Guide to competency of assessors
Guide for use in process improvement
Guide for use in determining supplier process capability
Vocabulary

standards are intuitively similar, as has been shown in a


comparative study of the two standards (Hailey, 1998).
To the best of our knowledge, there are no studies which
assess the degree of similarity between the two standards
such as is seen in a comparative study of ISO 9001 and
capability maturity model (CMM) (Paulk et al., 1993)
conducted by (Paulk, 1995). In the study, Paulk attempted to answer questions such as ``At what level in
the CMM would an ISO 9001 compliant organization
be?'' and ``Can a CMM level 2 (or 3) organization be
considered compliant with ISO 9001?''.
The aim of this study is to consider and, if possible, to
provide an answer to the questions relating to ISO 9001
and ISO/IEC 15504
At what ISO/IEC 15504 capability level would one
expect an ISO 9001 certied organization's processes
to be?
Is there any signicant dierence in the SPICE capability levels achieved by the processes of ISO 9001
certied organizations and those of non ISO 9001
certied organizations?
Is there any signicant dierence in the capability levels achieved by the SPICE processes of organizations
with a large information technology (IT) sta and
those with a small IT sta?
This study answers the three questions above empirically by analyzing a data set taken from 70 SPICE
phase 2 trial assessments.
The SPICE trials can be seen as a response to a
statement by Peeger et al. (1994) that

2. A brief overview of ISO/IEC 15504


2.1. Two dimensional architecture
The architecture of the emerging standard ISO/IEC
15504 standard consists of both process and capability
dimensions. Fig. 1 shows the structure of the process
and capability dimensions.
In the process dimension, the processes associated
with software development and maintenance are dened
and classied into ve categories known as the customersupplier (CUS), engineering (ENG), support
(SUP) Management (MAN), and Organization (ORG)
categories. Table 2 gives a brief overview of the ve
categories and the processes contained in each category.
As shown in Fig. 1 and Table 3, the capability dimension is represented by a set of process attributes
(PAs), which can be applied to any process, and represent measurable characteristics required to manage a
process and to improve its performance capability. The
capability dimension comprises six capability levels
ranging from 0 to 5. The higher the level, the higher the
process capability.
2.2. Capability level determination
An ISO/IEC 15504 assessment is applied to an organizational unit (OU)) (El Emam et al., 1998). An OU
is the whole or the part of an organization that owns
and supports the software process. In this paper the
term organization will sometimes be used when the term
OU would be strictly more correct. During an assessment, an organization can cover only the subset of
processes that are relevant for its business objectives. In
most cases, it is not necessary to assess all of the processes in the process dimension.
In ISO/IEC 15504, the capability level of each process
is determined by rating the PAs. For example, to

Standards have codied approaches whose eectiveness has not been rigorously and scientically
demonstrated. Rather, we have too often relied on
anecdote, `gut feeling', the opinions of experts, or
even awed research.
Fenton et al. (1993) and Fenton and Page (1993) have
made similar arguments.
The remainder of this paper is organized as follows:
Section 2 provides an overview of the ISO/IEC 15504
architecture, Section 3 presents the research method and
Section 4 presents the main results of the study and
discusses their implications. Finally, Section 5 concludes
with a summary of the paper and some nal remarks.

Fig. 1. Two dimensional architecture of ISO/IEC 15504.

H.-W. Jung, R. Hunter / The Journal of Systems and Software 59 (2001) 4355

45

Table 2
The processes in process dimension
Customersupplier process category (CUS) processes that have direct impact on the customer, support development and software transition to
the customer, and provides the correct operation and use software of products and/or services. Its processes are:
CUS.1: Acquire software
CUS.2: Manage customer needs
CUS.3: Supply software
CUS.4: Operate software
CUS.5: Provide customer service
Engineering process category (ENG) processes that directly specify, implement, or maintain the software product, its relation to the system and
its customer documentation. Its processes are:
ENG.1: Develop system requirements and design
ENG.2: Develop software requirements
ENG.3: Develop software design
ENG.4: Implement software design
ENG.5: Integrate and test software
ENG.6: Integrate and test system
ENG.7: Maintain system and software
Support process category (SUP) processes that may be employed by any of the other processes (including other supporting processes) at various
points in the software life cycle. Its processes are:
SUP.1: Develop documentation
SUP.2: Perform conguration management
SUP.3: Perform quality assurance
SUP.4: Perform work product verication
SUP.5: Perform work product validation
SUP.6: Perform joint review
SUP.7: Perform audits
SUP.8: Perform problem resolution
Management process category (MAN) processes which contain generic practices that may be used by those who manages any type of project or
process within a software life cycle. Its processes are:
MAN.1: Manage the project
MAN.2: Manage quality
MAN.3: Manage risks
MAN.4: Manage subcontractors
Organization process category (ORG) processes that establish business goals of the organization and develop processes, products, and resource
assets which, when used by the projects in the organization, will help the organization achieve its business goals. Its processes are:
ORG.1: Engineer the business
ORG.2: Dene the process
ORG.3: Improve the process
ORG.4: Provide skilled human resources
ORG.5: Provide software engineering infrastructure

determine whether a process has achieved level one or


not, it is necessary to determine the rating achieved by
PA1.1 (the process performance attribute). A process
that fails to achieve capability level 1 is at capability
level 0. Capability levels 25 each have two attributes
associated with them as shown in Table 3. A more detailed description of the attributes can be found in ISO/
IEC 15504: Parts 2 and 5.
Each process attribute is measured by an ordinal
rating F (Fully), L (Largely), P (Partially), or N (Not
achieved) that represents the extent of achievement of
the attribute. The object that is rated is the process instance. A process instance (PI) is dened to be a singular
instantiation of a process that is uniquely identiable
and about which information can be gathered in a
repeatable manner (ISO/IEC, 1999; El Emam et al.,
1998).

The capability level of a process is determined from


the ratings of its PAs. The achievement of capability
level k requires that all PAs below level k are rated F and
that the attributes at level k are rated F or L. As an
example, achievement of capability level 3 requires F
ratings for PA 1.1, PA 2.1, PA 2.2 and F or L ratings for
PA 3.1 and PA 3.2. ISO/IEC 15504: Part 5 provides
more detailed information on determining capability
level.
3. Research method
3.1. Source of data
The data for this study was collected during the
SPICE phase 2 trials. The SPICE project has been

46

H.-W. Jung, R. Hunter / The Journal of Systems and Software 59 (2001) 4355

Table 3
Process attributes for each capability level (ISO/IEC 15504: Part 5)
Capability level

Process attribute (PA)

Level 0 Incomplete

There is a general failure to attain the purpose of the process. There are little or no easily identiable work
products or outputs of the process. Thus, there are no PAs.

Level 1 Performed process

The purpose of the process is generally achieved. The achievement may not be rigorously planned and tracked.
There are identiable work products for the process, and these testify to the achievement of the purpose.
PA 1.1, Process performance attribute: The extent to which the process achieves the process outcomes by
transforming identiable input work products to produce identiable output work products.
The process delivers work products according to specied procedures and is planned and tracked. Work
products conform to specied standards and requirements.
PA 2.1, Performance management attribute: The extent to which the performance of the process is managed to
produce work products that meet the dened objectives.
PA 2.2, Work product management attribute: The extent to which the performance of the process is managed to
produce work products that are appropriately documented, controlled, and veried.

Level 2 Managed process

Level 3 Established process

Level 4 Predictable process

Level 5 Optimizing process

The dened process is performed and managed based upon good software engineering principles.
Individual implementations of the process use approved, tailored versions of standard, documented
processes to achieve the process outcomes. The resources necessary to establish the process denition are also
in place.
PA 3.1, Process denition attribute: The extent to which the performance of the process uses a process denition
based upon a standard process to achieve the process outcomes.
PA 3.2, Process resource attribute: The extent to which the process draws upon suitable resources (for
example, human resources and process infrastructure) that is appropriately allocated to deploy the dened
process.
The dened process is performed consistently in practice within control limits to achieve its process goals.
Detailed measures of performance are collected and analyzed. This leads to a quantitative understanding of
process capability and an improved ability to predict and manage performance. Performance is quantitatively
managed. The quality of work products is quantitatively known.
PA 4.1, Measurement attribute: The extent to which product and process goals and measures are used to ensure
that performance of the process supports the achievement of the dened goals in support of the relevant business
goals.
PA 4.2, Process control attribute: The extent to which the process is controlled through the collection, analysis,
and use of product and process measures to correct, where necessary, the performance of the process to achieve
the dened product and process goals.
Process performance is optimized to meet current and future business needs, and the process achieves
repeatability in meeting its dened business goals. Performance of quantitative process eectiveness and
eciency goals (targets) for performance are established, based on the business goals of the organization.
Continuous process monitoring against these goals is enabled by obtaining quantitative feedback and
improvement is achieved by analysis of the results.
PA 5.1, Process change attribute: The extent to which changes to the denition, management and performance of
the process are controlled to achieve the relevant business goals of the organization.
PA 5.2: Continuous improvement attribute: The extent to which changes to the process are identied and
implemented to ensure continuous improvement in the fulllment of the relevant business goals of the
organization.

empirically evaluating the emerging international standard through a series of trials consisting of three phases.
Phase 2 of the trials was based on ISO/IEC proposed
draft technical report (PDTR) (ISO/IEC 15504, 1998)
15504 and took place over the two years from September 1996 to June 1998. The PDTR version of the standard preceded the TR 2 version.
The data was from 70 assessments of 44 organizations from ve regions: Europe (24 trials), South AsiaPacic (34 trials), North Asia-Pacic (10 trials), USA (1
trial), and Canada/Mexico (1 trial) (Hunter, 1998; ISO/
IEC, 1999). Note that the number of assessments is
greater than that of the OUs due to multiple assessments

in some organizations. There were a total of 169 projects


involved and 691 PIs.
The data set submitted to the international trials coordinator (ITC) for each trial included the ratings data
from each assessment and answers to a set of questionnaires concerning the assessment, the OU, the project etc., completed by lead assessors and OUs,
following each assessment. At the country or state level,
local trials coordinators (LTCs) liaised with the assessors and OUs to ensure assessors' qualications, to
make the questionnaires available, to answer queries
about the questionnaires, and to ensure the timely collection of data.

H.-W. Jung, R. Hunter / The Journal of Systems and Software 59 (2001) 4355

3.2. Data analysis


3.2.1. Explanation of the term condence interval
This section provides a brief explanation of the term
condence interval which is used in the answer to the
question At what SPICE capability level would one
expect an ISO 9001 certied organization's processes to
be?
To give an example, in our data associated with the
ISO 9001 certied organizations, CUS.1 (Acquire software) consists of two process instances (PIs) with capability levels 0 and 1, respectively. The average
capability level of the sample is therefore 0.5. However,
dierent samples will produce dierent values for the
average capability level of a process. Thus, we used a
condence interval, lB 6 l 6 uB , to delimit the true (unknown) value l of the average capability level of each
process. Since the sample size is small in some processes,
the generic method of assuming a normal distribution
cannot be used (Montgomery et al., 1998). Thus, this
study used a nonparametric statistical approach called

47

the bootstrap method (Kenett and Zacks, 1998; Efron


and Tibsshirani, 1993) to compute the condence interval. This bootstrap method should not be confused
with the BOOTSTRAP method for process assessment
(Kuvaja, 1999).
The bootstrap method does not depend on a specic
distribution function. It samples n times from the original observation with replacement and then computes a
sample mean. This process is repeated M times, where
M is a large number. The distribution of the M sample
means is called the empirical reference distribution. In
this study 1000 sets of random samplings have been
taken and then the lower and upper limits of the condence interval have been determined, with 2.5% and
97.5% percentiles, respectively, from a histogram of
1000 bootstrap replications, in the empirical reference
distribution.
3.2.2. Dierence in capability level of two groups
One objective of our analysis is to nd the dierence (if
any exists) between the capability levels of ISO 9001

Fig. 2. Box and whisker plots showing the variation of the capability levels for the 29 ISO/IEC 15504 processes (see Appendix C for interpretation of
the box and whisker plot).

48

H.-W. Jung, R. Hunter / The Journal of Systems and Software 59 (2001) 4355

However, our data set is characterized as being small and


unbalanced (or skewed), as well as being non normal.
Thus, this study used a nonparametric test called the
Permutation test (Gibbons, 1985; Good, 1993; StatXact,
1998) that accommodates the characteristics of our dataset. Most nonparametric tests can be used on data from
an ordinal scale (Conte et al., 1986). See Figs. 2, 3, 6 for
the skewed characteristic and Tables 4 and 5 for small
and unbalanced characteristics of the observations. The
null hypothesis of the Permutation test is as follows:
H0 : F1 F2 :
Fig. 3. Box and whisker plot showing the variation of IT sta size.

certied organizations and the capability levels of non


ISO 9001 certied organizations. In order to do this, we
divided the data set into the two groups corresponding to
the ISO certied organizations and non ISO 9001 certied organizations, respectively. A popular method of
testing for the existence of a true dierence is to use a
hypothesis test for the dierence in means of two independent normal populations (Montgomery et al., 1998).

The above hypothesis states that two populations have


the same distribution as opposed to having dierent
distributions of capability level. Here it is not necessary
to make any assumption about the underlying distributions from which observations were drawn. The only
assumption is that data is independent both ``within
each sample'' and ``across the two samples''. Appendix
B gives a brief explanation of Permutation test.
3.2.3. Measurement scale
According to classical measurement theory concerned
with ``permissible statistics'' developed by Stevens

Table 4
Capability level of SPICE processes of ISO 9001 certied and non ISO 9001 certied organizations
Process
CUS.1
CUS.2
CUS.3
CUS.4
CUS.5
ENG.1
ENG.2
ENG.3
ENG.4
ENG.5
ENG.6
ENG.7
SUP.1
SUP.2
SUP.3
SUP.4
SUP.5
SUP.6
SUP.7
SUP.8
MAN.1
MAN.2
MAN.3
MAN.4
ORG.1
ORG.2
ORG.3
ORG.4
ORG.5

ISO 9001 certied organizations (group 1)

Non ISO 9001 certied organizations (group 2)

No. of PIs

Average x1

No. of PIs

Average x2

2
15
8
3
3
13
26
30
21
20
9
9
16
22
7
12
8
8
5
15
27
16
22
4
5
7
6
5
6

0.50
1.87
1.63
1.00
1.33
2.00
1.77
1.90
1.76
1.80
2.00
2.00
1.75
2.09
2.00
1.42
1.75
1.38
2.20
2.13
1.59
1.00
1.14
2.25
1.60
2.29
1.00
1.60
1.00

3
16
14
10
16
4
30
15
11
16
5
15
17
24
11
4
4
11
5
8
36
9
10
2
9
6
5
15
10

1.00
1.50
0.93
0.90
0.81
1.00
1.40
1.53
0.91
0.88
2.20
1.53
1.00
0.79
0.73
0.25
0.75
0.91
0.40
1.25
1.11
0.33
0.60
0.50
0.67
0.17
0.20
0.87
0.80

x1

x2

)0.50
0.37
0.70
0.10
0.52
1.00
0.37
0.37
0.85
0.93
)0.20
0.47
0.75
1.30
1.27
1.17
1.00
0.47
1.80
0.88
0.48
0.67
0.54
1.75
0.93
2.12
0.80
0.73
0.20

One-sided
exact P -value
0.400
0.058
0.017
0.604
0.159
0.134
0.143
0.181
0.015
0.002
0.481
0.134
0.018
0.000
0.015
0.074
0.176
0.154
0.060
0.025
0.005
0.118
0.060
0.200
0.119
0.001
0.056
0.007
0.419

The processes with P-value shown in bold denote the existence of the dierence in the capability level distribution of the two groups at the a 0:05
level of signicance.

H.-W. Jung, R. Hunter / The Journal of Systems and Software 59 (2001) 4355

49

Table 5
Capability level of SPICE processes of organizations with a large IT sta and organizations with a small IT sta
Process
CUS.1
CUS.2
CUS.3
CUS.4
CUS.5
ENG.1
ENG.2
ENG.3
ENG.4
ENG.5
ENG.6
ENG.7
SUP.1
SUP.2
SUP.3
SUP.4
SUP.5
SUP.6
SUP.7
SUP.8
MAN.1
MAN.2
MAN.3
MAN.4
ORG.1
ORG.2
ORG.3
ORG.4
ORG.5

Organizations with a large IT sta (group 1)

Organizations with a small IT sta (group 2)

No. of PIs

Average x1

No. of PIs

Average x2

3
20
14
8
12
9
42
32
22
23
7
13
45
16
22
3
9
7
6
11
11
23
32
10
8
5
13
5
15

1.00
1.65
1.29
1.13
1.00
2.00
1.60
1.94
1.59
1.43
2.43
2.08
1.40
0.69
1.05
2.00
1.11
1.57
0.83
1.18
0.91
1.52
1.72
1.60
1.38
1.60
1.46
1.60
2.00

2
9
8
5
7
6
10
9
8
11
5
10
14
7
8
3
5
6
5
9
5
9
11
7
5
5
6
5
6

0.50
1.56
1.00
0.60
0.71
1.50
1.50
1.56
1.25
1.55
1.60
1.10
1.07
1.14
0.75
1.33
0.80
1.00
0.40
0.89
0.80
0.89
0.73
0.86
1.20
1.00
0.33
1.00
1.50

x1

x2

0.50
0.09
0.29
0.53
0.29
0.50
0.10
0.38
0.34
)0.11
0.83
0.98
0.33
)0.46
0.30
0.67
0.31
0.57
0.43
0.29
0.11
0.63
0.99
0.74
0.18
0.60
1.13
0.60
0.50

One-sided
exact P -value
0.400
0.483
0.290
0.093
0.327
0.272
0.460
0.192
0.258
0.447
0.138
0.002
0.117
0.257
0.300
0.400
0.469
0.314
0.284
0.204
0.544
0.070
0.018
0.145
0.506
0.333
0.007
0.381
0.208

The processes with P-value shown in bold denote the existence of the dierence in the capability level distribution of the two groups at the a 0:05
level of signicance.

(1951), variables should be measured on an interval


scale if the arithmetic mean and variance are to be
computed (see also Nunnally and Bernstein, 1994).
To analyze our data set, the capability levels were
coded such that ``capability level 5'' was 5, down to
``capability level 0'' that was coded 0. El Emam and Birk
(2000a,b) state that the coding scheme for process capability lies between ordinal and interval level measurement. However, in the above papers they treated
capability level as being on an interval scale since capability level is a single item measure that is treated as if
it is interval in many instances. Furthermore, the use of
nonparametric methods on noninterval scale data would
exclude much useful study (Nunnally and Bernstein,
1994). Many authors as well as Stevens himself noted
that useful study can be conducted even if the proscriptions are violated (Briand et al., 1996; Gardner,
1975; Stevens, 1951; Velleman and Wilkinson, 1993). A
detailed discussion of the scale type issue for process
capability is given by El Emam and Birk (2000a,b).
3.2.4. Eects of context factors
In a recent review of the empirical literature on
software process assessment (El Emam and Briand,

1999) it was noted that the eectiveness of process


improvement actions depended on the context in
which the actions are performed. Thus, it would be
interesting to examine how each of the context factors, possession of ISO 9001 certication and organisational size, relate to capability level dened by
ISO/IEC 15504.
In investigating organization size as a context factor,
we divided the PIs into two groups based on whether the
organization had a large or a small IT sta, where small
is less than or equal to 50 IT sta. The same denition of
`small' organizations was used in a European project
that provides process improvement guidance for small
organizations (SPIRE project, 1998). In the validity
study of ISO/IEC 15504, El Emam and Birk (2000a,b)
used the same IT context.
4. Results
4.1. Descriptive statistics
44 OUs participated in the SPICE Phase 2 Trials and
a total of 691 PIs were rated. Fig. 2 gives box and

50

H.-W. Jung, R. Hunter / The Journal of Systems and Software 59 (2001) 4355

whisker plots showing the variation of the capability


levels for the 29 ISO/IEC 15504 processes, distributed
over the ve process categories. See Appendix C for an
explanation of the box and whisker plot.
Half of assessed OUs (22 OUs) had ISO 9001 certication. For the 691 PIs assessed, 350 PIs were from
the ISO 9001 certied OUs and 341 PIs were from non
ISO 9001 certied OUs. In addition, 446 PIs were assessed in OUs with a large IT sta and 206 PIs were
assessed in OUs with a small IT sta. There was some
missing data as far as IT sta size was concerned. Fig.
3 shows a box and whisker plot showing the variation
in IT sta size. The average and median size of IT sta
in the non ISO 9001 certied OUs are 94.98 and 50,
respectively, while those of the ISO 9001 certied OUs
are 163.47 and 75.
Fig. 4 shows the percentage distribution of capability
levels of PIs in the ISO 9001 certied OUs and non ISO
9001 certied OUs. Comparing the two pie charts gives
the impression that the ISO 9001 certied OUs have
greater capability than the non ISO 9001 certied OUs.
For example, 3% of the processes associated with ISO
9001 certied OUs are at of level 4 while none of the
processes associated with the non ISO 9001 certied
OUs are above level 3.
Fig. 5 shows the percentage distribution of capability
levels by IT sta size. Although OUs with a small IT
sta have 5% of processes at capability level 4, comparing the two charts shows that OUs with a large IT
sta appear generally to have higher capability than
those with a small IT sta.

Fig. 4. Distribution of capability levels in ISO 9001 certied and non


ISO 9001 certied organisations.

Fig. 5. Distribution of capability levels in organizations with a large


IT sta and organizations with a small IT sta.

4.2. Capability Levels of SPICE processes in ISO 9001


certied organizations
In this section we try to answer the question At what
SPICE capability level would one expect an ISO 9001
certied organization's processes to be? Fig. 6 shows
variations in the capability levels in each process category of 350 PIs (including all 29 SPICE processes) of
ISO 9001 certied organizations. All of the ISO 9001
certied organizations have attained a minimum capability level of 1 in six processes, namely CUS.2 (Manage
customer needs), CUS.3 (Supply software), CUS.4
(Operate software), ENG.1 (Develop system requirements and design), ENG.7 (Maintain system and software), and ORG.4 (Provide skilled human resources).
For the other 23 SPICE processes, there is at least one
process instance with capability level zero.
The third column in Table 4 shows the average capability level of each of the 29 ISO/IEC 15504 processes for the ISO 9001 certied organizations. The
average capability level for each SPICE process except
CUS.1 (for which there is very little data) lies between
1 and 2.3. Since dierent samples will produce dierent
values for the average capability level of a process, this
study provides a condence interval of the true value l
of the average capability level of each process. Fig. 7
shows the 95% bootstrap condence interval of the
true mean capability level for each of 15 ISO/IEC
15504 processes of the ISO 9001 certied organizations
(the fourteen processes with small sample size, less than
nine, are not displayed). As an example, in the case of
the condence interval of CUS.2, we can say, with a
condence of 95%, that the mean capability level is in
[1.533, 2.267]. Note that the average capability values
should be considered as conservative because some
assessments did not perform assessments beyond capability level 3.
From Table 4, the processes that have average capability level greater than or equal to 2 are ENG.1
(Develop system requirements and design), ENG.6
(Integrate and test system), ENG.7 (Maintain system
and software), SUP.2 (Perform conguration management), SUP.3 (Perform quality assurance), SUP.7
(Perform audits), SUP.8 (Perform problem resolution),
MAN.4 (Manage subcontractor), and ORG.2 (Dene
the process). The average capability level of CUS.1
(Acquire software) is less than 1. However, the sample
size in this case is too small for this value to be considered signicant. For the non ISO 9001 certication
organizations, ENG.6 (Integrate and test system) only
attained average capability level greater than or equal
to 2.
How can we explain the relatively large variation in
the capability levels of the SPICE processes of ISO 9001
certied organizations? There are several possible explanations:

H.-W. Jung, R. Hunter / The Journal of Systems and Software 59 (2001) 4355

51

Fig. 6. Variation of the capability levels in ISO 9001 certied OUs.

The SPICE processes that have capability level 0 may


not have been maintained since ISO 9001 certication
took place.
The SPICE processes that have the lower capability
levels may not be directly related to the ISO 9001 requirements.
The SPICE processes that have the higher capability
levels may be directly related to one or more clauses
of ISO 9001 and/or may be involved in process improvement.
The variation may be related to the relatively small
sample size (number of PIs rated).
4.3. Dierence in the capability levels of ISO 9001
certied and non ISO 9001 certied organisations
In this section, we try to answer the question Is
there a signicant dierence in the SPICE capability
levels achieved by the processes of ISO 9001 certied
organizations and those of non ISO 9001 certied
organizations?

Table 4 shows the number of PIs rated and the average capability level achieved for each of the 29 SPICE
processes rated both for the ISO 9001 certied and the
non ISO 9001 certied organizations. The average capability level of the ISO 9001 certied organizations is
greater than that of the non ISO 9001 certied organizations for all processes, apart from CUS.1 and ENG.6.
Dierent samples will produce dierent values for the
dierence. Thus, we need a statistical test to determine
the existence of the true dierences in capability level
between two groups (the ISO 9001 certied and the non
ISO 9001 certied organizations).
Although the assumption of an interval scale for
capability measurement (see Section 3.2.3) would allow
the use of a parametric test for determining the existence of true dierences in capability level between the
two groups, a nonparametric test (Permutation test)
suitable for nonnormal, small, unbalanced or skew data
was employed in this study. This analysis of the
dierence between the two groups used the capability levels of the 350 PIs from ISO 9001 certied

52

H.-W. Jung, R. Hunter / The Journal of Systems and Software 59 (2001) 4355

Fig. 7. Condence interval of the capability levels of ISO 9001 certied organisations (processes with sample size P 9; a 0:05).

organizations and the 341 PIs from non ISO 9001


certied organizations.
The analysis was conducted by utilizing StatXact-4
(1998) with the exact option of the Permutation test at
the a 0:05 level of signicance. StatXact-4 can compute the exact P -value to determine the same response
(capability level) distribution of the two groups. As
shown in Table 4, for 11 processes CUS.3, ENG.4,
ENG.5, SUP1, SUP.2, SUP.3, SUP.8, MAN.1, MAN.3,
ORG.2, and ORG.4, we can say, with a condence level
of 95%, that the capability level of the ISO 9001 certied
organizations is greater than that of the non ISO 9001
certied organizations, where P -value is less than or
equal to 0.05. For the remaining processes, since the P value is greater than 0.05, we cannot say that one group
(those associated with ISO 9001 certied organizations
or those associated with non ISO 9001 certied organizations) has a signicantly dierent capability distribution from the other.

the IT sta size is the same as the one used in the previous section. Unfortunately, in ve projects assessed,
the necessary data was incomplete. Thus, we used a total
of 652 PIs, 446 that belonged to organizations with a
large IT sta and 206 that belonged to organizations
with a small IT sta. The processes with the P -value
shown in bold in Table 5 denote the existence of a difference in the capability level distribution of the two
groups at the a 0:05 level of signicance.
As shown in Table 5, in all SPICE processes except
ENG.5 and SUP.2, the average capability level for organizations with a large IT sta is greater than for organizations with a small IT sta. However, since P -value
for all processes except ENG.7 and ORG.3 is greater
than 0.05, it is only for ENG.7 and ORG.3 that we can
reject the hypothesis that the capability level distribution
of organizations with a large IT sta is the same as that
for organizations with a small IT sta.

4.4. Dierence in capability levels of organisations with a


large IT Sta and organizations with a small IT sta

5. Final remarks

In this section we try to answer the question Is there


any signicant dierence in the capability levels achieved
by the SPICE processes of organizations with a large IT
sta and those with a small IT sta?
The basic method used to compare the response distribution in the capability levels achieved according to

The results of this study may be summarized as follows:


A SPICE process capability level of around 1.02.3
corresponds to an ISO 9001 compliant organization.
In almost all SPICE processes, the average capability
level of the ISO 9001 certied organizations is greater
than that of the non ISO 9001 certied organizations.

H.-W. Jung, R. Hunter / The Journal of Systems and Software 59 (2001) 4355

In addition, in 11 processes out of 29, there was statistical evidence indicating that the capability level is
greater for the ISO 9001 certied organizations than
for the non ISO 9001 certied organizations.
The capability levels of the processes associated with
organizations with a large IT sta was greater than
those of the processes associated with organizations
with a small IT sta. However, only two processes
(ENG.7 and ORG.3) showed a statistical signicant
dierence in the capability levels.
In interpreting the results obtained in comparing SPICE
and ISO 9001, the similarities between the requirements
of the two standards should be borne in mind. In this
sense the results are perhaps not too surprising, though
it is encouraging to have empirical evidence to suggest
that SPICE and ISO 9001 applied to software, produce
broadly similar results.
One limitation of this study should be made clear in
interpreting our results. This limitation is not unique to
our study, but is characteristic of most comparison
studies. It is worth explaining here.
Suppose that an experimenter is interested in investigating the eect of a specic binary factor (i.e., ISO
9001 certication and non ISO 9001 certication) to
determine whether this factor has a signicant eect on
an observed quantity. In retrospective studies of this
sort designed to ``look into the past'' (Agresti, 1996),
observed data is obtained through the analysis of historical data concerning the system or process (Montgomery et al., 1998). Thus, retrospectiveobservational
studies of this sort are limited in that they cannot consider possibilities such as non ISO 9001 certied companies not going through the certication process
because it was not necessary for their business, but
nonetheless having satised all the clauses of ISO 9001.
This limitation can be solved by performing a randomized experiment which is a more appropriate way of
investigating the eect of such factors (Montgomery
et al., 1998). However, sometimes such randomized experiments are not possible due to cost, ethics, or for legal
reasons. Cost is a major barrier preventing randomization studies in the SPICE trials data collection.

Acknowledgements
The authors are members of the SPICE trials team,
and wish to acknowledge the contributions of other past
and present members of the trials team, in particular
those of Khaled El Emam, Inigo Garro, Dennis Goldenson, Peter Krauth, Bob Smith, Kyungwhan Lee,
Angela Tuey and Alastair Walker. The research of
Ho-Won Jung was supported by Korea Research
Foundation Grant (KRF-99-041-C00329). This support
is gratefully acknowledged.

53

Appendix A. ISO 9001 Quality systems Model for


quality assurance in design, development, production,
installation and servicing
The twenty clauses of ISO 9001 are as follows:
4.1
4.2
4.3
4.4
4.5
4.6
4.7
4.8
4.9
4.10
4.11
4.12
4.13
4.14
4.15
4.16
4.17
4.18
4.19
4.20

Management responsibility
Quality system
Contract review
Design control
Document and data control
Purchasing
Control of customer-supplied product
Product identication and traceability
Process control
Inspection and testing
Control of inspection, measuring, and test
equipment
Inspection and test status
Control of nonconforming product
Corrective and preventive action
Handling, storage, packaging, preservation, and delivery
Control of quality records
Internal quality audits
Training
Servicing
Statistical techniques

Appendix B. Permutation test


The Permutation test is known as a good method to
compare two populations when data set has characteristics of small, unbalanced, or skewed, as well as not
being normal (Gibbons, 1985; Good, 1993; StatXact,
1998). Our data set has these characteristics.
Suppose that samples 1 and 2 have n1 and n2 observations from distributions F1 u and F2 u, respectively,
where u is the observed raw data and Fj u
P U 6 u j j; j 1; 2. The null hypothesis that we want
to test is:
H0 : F1 F2 :
The original observations in u are replaced by the
corresponding scores obtained by testing H0 by nonparametric methods. Let wij be the score corresponding
to uij , where j ( 1 or 2) and i denote the group and the
observation within the group, respectively. The score
can be represented by various ways of ranking the data
in the pooled sample size (n1 n2 ). Suppose that W
denotes a set of w and all its permutations. In addition,
~ be a permutation of w if w
~ has the same scores as w.
let w
Then, W can be represented by
~jw
~w
W fw

~ is a permutation of wg;
or w

~ is a random variable and w is a specic value.


where w

54

H.-W. Jung, R. Hunter / The Journal of Systems and Software 59 (2001) 4355

The test statistic T of any linear rank test is the sum


of the n1 scores associated with sample of population 1.
The Permutation test can assigns any arbitrary scores wij
in the denition of the linear rank test statistic. A special
case is Pitman's test that uses the raw data as score. The
test statistic T and its observed value t are represented
by
T

n1
X
i1

~ i1
w

and t

n1
X

wi1 :

i1

The exact P -value is computed by generating the true


permutation distribution of T under the assumption
F1 F2 . This distribution is derived by partitioning the
n1 n2 values in all possible ways so that groups 1 and
2 include n1 and n2 , respectively. There are n1 n2 !=
n1 !n2 ! ways to partition the (n1 n2 ) values into two
groups of n1 and n2 . Each partition is equally likely
under the null hypothesis.
The Permutation statistic T is considered as extreme
if it is either very small or very large. The exact one-sided exact P -value is dened as

P T P t if t > ET ;
P
P T 6 t if t 6 ET :
One way to state whether the null hypothesis
(H0 : F1 F2 ) is or is not rejected is to use the P -value.
The P -value is the smallest level of signicance that
would lead to rejection of the null hypothesis. Hence, if
P -value 6 a, then the null hypothesis is rejected at the
signicance level of a. In this study we used a 5%.
If the data set is large enough for the test statistic to
converge to an appropriate limiting normal or chisquare distribution, asymptotic P -values may be
obtained by evaluating the tail area of the limiting distribution. Otherwise, exact P -values can be computed by
actually deriving the true distribution of the test statistic
and then evaluating its tail area. For small, unbalanced,
and heavily tied data, the exact and asymptotic P -values
can be dierent and may result in opposite conclusions
to the hypothesis. The characteristics of our data sets
require exact P -values to be used.
Appendix C. Explanation of a box and whisker plot
The box and whisker plot in Fig. 8 provides a
graphical presentation of data for displaying various
features such as dispersion, location, and skewness. The
bottom of the box corresponds to the rst quartile (Q1 )
and indicates the value of the variable to which 25% of
the observations are less than or equal. Similarly the
top of the box corresponds to the third quartile. The
length of the box called interquartile range (IQR) is a
measure of the dispersion of the data. A line within the
box indicates the median (the 50th percentile) that is the
statistics of central. The median line in this study is

Fig. 8. Explanation of a Box and Whisker plot.

drawn with a symbol O to avoid an overlay of both lines


of the box and medium. Two whiskers are extended
from the box. The lower whisker starts at maxfX1 ;
Q1 1:5Q3 Q1 g and the upper whisker ends at
minfXn ; Q1 1:5Q3 Q1 g, where X1 and X2 are the
smallest and largest value of observations. Outliers are
data points beyond the lower and upper whiskers,
plotted with asterisks.
References
Agresti, A., 1996. An Introduction to Categorical Data Analysis.
Wiley, New York.
Briand, L., El Emam, K., Moraska, S., 1996. On the application of
measurement theory in software engineering. International Journal
of Empirical Software Engineering 1 (1), 6188.
Conte, S.D., Dunsmore, H.E., Shen, V.Y., 1986. Software Engineering
Metrics and Models. Benjamin/Cummings, Menlo Park, CA.
Efron, B., Tibsshirani, R.J., 1993. An Introduction to the Bootstrap.
Chapman & Hall, New York.
El Emam, K., Briand, L., 1999. Costs and benets of software process
improvement. In: Messnarz, R., Tully, C. (Eds.), Better Software
Practice for Business Benet: Principles and Experience. IEEE
Computer Soc Press, Silver Spring, MD.
El Emam, K., Goldenson, D.R., 1995. SPICE: An empiricist's
perspective. In: Proceedings of the Second IEEE International
Software Engineering Standards Symposium, pp. 84-97.
El Emam, K., Birk, A., 2000a. Validating the ISO/IEC measure of
software development process capability. Journal of Systems and
Software 51, 119149.
El Emam, K., Birk, A., 2000b. Validating the ISO/IEC measure of
software requirement analysis process capability. IEEE Transactions on Software Engineering 26 (6), 541566.
El Emam, K., Drouin, J.-N., Melo, W. (Eds.), 1998. SPICE: The
Theory and Practice of Software Process Improvement and

H.-W. Jung, R. Hunter / The Journal of Systems and Software 59 (2001) 4355
Capability Determination. IEEE Computer Soc Press, Silver
Spring, MD.
El Emam, K., Jung, H.-W., 2001. An evaluation of the ISO/IEC 15504
assessment model. Journal of Systems and Software 59, 2341.
Fenton, N., Page, S., 1993. Towards the evaluation of software
engineering standards. In: Proceedings of the Software Engineering
Standards Symposium, pp. 100107.
Fenton, N., Littlewood, B., Page, S., 1993. Evaluating software
engineering standards and methods. In: Thayer, R., McGettrick, A.
(Eds.), Software Engineering: A European Perspective. IEEE
Computer Soc Press, Silver Spring, MD.
Gardner, P., 1975. Scales and statistics. Review of Educational
Research 45 (1), 4357.
Gibbons, J.D., 1985. Nonparametric Statistical Inference, 2nd ed.
Marcel Dekker, New York.
Goldenson, D.R., El Emam, K., 1996. The international SPICE trials:
Project description and initial results. In: Proceedings of the 8th
Software Engineering Process Group Conference.
Good, P., 1993. Permutation Tests, A Practical Guide to Resampling
Methods for Testing Hypotheses. Springer, New York.
Hailey, V., 1998. A comparison of ISO 9001 and the SPICE
framework. In: El Emam, K., Drouin, J.-N., Melo, W. (Eds.),
SPICE: The Theory and Practice of Software Process Improvement
and Capability Determination. IEEE Computer Soc Press, Silver
Spring, MD.
Hunter, R.B., 1998. SPICE trials assessment prole. IEEE Software
Process Newsletter 12, 1218.
ISO 9000-3, 1997. Quality Management and Quality Assurance
Standards Part 3: Guidelines for the Application of ISO
9001:1994 to the Development, Supply, Installation and Maintenance of Computer Software, ISO, Geneva, Switzerland.
ISO 9001, 1997. Quality Systems Model for Quality Assurance in
Design, Development, Production, Installation and Servicing, ISO,
Geneva, Switzerland.
ISO/IEC JTC1/SC7/WG10, 1999. SPICE Phase 2 Trials Final Report,
vol. 1.
ISO/IEC PDTR 15504, Information Technology Software Process
Assessment: Part 1Part 9, 1998.
ISO/IEC TR2 15504, 1998: Part 1Part 9, Information Technology
Software Process Assessment, Geneva, Switzerland.
Kenett, R.S., Zacks, S., 1998. Modern Industrial Statistics: Design and
Control of Quality and Reliability. Duxbury Press, Pacic Grove,
CA (Chapter 7).
Kuvaja, P., 1999. BOOTSTRAP 3.0 A SPICE conformant software
process assessment methodology. Software Quality Journal 8, 7
19.
Maclennan, F., Ostrolenk, G., 1995. The SPICE trials: Validating the
framework. In: Proceedings of the 2nd International SPICE
Symposium, Brisbane.

55

Montgomery, D.C., Runger, G.C., Hubele, N.F., 1998. Engineering


Statistics. Wiley, New York.
Nunnally, J.C., Bernstein, I.H., 1994. Psychometric Theory. McGrawHill, New York.
Paulk, M.C., Webber, C.V., Garcia, S.M., Chrissis, M., Bush, M.,
1993. Key Practices of the Capability Maturity Model, version 1.1.
Software Engineering Institute, CMU/SEI-93-TR-25.
Paulk, M.C., 1995. How ISO 9001 compares with the CMM. IEEE
Software 7483.
Peeger, S.-L., Fenton, N., Page, S., 1994. Evaluating software
engineering standards. IEEE Computer 7179.
Smith, B., El Emam, K., 1996. Transitioning to phase 2 of the SPICE
trials. In: Proceedings of SPICE'96, pp. 4555.
StatXact-4 for Windows. 1998. Software for Exact Nonparametric
Inference. Cytel Software, Cambridge, MA.
Stevens, S., 1951. Mathematics, measurement, and psychophysics. In:
Stevens, S. (Ed.), Handbook of Experimental Psychology. Wiley,
New York.
The SPIRE project. 1998. The SPIRE handbook: better faster
cheaper software development in small companies. ESSI Project
23873.
TickIT, 1999. A Guide to Software Quality Management System
Construction and Certication Using EN29001, Issue 4.0, UK,
TickIT web site http://www.tickit.org.
Velleman, P., Wilkinson, L., 1993. Normal, ordinal, interval, and ratio
typologies are misleading. The American Statistician 47, 6572.
Woodman, I., Hunter, R., 1998. Analysis of Assessment Ratings from
the Trials. In: ElEmam, K., Drouin, J-N., Melo, W. (Eds.), SPICE:
The Theory and Practice of Software Process Improvement and
Capability Determination. IEEE Computer Society Press, Silver
Spring, MD, pp. 307342.
Ho-Won Jung received his B.S. in IE (Industrial Engineering) from
Korea University, his M.S. in IE from KAIST (Korea Advanced Institute of Science and Technology), and his Ph.D. in MIS from the
University of Arizona. He has held a visiting position at Clemson
University and has worked at NCA (National Computerization
Agency) in Korea. He is currently a professor of the Department of
Business Administration at Korea University. He is a member of
SPICE trials analysis team and is a member of the IEEE Computer
Society. His interest areas include software quality management and
assurance, and performance analysis of communications networks.
Robin Hunter received his B.Sc. and Ph.D. degrees at the University of
Glasgow. He is now a Senior Lecturer in Computer Science in the
University of Strathclyde in Glasgow, Scotland, and is a member of the
IEEE Computer Society. His research interests are in both product and
process aspects of software quality. He was a grant holder for the
ESPRIT II project SCOPE on Software Certication, and is contributing to the SPICE software process standardization project, as a
member of the trials analysis team.

You might also like