Professional Documents
Culture Documents
net/publication/267334878
CITATIONS READS
9 3,122
6 authors, including:
Shankar G. Aggarwal
National Physical Laboratory - India
101 PUBLICATIONS 2,172 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Barun Kumar Ghosh on 26 October 2014.
Received: 16 May 2013 / Accepted: 30 July 2013 / Published online: 8 September 2013
Abstract: Participation in proficiency testing (PT) is an important task to meet the requirements of ISO/IEC 17025 in the
area of quality assurance of laboratory results. A PT program in the field of chemical analysis of iron ore was organized by
CSIR-National Metallurgical Laboratory, Jamshedpur (nodal laboratory) and CSIR-National Physical Laboratory, New
Delhi (PT Coordinator) during November 2011–January 2012. Twenty-two (22) laboratories in India participated in the PT
program. The results of participating laboratories were first analyzed to identify the distribution patterns and the presence
of outliers. Several parametric and robust statistical methods were used to identify the outliers. Correct outlier rejection is
of utmost importance because the choice of the outlier test method influence the consensus value and standard deviation
which in turn determine the Z-score of a laboratory result in a PT program. In the present study, five parametric outlier tests
were compared: Dixon’s Q test, Grubbs single test, double test, t test, and Z-scores. In addition three robust tests as
alternative to parametric tests were chosen: box plot, Huber test and MAD-based test. It was observed that multiple outlier
test methods should be used to identify the outliers in a PT program especially when the number of participating
laboratories is less. They complement each other and helps give diverse information and better overview of the data set.
Among the 22 participating laboratories, Z-scores of 4 laboratories for analysis of total iron fall outside the acceptable limit
of ±2. Similarly, for analysis of alumina and silica, five laboratories had unacceptable Z-scores.
Keywords: Proficiency testing; Outliers; Iron ore analysis; Statistical methods; Inter laboratory comparison
123
88 S. Chakravarty et al.
SiO2 (%) in homogenized iron ore. CSIR-NML acted as the laboratories participated in the PT program. One bottle of
nodal laboratory for conducting PT and CSIR-NPL was the iron ore (100 g) was sent to each participating laboratory in
PT coordinator. Twenty-two National Accreditation Board second week of December, 2011. Participants were
for Testing and Calibration Laboratories (NABL) accred- required to determine Fe (total), Al2O3, and SiO2 quanti-
ited as well as non accredited laboratories in India partic- tatively. A standard proforma was provided for data and
ipated in the PT program during the period of November information input. Analytical results for iron ore were
2011–January 2012. The data was analyzed to find out best requested in % (w/w) to two significant figures with the
statistical approach to reject outliers. Performance of the expanded measurement uncertainty and coverage factor
participating laboratories was assessed on the basis of Z- stated. Technical details were also requested to be reported.
scores. Among the 22 participating laboratories, Z-scores Participating laboratories were allowed to use any vali-
of 4 laboratories for analysis of total iron fall outside the dated test method normally used by them for the mea-
acceptable limit of ±2. Similarly, for analysis of alumina surement of iron ore. The names of the participating
and silica, five laboratories had unacceptable Z-scores. An laboratories are tabulated in Table 1. To maintain confi-
effort was made to identify the plausible causes for dentiality, each laboratory was assigned a random code
reporting of incorrect analytical results by these laborato- number (1–22) and these numbers are used in this paper.
ries and the laboratories were informed to initiate necessary
corrective actions. 2.3. Statistical Analysis of Results
123
Proficiency Testing in Chemical Analysis of Iron Ore 89
robust, were used to find out the best statistical approach to for-purpose criterion. The result obtained by a laboratory is
reject outliers. The parametric outlier tests used were rejected as an outlier if numeric value of Z C 2.
Dixon’s Q test, Grubb’s single and double tests, t test and
Z-score. The robust outlier tests used were Box plot, Huber 2.3.1.3. Grubbs Test The most commonly used and rec-
test and MAD-based test. None of the used tests take ommended test by ISO for outliers is the Grubbs test [13].
measurement uncertainty into account. Uncertainty were This test compares the deviation of the suspect value from
left out deliberately, as not all laboratories have uniform the sample mean with the standard deviation of the sample.
approach to evaluate uncertainty and some of the labora- Grubbs test is also known as the maximum normed residual
tories did not report any uncertainty estimate together with test.
the results. The detail of the outlier test methods used is The Grubbs test statistic (GS) can be represented as:
discussed below. 100jS SH j
GS ¼ ð3Þ
S
2.3.1. Parametric Tests 100jS SL j
GS ¼ ð4Þ
S
Most well known elementary statistical tools are paramet-
where S is the standard deviation of whole data set, SL is
ric. Parametric tests make assumptions about the distribu-
standard deviation of data set where the lowest result is left
tion of the data. If those assumptions are correct parametric
out. SH is the standard deviation of the data set where the
test can produce an accurate and precise estimation. If
highest result is left out. Bigger GS is compared with a
those assumptions are incorrect, parametric test can be very
tabulated critical value from the distribution found in the
misleading. For that reason they are often considered as
table [13]. In order to achieve more confidence, Grubbs
non-robust. Four parametric tests were employed to iden-
double test is also used in PT schemes. It determines
tify outliers in the present study.
123
90 S. Chakravarty et al.
whether the two largest or two smallest values at a time and median absolute deviation (MAD) are two such
might be outliers. parameters. Three different types of robust tests were used
to determine outliers in the present study.
2.3.1.4. Dixon’s Q Test The Dixon Q test is based on
calculation of the experimental Q value defined as the ratio 2.3.2.1. Huber Test In Huber test, deviation between the
given by the distance of the suspected value from its laboratory result (x) and median (Me) is calculated first
nearest neighbor divided by the range of the values. If n is (|x - Me|). Then |x - Me| is compared with cr, where c is
the number of participating laboratories, then the corre- usually taken to be 1.5 and r = MAD/0.6745. MAD is
sponding n values are arranged in ascending order calculated using the formula MAD = Me [|xi - Me(xi)|].
x1 \ x2 \ \ xn,. For testing the smallest value (x1) or The laboratory result will be an outlier if |x - Me| [ 1.5r.
the largest value (xn) the following equations are used,
respectively: 2.3.2.2. MAD Based Test MAD based test is a very
x2 x1 useful robust method for evaluating outliers. Result of a
Q¼ ð5Þ
xn x1 participating lab is rejected as outliers if [|x - Me|]/
xn xn1 MAD [ 5.
Q¼ ð6Þ
xn x1
2.3.2.3. Box Plot Tukey has introduced the Box plot
If the obtained (experimental) Q value exceeds the (also known as Box and Whisker plot) as a graphical dis-
tabulated critical Qcrit value from normal distribution for a play on which outliers can be indicated [12]. It is a diagram
given confidence level found in table, then the suspect consisting of a rectangle (the box) with two lines (the
value can be rejected. whiskers) extending from opposite edges of the box, and a
further line in the box, crossing it parallel to the same
2.3.2. Robust Tests edges. The end of the whiskers indicate the range of non-
outlier data, the edges of the box from which the data
Robust statistics include parameters that are largely unaf- represent the upper (Q3) and lower quartiles (Q1) and line
fected by the presences of extreme values. Median (Me) crossing the box represents the median (Q2) of the data.
The first quartile is the value under which 25 % of the data
lie, and the third quartile is the value over which 25 % of
Fig. 2 Histogram (a) and Q–Q plot (b) for Al2O3 content in iron ore Fig. 3 Histogram (a) and Q–Q plot (b) for SiO2 content in iron ore
123
Proficiency Testing in Chemical Analysis of Iron Ore 91
the data are found. Outliers are defined as data points with slight tailing in higher value region. The histogram also
which are lower than the lower quartile (Q1), or higher clearly shows the presence of some results that is much higher
than the upper quartile (Q3) by more than 1.5 times the than others, and these results can be considered as outliers.
interquartile range. The histogram gave useful information about dataset, but
since the number of participating laboratories in PT program
was low, it is difficult to make definite conclusions about the
3. Results distributions. In order to do that, normal Q–Q plot was drawn
and visually evaluated as these plots are more robust with
It is very important to have information about the distribution respect to number of participating laboratories compared to
patterns of data (such as normality) before applying any sta- histogram. The normal Q–Q plot for Fe (total) content is
tistical analysis tools because typical statistical tests incor- presented in Fig. 1b. Presence of outlier data is clearly seen in
porate assumptions about the underlying distribution of data. Fig. 1b and if the highest value is left out, the distribution
Therefore, the distribution patterns of data obtained in the iron would become Normal. The histograms and normal Q–Q
ore PT program was first investigated. The distribution of plots of Al2O3 content and SiO2 content are presented in
dataset was assessed by drawing histograms and normal Q–Q Figs. 2 and 3, respectively. For Al2O3 content, tailing of data
plots. The histograms and normal Q–Q plots were also used to is observed in lower value region in histogram and rejection
detect visually any unusual observations (outliers) or any gaps of the low value data in Fig. 2b as outlier will result in Normal
in the data sets. Figures 1, 2 and 3 show histograms and Q–Q distribution. Similarly, the histogram of SiO2 content is tailed
plots for Fe (total), Al2O3 and SiO2 results obtained from all towards high value region (Fig. 3a) and rejection of the high
the participating laboratories. The histogram of Fe (total) value data as outlier in Q–Q plot (Fig. 3b) will result in
content (Fig. 1a) looked very close to Normal distribution Normal distribution.
Table 2 Identification of outliers using different methods for Fe (total) content in iron ore
Lab ID Lab data Grubbs single Grubbs double t test Z-score MAD based test Dixon Q test Huber test
123
92 S. Chakravarty et al.
The histograms and normal Q–Q plots indicated pres- from 63.89 to 63.73 %) and the outlier-corrected median
ence of outliers in the dataset. Different outlier rejection value remained practically unchanged. However, the out-
methods (parametric and robust) were then employed to lier-corrected standard deviation differed by up to 72.7 %
identify the outliers. The results are summarized in (range from 0.44 to 0.12). In case of Al2O3 content com-
Tables 2, 3 and 4. In the first two columns the laboratory parison (Table 3; Fig. 5), Dixon Q test and Z-score iden-
number and the actual obtained results are presented. The tified one outlier, MAD based test detects two outliers,
subsequent columns present the results obtained with dif- Grubbs single and Box plot detected three outliers, and
ferent outlier rejection tests. The results rejected before Grubbs double detected four outliers. Huber test detected 5
calculation of the consensus values and standard deviations outliers whereas t test detected maximum 9 outliers. The
are highlighted. The box plots for Fe (total), Al2O3, and outlier-corrected averages differed maximum up to 1 %
SiO2 showing outliers (as red marks) are shown in Figs. 4 and the outlier-corrected median values remain unchanged.
and 5. The outlier-corrected standard deviation differed up to
Among the total Fe content data of all participating 55.5 % (range from 0.08 to 0.18). Similarly, in the case of
laboratories (Table 2; Fig. 4), Dixon Q test and Z-score SiO2 content (Table 4; Fig. 5), Dixon Q test and Z-score
identified one outlier, the Grubb’s single, MAD based test identified one outlier, the Grubb’s single and MAD based
and Box plot identified three outliers, and Grubbs double test identified three outliers, Grubbs double and Box plot
test identified four outliers. t test and Huber test were more identified four outliers, Huber test identified five outliers
severe and both tests identified 7 outliers. It can be noted and the t test identified maximum 8 outliers. The maximum
that despite detection of different number of outliers using difference between outlier-corrected average values was
different test methods, the maximum difference between 1.8 % (range from 3.64 to 3.71 %) and the outlier-cor-
outlier-corrected average values was only 0.2 % (range rected median value differed by up to only 0.8 %.
Table 3 Identification of outliers using different methods for Al2O3 content in iron ore
Lab ID Lab data Grubbs single Grubbs double t test Z-score MAD based test Dixon Q test Huber test
123
Proficiency Testing in Chemical Analysis of Iron Ore 93
Table 4 Identification of outliers using different methods for SiO2 content in iron ore
Lab ID Lab data Grubbs single Grubbs double t test Z-score MAD based test Dixon Q test Huber test
However, the outlier-corrected standard deviation differed robust tests give similar number of outliers. In fact, it was
by up to 66.6 % (range from 0.09 to 0.27). found that every parametric test has an alternative robust
test. For example, Grubbs single test (parametric) and Box
plot (robust) identified equal number of outliers. So the
4. Discussions number of outliers itself does not give a direct answer
which methods (parametric or robust) should be preferred
In this study, the number of participating laboratories was in case of PTs with small number of participating labora-
low (22). The histograms and Q–Q plots showed tailed tories. Both parametric and robust methods have stricter
distributions and presence of outliers. The data set there- and less strict outlier tests. The number of outliers gives the
fore do not fulfill the criteria of parametric tests in strict idea which outlier test should not be used in case of PTs
sense as the parametric tests require normal distribution of with small number of laboratories. It has been proposed
the dataset. The problem can be overcome by increasing that the maximum number of outliers should not be more
the number of participating laboratories in the PT program. than 2/9 of the whole data set [14]. Based on this, in present
However, in the present study, both parametric and robust study, all the outlier tests except t test and Huber test were
test methods for outlier determination were utilized to get a considered suitable for outlier rejection. The next question
more detail information, i.e. which outlier method to be is which outlier test method should be adopted in a PT
used and which method should be avoided in a PT pro- program. The choice of the outlier test influences the
gram, where the number of participating laboratories is consensus value and standard deviation. These two char-
less. acteristics determine the Z-score of a laboratory result. It
It was found that different outlier tests give different can be seen from Tables 2, 3 and 4 that that the outlier-
results. Also, it was observed that several parametric and corrected averages for Fe (total), Al2O3 and SiO2 do not
123
94 S. Chakravarty et al.
123
Proficiency Testing in Chemical Analysis of Iron Ore 95
for both. The final assigned/recommended values of Fe median values is a better choice for the determination of
(total), Al2O3, and SiO2 and the target values for standard consensus value in a PT program as medians are more
deviations are tabulated in Table 5. These values were used stable compared to the mean values. The overall perfor-
to calculate Z-score of the participating laboratories mance of the 22 laboratories for quantitative analysis of Fe
(Table 6). Among the 22 participating laboratories, Z- (total), Al2O3 and SiO2 content in iron ore was satisfactory.
scores of 4 laboratories (Lab 5, 9, 21, and 22) for analysis The present PT program provided an external quality
of total iron fall outside the acceptable limit of ±2. Simi- control mechanism to improve the testing capability of the
larly, for analysis of alumina, Lab 2, 4, 9, 20 and 22 has testing laboratories in elemental analysis of iron ore.
unacceptable Z-scores whereas for silica, Lab 4, 5, 20, 21,
and 22 has unacceptable Z-scores. It is interesting to note Acknowledgments The authors thank participating laboratories,
Director, CSIR-National Metallurgical Laboratory, Jamshedpur and
that laboratory 22 has reported unacceptable results for all Director, CSIR-National Physical Laboratory, New Delhi for their
the three parameters whereas laboratory 4, 5, 9, 20, and 21 support for the program.
has reported unacceptable results for two parameters.
Laboratory 2 has reported unacceptable result for alumina
only. References
Effort was then made to identify the plausible causes for
adverse performance of the above laboratories in the PT [1] ISO/IEC 17025, General Requirements for the Competence of
the Testing Laboratories (2005).
program. It was observed that Laboratory 22, which has [2] EA-03/04 European Co-operation for Accreditation, Use of
reported unacceptable results for all three parameters, is Proficiency Testing as a tool for Accreditation in Testing (2001).
not a NABL accredited laboratory. It was advised to them [3] Y.P. Singh, S.K. Nijhwan, R.P. Singal, A.K. Saxena and S.U.M.
to go for NABL accreditation so that they will follow Rao, Interlaboratory Proficiency Testing: Liquid in Glass
Thermometer Intercomparisation, MAPAN J. Metrol. Soc. India,
standard procedures for chemical analysis and will use 19 (2004) 169–175.
appropriate certified reference materials to validate their [4] S. Yadav and A.K. Bandyopadhyay, Evaluation of Laboratory
results. Similarly, laboratory 5, 9, and 20 has not used any Performance Through Interlaboratory Comparison, MAPAN-J.
certified reference materials to validate their results. Metrol. Soc. India, 24 (2009) 125–138.
[5] M.K. Mittal, J.C. Biswas, and A.S. Yadav, Proficiency Testing
in AC Power and Energy, MAPAN J. Metrol. Soc. India, 24
(2009) 42–66.
5. Conclusion [6] S. Yadav, O. Prakash, V.K. Gupta, B.V. Kumaraswamy and
A.K. Bandyopadhyay, Evaluation of Laboratory Performance
Through Proficiency Testing Using Pressure Dial Gauge in the
A number of statistical test methods are available for out- Hydraulic Pressure Measurement up to 70 MPa, MAPAN J.
lier detection in a data set depending on the application, Metrol. Soc. India, 23 (2008) 79–99.
assumptions and number of participating laboratories. [7] S.K. Wong, Evaluation of the use of Consensus Values in Pro-
Correct outlier rejection is of utmost importance because ficiency Testing Programmes, Accred. Qual. Assur., 10 (2005)
409–414.
the choice of the outlier tests influences the consensus [8] S. Basak, S.S. Mukherjee, S.N. Mandal, R. Das, A.K. Mazum-
value and standard deviation. These two characteristics der, J.K. Mondal, R. Sammaddar, S. Mondal, and D. Kundu,
determine the Z-score of a laboratory result in a PT pro- Interlaboratory Proficiency Testing: Intercomparison in Relation
gram. In the present study, five parametric outlier tests to the Measurement of Alumina, Iron(III) Oxide and Titania
Present in Homogenised China Clay, MAPAN J. Metrol. Soc.
were compared: Dixon’s Q test, Grubbs single test, double India, 25 (2010) 265–272.
test, t test, and Z-scores. In addition three robust tests as [9] R. Hegazy, M.I. Mohamed, and A. Abu-Sinna, A Comparative
alternative to parametric tests were chosen: box plot, Huber Study of Statistical Methods Used in Analyzing the Proficiency
test and MAD-based test. The strictness of these outlier Testing Results of Yield Stress, MAPAN J. Metrol. Soc. India,
25 (2010) 107–113.
tests varied in a large range. Huber’s test and t test were the [10] IS 1493 (Part I): (1981) Methods of chemical analysis of iron
strictest tests which eliminated large number of data as ores, Part 1, Determination of common constituents.
outliers. Grubbs single test, Grubbs double test, MAD [11] S. Basak, J.K. Mondal, S.S. Mukherjee, S. Mandal and D.
based test and Box plot were less strict and was found to be Kundu, Homogeneity Testing of China Clay by Complexometry
and Spectrophotometry Using Statistical Technique, J. Ind.
most appropriate in the present case. Dixons Q test and Z- Chem. Soc., 84 (2007) 396–397.
score were very mild and only rejected one outlier. Mul- [12] J.C. Miller, J.N. Miller, Statistics for Analytical Chemistry, 3rd
tiple outlier test methods should be used to identify the ed., Ellis Horwood, Chichester, 1993.
outliers in a PT program, especially when the number of [13] W. Horwitz, Protocol for the Design, Conduct and Interpretation
of Method-Performance Studies, Pure Appl. Chem., 67 (1995)
participating laboratories is less. They complement each 331–343.
other and helps give diverse information and better over- [14] AOAC International, Official methods of analysis of AOAC in
view of the data set both for the PT organizers and par- international Gaithersburg (Md), 16th ed., AOAC International,
ticipating laboratories. If outliers are present in a data set, Gaithersburg, 1999.
123