You are on page 1of 10

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/267334878

Proficiency Testing in Chemical Analysis of Iron Ore: Comparison of


Statistical Methods for Outlier Rejection

Article  in  Mapan - Journal of Metrology Society of India · June 2014


DOI: 10.1007/s12647-013-0059-8

CITATIONS READS

9 3,122

6 authors, including:

Sanchita Chakravarty Barun Kumar Ghosh


National Metallurgical Laboratory BITS Pilani, K K Birla Goa
25 PUBLICATIONS   1,134 CITATIONS    28 PUBLICATIONS   682 CITATIONS   

SEE PROFILE SEE PROFILE

Shankar G. Aggarwal
National Physical Laboratory - India
101 PUBLICATIONS   2,172 CITATIONS   

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Rare earth in fly ash View project

Multifunctional material synthesis View project

All content following this page was uploaded by Barun Kumar Ghosh on 26 October 2014.

The user has requested enhancement of the downloaded file.



MAPAN-Journal of Metrology Society of India (June 2014) 29(2):87–95
DOI 10.1007/s12647-013-0059-8

Proficiency Testing in Chemical Analysis of Iron Ore: Comparison


of Statistical Methods for Outlier Rejection
S. Chakravarty1*, A. Mohanty1, B. Ghosh1, M. Tarafdar1, S. G. Aggarwal2
and P. K. Gupta2
1
CSIR-National Metallurgical Laboratory, Jamshedpur, Jharkhand, India
2
CSIR-National Physical Laboratory, New Delhi, India

Received: 16 May 2013 / Accepted: 30 July 2013 / Published online: 8 September 2013

Ó Metrology Society of India 2013

Abstract: Participation in proficiency testing (PT) is an important task to meet the requirements of ISO/IEC 17025 in the
area of quality assurance of laboratory results. A PT program in the field of chemical analysis of iron ore was organized by
CSIR-National Metallurgical Laboratory, Jamshedpur (nodal laboratory) and CSIR-National Physical Laboratory, New
Delhi (PT Coordinator) during November 2011–January 2012. Twenty-two (22) laboratories in India participated in the PT
program. The results of participating laboratories were first analyzed to identify the distribution patterns and the presence
of outliers. Several parametric and robust statistical methods were used to identify the outliers. Correct outlier rejection is
of utmost importance because the choice of the outlier test method influence the consensus value and standard deviation
which in turn determine the Z-score of a laboratory result in a PT program. In the present study, five parametric outlier tests
were compared: Dixon’s Q test, Grubbs single test, double test, t test, and Z-scores. In addition three robust tests as
alternative to parametric tests were chosen: box plot, Huber test and MAD-based test. It was observed that multiple outlier
test methods should be used to identify the outliers in a PT program especially when the number of participating
laboratories is less. They complement each other and helps give diverse information and better overview of the data set.
Among the 22 participating laboratories, Z-scores of 4 laboratories for analysis of total iron fall outside the acceptable limit
of ±2. Similarly, for analysis of alumina and silica, five laboratories had unacceptable Z-scores.

Keywords: Proficiency testing; Outliers; Iron ore analysis; Statistical methods; Inter laboratory comparison

1. Introduction participating laboratory in PT is evaluated by quantifying


the deviation of its results from the assigned reference
Proficiency testing (PT) is an important way to meet the value that, in most cases, is a consensus value obtained
requirements of ISO/IEC 17025 in the area of quality from participant’s results [7–9]. As an external quality
assurance of laboratory results [1]. It is also mandated by assessment tool, PT program has dual objectives. Firstly it
accreditation bodies that laboratories participate in PT allows accreditation bodies to monitor laboratory compe-
programs for all types of analysis undertaken in that lab- tence in specific tests. Secondly, PT program provides
oratory. PT provides objective evidence that laboratories participating laboratories an opportunity for improvement
are competent and that they can achieve the level of since the PT program reports are usually accompanied by
uncertainty for which they are accredited. All accredited as an analysis of results and recommendation for improved
well as non accredited laboratories are expected to partic- performance. Hence, the manner in which the performance
ipate in the PT, which will help them indirectly to improve of an individual participating laboratory is evaluated in a
uncertainty, reliability and reproducibility of their analyti- given PT program is very important.
cal results [2–6]. The performance of individual The present paper reports results of a PT program, PT-
Iron ore/2011-12, organized by CSIR-National Metallur-
gical Laboratory (CSIR-NML), Jamshedpur and CSIR-
National Physical Laboratory (CSIR-NPL), New Delhi in
the area of chemical analysis of iron ore for measurements
*Corresponding author, E-mail: sanchita@nmlindia.org of major constituents, i.e., total Fe (%), Al2O3 (%) and

123
88 S. Chakravarty et al.

SiO2 (%) in homogenized iron ore. CSIR-NML acted as the laboratories participated in the PT program. One bottle of
nodal laboratory for conducting PT and CSIR-NPL was the iron ore (100 g) was sent to each participating laboratory in
PT coordinator. Twenty-two National Accreditation Board second week of December, 2011. Participants were
for Testing and Calibration Laboratories (NABL) accred- required to determine Fe (total), Al2O3, and SiO2 quanti-
ited as well as non accredited laboratories in India partic- tatively. A standard proforma was provided for data and
ipated in the PT program during the period of November information input. Analytical results for iron ore were
2011–January 2012. The data was analyzed to find out best requested in % (w/w) to two significant figures with the
statistical approach to reject outliers. Performance of the expanded measurement uncertainty and coverage factor
participating laboratories was assessed on the basis of Z- stated. Technical details were also requested to be reported.
scores. Among the 22 participating laboratories, Z-scores Participating laboratories were allowed to use any vali-
of 4 laboratories for analysis of total iron fall outside the dated test method normally used by them for the mea-
acceptable limit of ±2. Similarly, for analysis of alumina surement of iron ore. The names of the participating
and silica, five laboratories had unacceptable Z-scores. An laboratories are tabulated in Table 1. To maintain confi-
effort was made to identify the plausible causes for dentiality, each laboratory was assigned a random code
reporting of incorrect analytical results by these laborato- number (1–22) and these numbers are used in this paper.
ries and the laboratories were informed to initiate necessary
corrective actions. 2.3. Statistical Analysis of Results

Different statistical methods were used to analyze the data


2. Materials and Methods received from participating laboratories. Histograms and
normal Q–Q plots were used to test the distribution patterns
2.1. Preparation of Samples of the experimental data. The number of participants in this
PT program was relatively small (22) and a closer look at
Preparation of samples is a fundamental and critical step in the data suggests presence of some outliers. Outlier is a
PT program to ensure that identical samples are analyzed data which appear to differ unreasonably from others in the
by all participating laboratories. In this study, iron ore of set. Different outlier test methods, both parametric and
Indian origin (50 kg) was collected from Bolani mines,
Keonjhar, Odisha. The ores were first air dried and then Table 1 Name of the participating laboratories in the PT program
crushed sequentially to -200 mesh using jaw crusher, roll
Laboratory name
crusher and pulverizer. Finally, the powdered ore was
homogenized by coning and quartering technique and Aglow Quality Control Lab., Kolkata
stored in 100 g glass bottles before the commencement of Bhagavati Ana Labs Ltd., Hyderabad
the PT program. Ten bottles were randomly selected from Tata Steel, Jamshedpur
the pool of 300 bottles as 10 samples and each sample was Delhi Test House, Delhi
analyzed 5 times for determination of total Fe (%), Al2O3 Dr. More RCA Laboratories, Mumbai
(%) and SiO2 (%) as per IS 1493 (Part I): 1981 [10]. For Inspection Survey and Surveillance (India) Pvt. Ltd., Kolkata
homogeneity testing, the chemical analysis data was sta- Inspectorate Griffith India, Bhubaneswar
tistically evaluated using one way variance analysis Inspectorate Griffith India, Chennai
(ANOVA) [11]. The significance level chosen for the Inspectorate Griffith India, Gandhidham
critical value of F was a = 0.05. The values of estimators Inspectorate Griffith India, Goa
of the variance between the bottles and the variance within Inspectorate Griffith India, Haldia
the bottles of iron ore samples were calculated for getting Inspectorate Griffith India, Hospet
Fcalculated value which was always found to be less than Inspectorate Griffith India, Jaipur
Fcritical value. Hence the results of the variance analysis Inspectorate Griffith India, JODA
showed that the materials were sufficiently homogeneous Inspectorate Griffith India, Mangalore
and suitable for distribution to the participants of PT Inspectorate Griffith India, Raipur
program. Inspectorate Griffith India, Visakhapatnam
National Metallurgical Laboratory, Jamshedpur
2.2. Organizing the PT Quality Services & Solutions (Goa), Goa
Quality Services & Solutions (QSS), Gandhidham
Testing laboratories in India were asked to participate in
Spectro Analytical Lab, Delhi
this PT program and the names of interested laboratories
Usha Martin Limited, Jamshedpur
were registered before 15th November, 2011. A total of 22

123
Proficiency Testing in Chemical Analysis of Iron Ore 89

robust, were used to find out the best statistical approach to for-purpose criterion. The result obtained by a laboratory is
reject outliers. The parametric outlier tests used were rejected as an outlier if numeric value of Z C 2.
Dixon’s Q test, Grubb’s single and double tests, t test and
Z-score. The robust outlier tests used were Box plot, Huber 2.3.1.3. Grubbs Test The most commonly used and rec-
test and MAD-based test. None of the used tests take ommended test by ISO for outliers is the Grubbs test [13].
measurement uncertainty into account. Uncertainty were This test compares the deviation of the suspect value from
left out deliberately, as not all laboratories have uniform the sample mean with the standard deviation of the sample.
approach to evaluate uncertainty and some of the labora- Grubbs test is also known as the maximum normed residual
tories did not report any uncertainty estimate together with test.
the results. The detail of the outlier test methods used is The Grubbs test statistic (GS) can be represented as:
discussed below. 100jS  SH j
GS ¼ ð3Þ
S
2.3.1. Parametric Tests 100jS  SL j
GS ¼ ð4Þ
S
Most well known elementary statistical tools are paramet-
where S is the standard deviation of whole data set, SL is
ric. Parametric tests make assumptions about the distribu-
standard deviation of data set where the lowest result is left
tion of the data. If those assumptions are correct parametric
out. SH is the standard deviation of the data set where the
test can produce an accurate and precise estimation. If
highest result is left out. Bigger GS is compared with a
those assumptions are incorrect, parametric test can be very
tabulated critical value from the distribution found in the
misleading. For that reason they are often considered as
table [13]. In order to achieve more confidence, Grubbs
non-robust. Four parametric tests were employed to iden-
double test is also used in PT schemes. It determines
tify outliers in the present study.

2.3.1.1. t Test t test is a statistical test to compare two


mean values to see if their difference is too large to be
explained by indeterminate error. The value of t is calcu-
lated using Eq. 1:
pffiffiffi
ðx  XÞ n
t¼ ð1Þ
s
where t is t test statistic, x is individual laboratory result,
X is the mean of participants, s is sample standard devia-
tion, and n is sample size. Please note that Eq. (1) has been
modified in order to adopt it with the condition of PT. The
value of t is compared with a critical p value from Stu-
dent’s t distribution, t(a, t) which is determined by the
chosen significance level, a, the degree of freedom for
sample, t. Values for t(a, t) are found in Student’s t dis-
tribution table in literature [12]. If t is greater than t(a, t),
the result is considered as an outlier.

2.3.1.2. Z-Scores The Z-scores in the present study are


used not only as a parameter to evaluate laboratories per-
formance, but also as one way to reject outliers. The Z-
scores, Zi are calculated from the laboratory results, the
assigned value and the standard deviation using Eq. 2:
x  Xa
Zi ¼ ð2Þ
s
where x is the laboratory result, Xa is the assigned value
calculated based on the best estimate of the true value
obtained from different laboratories and s is the ‘target Fig. 1 Histogram (a) and Q–Q plot (b) for Fe (total) content in iron
value for standard deviation’ and is derived from a fitness- ore

123
90 S. Chakravarty et al.

whether the two largest or two smallest values at a time and median absolute deviation (MAD) are two such
might be outliers. parameters. Three different types of robust tests were used
to determine outliers in the present study.
2.3.1.4. Dixon’s Q Test The Dixon Q test is based on
calculation of the experimental Q value defined as the ratio 2.3.2.1. Huber Test In Huber test, deviation between the
given by the distance of the suspected value from its laboratory result (x) and median (Me) is calculated first
nearest neighbor divided by the range of the values. If n is (|x - Me|). Then |x - Me| is compared with cr, where c is
the number of participating laboratories, then the corre- usually taken to be 1.5 and r = MAD/0.6745. MAD is
sponding n values are arranged in ascending order calculated using the formula MAD = Me [|xi - Me(xi)|].
x1 \ x2 \  \ xn,. For testing the smallest value (x1) or The laboratory result will be an outlier if |x - Me| [ 1.5r.
the largest value (xn) the following equations are used,
respectively: 2.3.2.2. MAD Based Test MAD based test is a very
x2  x1 useful robust method for evaluating outliers. Result of a
Q¼ ð5Þ
xn  x1 participating lab is rejected as outliers if [|x - Me|]/
xn  xn1 MAD [ 5.
Q¼ ð6Þ
xn  x1
2.3.2.3. Box Plot Tukey has introduced the Box plot
If the obtained (experimental) Q value exceeds the (also known as Box and Whisker plot) as a graphical dis-
tabulated critical Qcrit value from normal distribution for a play on which outliers can be indicated [12]. It is a diagram
given confidence level found in table, then the suspect consisting of a rectangle (the box) with two lines (the
value can be rejected. whiskers) extending from opposite edges of the box, and a
further line in the box, crossing it parallel to the same
2.3.2. Robust Tests edges. The end of the whiskers indicate the range of non-
outlier data, the edges of the box from which the data
Robust statistics include parameters that are largely unaf- represent the upper (Q3) and lower quartiles (Q1) and line
fected by the presences of extreme values. Median (Me) crossing the box represents the median (Q2) of the data.
The first quartile is the value under which 25 % of the data
lie, and the third quartile is the value over which 25 % of

Fig. 2 Histogram (a) and Q–Q plot (b) for Al2O3 content in iron ore Fig. 3 Histogram (a) and Q–Q plot (b) for SiO2 content in iron ore

123
Proficiency Testing in Chemical Analysis of Iron Ore 91

the data are found. Outliers are defined as data points with slight tailing in higher value region. The histogram also
which are lower than the lower quartile (Q1), or higher clearly shows the presence of some results that is much higher
than the upper quartile (Q3) by more than 1.5 times the than others, and these results can be considered as outliers.
interquartile range. The histogram gave useful information about dataset, but
since the number of participating laboratories in PT program
was low, it is difficult to make definite conclusions about the
3. Results distributions. In order to do that, normal Q–Q plot was drawn
and visually evaluated as these plots are more robust with
It is very important to have information about the distribution respect to number of participating laboratories compared to
patterns of data (such as normality) before applying any sta- histogram. The normal Q–Q plot for Fe (total) content is
tistical analysis tools because typical statistical tests incor- presented in Fig. 1b. Presence of outlier data is clearly seen in
porate assumptions about the underlying distribution of data. Fig. 1b and if the highest value is left out, the distribution
Therefore, the distribution patterns of data obtained in the iron would become Normal. The histograms and normal Q–Q
ore PT program was first investigated. The distribution of plots of Al2O3 content and SiO2 content are presented in
dataset was assessed by drawing histograms and normal Q–Q Figs. 2 and 3, respectively. For Al2O3 content, tailing of data
plots. The histograms and normal Q–Q plots were also used to is observed in lower value region in histogram and rejection
detect visually any unusual observations (outliers) or any gaps of the low value data in Fig. 2b as outlier will result in Normal
in the data sets. Figures 1, 2 and 3 show histograms and Q–Q distribution. Similarly, the histogram of SiO2 content is tailed
plots for Fe (total), Al2O3 and SiO2 results obtained from all towards high value region (Fig. 3a) and rejection of the high
the participating laboratories. The histogram of Fe (total) value data as outlier in Q–Q plot (Fig. 3b) will result in
content (Fig. 1a) looked very close to Normal distribution Normal distribution.

Table 2 Identification of outliers using different methods for Fe (total) content in iron ore
Lab ID Lab data Grubbs single Grubbs double t test Z-score MAD based test Dixon Q test Huber test

1 63.97 0.56 0.77 0.74 0.15 0.65 0.085


2 63.38 -0.79 -0.65 23.00 -0.63 -3.88 0.505
3 63.38 -0.78 -0.64 23.00 -0.63 -3.88 0.505
4 64 0.63 0.84 0.93 0.19 0.88 0.115
5 66.57 6.43 6.94 17.11 3.64 20.65 0.512 2.685
6 63.69 -0.08 0.1 -1.04 -0.22 -1.50 0.195
7 63.87 0.32 0.52 0.07 0.018 -0.12 0.015
8 63.92 0.43 0.64 0.39 0.08 0.27 0.035
9 64.5 1.76 2.03 4.06 0.86 4.73 0.615
10 63.9 0.39 0.6 0.28 0.05 0.12 0.015
11 63.9 0.39 0.6 0.28 0.05 0.12 0.015
12 63.94 0.49 0.7 0.55 0.11 0.42 0.055
13 63.85 0.27 0.47 -0.06 -0.00 -0.27 0.035
14 63.86 0.3 0.5 0.03 0.00 -0.19 0.025
15 63.87 0.32 0.52 0.07 0.01 -0.12 0.015
16 63.59 -0.32 -0.15 -1.7 -0.35 -2.27 0.295
17 63.93 0.46 0.67 0.47 0.09 0.35 0.045
18 64.03 0.68 0.9 1.08 0.23 1.12 0.145
19 63.31 -0.94 -0.81 23.44 -0.73 -4.42 0.575
20 64.04 0.71 0.93 1.16 0.24 1.19 0.155
21 62.53 22.7 22.65 28.34 -1.78 210.42 0.066 1.355
22 62.8 22.09 22.02 26.66 -1.41 28.35 1.085
No of outliers 0 3 4 7 1 3 1 7
Final mean 63.85 63.84 63.80 63.89 63.73 63.84 63.73 63.89
Final SD 0.74 0.28 0.23 0.12 0.44 0.28 0.44 0.12
Final median 63.88 63.9 63.88 63.9 63.87 63.9 63.87 63.9
Bold values indicate that the data which are omitted from final calculation

123
92 S. Chakravarty et al.

The histograms and normal Q–Q plots indicated pres- from 63.89 to 63.73 %) and the outlier-corrected median
ence of outliers in the dataset. Different outlier rejection value remained practically unchanged. However, the out-
methods (parametric and robust) were then employed to lier-corrected standard deviation differed by up to 72.7 %
identify the outliers. The results are summarized in (range from 0.44 to 0.12). In case of Al2O3 content com-
Tables 2, 3 and 4. In the first two columns the laboratory parison (Table 3; Fig. 5), Dixon Q test and Z-score iden-
number and the actual obtained results are presented. The tified one outlier, MAD based test detects two outliers,
subsequent columns present the results obtained with dif- Grubbs single and Box plot detected three outliers, and
ferent outlier rejection tests. The results rejected before Grubbs double detected four outliers. Huber test detected 5
calculation of the consensus values and standard deviations outliers whereas t test detected maximum 9 outliers. The
are highlighted. The box plots for Fe (total), Al2O3, and outlier-corrected averages differed maximum up to 1 %
SiO2 showing outliers (as red marks) are shown in Figs. 4 and the outlier-corrected median values remain unchanged.
and 5. The outlier-corrected standard deviation differed up to
Among the total Fe content data of all participating 55.5 % (range from 0.08 to 0.18). Similarly, in the case of
laboratories (Table 2; Fig. 4), Dixon Q test and Z-score SiO2 content (Table 4; Fig. 5), Dixon Q test and Z-score
identified one outlier, the Grubb’s single, MAD based test identified one outlier, the Grubb’s single and MAD based
and Box plot identified three outliers, and Grubbs double test identified three outliers, Grubbs double and Box plot
test identified four outliers. t test and Huber test were more identified four outliers, Huber test identified five outliers
severe and both tests identified 7 outliers. It can be noted and the t test identified maximum 8 outliers. The maximum
that despite detection of different number of outliers using difference between outlier-corrected average values was
different test methods, the maximum difference between 1.8 % (range from 3.64 to 3.71 %) and the outlier-cor-
outlier-corrected average values was only 0.2 % (range rected median value differed by up to only 0.8 %.

Table 3 Identification of outliers using different methods for Al2O3 content in iron ore
Lab ID Lab data Grubbs single Grubbs double t test Z-score MAD based test Dixon Q test Huber test

1 2.84 -0.35 -0.26 -0.06 -0.01 -0.63 0.06


2 3.15 1.31 1.78 4.25 0.91 2.63 0.25
3 3.02 0.61 0.92 2.44 0.52 1.26 0.12
4 3.4 2.65 3.42 7.71 1.64 5.26 0.14 0.5
5 2.79 -0.61 -0.59 -0.75 -0.16 -1.16 0.11
6 2.7 -1.10 -1.18 -2.00 -0.43 -2.11 0.2
7 3 0.51 0.79 2.16 0.46 1.05 0.1
8 2.9 -0.03 0.13 0.78 0.17 0.00 0
9 2.48 22.27 22.63 25.05 -1.08 -4.42 0.42
10 2.9 -0.03 0.13 0.78 0.17 0.00 0
11 2.98 0.40 0.66 1.89 0.40 0.84 0.08
12 2.91 0.03 0.20 0.92 0.20 0.11 0.01
13 2.99 0.45 0.72 2.03 0.43 0.95 0.09
14 2.93 0.13 0.33 1.19 0.25 0.32 0.03
15 2.87 -0.19 -0.07 0.36 0.08 -0.32 0.03
16 2.83 -0.40 -0.33 -0.19 -0.04 -0.74 0.07
17 2.85 -0.29 -0.20 0.08 0.02 -0.53 0.05
18 3 0.51 0.79 2.16 0.46 1.05 0.1
19 3.02 0.61 0.92 2.44 0.52 1.26 0.12
20 2.62 -1.52 -1.71 23.11 -0.66 -2.95 0.28
21 2.83 -0.40 -0.33 -0.19 -0.04 -0.74 0.07
22 1.57 27.14 28.62 217.68 23.77 214.00 0.50 1.33
No of outliers 0 3 4 9 1 2 1 5
Final mean 2.84 2.90 2.88 2.87 2.9 2.88 2.9 2.9
Final STD 0.34 0.12 0.11 0.08 0.18 0.15 0.18 0.09
Final median 2.9 2.9 2.9 2.84 2.9 2.9 2.9 2.9
Bold values indicate that the data which are omitted from final calculation

123
Proficiency Testing in Chemical Analysis of Iron Ore 93

Table 4 Identification of outliers using different methods for SiO2 content in iron ore
Lab ID Lab data Grubbs single Grubbs double t test Z-score MAD based test Dixon Q test Huber test

1 3.67 -0.03 -0.16 -0.89 -0.19 0.09 0.01


2 3.87 0.71 0.66 1.18 0.26 1.91 0.21
3 3.68 0.02 -0.12 -0.78 -0.17 0.18 0.02
4 3.09 -2.14 22.53 26.87 -1.50 25.18 0.13 0.57
5 4.49 2.93 3.19 7.57 1.65 7.55 0.83
6 3.77 0.32 0.25 0.14 0.03 1.00 0.11
7 3.52 -0.56 -0.77 22.43 -0.53 -1.27 0.14
8 – – – – – –
9 3.59 -0.31 -0.49 -1.71 -0.37 -0.64 0.07
10 3.55 -0.46 -0.65 22.12 -0.46 -1.00 0.11
11 3.63 -0.19 -0.32 -1.30 -0.28 -0.27 0.03
12 3.64 -0.14 -0.28 -1.20 -0.26 -0.18 0.02
13 3.53 -0.55 -0.73 22.33 -0.51 -1.18 0.13
14 3.73 0.19 0.09 -0.27 -0.06 0.64 0.07
15 3.56 -0.43 -0.61 -2.02 -0.44 -0.91 0.1
16 3.71 0.12 0.00 -0.47 -0.10 0.45 0.05
17 3.58 -0.34 -0.53 -1.82 -0.40 -0.73 0.08
18 3.66 -0.08 -0.20 -0.99 -0.22 0.00 0
19 3.83 0.54 0.49 0.76 0.17 1.55 0.17
20 3.37 -1.13 21.38 23.98 -0.87 -2.64 0.29
21 5.3 5.89 6.49 15.93 3.48 14.91 0.37 1.64
22 4.1 1.53 1.60 3.55 0.77 4.00 0.44
No of outliers 0 3 4 8 1 3 1 5
Final mean 3.75 3.64 3.68 3.68 3.68 3.71 3.67 3.65
Final STD 0.44 0.12 0.15 0.09 0.27 0.24 0.27 0.10
Final median 3.66 3.64 3.66 3.67 3.65 3.65 3.65 3.65
Bold values indicate that the data which are omitted from final calculation

However, the outlier-corrected standard deviation differed robust tests give similar number of outliers. In fact, it was
by up to 66.6 % (range from 0.09 to 0.27). found that every parametric test has an alternative robust
test. For example, Grubbs single test (parametric) and Box
plot (robust) identified equal number of outliers. So the
4. Discussions number of outliers itself does not give a direct answer
which methods (parametric or robust) should be preferred
In this study, the number of participating laboratories was in case of PTs with small number of participating labora-
low (22). The histograms and Q–Q plots showed tailed tories. Both parametric and robust methods have stricter
distributions and presence of outliers. The data set there- and less strict outlier tests. The number of outliers gives the
fore do not fulfill the criteria of parametric tests in strict idea which outlier test should not be used in case of PTs
sense as the parametric tests require normal distribution of with small number of laboratories. It has been proposed
the dataset. The problem can be overcome by increasing that the maximum number of outliers should not be more
the number of participating laboratories in the PT program. than 2/9 of the whole data set [14]. Based on this, in present
However, in the present study, both parametric and robust study, all the outlier tests except t test and Huber test were
test methods for outlier determination were utilized to get a considered suitable for outlier rejection. The next question
more detail information, i.e. which outlier method to be is which outlier test method should be adopted in a PT
used and which method should be avoided in a PT pro- program. The choice of the outlier test influences the
gram, where the number of participating laboratories is consensus value and standard deviation. These two char-
less. acteristics determine the Z-score of a laboratory result. It
It was found that different outlier tests give different can be seen from Tables 2, 3 and 4 that that the outlier-
results. Also, it was observed that several parametric and corrected averages for Fe (total), Al2O3 and SiO2 do not

123
94 S. Chakravarty et al.

Table 5 Final assigned values and targeted standard deviations used


to calculate Z-scores of participating laboratories
Parameter Fe (total) Al2O3 SiO2

Assigned value (Xa) (%) 63.84 2.90 3.64


Targeted standard deviation (s) 0.28 0.12 0.12

Table 6 Z-score of the participating laboratories in the PT Program


Lab ID Z-scores
Total iron Alumina Silica

1 0.46 -0.5 0.25


2 -1.64 2.083333 1.916667
3 -1.64 1 0.333333
4 0.57 4.166667 24.58333
Fig. 4 Box plot for Fe (total) content in iron ore
5 9.75 -0.91667 7.083333
6 -0.54 -1.66667 1.083333
7 0.11 0.833333 -1
8 0.29 0 -0.41667
9 2.36 23.5 -0.41667
10 0.21 0 -0.75
11 0.21 0.666667 -0.08333
12 0.36 0.083333 0
13 0.04 0.75 -0.91667
14 0.07 0.25 0.75
15 0.11 -0.25 -0.66667
16 -0.89 -0.58333 0.583333
17 0.32 -0.41667 -0.5
18 0.68 0.833333 0.166667
19 -1.89 1 1.583333
20 0.71 22.33333 22.25
21 24.68 -0.58333 13.83333
Fig. 5 Box plot for 1 Al2O3 and 2 SiO2 content in iron ore 22 23.71 211.0833 3.833333
Bold values indicate that the data which are omitted from final
vary much although different number of outliers was calculation
rejected following different test methods. Similarly the
median values for Fe (total), Al2O3 and SiO2 remained overview of the data set both for the PT organizers and
practically unchanged after rejection of different number of participating laboratories. In the present study, it was
outliers following different test methods. Therefore, med- observed that t test and Huber test are very strict and
ian values are more stable compared to the mean values if eliminate a large number of outliers from the data set.
outliers are present in a data set and hence is a better choice Therefore, these tests should be avoided for outlier deter-
for the determination of consensus value in a PT program. mination. On the other hand, Dixons Q test and Z-score
In the present case, however, both median and mean values were mild and only rejected one outlier. The Grubbs single
can be considered as the consensus value. On the other test, Grubbs double test, MAD based test and Box plot
hand, the outlier-corrected standard deviation for Fe (total), were the most appropriate tests to determine the outliers in
Al2O3 and SiO2 vary significantly with the choice of the the present study. Accordingly, Grubbs single test was
outlier rejection method (Tables 2, 3 and 4). Therefore, chosen as the method to identify the outliers in the present
multiple outlier test methods should be used to identify the study. Thus, the accepted consensus value for Fe (total) is
outliers in a PT program especially when the number of 63.84 % with standard deviation of 0.28. Similarly for
participating laboratories is less. They complement each alumina and silica, the accepted consensus values are 2.90
other and helps give diverse information and better and 3.64 %, respectively with standard deviation of 0.12

123
Proficiency Testing in Chemical Analysis of Iron Ore 95

for both. The final assigned/recommended values of Fe median values is a better choice for the determination of
(total), Al2O3, and SiO2 and the target values for standard consensus value in a PT program as medians are more
deviations are tabulated in Table 5. These values were used stable compared to the mean values. The overall perfor-
to calculate Z-score of the participating laboratories mance of the 22 laboratories for quantitative analysis of Fe
(Table 6). Among the 22 participating laboratories, Z- (total), Al2O3 and SiO2 content in iron ore was satisfactory.
scores of 4 laboratories (Lab 5, 9, 21, and 22) for analysis The present PT program provided an external quality
of total iron fall outside the acceptable limit of ±2. Simi- control mechanism to improve the testing capability of the
larly, for analysis of alumina, Lab 2, 4, 9, 20 and 22 has testing laboratories in elemental analysis of iron ore.
unacceptable Z-scores whereas for silica, Lab 4, 5, 20, 21,
and 22 has unacceptable Z-scores. It is interesting to note Acknowledgments The authors thank participating laboratories,
Director, CSIR-National Metallurgical Laboratory, Jamshedpur and
that laboratory 22 has reported unacceptable results for all Director, CSIR-National Physical Laboratory, New Delhi for their
the three parameters whereas laboratory 4, 5, 9, 20, and 21 support for the program.
has reported unacceptable results for two parameters.
Laboratory 2 has reported unacceptable result for alumina
only. References
Effort was then made to identify the plausible causes for
adverse performance of the above laboratories in the PT [1] ISO/IEC 17025, General Requirements for the Competence of
the Testing Laboratories (2005).
program. It was observed that Laboratory 22, which has [2] EA-03/04 European Co-operation for Accreditation, Use of
reported unacceptable results for all three parameters, is Proficiency Testing as a tool for Accreditation in Testing (2001).
not a NABL accredited laboratory. It was advised to them [3] Y.P. Singh, S.K. Nijhwan, R.P. Singal, A.K. Saxena and S.U.M.
to go for NABL accreditation so that they will follow Rao, Interlaboratory Proficiency Testing: Liquid in Glass
Thermometer Intercomparisation, MAPAN J. Metrol. Soc. India,
standard procedures for chemical analysis and will use 19 (2004) 169–175.
appropriate certified reference materials to validate their [4] S. Yadav and A.K. Bandyopadhyay, Evaluation of Laboratory
results. Similarly, laboratory 5, 9, and 20 has not used any Performance Through Interlaboratory Comparison, MAPAN-J.
certified reference materials to validate their results. Metrol. Soc. India, 24 (2009) 125–138.
[5] M.K. Mittal, J.C. Biswas, and A.S. Yadav, Proficiency Testing
in AC Power and Energy, MAPAN J. Metrol. Soc. India, 24
(2009) 42–66.
5. Conclusion [6] S. Yadav, O. Prakash, V.K. Gupta, B.V. Kumaraswamy and
A.K. Bandyopadhyay, Evaluation of Laboratory Performance
Through Proficiency Testing Using Pressure Dial Gauge in the
A number of statistical test methods are available for out- Hydraulic Pressure Measurement up to 70 MPa, MAPAN J.
lier detection in a data set depending on the application, Metrol. Soc. India, 23 (2008) 79–99.
assumptions and number of participating laboratories. [7] S.K. Wong, Evaluation of the use of Consensus Values in Pro-
Correct outlier rejection is of utmost importance because ficiency Testing Programmes, Accred. Qual. Assur., 10 (2005)
409–414.
the choice of the outlier tests influences the consensus [8] S. Basak, S.S. Mukherjee, S.N. Mandal, R. Das, A.K. Mazum-
value and standard deviation. These two characteristics der, J.K. Mondal, R. Sammaddar, S. Mondal, and D. Kundu,
determine the Z-score of a laboratory result in a PT pro- Interlaboratory Proficiency Testing: Intercomparison in Relation
gram. In the present study, five parametric outlier tests to the Measurement of Alumina, Iron(III) Oxide and Titania
Present in Homogenised China Clay, MAPAN J. Metrol. Soc.
were compared: Dixon’s Q test, Grubbs single test, double India, 25 (2010) 265–272.
test, t test, and Z-scores. In addition three robust tests as [9] R. Hegazy, M.I. Mohamed, and A. Abu-Sinna, A Comparative
alternative to parametric tests were chosen: box plot, Huber Study of Statistical Methods Used in Analyzing the Proficiency
test and MAD-based test. The strictness of these outlier Testing Results of Yield Stress, MAPAN J. Metrol. Soc. India,
25 (2010) 107–113.
tests varied in a large range. Huber’s test and t test were the [10] IS 1493 (Part I): (1981) Methods of chemical analysis of iron
strictest tests which eliminated large number of data as ores, Part 1, Determination of common constituents.
outliers. Grubbs single test, Grubbs double test, MAD [11] S. Basak, J.K. Mondal, S.S. Mukherjee, S. Mandal and D.
based test and Box plot were less strict and was found to be Kundu, Homogeneity Testing of China Clay by Complexometry
and Spectrophotometry Using Statistical Technique, J. Ind.
most appropriate in the present case. Dixons Q test and Z- Chem. Soc., 84 (2007) 396–397.
score were very mild and only rejected one outlier. Mul- [12] J.C. Miller, J.N. Miller, Statistics for Analytical Chemistry, 3rd
tiple outlier test methods should be used to identify the ed., Ellis Horwood, Chichester, 1993.
outliers in a PT program, especially when the number of [13] W. Horwitz, Protocol for the Design, Conduct and Interpretation
of Method-Performance Studies, Pure Appl. Chem., 67 (1995)
participating laboratories is less. They complement each 331–343.
other and helps give diverse information and better over- [14] AOAC International, Official methods of analysis of AOAC in
view of the data set both for the PT organizers and par- international Gaithersburg (Md), 16th ed., AOAC International,
ticipating laboratories. If outliers are present in a data set, Gaithersburg, 1999.

123

View publication stats

You might also like