You are on page 1of 38

Document: ISO/TC 147/SC6 N 331

Our ref: ISO/TC147/SC6

Date: 9 Feb 2005

Secretariat of ISO/TC147/SC 6
Water quality - Sampling (general methods)

Dear Member,

DRAFT TO ACCOMPANY NEW WORK ITEM PROPOSAL - ISO/WD 5667-20 WATER QUALITY -
SAMPLING – PART 20: GUIDANCE ON THE USE OF SAMPLE DATA FOR DECISION MAKING (See
doc ISO/TC147/SC6 N 332)

The attached draft is to be read in conjunction with the New Work Item Proposal in document
ISO/TC147/SC6 N 332. Please note that this NWIP is being processed as part of a pilot study on the
electronic committee-internal balloting application and respond as detailed in N332.

The attached draft contains an updated version of the text that was developed during the time at which
this project was at Stage 0 (preliminary item) on the TC147/SC6 programme and also the response of
the author of the draft to comments made by the Canadian member body on the Stage 0 draft.

Please note that the reply date on the NWIP is 9 May 2005. The results will be discussed at the
meeting of ISO/TC147/SC6 planned for 3 June 2005 in Japan.

Yours sincerely

David Upstone
Secretary ISO/C147/SC6

Upstoned\tc147\sc6\doc
BSI, 389 Chiswick High Road, London W4 4AL, UK. Tel: + 44 20 8996 7174. Fax: + 44 20 8996 7799.
E-mail: david.upstone@bsi-global.com
NEW WORK ITEM PROPOSAL
Date of presentation Reference number
2005.02.09 (to be given by the Secretariat)

Proposer
ISO/TC147/SC6 ISO/TC 147 / SC 6 N 331
Secretariat
BSI
A proposal for a new work item within the scope of an existing committee shall be submitted to the secretariat of that committee with a copy to
the Central Secretariat and, in the case of a subcommittee, a copy to the secretariat of the parent technical committee. Proposals not within the
scope of an existing committee shall be submitted to the secretariat of the ISO Technical Management Board.
The proposer of a new work item may be a member body of ISO, the secretariat itself, another technical committee or subcommittee, or
organization in liaison, the Technical Management Board or one of the advisory groups, or the Secretary-General.
The proposal will be circulated to the P-members of the technical committee or subcommittee for voting, and to the O-members for information.
See overleaf for guidance on when to use this form.
IMPORTANT NOTE: Proposals without adequate justification risk rejection or referral to originator.
Guidelines for proposing and justifying a new work item are given overleaf.

Proposal (to be completed by the proposer)


Title of proposal (in the case of an amendment, revision or a new part of an existing document, show the reference number and current title)
English title ISO 5667-20 WATER QUALITY - SAMPLING – PART 20: GUIDANCE ON THE
USE OF SAMPLE DATA FOR DECISION MAKING
French title
(if available)

Scope of proposed project


Will establish guidance on general principles for dealing with the use of water
quality sample data for decision-making, including assessment of:
· Compliance with standards
· Change
· Classification into groups
Concerns known patented items (see ISO/IEC Directives Part 1 for important guidance)
Yes No If "Yes", provide full information as annex
Envisaged publication type (indicate one of the following, if possible)
International Standard Technical Specification Publicly Available Specification Technical Report

Purpose and justification (attach a separate page as annex, if necessary)


ISO 5667-20 will be the most recent in a significant series of ISO standards giving
guidance on general aspects of sampling for determination of water quality and will
give guidance on the use of data obtained by taking samples. The purpose is to
deal with the use of such data in taking decisions and in measuring success or
failure in the presence of the errors associated with sampling. The guide aims to
help control the risk that such errors lead to wrong decisions. This guide will
look also at the problems that are caused when compliance with standards for water
quality is assessed using data obtained by sampling.
Target date for availability (date by which publication is considered to be necessary) 2008.05.31
Relevant documents to be considered
Draft attached as annex
Relationship of project to activities of other international bodies

Liaison organizations Need for coordination with:


Existing liaisons of ISO/TC147/SC6 are IEC CEN Other (please specify)
satisfactory

FORM 4 (ISO) Page 1 of 3


Version 2001-07
New work item proposal

Preparatory work (at a minimum an outline should be included with the proposal)
A draft is attached An outline is attached. It is possible to supply a draft by
The proposer or the proposer's organization is prepared to undertake the preparatory work required Yes No

Proposed Project Leader (name and address) Name and signature of the Proposer
Mr T Warn (include contact information)

tony.warn@environment-agency.gov.uk ISO/TC147/SC6 Secretariat


D Upstone, BSI
david.upstone@bsi-global.com
Comments of the TC or SC Secretariat
Supplementary information relating to the proposal
This proposal relates to a new ISO document;
This proposal relates to the amendment/revision of an existing ISO document;
This proposal relates to the adoption as an active project of an item currently registered as a Preliminary Work Item;
This proposal relates to the re-establishment of a cancelled project as an active project.
Other: If accepted, this work will be allocated to ISO/TC147/SC6/WG1 'Design of
sampling programmes', convener Mr R West of the UK
Voting information
The ballot associated with this proposal comprises a vote on:
Adoption of the proposal as a new project
Adoption of the associated draft as a committee draft (CD)
(see ISO Form 5, question 3.3.1)
Adoption of the associated draft for submission for the enquiry vote (DIS or equivalent)
(see ISO Form 5, question 3.3.2)
Other: This NWIP ballot is being processed as part of a pilot study on the electronic
committee-internal balloting application. Please respond as detailed in doc
ISO/TC147/SC6 N332.
Annex(es) are included with this proposal (give details)
ISO/WD 5667-20
List of comments made on Stage 0 draft and project leader's responses
Date of circulation Closing date for voting Signature of the TC or SC Secretary

2005.02.09 2005.05.08 David Upstone

Use this form to propose:


a) a new ISO document (including a new part to an existing document), or the amendment/revision of an existing ISO document;
b) the establishment as an active project of a preliminary work item, or the re-establishment of a cancelled project;
c) the change in the type of an existing document, e.g. conversion of a Technical Specification into an International Standard.
This form is not intended for use to propose an action following a systematic review - use ISO Form 21 for that purpose.
Proposals for correction (i.e. proposals for a Technical Corrigendum) should be submitted in writing directly to the secretariat concerned.

Guidelines on the completion of a proposal for a new work item


(see also the ISO/IEC Directives Part 1)
a) Title: Indicate the subject of the proposed new work item.
b) Scope: Give a clear indication of the coverage of the proposed new work item. Indicate, for example, if this is a proposal for a new document,
or a proposed change (amendment/revision). It is often helpful to indicate what is not covered (exclusions).
c) Envisaged publication type: Details of the types of ISO deliverable available are given in the ISO/IEC Directives, Part 1 and/or the
associated ISO Supplement.
d) Purpose and justification: Give details based on a critical study of the following elements wherever practicable. Wherever possible
reference should be made to information contained in the related TC Business Plan.
1) The specific aims and reason for the standardization activity, with particular emphasis on the aspects of standardization to be covered, the
problems it is expected to solve or the difficulties it is intended to overcome.
2) The main interests that might benefit from or be affected by the activity, such as industry, consumers, trade, governments, distributors.
3) Feasibility of the activity: Are there factors that could hinder the successful establishment or general application of the standard?
4) Timeliness of the standard to be produced: Is the technology reasonably stabilized? If not, how much time is likely to be available before
advances in technology may render the proposed standard outdated? Is the proposed standard required as a basis for the future development
of the technology in question?
FORM 4 (ISO) Page 2 of 3
New work item proposal

5) Urgency of the activity, considering the needs of other fields or organizations. Indicate target date and, when a series of standards is
proposed, suggest priorities.
6) The benefits to be gained by the implementation of the proposed standard; alternatively, the loss or disadvantage(s) if no standard is
established within a reasonable time. Data such as product volume or value of trade should be included and quantified.
7) If the standardization activity is, or is likely to be, the subject of regulations or to require the harmonization of existing regulations, this should
be indicated.
If a series of new work items is proposed having a common purpose and justification, a common proposal may be drafted including all elements
to be clarified and enumerating the titles and scopes of each individual item.
e) Relevant documents: List any known relevant documents (such as standards and regulations), regardless of their source. When the
proposer considers that an existing well-established document may be acceptable as a standard (with or without amendment), indicate this with
appropriate justification and attach a copy to the proposal.
f) Cooperation and liaison: List relevant organizations or bodies with which cooperation and liaison should exist.

FORM 4 (ISO) Page 3 of 3


© ISO 2003 – All rights reserved

ISO TC 147/SC 6 N 331


Date: 2005.02.09

ISO/WD 5667-20

Secretariat: BSI

Water quality — Sampling — Part 20: Guidance on the use of sample


data for decision making

Warning

This document is not an ISO International Standard. It is distributed for review and comment. It is subject to
change without notice and may not be referred to as an International Standard.

Recipients of this draft are invited to submit, with their comments, notification of any relevant patent rights of
which they are aware and to provide supporting documentation.

Document type: International Standard


Document subtype:
Document stage: (20) Preparatory
Document language: E

J:\ISO\ISO 147\ISO 147 SC 6\Projects\NWIP\5667-20\Results\ISO_5667-20 (edit4) copy 9 Feb 05.doc STD


Version 2.1c
WORKING DRAFT ISO/WD 5667-20

Copyright notice
This ISO document is a working draft or committee draft and is copyright-protected by ISO. While the
reproduction of working drafts or committee drafts in any form for use by participants in the ISO standards
development process is permitted without prior permission from ISO, neither this document nor any extract
from it may be reproduced, stored or transmitted in any form for any other purpose without prior written
permission from ISO.

Requests for permission to reproduce this document for the purpose of selling it should be addressed as
shown below or to ISO's member body in the country of the requester:
[Indicate the full address, telephone number, fax number, telex number, and electronic mail address, as
appropriate, of the Copyright Manger of the ISO member body responsible for the secretariat of the TC or
SC within the framework of which the working document has been prepared.]

Reproduction for sales purposes may be subject to royalty payments or a licensing agreement.

Violators may be prosecuted.

© ISO 2003 – All rights reserved 1


Contents Page

Introduction...................................................................................................................................................1
1 Scope ................................................................................................................................................1
2 References........................................................................................................................................1
3 Terms and definitions ......................................................................................................................1
4 Summary of key points ....................................................................................................................1
5 Types of error and variation.............................................................................................................2
5.1 General..............................................................................................................................................2
5.2 Analytical error .................................................................................................................................3
5.3 Sampling error..................................................................................................................................3
6 Activities ...........................................................................................................................................4
6.1 Estimation of summary statistics ....................................................................................................4
6.2 Limit Values and Compliance ..........................................................................................................4
6.3 Confidence of failure........................................................................................................................5
6.4 Methods for percentile standards....................................................................................................6
6.5 Non-parametric methods .................................................................................................................7
6.6 Look-up tables..................................................................................................................................9
7 Water quality limit values...............................................................................................................10
7.1 General............................................................................................................................................10
7.2 Ideal limit values.............................................................................................................................10
7.3 Absolute limits................................................................................................................................11
7.4 Percentage of failed samples.........................................................................................................14
7.5 Calculating limits in effluent discharges.......................................................................................14
8 Declaring that a substance has been detected .............................................................................16
9 Detecting change............................................................................................................................17
10 Classification..................................................................................................................................18
10.1 General............................................................................................................................................18
10.2 Confidence that class has changed ..............................................................................................20
Annex A (informative) Calculation of confidence limits in clause 6.4 .....................................................24
Annex B (informative) Calculation of confidence limits in clause 6.5 .....................................................25

2 © ISO 2003 – All rights reserved


Foreword

ISO (the International Organization for Standardization) is a worldwide federation of national standards bodies
(ISO member bodies). The work of preparing International Standards is normally carried out through ISO
technical committees. Each member body interested in a subject for which a technical committee has been
established has the right to be represented on that committee. International organizations, governmental and
non-governmental, in liaison with ISO, also take part in the work. ISO collaborates closely with the
International Electrotechnical Commission (IEC) on all matters of electrotechnical standardization.

International Standards are drafted in accordance with the rules given in the ISO/IEC Directives, Part 2.

The main task of technical committees is to prepare International Standards. Draft International Standards
adopted by the technical committees are circulated to the member bodies for voting. Publication as an
International Standard requires approval by at least 75 % of the member bodies casting a vote.

Attention is drawn to the possibility that some of the elements of this document may be the subject of patent
rights. ISO shall not be held responsible for identifying any or all such patent rights.

ISO 5667-20 was prepared by Technical Committee ISO/TC 147, Water quality, Subcommittee SC 6,
Sampling (general methods).

ISO 5667 consists of the following parts, under the general title, Water quality — Sampling:

 Part 1: Guidance on the design of sampling programmes;

 Part 2: Guidance on sampling techniques;

 Part 3: Guidance on the preservation and handling of water samples;

 Part 4: Guidance on sampling from lakes natural and man-made;

 Part 5: Guidance on sampling of drinking water and water used for food and beverage processing;

 Part 6: Guidance on sampling of rivers and streams;

 Part 7: Guidance on sampling of water and steam in boiler plants;

 Part 8: Guidance on sampling of wet deposition;

 Part 9: Guidance of sampling from marine waters;

 Part 10: Guidance of sampling of waste waters;

 Part 11: Guidance of sampling of groundwaters;

 Part 12: Guidance on sampling of bottom sediments;

 Part 13: Guidance on sampling of sludges from sewage and water-treatment works;

 Part 14: Guidance on quality assurance of environmental water sampling and handling;

 Part 15: Guidance on preservation and handling of sludge and sediment samples;

 Part 16: Guidance on biotesting of samples;

 Part 17: Guidance on sampling of suspended sediments;

© ISO 2003 – All rights reserved 3


 Part 18: Guidance on sampling of groundwater at contaminated sites;

 Part 19: Guidance on sampling in marine areas;

 Part 20: Guidance on the use of sample data for decision-making.

4 © ISO 2003 – All rights reserved


WORKING DRAFT ISO/WD 5667-20

Introduction
This guide is about the use of data obtained by taking samples. The purpose is to deal with the use of such
data in taking decisions; in measuring success or failure in the presence of the errors associated with
sampling. The guide aims to help control the risk that such errors lead to wrong decisions.

Poor decisions can also stem from the way in which water quality standards for discharges and environmental
waters are framed or set out in regulations and permits. This guide looks also at the problems that are caused
when compliance with these standards is assessed using data obtained by sampling.

NOTE Decisions might result in the commendation or criticism of people, sites, companies, sectors or nations. They
may lead to legal action or decisions to reduce or increase discharges to the environment.

There are several sampling methods available and their respective merits are under debate. This guide does
not deal with sampling methods. It deals with the additional issue of using the results from sampling to take
decisions, even where the choice of method of sampling is correct and it is used properly.

1 Scope
This guide establishes general principles for dealing with the use of sample data for decision-making. The
scope includes assessment of:

• Compliance with standards

• Change

• Classification into groups

It is not the purpose of this guide to recommend particular statistical techniques. Nor does it cover a wide
range of techniques and the circumstances in which they should be used. The purpose is to establish the
principle that sampling errors (and errors generally) must be assessed and taken into account as part of the
process of taking decisions.

NOTE A few statistical techniques are used as illustrative examples. These are techniques that have seen routine
use in some regulatory regimes.

2 References
The following referenced documents are indispensable for the application of this document. For dated
references, only the edition cited applies. For undated references the latest edition of the referenced
document (including any amendments) applies.

3 Terms and definitions


For the purposes of this document, the terms and definitions given in….apply.

4 Summary of key points

The following points are brought out in the text.

© ISO 2003 – All rights reserved 1


Sampling error is caused by the action of random chance. It can be present, for example, in a set of
measurements of water quality taken over a period of time. The values for chemical analysis of those samples
depend on the particular small volumes of water that were extracted or measured. If water quality varies, a
second set of samples will tend to have different values because this set is made up of different small volumes
of water. Sampling error is the term given to this effect. It applies even in the case of trivial errors in chemical
analysis and if there are no appreciable errors in the methods by which samples are taken and handled.

Sampling error should be quantified and taken into account in all cases where water quality varies and
sampling is used to estimate information that is used to take decisions. This includes assessing compliance
with standards and thresholds (see Clause 7), deciding whether water quality has changed (see Clause 9),
and putting waters into grades in classification systems (see Clause 10). This guide recommends that:

 Standards where compliance is assessed by sampling should be defined or used so that sampling error
can be estimated and dealt with appropriately (see Clause 7.2);

 Absolute limits should be treated as a percentile when assessing compliance using sampling (see 7.3);

 Standards defined as limits to be met by a percentage of samples should be defined or used as the
corresponding percentiles (see Clause 7.4); and,

 The degree of confidence should be estimated when aiming to demonstrate failure of water quality
standards (see Clause 6).

 The degree of confidence in changes or differences should be estimated when aiming to demonstrate
change or no change (see Clause 10.2).

This guide sets out basic requirements and illustrative methods that will be adequate in many decisions of the
type set out in Clause 1 above and Clauses 5.1 and 5.3 below, or that will serve as a preliminary look at the
sensitivity of a decision to sampling error. This guide does not cover a full range of statistical techniques.

This guide does not deal with the mechanics of taking the samples themselves, or how to ensure the samples
are representative. Neither does it deal with how to perform chemical analysis on the samples. These are big
topics in themselves and they are covered in other guides. If badly done they add to the difficulties from
sampling error and, in some cases, the resulting errors may dominate those from sampling error.

5 Types of error and variation

5.1 General

In many procedures by which data are used to take decisions, there will be a set of results taken over a period
of time (e.g. a year). This information might be used to make judgements such as the following:

 Water quality in this river failed to meet the required standards during this year;

 This treatment works performed better this year;

 Water quality in this lake needs improvement;

 This company has better effluent discharge compliance than that one; or,

 Most of the risk of environmental impact is from this particular type of effluent discharge.

It is unlikely that there are many significant changes in water quality from second to second throughout a year,
but variations from day to day are common. These can be due to diurnal cycles, the play of random errors and
bias from the laboratory, the weather, step changes and day-to-day variations (perhaps in the natural
processes in water or caused by discharges and abstractions and changes in these), seasonal and economic
cycles, and several underlying and overlapping long-term trends.

In addition, the set of samples must representative of the average quality of the masses of water from which
they were taken. For example, a set of samples must be representative of the period of time being reviewed,
i.e. when estimating an annual mean, it would not be acceptable for all the samples to be taken in April.

2 © ISO 2003 – All rights reserved


NOTE Guidance on all these aspects is given in more detail in ISO 5667-1.

5.2 Analytical error

Analytical errors are those errors that are introduced by the process of chemical analysis and reflect that these
measurements are not error free. It may be that the result for a single sample can be specified to within a
specific range, e.g. ± 15 %.

NOTE This depends on the capabilities of the equipment and the laboratory that has been used to perform the
analysis.

When a mean is calculated from 12 samples, the errors in the chemical analysis tend to average out
according to the square root of the number of samples. For example if the analytical error associated with a
single sample were ± 15%, then the error in the estimate of the mean of a set of chemical analyses would
tend to reduce to something like ± 4% for 12 samples or to ± 2,5 % for 36 samples.

In using samples to take decisions this kind of error from chemical analysis augments but is often smaller than
that from statistical sampling error.

NOTE: The estimate of the mean and its errors can be misleading if the samples are unrepresentative. For example if the
samples for the estimate of the annual mean are all taken in winter, or from a particular patch or depth of water.
Sometimes several or most of the data will be reported as less than a specified limit of detection and, depending on the
1
types of decisions that depend on the data, may require special techniques .

2
When the sample results are used to estimate the value of other summary statistics such as percentiles , the
picture is similar, i.e. the errors are inversely proportional to the square root of the number of samples but are
larger than for the mean.

5.3 Sampling error

Sampling error is due to quality variations in the water being sampled, and the ability of the sampling process to
accurately reflect these variations. In a set of samples taken over a period of time, the results are affected by the
operation of the laws of chance in the way the particular samples came to be collected. This produces error even
if analytical errors happened to be close to zero, and if the sampling were truly representative.

In using sampling, the main source of error is usually associated with the number of samples taken. In the
types of decision listed below sampling error is usually a bigger issue than, for example, that associated with
errors of chemical analysis. (Though in practice, errors in chemical analysis, for example, are bound up within
what is observed as statistical sampling error, say, in the list of measurements taken in a year.)

Sampling error should be assessed in cases where water quality varies, such as the following:

 when sampling to measure water quality;

 when using samples to estimate summary statistics, e.g.. the monthly mean, the annual percentile or the
annual maximum;

 when making statements about whether this year’s summary statistics are higher or lower than last
year’s;

 when establishing whether summary statistics exceed a threshold;

 when using summary statistics to place water quality in a particular class within a classification system; or

 when assessing whether a class change has occurred.

1 Such data are called censored data. Special techniques are available for getting the best out of such data.

2 The 95-percentile is the value exceeded for 5 % of the time.

© ISO 2003 – All rights reserved 3


In all these the aim is to assess whether the change or status is statistically significant. And to require that
laws, regulation and guidance assert the requirement to assess and report statistical significance.

6 Activities

6.1 Estimation of summary statistics

An estimate of a summary statistic depends on the values that happen to be captured by sampling. The
estimate, due to sampling error, is almost certain to differ from the true value – that which would be obtained if
it were possible to achieve continuous error-free monitoring.

Sampling error can be managed by calculating confidence limits. Confidence limits define the range within
which the true value of the estimate of the summary statistic is expected to lie. In the example in Table 1, the
estimate of the mean from 8 samples is 101 mg/l and there is a pair of 95 % confidence limits, 47 and 155.
There is 95 % confidence that the true mean exceeds the 95 % (optimistic) confidence limit and 95 %
confidence that the true mean is less than the 95 % (pessimistic) confidence limit of 155. Overall there is 90 %
confidence that the true mean occupies the range between 47 and 155.

Table 1 — Example of confidence limits for the mean

Estimate of the mean 101

Standard deviation 82

Number of samples 8

Pessimistic and optimistic confidence limits 47 - 155


(90% confidence)

The gap between the confidence limits widens as the sampling rate is decreased. It is also larger for estimates
of extreme summary statistics such as the 95-percentile and 99-percentile. For a typical water quality pollutant,
the confidence limits for a mean estimated from 12 samples are ±30 % around this estimate. For an estimate
of the 95-percentile, this range is –20 % to +80 % around the estimate of the 95-percentile.

6.2 Limit Values and Compliance

It will be argued in Clause 7, that the impact of sampling error makes it vital that water quality standards and
similar controls are defined or used as means or percentiles and, not, for example, as absolute limits. In other
words the decision to use sampling means that definitions of water quality standards should be restricted to
summary statistics that can be assessed properly by sampling. In this clause the discussion is limited to
summary statistics that are means and percentiles.

Note: The use of a limit that is expressed as an annual mean implies, for example, that the pollutant causes damage
that builds up over time. But it can also apply if the impact of the pollutant is associated with values higher
than the mean, so long as the shape of statistical distribution can be expected to be fairly stable and if action
to reduce the mean will also reduce the number or scale of peak events. The extent to which this is not risky
is often covered by the size of the safety factors built into the standard in the first place. Given all these
conditions, the use of a mean as a standard has the advantage that it is generally efficient in terms of getting
the smallest sampling errors from a set of samples.

If the water quality standard is defined as a mean, then it is a simple matter to estimate the mean from a set of
samples. This estimate can then be compared with the value of the mean that is set down as the water quality
standard. If the value estimated from the samples is worse than this value, the site under test can be said to
have failed. If the estimate is better than this value, the site under test can be said to have passed. This type
of assessment is called a face-value assessment.

A difficulty with the face-value approach is that any estimate of the mean depends on the values captured by
sampling. There is a risk, caused by sampling error, that a compliant site (one which met the mean standard)
might be reported as a failure purely because the set of samples happened by chance to capture a few high

4 © ISO 2003 – All rights reserved


values. Similarly, a non-compliant site might evade detection if it happened by pure chance to hold mainly
good quality samples.

Therefore, sampling error carries a risk of making bad decisions.

This risk can be controlled by allowing for the doubt that the sampling error puts into the estimate of the mean.
One way of doing this is to calculate confidence limits (see Table 1).

In Table 1, the face-value estimate of the mean is 101. Around this there is the pair of confidence limits, 47
and 155. These define a confidence interval. With a 90 % confidence interval, there is 95 % confidence that
the true mean is less than the upper or pessimistic confidence limit (155 in Table 1). There is a chance of only
5 % that a true mean as high as this could have come about from the action of chance in sampling.

Similarly, there is 95 % confidence that the true mean exceeds the optimistic or lower confidence limit (47 in
Table 1). There is a chance of only 5 % that a value as low as this could have arisen by chance.

To assess compliance, the confidence limits should be compared with the standard.

 If the pessimistic confidence limit is less than the mean limit (for a 90% confidence interval), there is at
least 95 % confidence that the standard was met.

 If the optimistic confidence limit exceeds the mean limit, there is at least 95 % confidence that the
standard was not met.

 Where the mean limit lies between the optimistic and pessimistic confidence limits, it is not possible to
state compliance or failure with at least 95 % confidence.

With the results shown in Table 1, the site would pass a water quality standard set as a mean of 160 and fail a


standard of 40. These decisions have at least 95 % confidence. Performance against a standard of 100 is
unresolved at 95 % confidence the value of 100 lies between the confidence limits.

A limit that is expressed as a percentile is perhaps preferred to the mean for example, for damage associated
with, say, high concentrations rather than the average. It can also apply if the impact of the pollutant is
associated with values higher than the 95-percentile limit, so long as the shape of statistical distribution can be
expected to be fairly stable and if action to reduce the 95-percentile will also reduce the number of peak
events. Similarly the use of a standard like the annual 95-percentile implies knowledge or assumption that the
duration of individual events of high concentration is not important so long as the total exceedence is less than
5 %. Though this standard can apply where the distribution of the duration of events is expected to remain
fairly stable.

Where none of this applies other types of standard can be used, though these will imply that compliance may
need to be assessed by monitoring that is nearly continuous, and not by a small set of samples.

6.3 Confidence of failure

In taking decisions, the response could vary from “report the failure” to “take legal action” to “spend a lot of
money” to “rectify the problem”. The consequences of being wrong vary, and, in principle, each type of
decision requires its own degree of confidence, –i.e. its own accepted risk of being wrong. The more important
the decision, the less the decision-taker should allow of the play of errors in sampling and measurement to
lead to a wrong decision.

The confidence of failure is a single statistic that replaces the need to compute different confidence limits for
each type of decision. It varies on a scale from 0 % to 100 % (see Table 2).

© ISO 2003 – All rights reserved 5


Table 2 — Example of confidence of failure in mean standards

Mean 101

Standard deviation 82

Number of samples 8

Confidence of failure (%)

For a mean standard of 180 1

For a mean standard of 120 27

For a mean standard of 30 98

In Table 2, the 98 % confidence of failure states that there is a risk of only 2 % that the site under test met the
mean standard of 30 but it appeared to fail because the action of chance produced a set of bad samples. In
this case it is appropriate to take any action where it is acceptable to live with a risk of up to 2 % that such
action is unnecessary.

This looks at the risk that failure is wrongly reported3. The parallel exercise, though less common, is to look at
the risk that success is claimed wrongly4. In Table 2, there is 1 % confidence that a standard of 180 is failed.
In other words there is 99 % confidence that the standard was met.

6.4 Methods for percentile standards

One way of estimating percentiles from the results of sampling is to use an assumption about the statistical
distribution from which the samples were taken, e.g. whether it is log-normal or normal. Such methods are called
parametric methods, as distinct from non-parametric methods (that generally need to make no assumption about
distribution).

Parametric methods depend on the fact, for example, that the 95-percentile for a normal distribution is 1.64
standard deviations above the mean. An example in Table 3 gives an estimate of the 95-percentile of 250 for
a mean and standard deviation of 101 and 82 respectively. In this a log-normal distribution is assumed and
the mean and standard deviation are converted to the log domain using the Method of Moments5.

The 95 % confidence limits are 160 and 7606.

NOTE It is not the intention in this example to advocate the assumption of a log-normal distribution in circumstances
where it is not appropriate, or the use of the particular methods of calculating percentiles and confidence limits that were
used for this example. But it is useful do these sorts of calculations to indicate the scale of error, if only as a preliminary.
What is being advocated is that it is a folly to act as if the true 95-percentile in the above example were 250 and to act in
ignorance that the range on this was something like 160 to 760, or worse.

It may be important in the context of the decisions made as a result of these data to use different statistical techniques. It
may also be that the data are unrepresentative in time of space, or that there were mistakes in mechanics of taking and
handling the samples. The data may be affected by limitations in analytical technique and expressed as “less than” some
detection limit. There may be underlying trends. Many of these factors will mean that the true error is bigger that
suggested by the range from 160 to 760.

3 Statisticians call this a Type I Error.

4 Statisticians call this a Type II Error.

5 This provides equations that convert the mean and standard deviation into estimates of the mean and standard deviation
for the logarithms of the data without the need to take logarithms of the sample results themselves (see Appendix 1).
6 They are calculated in this particular case using the properties of the Shifted T-Distribution (see Appendix 1).

6 © ISO 2003 – All rights reserved


Table 3 — Example of confidence of failure in percentile standards

Mean 101

Standard deviation 82

Number of samples 8

Estimate of 95-percentile 250

Optimistic confidence limit 160

Pessimistic confidence limit 760

Confidence of failure (%)

For a standard of 800 4

For a standard of 300 40

For a standard of 150 96

In Table 3, the confidence of failure of 4 % states that there is 96 % confidence that the standard was met.
There is a risk of only 4 % that the site truly met the 95-percentile standard of 800 but appeared to fail
because the set of samples contained, through chance, and unexpected number of high values.

The assumption of log-normality is worth making if data can be assumed to be roughly compatible with the
log-normal distribution. The assumption injects extra information into the calculation and so can boost the
precision of estimates of percentiles, when compared with non-parametric methods.

NOTE A parametric method may be risky where there is no evidence that the data follow a parametric distribution.

6.5 Non-parametric methods

There are instances where parametric methods cause difficulties. It has been noted in Clause 6.4 that it might
sometimes be wrong to assume, for example, a log-normal distribution.

Non-parametric methods for the estimation of percentiles are based on ranking the sample results from
smallest to largest. An estimate of the 95-percentile is given as the value that is approximately 95 % of the
way along this ranked list, interpolating where this point falls between a pair of samples.

Since assumption (or information) is excluded, the estimates from non-parametric methods tend to be less
precise than those from parametric methods.

NOTE Estimates from non-parametric methods might be less risky than a false assumption of log-normality.

The parametric methods may not be unhelpful in applications that involve legal actions. This may happen if
there is a need to avoid assumptions that might be contested, e.g. whether the sample results follow a log-
normal distribution (or any other parametric distribution).

When assessing compliance with percentile standards, an alternative and simpler way of using a non-
parametric approach is to count the number of failed samples, i.e. the number of sample results whose
concentration exceeds the concentration in the percentile standard. The proportion of failed standards is an
estimate of the time spent in excess of the threshold. If more than 5 % of samples exceed the limit in a 95-
percentile standard, it is tempting to say that the standard was not met. However, this is a face-value
assessment of the percentage of time spent outside the limit, vulnerable to sampling error.

This method of using data, counting failed samples, means that some of the information in the samples is not
used, and this can sometimes be significant. For example, there is no difference between a sample that only
just exceeded a limit, and one that exceeded it grossly. Both are merely failed samples under this method.

© ISO 2003 – All rights reserved 7


Similarly, a sample that nearly failed the limit is equivalent to one with a concentration of zero. Both are just
compliant samples.

For 26 samples with one failed sample, the percentage of failed samples is 1/26 × 100 or 3,8 %. This is an
estimate of the true failure rate, i.e. the true time spent in failure. The value of 3,8 is less than 5 %. This states,
at face value, that a 95-percentile standard has not been failed (whereas a 99-percentile standard would have
been).

Table 4 shows that for 26 samples and one failed sample, the 95 % confidence limits about the value of 3,8 %
are 0,20 and 177. The values of 0,2 and 17 define the pair of 95 % confidence limits on the estimate of the
time spent in failure (the percentage of failed samples). There is 90 % confidence that the true failure rate is in
this range.

Table 4 — Non-parametric method for percentiles

Number of samples Number of failed Percentage of failed True failure rate (%)
samples samples
(90 % confidence
(%) interval)

4 1 25.0 1.27–75.1

12 1 8.33 0.43–33.9

26 1 3.85 0.20–17.0

52 1 1.92 0.099–8.8

150 1 0.67 0.034–3.1

The third row of Table 4 shows that there is a risk of 5 % that a result as bad or worse than one failed sample
in a set of 26 could have been produced from a site whose true failure rate was as low as 0,20 %. Similarly,
there is a risk of only 5 % that a result as good or better than one failed sample in a set of 26 could have been
produced from a site whose true failure rate was as bad as 17 %.

As before, sampling is used to estimate a summary statistic that in this case refers to the time spent in failure.
Again the use of sampling introduces sampling error.

If there were 12 samples and one of the samples exceeded the limit. This would represent 8,33 % of failed
samples. This is the face-value estimate of the time spent in failure. Table 4 gives the corresponding 95 %
confidence limits as 0,43 % and 33,9 %.

Suppose the standard was a 95-percentile. As before, it is necessary to compare not only the face-value
estimate of 8,33 with the allowance of 5 %, but to do the same with the optimistic confidence limit. If this
exceeds 5 % there is at least 95 % confidence that the site has truly failed the 95-percentile standard. In this
case, the optimistic confidence limit is only 0,43 % and the failure is not significant at 95 % confidence.

The above deals with the assessment of failure. To be sure of a pass it should be a requirement that the
pessimistic confidence limit is less than 5 %.

Any other position, where the figure of 5 % lies between the two confidence limits, is unresolved at 95 %
confidence.

Table 5 parallels the contents of Table 4 but gives the confidence of failure of a 95-percentile limit for
particular outcomes from sampling.

7 In this particular case this is estimated from the properties of the binomial distribution (see Appendix 2).

8 © ISO 2003 – All rights reserved


The second row of Table 5 shows the case for a set of 12 samples in which 1 sample exceeded the 95-


percentile limit. The Table shows that there is 54 % confidence that this outcome, 12 samples and 1 failed
sample, indicates that the 95-percentile limit has been failed that the concentration in the 95-percentile limit
has been exceeded for more than 5 % of the time. In this case it is appropriate to take decisions as a
consequence of this set of samples, so long as it is acceptable that the risk is 46 % that the action is
unnecessary. This might rule out expensive or irreversible decisions.

Table 5 — Confidence of failure of a 95-percentile limit

Number of Number of failed Confidence of


samples samples failure

(%)

4 1 81

12 1 54

26 1 26

52 1 7

150 1 0.05

6.6 Look-up tables

The preceding clause has discussed the use of the optimistic confidence limit or confidence of failure to
determine which sets of samples show failure that is statistically significant. This process can be simplified by
allowing more failed samples than, for example, the 5 % suggested by a 95-percentile standard.

In this, the permitted number of failed samples is increased to a point where there is at least 95 % confidence
that the site has failed for the percentage of time allowed by the percentile standard, i.e. 5 % for the 95-
percentile. This gives at most a risk of one-in-20 that a site is wrongly declared as a failure. Table 6 contains
figures for the 95-percentile standard and 95 % confidence of failure.

Table 6 — Look-up table for the 95-percentile

Number of samples Minimum number of failed


samples for at least 95 %
confidence of failure

4–7 2

8–16 3

17–28 4

29–40 5

41–53 6

54–67 7

A similar table forms part of the permits sanctioning discharges from sewage treatment plants under the EU
Directive concerning urban wastewater treatment (91/271/EEC). It defines “failure” as a state where there is 95 %
confidence that the 95-percentile standard was failed.

© ISO 2003 – All rights reserved 9


Look-up tables can be set up for any combination of percentile and required confidence. Table 7a gives figures for
the 99.5-percentile and 95 % confidence of failure and Table 7b gives a version for the 95-percentile and 99.5 %
confidence of failure8.

Table 7a — Look-up table for a 99,5-percentile standard

Number of samples Minimum number of failed


samples for at least 95 %
confidence of failure

1–10 1

11–71 2

Table 7b — Look-up table for a 95-percentile standard

Number of samples Minimum number of failed


samples for at least 99,5 %
confidence of failure

3–7 3

8-14 4

15-23 5

Just as extra failures are allowed in order to give proof of failure, so fewer failures than the 5 % associated
with the 95-percentile standard is a condition of demonstrating proof of success. In the design of rules in
awarding prizes for proven compliance, a different look-up-table would be needed. This table would be
designed to control the risk of stating wrongly that a site that has truly failed, and is reported wrongly as
compliant because of sampling error.

There is a limitation in using this type of look-up table to show high confidence of compliance with standards
such as the 95-percentile. This is because at low rates of sampling, reaching the required level of confidence
can appear to require fewer than zero failed samples. Confirming at least 95 % confidence that a limit is met
for 95 % of the time cannot be done with less than 57 samples.

NOTE The use of fewer than 57 samples may require the use of parametric methods.

7 Water quality limit values

7.1 General

Water quality can be assessed by the use of limit values9. The results from samples are compared with limit
values in order to assess compliance. This clause looks at the extra difficulties that can be caused by the way
limit values are defined.

7.2 Ideal limit values

8 The calculations in this Clause are based on the properties of the Binomial Distribution (Appendix 2)

9 Often called water quality standards, or environmental quality standards, or set as conditions in the permits for
discharges.

10 © ISO 2003 – All rights reserved


Difficulties can be avoided if limit values are defined or treated as ideal limit values, i.e. values that address
five criteria, the first three of which are listed below:

 a limit, e.g. a concentration of 10 µg/l;

 a summary statistic, e.g. how often the limit may be exceeded, e.g. 1 % of the time10;

 the period of time over which this statistic applies, e.g. a calendar year.

These three criteria are key and set the limit value. A fourth is relevant when deciding the action to improve
water quality. When the action is finished, the question arises: what residual risk of failure is acceptable in the
long run, for example, as a consequence of rare patterns in the weather? A fourth point covers this as follows:

 the definition of the design risk, i.e. the proportion of time periods for which failure to meet the criterion
(enshrined in the above three bullet points) is accepted, (e.g.. one in 20 calendar years).

In other words, using the numbers introduced so far, it is acceptable that an annual estimate of the 99-
percentile exceeds 10 µg/l in 1 year in 20.

A fifth criterion may be added that deals with the actual assessment of compliance, i.e. from the samples taken in
a particular calendar year:

 the statistical confidence with which non-compliance is to be demonstrated before failure is reported.

There is a trade-off between the fourth and fifth points. The fourth point, perhaps not as important as the first
three, is best regarded as the outcome given even continuous error-free monitoring. It relates to the
acceptability of the physical consequences of truly failing the standard. The fifth deals with compliance. Failure
might be defined as the case where the monitoring or sampling shows at least 95 % confidence that the failure
is true and not attributable to the play of chance in sampling.

In other words, again using the numbers introduced so far, it is acceptable that an annual estimate of the
upper 95 % confidence limit exceeds 10 µg/l in 1 year in 20.

In an ideal limit all five items are defined. If any were undefined, its value would take an arbitrary, unknown
value that could vary from decision to decision as the limit was used. It is useless to know that the limit is 10
µg/l, whilst allowing any or several of the other four items to vary in an unknown manner for each decision.

Examples of ideal limits include the following.

 Over a long period of time, e.g. 20 years, the 95-percentile value of the concentration should be less than
200 in 19 summers out of 20 and failure will be declared when monitoring shows non-compliance with
95 % confidence.

 Over a long period of time, e.g. 20 years, the mean value of the concentration should be less than 0,6 in
five calendar years out of 10 and failure will be declared when monitoring shows non-compliance with
95 % confidence.

7.3 Absolute limits

The purpose of a limit is compromised if it is defined in a way that ignores the fact that compliance will be
checked by sampling. One type of limit that runs this risk is the maximum value (or absolute limit). This type of
limit is popular because it is easy to understand and use, especially in legal actions.

These benefits should be set against the errors that arise when maxima are assessed against data collected
by sampling. These errors can lead to faulty assessments of performance and so to wrong decisions, e.g. on
legal action and investment to improve quality that does not, in reality, require improvement, or failure to invest
where improvement is truly necessary.

10 This is the annual 99-percentile. Standards might also be expressed as other percentiles and averages for a particular
period of time, e.g. a month.

© ISO 2003 – All rights reserved 11


In particular, great care should be taken over absolute limits and over the amount of data used to assess
compliance. This is because a relatively small number of samples are taken from a relatively wide range of
time and material. This leads to a strong risk that there will be high concentrations (and exceedences) outside
the instances captured in the samples. The fact that few failures are seen may encourage regulators to
position limits at values that seldom elicit failure under, say, monthly sampling.

The problem is that:

 increasing sampling will lead to more failed samples; and

 a report that the standard has been failed is almost guaranteed under continuous monitoring or very
frequent sampling.

To illustrate, consider a site that exceeds a standard for 1 % of the time. Such a site will always be reported as a
failure if assessed using a continuous error-free monitoring. Table 8 shows that this failure will escape detection
with a probability that depends strongly on the number of samples11.

Table 8 — Effect of sampling rate on reported compliance

Number of samples Probability of reporting


failure
(%)

4 4

12 11

52 39

When we use sampling, the impression of failure depends on the number of samples. The illusion of improved
performance can be manufactured by taking fewer samples. In the meantime, the “true quality” may have
deteriorated.

As discussed below, absolute limits monitored solely by sampling are not true absolute limits at all. This is
because of the mathematical implication that failure is permitted at times when samples are not taken, i.e.
failure is tolerated for a proportion of the time. Such absolute limits are, in truth, percentiles.

When seeking a solution to the problems caused by an absolute limit, we should:

 translate it into a percentile in the permits and regulations; or,

 treat it as a percentile when assessing compliance.

As an ideal limit, the absolute limit has the required clarity for the first item, which might be a value, e.g. a
concentration of 10 µg/l. However, for the second item, the summary statistic is ambiguous. The absolute limit
requires compliance by 100 % of samples in a year.

This has two meanings. The first meaning is that the limit is a 100-percentile, a value that should be met for 100 %
of a year. It has been discussed above that this is illogical if sampling alone is used to assess compliance. The
use of only twelve samples leaves a lot of time where failure could have occurred and might not have been
noticed.

11 These calculations are based on the probability of no failures in a set of N samples. This is 1 minus p to the power N,
12
where p is the probability of a compliant sample. Thus for 12 samples in Table 8 it is 1-0.99

12 © ISO 2003 – All rights reserved


The second possible meaning relies on the fact that an absolute limit coupled with a sampling rate actually
defines candidate pairings of a percentile and a level of proof for declaring failure. For example, consider 12
samples, and a rule where none of these is permitted to exceed the limit. This outcome is exactly the same as a
limit set, for example, as a 95-percentile concentration that requires that sampling demonstrate 50 % confidence
of failure before a site is declared to have failed. The same rule, 12 samples and no failures, is also equivalent to
a 99,5-percentile concentration with a 95 % level of proof12. Any number of other pairings or percentile and level
of proof is possible.

This second option, treating the absolute limits as, for example, 99.5-percentiles, is attractive in cases where
limits have been made so strict that occasional failed samples are likely, but where the occasional failure is of
low concern. This option controls the problem, illustrated in Table 9a, that the percentile (and the severity of
the limit or standard) changes with the sampling rate. It also avoids the untidy and uncomfortable alternative of
inventing rules by which operators and regulators discount certain failed samples.

Table 9a — Effect of sampling rate on the severity of an absolute limit

Number of samples Equivalent Percentile Confidence of failure


(%)

4 75-percentile 50

12 95-percentile 50

52 98-percentile 50

In Table 9a an absolute limit of say 10 µg/l, is equivalent to a 75-percentile is assessed from 4 samples, but a
98-percentile if checked against 52 samples. This change in percentile with sampling rate is equivalent,
typically, to a move from 10 to 30 µg/l in the applied limit. This can be an arbitrary and unfair change in
severity.

Similarly, for a fixed percentile, the severity of the standard is increased in terms of the degree of proof required to
produce a report of failure (see Table 9b).

Table 9b — Effect of sampling rate on the severity of an absolute limit

Number of samples Percentile Confidence of failure


(%)

4 95-percentile 81

12 95-percentile 54

52 95-percentile 7

These problems are controlled if the limit is set, as for an ideal limit, to some particular combination of percentile
and level of proof, i.e. the 99,5-percentile with a level of proof of 95 %.

However, it may be the case that the limit can never be failed because it is known to cause immediate damage. In
this case, it may be best to move it to a lower concentration, e.g. 4 µg/l as a 99-percentile instead of 10µg/l as a
“100-percentile”, perhaps retaining the original 10µg/l as an additional control. In this case a failure of the 99-
percentile of 44 µg/l is taken might be taken as an unacceptable risk of getting an actual value of 10µg/l.

If the absolute limit is to be used on its own there is an implication that continuous accurate monitoring is required,
and that there are real-time controls that can prevent exceedence.

12 These conclusions follow from the properties of the binomial distribution. One failed sample in a set of 12 gives a
probability of greater than 95 % that the standard was failed for at least 0,5 % of the time covered by the samples.

© ISO 2003 – All rights reserved 13


Also, it may be that an absolute standard has been set on the understanding that more information than
sampling is used to judge performance. Such extra information may include the fact that high concentrations
cannot occur for the type of process that is being monitored, or because other records kept by plant operators
demonstrate that treatment plants are working well. There is a risk, however, that the sample results are used
outside of these contexts.

In adopting, e.g. the 99,5-percentile instead of the “100-percentile”, the regulator acknowledges that it is
impossible to demonstrate from sampling alone that a limit was met for 100 % of the time. This method, declaring
the percentile point and level of proof, allows the taking of more or fewer samples, whilst retaining the same
severity of standard and level of proof.

The absolute limit has the advantage that lawyers and the public easily understand it. An absolute limit might be
retained in law, but used with a declared policy that the basis for taking a decision, i.e. to prosecute for non-
compliance, treats the absolute limit as a 99,5-percentile.

7.4 Percentage of failed samples

If the concentration in the sample exceeds a certain limit, then the sample can be said to fail. Some water quality
standards are expressed as the maximum percentage of failed samples in a set of samples taken over a period of
time.

This betrays a lack of appreciation of the difference between a statistical population and a set of samples. A
statistical population is the distribution of all the values that actually occur over a period of time, e.g. in one year.
For one year it can be thought of as approximating to the set of error-free results of chemical analysis taken from
totally representative samples taken for each of the 31,536,000 seconds in the year. In contrast there might be
only 12 samples. The results of samples are used to make an estimate for the population.

Limit values should be expressed as a function of the population, e.g. that concentration should be below the limit
should for at least 95 % of the time. They should not be expressed as limits to be met by, e.g. 95 % of samples.
The percentage of failed samples is an estimate of the time spent in failure.

If a limit value is defined as a limit to be met by at least 95 % of samples and 8 % of samples fail, this value, 8 %,
will mean that the site is declared to have failed because 8 % is greater than 5 %, the allowed percentage of failed
samples.

Although the 8 % of failed samples exceeds 5 %, the conclusion that the time spent in failure exceeds 5 %, is
certain only if sampling is continuous, representative and accurate. There may be only 12 samples in a year, and
purely by chance, the true failure rate might have been less than 5 %, but a set of samples with high
concentrations might have been collected. In this case the site under test would be wrongly condemned because
of sampling error (or the decision to take only 12 samples).

Limits defined as having to be met by a percentage of samples should be redefined and treated as the
corresponding percentiles. Confidence limits, or the confidence of failure, should then be calculated and action
taken according to the accepted risk of acting unnecessarily. This was discussed for Tables 1, 2 and 3.

7.5 Calculating limits in effluent discharges

Data collected by sampling is also used to set the limits needed in permits in order to meet environmental
standards. Table 10 illustrates the calculation (by Monte-Carlo Simulation) of the mean and 95-percentile of
discharge quality in order to meet a 90-percentile standard of 1,3 in a river.

These calculations ensure that the permit conditions are justified in terms of being necessary to meet the
environmental requirement, and that they go no further than this. The need to do them reinforces the
requirement for the “ideal limits” discussed in Clause 7.2 and the need to define limits as means or percentiles.
Though these calculations are outside the scope of this guidance.

Environmental problems may be due to short term events. For example a river may be fully saturated with
oxygen for more than 95 % of the time but a few minutes of zero oxygen will kill the fish. The use of a 5
percentile standard in this case works only to the extent that the extreme events are correlated with the 5-
percentile (Clause 6.2) in the context of the safety factors built into the 5-percentile standard. As discussed in
Clauses 6.2 and 7.2, there may be cases where this does not apply, though if this risk is to be lived with and
managed by water quality standards, there is an implication that compliance cannot assessed by sampling,
but requires some form of continuous assessment.

14 © ISO 2003 – All rights reserved


© ISO 2003 – All rights reserved 15
Table 10 — Calculating discharge standards for a river with a 90-percentile standard of 1,3

Input data

Mean river flow upstream of discharge 325,00

5-percentile of river flow 40,00

Mean upstream quality 0,08

Standard deviation 0,07

Mean flow of discharge 59,00

Standard deviation 19,00

Present mean quality of discharge 7,20

Standard deviation 4,10

Results

Mean river quality downstream of discharge 0,63

90-percentile river quality 1,30

River quality standard (90-percentile) 1,30

Required mean quality in discharge 2,51

Required 95-percentile discharge quality 5,22

8 Declaring that a substance has been detected

A variation on the absolute limit lies in answering questions such as: “was the substance detected by chemical
analysis?” This should be answered by calculating, for example, whether there is at least 95 % confidence
that the substance is present, above an agreed limit of detection, for at least 10 % of the time. This can be
done by using Table 1113.

Table 11 — Defining “detected”

Number of samples Minimum number of samples


above the limit of detection

2–3 2

4–8 3

9–13 4

14–18 5

13 These tables use the same methods as before (Appendix 2).

16 © ISO 2003 – All rights reserved


A parallel process might be to ask how many blank samples are needed before it can be concluded that a
substances is not present. Table 12 shows rules for showing that a substance is not present for more than a
specific percentage of the time.

Table 12 — Confirming “absence”

To demonstrate absence for the Minimum number of samples


following percentages of time below the limit of detection

50 5

20 14

10 29

5 59

1 298

9 Detecting change

Sometimes sampling has to be used to demonstrate that water quality has improved or not deteriorated. If the
estimate of the mean was 35 in 2003 and 41 in 2004 this looks like an increase of 30 %.

As before, this conclusion is affected by sampling error. There is a need to calculate the statistical confidence
that the recorded difference is real. A test 14 can be used to calculate the significance of an apparent
difference in the mean. Table 13 gives an example.

In Table 13, an apparent difference in the mean of 25 % is confirmed as significant at a level of confidence of
97 %. This indicates there is a chance of only 3 % that a difference as large as this could arise by chance.

Table 13 — Assessment of change in mean

Mean Standard Number of Confidence of change


deviation samples from 2003 to 2004
(%)

2003 20 10 25 97

2004 25 9 33

Similarly, if the estimate of the 95-percentile was 39 in 2003 and 42 in 2004, this looks like an increase of 8 %.
Again this is the face-value conclusion. As in Table 13, it is necessary to calculate the statistical confidence
that the recorded difference is real, as in the following example of the output from a parametric calculation
(see Table 14).

14 There are lots of tests for detecting change. The above example uses a test suitable for small sets of samples (a T-
test). The aim of the text is to illustrate the need to consider the error using a technique that produces the sort of
information in the tables.

© ISO 2003 – All rights reserved 17


Table 14 — Parametric assessment of change in a percentile

95-percentile Number of Confidence of change from


samples 2003 to 2004
(%)

2003 39 25 72

2004 42 33

Table 14 states that there is a risk of 28 % that the increase from 39 to 42 is due to chance (and a 72%
probability that the change is a real one).

Non-parametric methods can be used to tackle the same issue. The first step is to look at the uncertainties in
the estimate of a limit, L. L might be an estimate of the 95-percentile for 2001, i.e. the concentration exceeded
for 5 % of the time. This estimate might be based on N, the number of samples used to estimate L, and E, the
number of exceedences of L in 2001.

Suppose in 2002 M samples were taken and F exceedences of L are observed. A test15 can be carried out to
compare the proportion of exceedences in 2001, (E/N), with the proportion in 2002 (F/M). Table 15 illustrates
the outcome.

Table 15 — Non-parametric assessment of deterioration

Number of Number of Percentage Confidence of change


samples exceedences failed from 2001 to 2002
samples

(%) (%)

2001 40 3 7,5 93

2002 12 3 25,0

In Table 15, the exceedence rate is apparently higher in 2002 (25 %) than in 2001 (7,5 %). At face value this
is a deterioration. It turns out that there is a chance of 7 % that a discrepancy this big could have arisen by
chance.

In summary, to demonstrate change, confidence of change should always be calculated. This distinguishes
changes that can be ascribed to sampling error, from those that really need attention.

10 Classification

10.1 General

Sometimes water quality is described in terms of a classification system. In such a system there may be sets
of water quality standards, for one or more pollutants. Table 16 illustrates a classification system for a single
pollutant. The class limits might be summary statistics of water quality, such as the 95-percentile.

15 This can be done, for example, using “Fisher’s exact test” for 2-by-2 contingency tables.

18 © ISO 2003 – All rights reserved


Table 16 — Example classification system for a single pollutant using 95-percentile class limits

Class 95-percentile class limits

1 <10

2 10–20

3 20–40

4 40–80

5 >100

To assign the class it might be that an estimate is made of the 95-percentile. If this value were 25 it could be
said that the site was in Class 3 because 25 falls with in the range from 20 to 40 (see Table 16). Again this is
a face-value assessment. The true 95-percentile might have differed from 25, and the true grade might have
been Class 2 or 4. The risk of a mistake is being caused by sampling error.

As in the examples discussed above (for example Tables 2, 3, 4 and 12, 13, 14), the statistical confidence
should be estimated to ensure that water quality has exceeded the water quality standards. In this case this
means the confidence that the 95-percentile exceeded 40, the confidence that it was less than 20, and the
confidence that it lay between 20 and 40. This might give the example in Table 17.

Table 17 — Example classification for a single pollutant using confidence of class (%)

Confidence of class (%)

Class Class Class Class Class


1 2 3 4 5

– 5 56 35 4

In Table 17, there is 56 % confidence that Class 3 is the true class, 35 % confidence that the true class is
Class 4, 5 % that it is Class 2 and even 4 % that it is Class 5, i.e. two classes different from the face-value.
The possibility of Class 1 has zero confidence.

In practice the classification may be based on several pollutants and the “worst” result might be used to define
the class. Table 18 illustrates the case of four pollutants. The fourth sets the face-value class, i.e. Class 4.
This has 68 % confidence. There is 4 % confidence of Class 5 as a result of the third pollutant. The residual
confidence is 28 % (100 - 68 - 4). In this case this residual can be assigned to the worst class of better quality
than the face-value class – this means it is assigned to Class 3.

© ISO 2003 – All rights reserved 19


Table 18 — Example classification system based on four pollutants

Pollutant Confidence of class


(%)

Class Class Class Class Class


1 2 3 4 5

A – 94 6 – –

B – 23 76 1 –

C – 5 56 35 4

D 32 – – 68 –

E – – 28 68 4

The confidence of failure of a target class can be used to rank priorities – to manage the risk of action that
later might turn out to have been unnecessary. For a target of Class 1 and or 2, Table 18 gives 100 %
confidence of failure. For a target of Class 3, the confidence of failure is 72 % [(68 + 4) %].

10.2 Confidence that class has changed

Suppose that the face-value class changed from Class 4 in 1997 to Class 3 in 2002. At face value this looks
like an improvement16 (see Table 19).

Table 19 — Example confidence of class change (%)

Class 1997 2002

1 – –

2 – 40

3 20 60

4 70 –

5 10 –

The confidence of a change from Class 4 to Class 3 is the product of the values of the confidence of class for
Class 4 in 1997 and for Class 3 in 2002. This is 0,7 ×0,6, or 42 %.

All the possible combinations are given in Table 20. This shows 42 % confidence of the change from Class 4
in 1997 to Class 3 in 2002. It also shows 8 % confidence of a change from Class 3 in 1997 to Class 2 in 2002,
12 % confidence that the site stayed in Class 3, 4 % confidence of a change from Class 5 to Class 2. Finally,
there is 6 % confidence and in a move from Class 5 to Class 3.

16 This assumes low numbered classes are of good water quality.

20 © ISO 2003 – All rights reserved


Table 20 — Confidence of a change in class

Class in 2002 Confidence


in 1997
1 2 3 4 5
(%)

1 — — — — — —

Class 2 — — — — — —

In 3 — 8 12 — — 20

1997 4 — 28 42 — — 70

5 — 4 6 — — 10

Confidence — 40 60 — —
in 2002

The sum of the numbers in the diagonal (dark) squares in Table 20 gives the overall confidence of no change
in class. This is 12 %, i.e. the confidence that the site stayed in Class 3. The entries are zero for no change
from Class 1, 2, 4 or 5.

The sum of the adjacent lower (light) squares gives the confidence of an upgrade. There is 50 % confidence
of an improvement by one class. This is made up of a 42 % confidence of a change from Class 4 to Class 3
and an 8 % confidence of a change from Class 3 to Class 2.

The shaded portion of Table 20 shows that 88 % is the confidence of an improvement of one class or more17.
Following this logic, the situation can be summarised as in Table 21. This shows 50 % confidence that quality
improved by one class and 34 % confidence that the improvement was by two classes.

Table 21 — Example confidence of change in class (%)

Change Confidence (%)


Down 2 classes —
Down 1 class —
No change in class 12
Up 1 class 50
Up 2 classes 34
Up 3 classes 4
Up 4 classes —

The data can also be presented as an accumulating sum, from the bottom, stopping at the middle, to give the
numbers in Table 22.

17 Similarly the sum of the upper diagonals gives the confidence of an overall drop in class, which is zero.

© ISO 2003 – All rights reserved 21


Table 22 — Confidence of a change in class

Change Confidence (%)


No downgrade 100
Up at least one class 88
Up at least 2 classes 38
Up at least 3 classes 4
Up at least 4 classes —

Tables 17 to 22 are real examples from the management of river water quality.

It may be that over a 5-year period that different methods and instruments were used. Random errors in
these will come through in the analysis. If in Table 18 the results were based on fewer sample, or on less
accurate methods of chemical analysis, this will come though as a wider spread of the probabilities of class
and a reduced ability of picking of changes in class between 1997 and 2002. On the other hand the risk is
controlled that expensive action to improve water quality, or complacency that all is well, is caused by errors in
monitoring.

A more difficult issue occurs if the methods in 1997 were biased or based on unrepresentative samples. This
undermines the assessment of change but a lack of knowledge of the bias is no excuse for failing to assess
the impact of sampling error itself.

22 © ISO 2003 – All rights reserved


Bibliography

[1] ISO 5667-1, Water quality — Sampling — Part 1: Guidance on the design of sampling programmes.

[2] Council directive of 21 May 1991 concerning urban wastewater treatment (91/271/EEC), O.J. L 31 of
5.2.1976.

N L Johnson and B L Welch (1939). Applications of the Non-central t-Distribution. Biometrika, 31, 362-389.

E S Pearson and H O Hartley (1972). Biometrika Tables for Statisticians. Volume II. Cambridge University Press.

[3]

© ISO 2003 – All rights reserved 23


Annex A
(informative)

Calculation of confidence limits in clause 6.4

A.1 Clause 6.4 used as an example a standard Parametric Method (Method of Moments, described below)
to estimate confidence limits around percentiles. The values of m and s are converted to the values for the
logarithms of the data using the Method of Moments:


 m 
M = ln  

 (1 + s 2
m 2
) 

S = ln(1 + s 2 m 2 )
A.2 M and S stand for estimates of the mean and standard deviation of the logarithms of the data. The
characters, ln, denote the natural logarithm. The Face-value estimate of the 95-percentile{ XE "90-percentile"
} is then:

q = exp( M + 1.6449S )
A.3 To calculate confidence limits the factor 1.6449 is replaced with t0 a value which depends on the
sampling rate. The values of t0 are given by:

δ + λ (1+ δ 2 / 2 f − λ2 2 f )
t0 =
n(1− λ2 2 f )
where f is the number of degrees of freedom, in this case n-1, where n is the number of samples. Also,
in this equation:

δ =z n
and z is the Standard Normal Deviate{ XE "Normal Deviate" }: 1.6445 for the 95-percentile{ XE "95-
percentile" }.

A.4 The value of λ approximates to the Standard Normal Deviate{ XE "Normal Deviate" } for the confidence
limit used to define the Optimistic Confidence Limit. For 95% confidence, λ approximates to 1.6445. The true
value of λ is calculated more precisely as a function of f and z.

24 © ISO 2003 – All rights reserved


Annex B
(informative)

Calculation of confidence limits in clause 6.5

B.1 If a water achieves the limit for a proportion, p, of the time, the chance that f sample will fail out of a total
of n is R in:

n- f f
n! p (1 - p)
R =
f! (n- f)!
B.2 In the equation the term f! is f factorial. Where f is 4, f! is 4 times 3 times 2 times 1. Similarly n! is n
factorial etc.

B.3 Usually we know that f samples have failed out of a total of n and we want to estimate the proportion of
time, p, that the water failed and compare this with, say, a proportion 0.95 (or 95 per cent) that is the standard.
(A 95-percentile is a value that is exceeded 0.05 of the time)

B.4 A face value estimate of p is given by the proportion of failed samples, or f/n.

B.5 A confidence limit on the estimate of p is obtained by summing the f+1 values of R calculated by the
above equation for numbers of failed samples of 0,1.2,....,f. In this we make an initial guess of the values of p.
This gives:

[Total R] = R0 + R1 + R2 + … + Rf

Where, for example R3 is the estimate of R for 3 failed samples in n.

B.6 By iteration, the value of p is calculated which gives a sum of the values of R, [Total R], equal of 0.95.
This value of p is the 95% confidence limit.

B.7 To derive a look up table we set p to 0.95 (for a 95-percentile standard) and choose a value of n, say 20
samples. We then work out [Total R] several times - for zero failed samples, f=0, one failed sample, f=1, two
failed samples, f=2, and so on. We continue until we first get a value of [Total R] that exceeds the required
degree of confidence that the 95-percentile has been failed. One less sample, f minus 1 gives the maximum
permitted numbers of samples. These outcomes for calculations for 20 samples is shown in table B.1:

Table B.1 Failed samples

Number of failed Total R


samples
0 0.0
1 35.85
2 73.58
3 92.45
4 98.41
5 99.74

The entire process is repeated for other sampling rates to give the full look-up table.

© ISO 2003 – All rights reserved 25


Overall GE • More explanation is required. I’ve assumed these first two
bullets can be addressed by my
action on the points made for
particular clauses

• A number of examples were felt by Looking at the details it seems


several Canadian scientists to be that the techniques are regarded
incorrect. as too simple, or less than
optimal, rather than incorrect. I
have tried to amend the text to
deal with this.

• Some approaches were, shall we I’m happy to add examples. All


say, obscure. the techniques described are
used to take lots of regulatory
decisions.

• Overall opinion was that The purpose of the standard is


assessment standards were a to point out the need to take
good idea and would be very account of statistical principles.
useful, but a huge amount of work It uses examples but these are
would be necessary to organize not meant to be exhaustive or to
the necessary background perform the role of a textbook.
material. Other examples would be
welcome.
Overall TE • The use of a control chart I’m not sure how to do this.
approach would improve the Happy to consider contribution
clarity and applicability of the from TE (Canada)
standard.

• Comparisons of continuous vs. I’m don’t know what is meant


discrete data would also be useful. here

• There is a need to refer to many I’ve tried to action this in terms


supplementary aspects, including of mentioning the risks. Not
type of risk (severe and acute vs. sure how to action severe and
weak and long-term effects), acute vs. weak and long-term
circumstances/purposes of effects.
decision making (long history at a
site vs first examination.)
Terms and 1st para. ED • Reminder about fixing XXXX
Definitions
5.1 General 4th para., ED • Suggest change “day to day” to I disagree. I’ve tried to clarify the
1st “hour to hour” since that ties better text.
sentence to the diurnal cycles discussed in
following sentence.

• Sometimes there are second to


second changes so why not delete
that discussion.
6.1 all TE • Numerous over-simplifications. I’ve tried to point these out.
Estimation
of summary • Why has analytical and sampling Perhaps the text should be
statistics error suddenly become sampling clearer here. I’ve tried to
(same error? improve it.
comments
for 6.2)
• No such thing as “error-free” I was trying to make a point that
monitoring – possibly “mistake even sampling and analysis
were perfect there would still be
free”? were perfect there would still be
sampling error. Many decision
makers act as if ignorant of this
and appreciate analytical error
but ignore sampling error.

• Need to emphasize that argument Many data sets are sufficiently


is theoretical and for a stationary stationary and random in the
random process. context of making regulatory
decisions. Text changed.

• Please avoid use of loaded terms I find upper and lower get
such as “optimistic” and muddled for water quality
“pessimistic” – use upper and standards that must be
lower. exceeded. Not yet actioned.

• No mention of common problems I have tried to do this within a


of left-censoring, quantitative theme that errors are bad
versus semi-quantitative data, enough anyway. If these
pseudoreplication, lack of complications apply as well the
homogeneity, non-normal errors will be that much larger.
population distributions, data If we need better estimates of
gaps, mixed analytical & sampling the error we shall need other
methods, lack of independence, techniques.
lack of representativeness,
outlying values.

• Also equations or algorithms This plus the other comments


should be provided for hypothesis would lead to a text book.
testing. Would this not encourage users
to use the techniques? Have
added two Appendices.

6.3 2nd TE Insert discussion of one-tailed versus Not yet done. Not sure what to
Confidence paragraph two-tailed tests do. Suggestions?
of Failure
6.4 2nd para. TE While one can’t disagree with an The text is weak if the
Methods assumption of log-normal distribution discussion of this example gives
for in this particular example, the an impression that this
percentile weakness of having this discussion in technique is best and universal.
standards a standard is that many readers might I’ve tried to change it. But I’m
infer that log-normal is commonplace concerned not to frighten people
and will use the example method in away from using techniques in
inappropriate situations. cases where log-normal is
reasonable in the context of the
decision being considered. In
my experience log-normal
assumption is also useful as a
first step in quantifying
uncertainty.

Suggestion: add flow chart (plus I’m not qualified to do this.


explanatory text) to direct users when Happy to consider contribution
to use different statistical from TE (Canada)
manipulations, tests of assumptions
and statistical tests
6.4 Footnote ED Suggest a more detailed explanation Appendix added
Methods 4 be provided.
for
percentile
standards
6.4 (example TE Need clearer examples. Also please I think this a misguided request
Methods s) do a word processor search to find but I have acted on it.
for and delete all discussions of
percentile “lucky” and “unlucky”!
standards
Better to use figures rather than tables I’ve thought about this but I can’t
for the examples see how to do it. Suggestions?
6.5 Non- overall TE These individual methods be replaced Above answers apply.
Parametric by a flow-chart schematic leading
Methods users to appropriate assumption
testing and choice of statistical
methods.
6.5 Non- Paragrap TE Delete since not true. (Rather an old- Not always true, but more often
Parametric h 3 and fashioned view.) true than not?
Methods accompa
nying
note
6.5 Non- Table 1 ED • Again, use of a figure would be Above answer applies
Parametric much clearer.
Methods
6.5 Non- Footnote ED • More detailed explanation required Appendix added.
Parametric 5
Methods
6.5 Non- Final TE • Add explanation detail regarding Paragraph added
Parametric paragraph meaning and practical
Methods consequence of term “confidence
of failure”.

• Also is duration of failure Text added to Section 6.2 and


important? elsewhere.
6.6 Look- overall TE • One-tailed test re success Example added.
up table mentioned but without example;

• Two tailed tests not discussed.

• Type 1 and type 2 error not Example added


discussed.
7.3 overall TE/ • Good discussion. Should specify Not sure how to action.
Absolute ED N a priori.
Limits
• Some Canadian provinces have There will be no problem if
used maximum limits without standards are very lax or if
being overwhelmed by non- water quality is very good
compliance as this section infers. compared with the standard.

• Most importantly, maximum limits Agreed and covered. But they


are straightforward to explain to a are not fair if failure can be
judge in court. generated simply by taking more
samples. Can keep the
maximum for legal purposes but
make the decision to prosecute
depend on statistically
significant failure of the
percentile.
• This section would be greatly Footnote added that allows this
enhanced by replacing table with
detailed procedure/algorithm (or
spreadsheet procedure) so people
can generate Probability of
reporting failure and confidence of
failure for any N.
7.4 overall TE • Canadian experience is that failure More details please?
Percentage in some circumstances is more
of failed serious than failure in other Agreed the use of seasonal
samples circumstances. standards though with a regular
season cycle this can be
• Seasonal limits can be established equivalent to an annual
to reflect this reality, whereas percentile (in which the 95-
annual average limits mix percentile value will tend to
important and trivial failures. occur, say, in summer)

Failure of a mean standard


Should alert to risk of damage
that requires a response. If
statistically significant failure of
the annual mean is of no
consequence then the form of
standard is not right in this case.

7.5 2nd ED • Not sure what this means Text expanded.


Calculating paragraph
discharge
standards
7.5 overall TE • The flaw in this strategy is that Text added to explain. The
Calculating environmental problems often may system should be set up so as
discharge be due to short term events. For ensure that the failure of the
standards example a river may well be fully percentile indicates an
saturated with oxygen 90% of the unacceptable risk of actual kills.
time or more but a few minutes of This risk can be acted on before
zero oxygen will kill the salmonid real kills happen, hopefully. A
fish. percentile standard depends on
a rough stability in the statistical
distributions over time and will
not work for a pristine water
subject only to a risk of 1-in-10
year accident.
8 overall TE • Suggest add consequence of false Or do you mean this in the
Declaring… positives and false negatives. The sense of Type I and Type II - a
likelihood of false positives and test to confirm the chemical is
false negatives varies greatly with not present. Section added on
definition of detection limit. latter.
9 Detecting overall TE • t for small samples, Z for large. Text added to indicate the
change Overall, there are other more breadth of approaches.
appropriate approaches.

You might also like