You are on page 1of 9

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/318106664

On Cleaning Validation Recovery Studies: Common Misconceptions

Article · June 2017

CITATION READS

1 7,047

1 author:

Mohammad Ovais

18 PUBLICATIONS   49 CITATIONS   

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Cleaning Validation for the 21st Century View project

Data-derived recovery factors for cleaning validation studies View project

All content following this page was uploaded by Mohammad Ovais on 03 July 2017.

The user has requested enhancement of the downloaded file.


Published on IVT Network (http://www.ivtnetwork.com)

On Cleaning Validation Recovery Studies: Common Misconceptions

By Mohammad Ovais Jun 22, 2017 7:51 am PDT

Abstract

The reliability of cleaning validation results depends on the validity of sampling procedures used. In order
to ensure that the sampling procedures are suitable for their intended purpose, these procedures are
qualified under simulated laboratory conditions, using recovery/spiking studies. Unfortunately, there are a
number of misconceptions about recovery studies that could potentially result in incorrect design of these
studies and misleading assessment of the recovery data. This article reviews the current approaches to
qualifying sampling procedures and addresses some of those misconceptions.

Keywords: Cleaning validation; Recovery study; Sampling qualification; Recovery factor

Introduction

The effectiveness of cleaning procedures is assessed by collecting samples from cleaned product contact
surfaces using appropriate sampling methods (e.g., swabbing, rinsing). Prior to their use, it is expected
that these sampling procedures should be qualified to ensure that they are appropriate for the intended
purpose and do not lead to false results (1). In order to demonstrate that the sampling procedures can
recover process residues from the surfaces at an acceptable level, studies are designed to challenge
these procedures under laboratory conditions (2). These studies, often referred to as recovery or spiking
studies, involve recovering known amounts of test substance spiked onto representative surfaces. The
results obtained from the studies are then used for estimating level of recovery, expressed in percent of
spiked level (2). According to current industry practices, various approaches are used for conducting
recovery studies and assessing recovery data (3). However, these approaches remain inconsistent across
pharmaceutical companies, mainly because they are driven by unsound assumptions.

This article reviews current approaches to qualifying sampling procedures and will discuss some of the
common misconceptions related to recovery studies.

Current Approaches to Qualifying Sampling Procedures

Sampling procedures are essentially qualified for accuracy and precision (usually repeatability) through
appropriately designed recovery studies. In these studies, known amounts of test substance are spiked
onto representative surface(s) by spreading solution aliquots of analyte over a defined area (e.g., 5cm x
5cm). The spiked surfaces upon drying or holding for a certain time, are sampled with appropriate method
(e.g., swabbing, rinsing). The collected samples are analyzed and the resulting amounts recovered are
then used for evaluating and determining effectiveness of the procedure. Replicates and factors such as
samplers and materials of construction are often included in the study design (2, 4, 5). The most critical
element of recovery studies is the choice and number of spiking level(s) i.e., the amount(s) of substance to
be spiked. In practice, two approaches are used for selecting spiking levels: In one, the surface is spiked
with the test substance at only 100% of the area normalized acceptance limit (i.e., surface residue limit in
µg/cm2). For example, if the surface residue limit is 0.5 µg/cm2, then for a surface of area 25 cm2
an amount equivalent to 12.5 µg is spiked. In other studies, multiple levels of the test substance around
the surface residue limit are spiked (4, 5). The Parenteral Drug Association’s (PDA) guidance on cleaning
validation (2) strongly recommend the first approach and assert that, owing to inherent variability
associated with sampling procedures, the information obtained from multiple spiked levels is of little value.
A recent article (6) on the subject maintained that the most appropriate level for spiking is surface residue
limit. However, for safer products (where the residue might be visible at the surface residue limit), the
article proposed spiking at levels around the visible residue limit. Another article (7) suggests evaluating
recovery at expected residue levels in addition to levels around acceptance limit, and that spiking levels
should preferably be multiples of limit of quantitation.

Percent recoveries are estimated from the ratio of amounts recovered to amounts spiked. The recovery (or
correction) factor is computed from these recovery percentages by applying a suitable approach. Some of
these approaches, as reported in the literature (2, 3, 4, 8), are summarized in Table I. The Active
Pharmaceutical Ingredients Committee (12) and PDA (2) guides recommend correcting analytical results
when the recovery factor is found to be ? 50% and lower than 90% and 70%, respectively. Recoveries
below 50% are generally considered “questionable” (12, 13). In addition, coefficient of variation (CV) as a
measure of precision is also estimated from the recovery data. CV values of <15% (8) or ?30% (2) at any
spiked level have been suggested as acceptable levels of precision.

Previous work by a number of authors (7, 9-11) has suggested that the data obtained from recovery
studies, involving multiple spiked levels, are also evaluated for linearity. There exists many literature on
linearity aspect of recovery studies [e.g., (14-18)], explicitly discussing its relevance and expectations.
Some of which maintained that due to high variability commonly seen in swab recovery data, linearity is
not expected (14, 15) and attempting to establish linearity is not of any practical value (17). The
conclusions presented in these works were primarily based on the assumption that for demonstrating
linearity, one has to establish a relationship between percent recoveries and spiked levels (16, 18).

Common Misconceptions

Recovery studies are used for qualifying sampling procedures, but there are some common
misconceptions concerning them that need to be dispelled. The following is a list of common
misconceptions and their rebuttals:

1.Spiking at multiple levels does not yield any meaningful information.

The first misconception is about the number of spike levels used for recovery studies. It is often assumed
that spiking at multiple levels does not yield any meaningful information and that spiking should only be
done at surface residue limit. The argument often presented in support of this assumption is that sampling
procedures are inherently highly variable (2). The primary objective of recovery studies is to assess the
type (i.e., constant or proportional or both) and extent of bias, and if significant bias is observed, a
correction factor is estimated from the data. However, spiking at only one level would fail to achieve this
objective for three reasons: (a) data generated from such a study can provide some information, useful
though incomplete, regarding constant bias only, and not proportional bias; (b) such studies do not provide
any information on how the amounts spiked and found are related to each other (i.e., are they linearly or
non-linearly related? Are amounts found increasing or decreasing function of amounts spiked?); and (c)
the recovery factor derived from such a study may not accurately reflect the recovery for levels lower or
higher than the level considered for the study, thereby possibly yielding false-negative or false-positive
results. In a cGMP Warning Letter (19) issued to a pharmaceutical company, the US Food and Drug
Administration questioned the firm’s practices for validating swabs methods, noting that “the recovery
swab studies to validate the swab methods did not provide sufficient data to demonstrate manual recovery
variability”. Although the details of this inspection finding are not known, this citation points to an important
fact that recovery variability cannot be assumed, but has to be empirically demonstrated.

2.Recovery data are not expected to be linear.

This misunderstanding is evident in some of the literature [e.g., (14-15)] as well, the implication being that
spiking at more than two levels should not be carried out. The argument, however, is debatable as it is
based on the inaccurate premise that recovery linearity is equivalent to demonstrating linear relationship
between percent recoveries and spiked levels. Plots of percent recoveries versus spiked levels do not
always yield sufficient information that could be used for evaluating recovery data. Furthermore, these
plots may not always demonstrate that a relationship exist. This is because (percent) recoveries are ratios,
a slight deviation from the expected value at lower spiked level would result in recoveries far from 1 (or
100%) while similar deviations at higher levels might not show the same effect. Figure 1 summarizes
recovery data from some published studies (20-23). The upper row of this figure shows the plots of
percent recoveries versus spiked levels (Figure 1a-d), and the bottom row shows the plots of amounts
spiked versus amounts found (Figure 1e-h). As seen in Figure 1a-d, except for the plot in Figure 1c, there
is no apparent linear or non-linear relationship between the two variables. On the other hand, the plots
(Figure 1e-h) of the same recovery data, with amounts found plotted against amounts spiked, show that
the two variables are linearly related. From these examples, it is thus clear that published studies do not
support the notion that the recovery data are not expected to be linear, and when assessing recovery data
for linearity (or non-linearity or atypical trend), it is more logical and rational to plot amounts spiked against
amounts found.

3.Percent recoveries tend to decrease with increasing spiked levels.

From the literature (15, 17), it is apparent that this argument is based on ahypothetical assumption, and
not on scientific data. The plots shown in Figure 1b-c, where mean percent recoveries can be seen
increasing with increasing spiked levels, call into question this assumption. Again, as stated earlier,
understanding the relationship between amounts spiked and amounts found provides for an empirical
assessment of recovery data – in particular linearity or non-linearity of these results. In cases where the
relationship is found to be linear, the value of slope can indicate if the two variables are increasing
(slope>1) or decreasing (slope<1) functions of each other.

4.Any approach can be used for estimating recovery factors.

Another misconception concerning recovery studies is that any approach can be used, or a one-size-fits-
all approach exists, for setting recovery factors. Currently, various approaches are used for estimating
recovery factors (see Table I). Among them, the most commonly used are the ones based on the lowest
(overall or mean) or average recovery values (3). However, the major drawback of these approaches is
that the estimated recovery factors do not reflect the variability observed in the recovery data and worst-
case values are typically assumed, but not empirically derived. An example of such a case is illustrated in
Figure 2, which shows recovery results of two sampling personnel. As seen in the figure, since all the
recovery values lie above the PDA’s recommended 70% acceptability criterion, one may decide not to
estimate a recovery factor. However, looking at the lower, one-sided 95% confidence interval of the overall
mean recovery (which exceeds the 70% limit), would that still be a valid reason for not correcting
measured results? Similarly, if we consider 90% as the benchmark for setting recovery factor, what should
be the recovery factor then? Overall minimum/average or lowest average? As these approaches do not
account for variability in recovery data, none of them, in author's view, should be used for setting recovery
factors. Instead, recovery factors should be data-derived, incorporating the measurement variability. Since
recovery factors apply to future measurements, factoring in the uncertainty information while estimating
recovery factors will always result in consistent and reliable estimates of amounts recovered. For instance,
using the lower confidence interval of mean instead of mean recovery itself, in most cases, would yield
more reliable estimates of recovery factors than the ones obtained using any of the current approaches.
Unfortunately, to author’s knowledge, there is no discussion in the literature on how to incorporate these
uncertainty values in recovery factor estimation.

5.Lowest (overall or mean) recovery represents worst-case.

This argument is frequently found in discussion forums online and in other literature. The argument is often
given to support the use of lowest recovery as correction factor. The statement clearly indicates that there
is a lack of understanding of statistics and its application to recovery data analysis. If one observes that
the lowest overall and lowest mean recoveries are, respectively, 0.60 and 0.65, but the lower one-sided
95% confidence limit of mean recovery is 0.55, will then the former two still represent the worst-case? A
significant limitation of basing recovery factors on lowest values is the possibility or risk that an outlier,
which may otherwise be addressed, could be selected as the recovery factor, while penalizing higher
recoveries. Even Forsyth (8) argued against the use of lowest recovery, suggesting that the approach is
not based on science and statistics.

6.Recoveries below 50% are unacceptable.

The misconception about the “questionable” recoveries has mainly emanated from various guidelines on
cleaning validation, such as the one from World Health Organization (13). Though these numbers (50% or
70% or 90%) are widely used in the pharmaceutical industry, there is no underlying scientific basis for any
of them. From any perspective, it is difficult to understand: why would recoveries of 52% be acceptable
and recoveries of 48% would not? There are a number of factors - such as surface characteristics (e.g.,
material of construction), limited choices of sampling solutions, spiked amounts - which are not related to
the sampling method or technique itself, but could contribute to poor recoveries. For example, very low
spike levels, due to significant analyte loss during sample preparation or adsorption of the analyte onto the
spiked surface, may result in very low recoveries. In such cases, limiting recoveries to ? 50% would only
result in unnecessary delays and energies spent on developing alternative sampling methods. The
decision to correct analytical results for recovery should be data-driven, if the recovery values are found
statistically significantly different from 1, a recovery factor should be estimated from the data irrespective
of the level of recovery.

Conclusions
In the validation of cleaning procedures, qualification of sampling methods is a regulated step, the success
of which determines the acceptance of validation data by stakeholders such as quality assurance and
regulators. It is therefore important that the qualification studies should be designed and evaluated based
on sound scientific principles, and not mere assumptions, with the objective of understanding variability
and uncertainty in recovery results.

I conclude with the following recommendations for designing recovery studies and evaluating recovery
results:

1. Recovery studies should include at least three or more spike levels, with sufficient number of

replicates at each level, for estimating between- and within-level variability.


2. Amounts spiked and amounts found should be plotted to assess if any relationship exists between

the two variables.


3. Decisions to accept recovery results should not be based on individual data points or observed

mean recoveries, but on the spread of the confidence limits.


4. Estimation of recovery factors should be data-driven, accounting for sampling uncertainty and

variability.

Acknowledgements

I thank Dr. Michel Crevoisier, Osamu Shirokizawa and Andrew Walsh for critically reviewing the
manuscript and for their helpful discussions and suggestions.

Conflict of Interest Declaration

The author declares that he has no competing interests.

References

1. FDA,Guide to Inspections Validation of Cleaning Processes, July 1993,

www.fda.gov/ICECI/Inspections/InspectionGuides/ucm074922.htm (accessed May 12, 2017).


2. PDA,Technical Report No. 29 (Revised 2012) - Points to Consider for Cleaning Validation, 2012.
3. D.A. LeBlanc, “PDA Survey Results: Cleaning Validation Sampling Recovery Practices,”PDA Letter

. 45(7), 9-15, 2009.


4. J. Ermer, “Accuracy,” inMethod Validation in Pharmaceutical Analysis. A Guide to Best Practice

, J. Ermer, J.H. McB. Miller, Eds. (Wiley-VCH, Weinheim, 2005), pp. 74-76.
5. R.B. Kirsch, “Validation of Analytical Methods Used in Pharmaceutical Cleaning Assessment and

Validation”,Analytical Validation in the Pharmaceutical Industry, Supplement toPharm. Tech. pp. 40-
46, Feb., 1998.
6. R.J. Forsyth, “Rethinking Limits in Cleaning Validation,”Pharm. Tech. 39 (10), pp. 52–60, 2015.
7. M. McLaughlin, “Pharmaceutical Cleaning Validation Method References for Alconox, Inc.

Detergents,” Alconox Inc., 2001,


www.alconox.com/Resources/Web/PharmaceuticalCleaningValidation.aspx (accessed Jan 3, 2017).
8. R.J. Forsyth, “Best Practices for Cleaning Validation Swab Recovery Studies,”Pharm. Tech. 40 (9),

40-53, 2016.
9. C. Glover, “Validation of the total organic carbon (TOC) swab sampling and test method,”PDA J

Pharm Sci Technol. 60(5), 284-90, 2006.


10. K. Bader, J. Hyde, P. Watler, and A. Lane, “Online Total Organic Carbon (TOC) as a Process

Analytical Technology for Cleaning Validation Risk Management,”Pharm. Eng. 29(1), 8-20, 2009.
11. S. Lombardo, P. Inampudi, A. Scotton, G. Ruezinsky, R. Rupp, and S. Nigam, “Development of

surface swabbing procedures for a cleaning validation program in a biopharmaceutical


manufacturing facility,”Biotechnol. Bioeng., 48(5), 513–519, 1995.
12. European Chemical Industry Council/Active Pharmaceutical Ingredients Committee,Guidance on

Aspects of Cleaning Validation in Active Pharmaceutical Ingredient Plants, APIC, 2014.


13. World Health Organization, “Appendix 3 Cleaning Validation of Annex 4 Supplementary Guidelines

on Good Manufacturing Practices: Validation,” Technical Report Series 937, World Health
Organization, 2006.
14. D.A. LeBlanc, “Dispelling Cleaning Validation Myths: Part II,”Pharm Technol Europe, 17(12), pp. 45-

47, 2005.
15. D.A. LeBlanc, “Spiking Amounts for Sampling Recovery Studies,” inCleaning Memos, Cleaning

Validation Technologies, Kodak, TN, July 2007.


16. D.A. LeBlanc, “Revisiting Linearity of Swab Recovery Results,” inCleaning Memos, Cleaning

Validation Technologies, Kodak, TN, July 2009.


17. D.A. LeBlanc, “Swab Sampling Recovery as a Function of Residue Level,” inCleaning Memos

, Cleaning Validation Technologies, Kodak, TN, October 2010.


18. D.A. LeBlanc, “Revisiting Linearity of Recovery Studies,” inCleaning Memos, Cleaning Validation

Technologies, Kodak, TN, September 2013.


19. FDA, Warning Letter issued to Capricorn Pharma, Inc., April 20, 2010,

www.fda.gov/ICECI/EnforcementActions/WarningLetters/2010/ucm227896.htm (accessed Jan 3,


2017)
20. T. Mirza, M.J. Lunn, F.J. Keeley, R.C. George, and J.R. Bodenmiller, “Cleaning Level Acceptance

Criteria and a High Pressure Liquid Chromatography Procedure for the Assay of Meclizine
Hydrochloride Residue in Swabs Collected from Pharmaceutical Manufacturing Equipment
Surfaces,”J Pharm Biomed Anal. 19(5), 747-56, 1999.
21. T. Mirza, R.C. George, J.R. Bodenmiller, and S.A. Belanich, “Capillary Gas Chromatographic Assay

of Residual Methenamine Hippurate in Equipment Cleaning Validation Swabs,”J Pharm Biomed Anal
. 16(6), 939-50, 1998.
22. C. B'Hymer, T. Connor, D. Stinson, and J. Pretty, “Validation of an HPLC-MS/MS and Wipe

Procedure for Mitomycin C Contamination,”J Chromatogr Sci. 53(4), 619-24, 2015.


23. R. Raghavan, M. Burchett, D. Loffredo, and J.A. Mulligan, “Low-Level (ppb) Determination of

Cisplatin in Cleaning Validation (Rinse Water) Samples. II. A High-Performance Liquid


Chromatographic Method,”Drug Dev Ind Pharm. 26(4), 429-40, 2000.

Table I: Some of the approaches for setting recovery factor(s)

Approach Description

Overall minimum The lowest of all the recovery values, regardless of the c

Lowest mean The lowest of all the mean recoveries

Overall / Grand mean The mean of the pooled recoveries

Recovery at the surface residue limit Lowest or mean recovery at cleaning acceptance limit

Spiked level or concentration dependent recovery factor(s) A variable recovery factor depending on the measured r
Figure 1. Analysis of some published recovery data*

* Data source: Plots (a) & (e) - Recovery from spiked stainless steel surface, Mirza et al (20); Plots (b) &
(f) - Recovery from spiked stainless steel surface, Mirza et al (21); Plots (c) & (g) - Recovery from spiked
stainless steel surface using Texwipe swab, B'Hymer et al (22); Plots (d) & (h) - Recovery from spiked
stainless steel surface, Raghavan et al (23).

Figure 2. Graphical summary of recovery results

The (+) symbol in the figure denote average of recoveries. The dashed line shows the lower one-sided
95% confidence limit of the average recovery and the solid vertical blue line shows the PDA’s 70%
recovery acceptability criterion.
Source URL: http://www.ivtnetwork.com/article/cleaning-validation-recovery-studies-common-
misconceptions

View publication stats

You might also like