0% found this document useful (0 votes)
62 views28 pages

Bayesian Probabilistic Modeling in Pharm

Uploaded by

Harold fotsing
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
62 views28 pages

Bayesian Probabilistic Modeling in Pharm

Uploaded by

Harold fotsing
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

PERSPECTIVE AIChE Journal

DOI10.1002/aic.16744

Bayesian Probabilistic Modeling in Pharmaceutical Process Development

On occasion of her 2018 Industry Leadership Award of AIChE

Jose E. Tabora, Federico Lora Gonzalez, and Jean W. Tom


Chemical & Synthetic Development, Product Development
Bristol-Myers Squibb Company

Keywords: pharmaceutical, Quality by Design, QbD, process capability, process robustness, Bayesian statistics,
probabilistic modeling, risk assessments, design space, product specifications, data science

Correspondence concerning this article should be addressed to Jean Tom at jean.tom@bms.com.

Introduction

The first AIChE perspective discussing the state and future of Pharmaceutical Process Development was
published December 2006, asking the question ‘Can Pharmaceutical Process Development Become
High Tech?’[1] This perspective came out shortly after the Food and Drug Administration (FDA)
published its guidance on the Quality by Design (QbD) initiative[2] which emphasized a systematic
approach to pharmaceutical development and manufacturing through greater scientific understanding of
the process and improvements in the ability to control, with a basis from quality risk management. The
goal was to promote a more efficient, agile, flexible pharmaceutical manufacturing sector that reliably
produces high quality drugs without extensive regulatory oversight.[3] Three key elements advocated in
this article are now commonplace in pharmaceutical development across the industry: 1) the approaches
proposed by the guidance which included use of statistical, multivariate analysis, use of design of
experiments, and an emphasis on physical organic chemistry and kinetics, 2) the use of modeling tools
to reduce empirical experimentation, verify mechanistic understanding and process control, and 3) the
use of parallel experimentation using automated work streams for process options, design space
exploration and productivity.[4]

This article has been accepted for publication and undergone full peer review but has not been through the
copyediting, typesetting, pagination and proofreading process which may lead to differences between this
version and the Version of Record. Please cite this article as doi: 10.1002/aic.16744

© 2019 American Institute of Chemical Engineers

This article is protected by copyright. All rights reserved.


In the ensuing 13 years, additional AIChE perspectives of significance for pharmaceutical process
development have been written. In 2008, Variankaval, Cote and Doherty provided a perspective on the
state of crystallization of active pharmaceutical ingredients, tying elements of QbD and highlighting
opportunities to advance polymorph selection, prediction and detection.[5]

In the last 3 years, 5 additional AIChE perspectives published were highly relevant to pharmaceutical
process development, covering recent trends: continuous processing and manufacturing, applications
and impact of QbD and data science. Ierapetritou, Muzzio and Reklaitis gave a perspective on
continuous manufacturing processes to make the pharmaceutical drug product, focusing on powder-
based product manufacturing which incorporated many of the key elements of QbD: online
measurements, process modeling, and process control.[6] They forecasted that success in continuous
drug product manufacturing would help accelerate similar efforts in continuous processing for small
molecule and biologic active ingredients and other pharmaceutical product forms. Jensen’s perspective
on the state of flow chemistry and micro reaction technology [7] covered the expansion of flow
processes and technologies to enable continuous multi-step synthesis to make active pharmaceutical
ingredients and demonstration of on-demand continuous production of pharmaceutical compounds, but
highlighted gaps in analytical chemistry, work-up unit operations, handling of solids and automation
needed to further advance the adoption of continuous processing. Most recently, Collins’ perspective [8]
looked at the impact of the regulations and FDA guidance on the state of Quality by Design and the
opportunity for chemical engineers to drive advanced process control to achieve the ultimate goal of
QbD: enabling a much higher level of quality in pharmaceutical manufacturing. The fourth article of
interest to pharmaceutical process development was Beck, Carothers, Subramanian and Pfaendtner’s
perspective [9] on data science in accelerating innovation and discovery in chemical engineering. While
Beck et al. discuss this in the context of research and advances in the field of computational molecular
science and engineering and the field of energy systems and management, we see similar impact of data
science in driving advances for pharmaceutical process development and towards the goals of QbD in
manufacturing processes. Beck et al. looked at the three aspects of data science: data management,
statistical and machine learning, and visualization, and make the case that chemical engineers are often

This article is protected by copyright. All rights reserved.


in the position of being asked to manipulate, transform and analyze complex data sets whether they are
working in discovery, research and development or manufacturing. The fifth article, recently published,
by Venkatsubramanian[10] on Artificial Intelligence in Chemical Engineering, reviews the phases of
this area over the past 40 years and discusses the possible directions to develop AI-based models and use
machine learning. Given these previous articles, we would like this perspective to focus on the
opportunity of chemical engineers in connecting specific aspects of data science to the goals of Quality
by Design in pharmaceutical development. Specifically, we see the opportunity to drive a probabilistic
modeling approach using Bayesian statistics because it provides a framework aligned with the
pharmaceutical development where the limited process data generated during the development stage
must be used to project risks, and establish controls to meet quality and robustness expectations upon
product launch. This approach is also highly relevant and potentially beneficial for other industries
where projecting manufacturing performance and reliability based on a minimal dataset during product
or process development.

Quality by Design

The FDA introduced the first draft guidance on Quality by Design for comment in 2004 to drive
pharmaceutical manufacturers towards adoption of new technologies to improve its reliability and
productivity and the resulting products’ quality. Quality control in pharmaceutical manufacturing
facilities is based on the regulatory agencies’ guidance which emphasized a set of procedural
approaches, referred to as current Good Manufacturing Practices (cGMPs) regulations. The key
components of current cGMPs are expectations to establish strong quality management systems, obtain
appropriate quality raw materials, establish robust operating procedures, detect and investigate product
quality deviations, and maintain reliable testing laboratories. CGMPs are required as a formal system of
controls which is expected to minimize contamination, mix-ups, deviations, failures, and errors.

Given the regulatory agencies’ responsibility for protecting the public health by ensuring the safety,
efficacy, and security of pharmaceutical products, pharmaceutical manufacturers maintained strict

This article is protected by copyright. All rights reserved.


adherence to cGMPs as paramount for approval of new drug applications. However, such strong focus
on these procedural compliance generally inhibited the rapid adoption of new technologies, particularly
in comparison to other chemical processing industries without such regulatory oversight or procedural
expectations.

The principles and elements of Quality by Design were to drive an approach which would lead to 1)
meaningful product quality specifications that are based on clinical performance, 2) increased process
capability, reduced product variability and defects by enhancing product and process design,
understanding, and control and 3) increased product development and manufacturing efficiencies. [11].
In 2006, QbD was codified into the eighth quality guidance (Q8) by the International Council for
Harmonisation of Technical Requirements for Pharmaceuticals for Human Use (ICH), and is used by
ICH regulators which include the FDA for the US, the European Medicines Agencies for European
Union, and agencies from numerous other countries.

The definition of Quality by Design is now a systematic approach to pharmaceutical development and
manufacturing beginning with predefined objectives, emphasizing product and processes understanding,
and process control, all based on sound science and quality risk management. [12] Some of the
approaches mentioned specifically in the guidance for Pharmaceutical Development Q8 (R2) to enable
this systematic, enhanced understanding of the product and process are summarized in Table 1.

Concepts supporting QbD Definition Engineering Workflow


Quality Target Product The product description that summarizes The basis of design for the development
Profile the characteristics expected during the of the product.
development to respond to the
therapeutic drug target.
Critical Quality Attributes A physical, chemical, biological or Determine whether the product
(CQA) microbiological property or characteristic properties could impact patient
that should be within an appropriate typically purity (safety) and
limit, range, or distribution to ensure the bioavailability (efficacy)
desired product quality.
Risk Assessment A systematic process of organizing Linking material attributes and process

This article is protected by copyright. All rights reserved.


information to support a risk decision to parameters to CQAs and the
be made within a risk management corresponding probability of excursion
process. It consists of the identification of outside the design space.
hazards and the analysis and evaluation
of risks associated with exposure to those
hazards.
Design Space The multidimensional combination and Relationship [f(x)] between process
interaction of input variables (e.g., inputs [x] and the critical quality
material attributes) and process attributes [y] and corresponding
parameters that have been parameter space with acceptable
demonstrated to provide assurance of failure rate
quality.
Control Strategy A planned set of controls, derived from Includes but not limited to
current product and process • Control of input material attributes
understanding, which assures process. • Product specifications
performance and product quality. • Control for unit operations
• In-process or real-time release
testing
• Monitoring program
Table 1. Key concepts supporting Quality by Design approach. Source: Guidance for Industry Q8(R2)
Pharmaceutical Development[2].
This guidance and subsequent ones [13] have provided a framework for the industry to convey its
scientific process understanding as the basis of the manufacturing processes. While there is general
consensus that the overall product, process understanding and manufacturing quality have improved as a
whole, there are still occurrences of product recalls and drug shortages as a result, particularly for older,
legacy products. It is important, at this point, to introduce two important concepts associated with
Quality by Design as we talk about improving manufacturing quality: process robustness and process
capability. The definition of robustness in a process is its ability to demonstrate acceptable quality and
performance while tolerating variability in inputs which may include variability of raw materials,
operating conditions, process equipment, environmental conditions and human factors. [14] Process
capability is a measurable property of a process to the specification, often expressed as a process

This article is protected by copyright. All rights reserved.


capability index or process performance index, and can provide a framework to quantify robustness and
risk.

The FDA has a vision for pharmaceutical manufacturing capability to be much higher (more robust)
with Yu and Kopcha [3] recently proposing that future of pharmaceutical quality should be six sigma (or
3.4 defects per million or ppm) 1, a significant improvement over the ~three sigma (6.7 defects per 100)
currently observed today. Towards achieving the goal of six sigma, Yu and Kopcha discuss the role of
different types of regulation (management-based, means-based and performance-based) play. Of interest
to this perspective is the performance-based approaches which intervene at the output stage and require
the regulated entities to have a high capacity to measure output. This is aligned with newer approaches
to utilize the vast array of data to quantify risk, to calculate process capacity and to measure robustness
utilizing new infrastructures for data science and incorporating such methodologies as Bayesian
statistics and machine learning. It is noteworthy that the language used in the ICH guidelines (Q8, Q9)
invoke eminently probabilistic notions and suggest that the characterization of the process and the
analysis that define the product properties should be approached from a quantitative assessment of the
uncertainties associated with the process outcomes.

A Probabilistic Approach to QbD

To achieve the level of control proposed by Yu and Kopcha [3], and to meet the FDA’s vision for robust
pharmaceutical manufacturing, chemical engineers must adapt the way risk assessments and process
characterization are done in order to quantify risk more effectively. The potential for unexpected results
from a process depends on both our ignorance about the process as well as the inherent process
variability. Quantification of risk, in other words, is to be able to access quantitative estimates of the
intrinsic variability of a process and account for the lack of knowledge about the process. The intrinsic
robustness of a process can then be assessed, allowing the scientists to adequately propose a control
strategy that addresses six sigma-level control accounting for typical manufacturing variability.

1
Yu and Kopcha’s definition of six sigma includes a long term shift of the initially measured mean of 1.5 sigma.

This article is protected by copyright. All rights reserved.


Specifically, the way the concept of a control strategy is implemented should account for epistemic
uncertainty (ignorance) and aleatory uncertainty (inherent to the process) in each of the control strategy
components: input material attributes, process parameters, analytical release testing or in-process
analysis methods, operations and measured product CQAs. The role of the scientist is to determine how
much data is required to reduce epistemic uncertainty and quantitatively characterize aleatory
uncertainty to achieve a certain level of reliability.

Applications of Bayesian inferences in QbD

Bayesian methods are exceptionally effective at characterizing variability from limited data, as they are
naturally suited to easily produce the posterior predictive probability distributions necessary to explore
process capability and reliability. Additionally, uncertainty about the model terms are incorporated in
the estimate. Peterson, Miro-Quesada, del Castillo, and others have written extensively on this topic,
introducing the concept of a posterior predictive approach to surface optimization in drug
development,[15, 16] applying these concepts to the ICH Q8 approach to design space, [17, 18] noise
variables, [17, 19, 20] and analytical method development [21, 22] among others. The methodology can
also be applied to mechanistic models including reaction kinetics parameter estimation [23]. Detailed
descriptions of the incorporation of Bayesian analysis to the prediction of process yields and the
forecasting of process robustness and process capability specifically in the context of pharmaceutical
development have been described recently [24, 25].

Process Capability and Process Robustness Metrics

Process capability is the measure of how often a process can deliver a product that meets the
specifications for the product. This measure is typically expressed as a capability index or performance
index [26, 27]. These indices are typically measured when there is sufficient data, generally in a
manufacturing setting. Samples are grouped in time, and the sample mean and sample standard deviation
are used to calculate the process capability or performance index [28]. Cpk is the process capability index
assuming that the process may not necessarily be centered between the upper specification (SU) and

This article is protected by copyright. All rights reserved.


lower specification limits (SL), and that the process is normally distributed. The process capability index
is given by [29]

𝑆 −𝜇� 𝜇� −𝑆𝐿
𝐶̂𝑝𝐾 = min � 𝑈 � , � (1)
3𝜎 �
3𝜎

Where 𝜇̂ and 𝜎� are the sample mean and sample standard deviation. However, in pharmaceutical
development, scientists rarely have enough data to accurately measure and therefore predict process
capability. In the context of limited data, there is uncertainty about the sample mean and sample
standard deviation (epistemic uncertainty), and how it applies to the underlying process. In this case, it is
very difficult (and indeed dangerous) to predict future capability based on limited data, because the
calculated process capability index can vary significantly based on the sample size, even if the data
comes from the same process. Estimation of future process capability must account for uncertainty in the
measured values, the sample size, and sample mean and standard deviation. Methods to address
limitations in estimating failure rates from limited sample size have been proposed and generally
required a separate calculation [26, 30, 31]. The Bayesian methodology described above can provide an
estimate of failure rate, which can be translated into predictions of future capability. For a six sigma
process, the corresponding failure rate is 3.4 parts per million. This estimate depends on not only the
sampled mean and sampled standard deviation, but also the number of samples due to the “lack of
knowledge” when there is limited data.

Predicted failure rate can be used to guide process development, both to drive the reduction of epistemic
uncertainty through efficient experimentation (through model-guided experimental design), and through
process optimization via failure rate reduction. Other process metrics that are not controlled as CQAs,
such as cost, yield, and cycle time can also be modeled with a Bayesian framework to provide a basis by
which a thorough optimization could be performed.

The Bayesian estimation of the parameters of a normal distribution (mean and standard deviation) from
a set of data of size N and uninformative priors gives a posterior predictive distribution which is the
student’s t-distribution with N-1 degrees of freedom. [32] This gives a straightforward application of

This article is protected by copyright. All rights reserved.


the Bayesian posterior predictive distribution in a workflow, which can be repeated for all risks in a
process. If the predictive probability of the risk is too high, then the scientist may gather more data about
the process, better model the process using the design space formalism described below, or try to
mitigate the risk by optimization, control points, or specifications. Figure 1 shows an example of this
calculation: given a test distribution that we can sample from, we can calculate the posterior predictive
distribution for different random draws from the test distribution for different sample sizes (N = 5, 15,
and 50). The resulting posterior predictive distribution can then be used to calculate the percentile (red
lines, 99.993% in this case, dashed lines are 5% and 95% quantiles for 100,000 replicate samples), or
predict the failure rate for the scenario given a limit (blue line in the test distribution corresponding to a
failure rate of 63 parts per million in the test distribution). As the number of samples increase, the
predicted failure rate generally decreases, because uncertainty about the mean and standard deviation
decrease. Therefore, implementation of a Bayesian formalism results in most cases in a conservative
estimate of true process capability. Table 1. Shows the results of the failure rates (in PPM) that would
be estimated from a single random sample (as shown in Figure 1), the range of the estimates, and those
obtained from the Bayesian analysis.

Naïve estimate Bayesian Formalism


Sample
Size Single
Monte Carlo Single Monte Carlo
Random % Under % Under
Range* Random Range*
Sample predicted predicted
(PPM) Sample (PPM) (PPM)
(PPM)
5 0.03 0-8615 58% 2807 388-37923 1%
15 2.57 0.01-1981 55% 223 30-6042 10%
50 24.8 1.74-566 53% 89 13-1028 25%
* 5% and 95% quantiles

This article is protected by copyright. All rights reserved.


Table 1. Failure rate estimates from single random samples shown in Figure 1 and estimates from
100,000 random samples (Monte Carlo [MC]), The % under predicted corresponds to the proportion of
the MC simulations that resulted in a underestimation of the true failure rate (63 PPM)

Figure 1: Using a student’s t-distribution to calculate the posterior predictive distribution given normality. For this example,
N samples (red x’s) are randomly drawn from the test distribution (gray, µ = 1, σ = 0.05) and used to calculate the posterior
predictive distributions given by the student’s t-distribution with N-1 degrees of freedom (blue distributions). The posterior
predictive distribution is then used to calculate the limits for a desired reliability (red lines, dashed lines are 5% and 95%
quantiles for 100,000 replicate draws, R = 99.9936%) or the failure rate (PPM) associated with a limit (blue line, 1.19). As
the number of samples increases, the calculated limit (red) for a desired reliability converges to the expected limit given
normality.

This example showcases the value of quantitatively incorporating the ignorance of the true distribution
associated with limited data. Only in a small fraction of cases would the formalism result in an
underestimate of the process capability. Other approaches to dealing with this problem have been

This article is protected by copyright. All rights reserved.


investigated and applied successfully: Kane [26] addressed the small sample problem by specifying a
critical value for the target performance index that would provide a high probability of meeting the
target control. As seen in this example, the Bayesian framework provides a direct incorporation of
ignorance of the “true” distribution due to limited sampling and provides a conservative assessment of
the risk. Considering that the Bayesian framework results in the inference of a distribution (that is
generally not normal) it suggests that estimates of failure rate against the limit directly v. a capability
index results in a more informative and powerful metric. What Sullivan [27] advocated for was for a
common statistical language to quantify the number of defects associated with a given production and a
way to track quality overtime. Furthermore, there is inconsistency in the literature as to what is the
failure rate (number of non-conforming units) associated with a given Cpk. In some instances, a long-
term drift of the mean by 1.5 sigma units is assumed and the failure rates estimate at the three and six
sigma level are estimated at 66807 and 3.4 ppm respectively [3, 33]. These estimates are much higher
than those corresponding to direct estimation of the population (i.e, 2700 and 0.00198 ppm respectively)
[29, 34]. The pharmaceutical development community would benefit from moving away from the use
of estimated Cpk as a development metric for process capability and instead favor the use of predicted
failure rates, specifically predicted defects per million (PPM), which are not burdened by the
assumptions, limitations, and ambiguity of the former and provide a more direct measure of the level of
control of the process, which ultimately is the QbD formalism [3, 11, 35].

Risk Analysis and Evaluation

Central to the successful implementation of a QbD strategy is the management of risk to quality which
has been documented in the ICH guidelines Q8, Q9 and Q10[13]. The first step in the Risk Assessment
process is risk identification which is identifying the failures that can impact the process or quality of the
product. Inherent in this step is also the identification of the potential critical quality attributes. This
information forms the basis to begin the risk analysis and evaluation which is asking the question of
what are the risks of these potential failures. Here, we are assessing the adverse impact of the risk and its
likelihood of occurrence, taking into consideration of the ability to detect the risk. A scoring system is

This article is protected by copyright. All rights reserved.


applied to the analysis, to allow prioritization of the highest risks. One widely used methodology is the
Failure Mode Effect Analysis (FMEA), a tool developed in the late 1950’s to study potential
malfunctions of military systems. The tool is well suited to detailed assessment of pharmaceutical
manufacturing processes due to its thorough and detailed approach and broad acceptance by global
health authorities. [36] [37] FMEA provides a numerical evaluation of the potential failure modes based
on the mode’s severity, its probability of occurrence and detectability. The risk is then calculated as a
product (Risk Priority Number or RPN) by

𝑅𝑃𝑁 = 𝑆𝑒𝑣𝑒𝑟𝑖𝑡𝑦 × 𝑃𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦 × 𝐷𝑒𝑡𝑒𝑐𝑡𝑎𝑏𝑖𝑙𝑖𝑡𝑦 (2)

The severity score reflects the impact or consequence of the hazard, in the process analysis to be the
impact to product quality. The detectability score reflects the ability to discover or determine the
existence of a failure mode and provide a control action on that mode. The probability is defined as the
likelihood of the occurrence of the failure mode. This approach requires an assessment of the failure
mode in terms of its probability and consequence. The potential failures mode for a complex
pharmaceutical process can be rooted on human, mechanical and system failures. Suggested
implementations of FMEA analysis focus the analysis to the relationship of the quality attributes of the
product (intermediate, drug substance or drug product) with the process parameters [35, 38]. Approaches
to incorporate the probability of failure in implementation of FMEA analysis include an assessment of
the relationship between the anticipated process parameter range during commercial manufacture
(normal operating ranges [NOR]) and the known limits of process parameter ranges beyond which a
given quality attribute is known to be affected or no information is known beyond that limit. [39] Other
implementations include the use of random sampling from estimated process variability and evaluate the
impact on the quality attribute from a mechanistic or statistical model. [40, 41]

In many instances, the failure mode is associated with a parameter excursion away from the desired
control limit due to common cause variation. In these cases, a Bayesian framework allows one to
associate a distribution to the parameter which, as stated before, accounts for both the intrinsic
distribution and the ignorance around the true distribution due to limited sampling of the parameter.

This article is protected by copyright. All rights reserved.


This procedure results in a rigorous quantification of the probability of failure and removes the
subjectivity associated with the generally implemented scoring criteria. [42]

Design Space

The ICH Q8 introduced the concept of a design space (DS), which is defined as “the multidimensional
combination and interaction of input variables (e.g., material attributes) and process parameters that
have been demonstrated to provide assurance of quality.” [2].

As a drug candidate approaches regulatory filing, the development of the design space is one of the main
goals of process chemical engineers to support implementation of the process in manufacturing. It is this
stage of the development process where the characterization of the process is formalized. Typically,
exploration of the parameter spaces of each critical unit operation (reactions, crystallizations,
distillations, etc.) are conducted using design of experiments, and the results are analyzed by some sort
of data collection, manipulation, and model building. [11, 43, 44] The data and the models may lead the
scientists to choose new operating conditions for the process due to an optimized mean response. For
multiple responses (e.g., purity, isomer, particle size), multiple models are built and typically the area of
overlapping means, where each response is within the desired limits, is used as the design space.
However, this approach has several problems relating to the reliability of such a surface. First, and most
simply, if the design space limit is based on where the mean of the response is equal to the specification,
then the conditional probability of failure (at the limit) is 50%, if the model is perfectly predictive. [45,
46] Second, for multiple responses, the probability of success for the batch is given by the joint
probability of meeting all of the specifications for all of the responses; characterization of the
overlapping means does not provide any measure of assurance of meeting all of the specifications. For
typical processes, this results in reliabilities of much less than 50% at the “sweet spot” given by the
mean responses. [45]

This article is protected by copyright. All rights reserved.


Naturally, the design space is a key component of the control strategy for a pharmaceutical process, and
comes directly from the concept of QbD; it also provides a clear example of why a probabilistic
approach to QbD aligns with the FDA’s vision of process control.

John Peterson was one of the first to ask the question of “how much assurance” is acceptable to define a
design space [17]. If the industry is heading towards six sigma control, then risk quantification at the
appropriate levels must be incorporated into the design space criteria. Together the notion of risk to
quality, process robustness, and process capability suggests a probabilistic characterization of the
pharmaceutical process.

The problem of chemical process design under uncertainty has been successfully approached by
Grossmann et al. [47-49] in which the concepts of operability, flexibility and resiliency of design under
uncertainties are introduced and formalized in a quantitative framework. Although intended for plant
design, Garcia-Muñoz et al. [50, 51] extended the concept to pharmaceutical process design, a key
aspect of this contribution is the reliance of graphical instruments to complement the analysis. The
framework used was not Bayesian, but the Monte Carlo (MC) approach used to evaluate uncertainty is
similar to that which would result from a Bayesian formalism with non-informative priors.

Lebrun et al. [21] applied a Bayesian methodology to stablish the design space of a spray drying process
with three quality attributes. The defined design space had a 45 % probability of meeting all the quality
attributes. Castagloni et al.[46] incorporated a Bayesian analysis to define the design space for
crystallization with multiple CQA’s. The authors considered an 80% probability of meeting all the
CQA’s as conditions incorporated in the design space definition.

A Bayesian approach to define the Design Space natively incorporates uncertainty, both from the lack of
data and observed variability, and can incorporate prior knowledge about the system in its model output.
The formalism is particularly effective at providing estimates of variability from limited data, as is the
case in pharmaceutical processing. Specifically, the number of scale-relevant manufactured lots that are
accessible before the product’s commercial launch are almost always insufficient to apply the
conventional tools of process robustness that are intended for industries with thousands or millions of

This article is protected by copyright. All rights reserved.


samples. A recent review by Debevec et al. summarizes some of the applications of a variety of
methodologies to design space development [43].

Utilization of a Bayesian methodology provides a quantification of the process reliability across a


parameter space, which directly translates to a more suitable constraint on the design space.
Mathematically, this was introduced in an elegant way by Peterson et al. [17] and is given by:

〈𝑥: Pr(𝒀 ∈ 𝐴|𝒙, 𝑑𝑎𝑡𝑎) ≥ 𝑅 〉 (3)

where x is the set of control parameters, Y is the set of responses, A is the specification constraints, and
R is a reliability criteria. Additionally, input parameters x can be modeled as noise variables, where
historical manufacturing data can be used to inform the distribution. Generally, under the conditions
used for model calibration as part of the pharmaceutical development process, the parameters x (which
are known to causally impact the quality attribute Y) are controlled (or measured) at a much higher level
than it is anticipated under manufacturing conditions. Therefore, the impact of the variability in these
factors needs to be quantified and accounted for specifically in the estimates of process reliability (i.e.
probability of failure against a given specification). Figure 2a shows the charge amounts from a pilot
plant for a 10 year history; these data can be used to inform how the noise variables x are incorporated
into the model. If data is not available, noise variables can be incorporated as normal distributions given
typical range in the manufacturing setting (Figure 2b). The model can be used to calculate the predictive
distribution of the quality attribute Y at any processing target point (Figure 2c), and therefore the failure
rate at any point in the parameter space; from this, failure rate maps (Figure 2d) across parameter ranges
can be used to determine the design space given (3).

This article is protected by copyright. All rights reserved.


Figure 2: Workflow to generate the design space using a probabilistic framework: input noise variables from data (a) or
assumed distributions (b) are used in conjunction with a Bayesian regression to generate a posterior predictive distribution
(c). Given a limit (dashed red line, c), the failure rate surface can be calculated by iterating over the parameter space (d). The
design space is given by equation 3.

Specification Setting and Long Term Stability

An area in which probabilistic assessment of the process and product relationship is anticipated to
provide significant value is the workflow associated with setting specifications for products at the time
of its validation or filing of the new drug application. Specifically the balance between the product’s
release specification and the limits of shelf life. From a regulatory perspective, the accepted lower and
upper limits for Assay (a measure of purity) are typically 95.0 and 105.0 %, respectively at release. The
Limits for Shelf Life (LSL) vary depending upon the product’s stability profile. Endorsed models,
outlined in ICH guidance (Q6A, Q6B, Q1D, Q1E)[13] are utilized to determine the limits proposed to
the health authority, taking into account the degradation rate observed during Long Term Stability
Studies (LTSS) and analytical method variation. A typical example is shown in Figure 3. In this
approach, the degradation kinetics, the variability in the initial conditions, and the analytical variability
are evaluated separately to achieve a proposed LSL. This proposed LSL is based on estimation of the

This article is protected by copyright. All rights reserved.


lower 5th percentile of distribution of values at release (Base Value), the 5th lower percentile for the
confidence of the decay slope (95% LCLb) or zero if this value happens to be positive, and finally the
error of the assay to an arbitrary confidence level (Z1-α𝜎�assay).

𝐿𝑆𝐿 = 𝐵𝑎𝑠𝑒 𝑉𝑎𝑙𝑢𝑒 + min{0, 95% 𝐿𝐶𝐿𝑏 } ∗ 𝑒𝑥𝑝𝑖𝑟𝑦 − 𝑧1−𝛼 ∗ 𝜎�𝑎𝑠𝑠𝑎𝑦 (4)

Figure 3: Frequentist statistical analysis based on ICH Q1E to estimate Limit Shelf Life (LSL) for a target expiry date. The

dashed blue line indicates the 95% limit for the worst case slope (slope+ϵ). The dashed yellow line
applies the worst case slope to the base value (95% limit of the distribution of initial material purity).
Finally, the dashed red line adds the assay error to estimate the LSL value by projecting to the 24 month
time point.

The proposed model takes into account the data (base value) together with the slope and uncertainty in
addition to analytical variability as an additive approach to determine the proposed limit. Unfortunately,

This article is protected by copyright. All rights reserved.


one limitation of this model is that it does not provide a distribution and therefore cannot estimate the
expected failure rate at either of the two quality gates (release and shelf life). Another problem of this
approach is that the values of distributions of initial release values, the possible values of degradation
rate, and the analytical error are estimated independently; however, in practice the uncertainties are
observed simultaneously

The proposed LSL is often challenged by the health authority and a negotiated limit is agreed upon.
When the conventional statistical models indicate that there is risk that product released at 95.0% may
not meet the accepted LSL, a more restrictive internal limit will be imposed at release. The application
of the internal limit is intended to ensure that released product will always meet shelf life requirements.
However, increasing the internal limit increases the risk of batch failure at release while setting the
internal limit closer to the shelf life limit potentially increases the probability of an out-of-specification
(OOS) finding on shelf life. A probability model allows for quantification of this risk trade off.

A Bayesian approach to this problem would develop a probabilistic model for the degradation kinetics
of the drug product to estimate the probability of OOS as a function of release value. In the case shown
in Figure 4, a probabilistic model was developed for the rate of impurity formation (measured as a
relative area percent [RAP] of the analytical HPLC trace) of the drug product as a function of the initial
level of the impurity and the storage temperature. The figure shows the probabilistic model for the 35°C
simulation only. In the time dependent distribution associated with the 35 °C storage condition 99 % of
the estimated population is above the LSL value after 250 days, indicating nearly 100 % failure rate for
these conditions as is apparent in the 2-D projection in the left graph in Figure 4.

This article is protected by copyright. All rights reserved.


Figure 4. Probabilistic evaluation of the release v OOS. Left: Stability data and model for impurity generation. The lines
represent the model means. For the 35 °C data, the Bayesian probabilistic model is shown as a density distribution. For this
(35 °C) estimated distribution, 1 % of the estimates are outside the red lines. Right: Contour map of the estimated
probability of OOS for a LSL of 15 RAP as function of initial impurity level and storage temperature. The corresponding
estimate for the 35 °C case is indicated by the red square.

Analysis of the distribution of the impurity in the drug substance following the drug substance control
strategy enables estimation of the failure rate of the drug product given a release limit of the drug
substance. The resulting truncated distribution (with a maximum impurity level Io at the drug
substance’s release value) is used to estimate the degradation level of the drug product following two
year of storage at different temperatures. The resulting distributions are subsequently used to calculate
the failure rate against a LSL denoted as 15 % in Figure 4. This information would be used to enable a
rational decision based on the impact of a given failure rate at release against the probability of
observing an OOS. The final limit would be based on a judgement of the perceived impact of each
outcome.

Path to Broad Implementation across the Pharmaceutical Industry

This article is protected by copyright. All rights reserved.


As in other areas of modeling with applications to pharmaceutical development, we anticipate increased
use of quantitative analytics in all aspects of decision making across the full development timeline.
Monte Carlo methods, both as Markov-Chain Monte Carlo and in the direct random sampling of
distributions, are extremely powerful in the assessment of uncertainty of outcomes. Improvements in
sampling algorithms from Metropolis-Hastings to Gibbs sampling[52], No-U-Turn Sampler
(NUTS)[53], Metropolis-adjusted Langevin algorithm (MALA)[54] and many others, and in computing
power have resulted in much more accessible and easy-to-use software packages [55] such as Stan[56],
PyMC3[57, 58], JAGS[59], BUGS/WinBUGS/OpenBUGS [60, 61], and JASP [55, 62]. These
techniques have intuitive resonance for chemical engineers and as the associated software (both
proprietary and open source) becomes more accessible, we anticipate a much wider adoption across the
pharmaceutical industry. Naturally, despite the current wide accessibility of Bayesian-enabling
software, significant gaps need to be surmounted before a Bayesian formalism is universally adopted.
For instance, a great deal of the modeling that informs the pharmaceutical development is mechanistic
and is powered by highly specialized and powerful software such as gProms, Dynochem, COMSOL,
Aspen etc. which do not, at the moment, incorporate Bayesian capability as part of model regression and
prediction.

We believe that the use of these tools, in conjunction with appropriate data management (e.g., databases,
datalakes) is the foundation for reproducible research in modeling and data analysis within the context
of predictive risk analysis. [63] In short, reproducible research is enabled by a platform that captures the
data and analysis (software + scripts/model) and the computational environment in a way that allows
any scientists to replicate, explore and share the analysis. Increasingly, platforms that enable
reproducible research are hosted virtually in the cloud, further increasing flexibility, collaboration, and
reproducibility. [64] However, there are still challenges that must be overcome for the impact of these
tools to be well adopted and widespread, and for their use to have an impact on pharmaceutical
development. As in other disciplines of data science, there is a danger of these tools being abused or
misused [9, 10] due to lack of understanding or misunderstanding, ease of use, or inappropriate data.
These tools, powerful as they are, can merely complement the fundamental quantitative process

This article is protected by copyright. All rights reserved.


characterization provided by phenomenological understanding driven from mass balance, chemical
thermodynamics, kinetics and transport phenomena which is the basis of a sound chemical engineering
formalism. The pharmaceutical development community must maintain high scientific standards while
adopting these new technologies.

The engineering educational system plays a crucial role in preparing students to be productive
professionals with a key skill set being the ability to analyze and interpret data. It is of our opinion that
there needs to be more exposure than the current basic statistical analysis applied to a simple design of
experiments in the typical unit operations or laboratory classes. The education experience should enable
engineers to be ready to rapidly learn, and then appropriately and thoughtfully apply these methods in
industry. We need to include data science content in the undergraduate and graduate curriculum for
chemical engineering to teach the fundamentals of statistics, coding, data visualization and modeling
philosophies, and to teach new engineers how to work with data (both large and small), in addition to
teaching the core chemical engineering subjects (thermodynamics, kinetics, mass transport, math, etc.).
Specifically, innovations in data science should be contextualize to any course that discusses
mathematical modeling. Furthermore, we believe that concepts of data science can be directly applicable
to applied lab courses, introducing the concepts of uncertainty, risk, and model-driven experimental
design. Much in the way J W Gibbs emphatically declared “mathematics is a language” as a justification
for the inclusion of mathematics in higher education [65], the prevalence of data science requires it to be
part of any quantitative endeavor, strongly suggesting that it should be part of any engineering
discipline. The capability to naturally associate uncertainty as part of predictive modeling is powerful as
it readily provides a direct assessment of the strength of the predictions without additional analysis. As
discussed above, there are many formalisms that enable model prediction uncertainty, however, we
strongly feel that the Bayesian framework, unlike other approaches, is both accessible and intuitive [66].

Finally, regulatory agencies should encourage the use of probabilistic risk-based methods to foster the
adoption of quantitative modeling, and evolve from using traditional statistical approaches. Within the

This article is protected by copyright. All rights reserved.


US, the International Consortium for Innovation and Quality (IQ) 2 and the FDA’s newly created
Emerging Technologies Team [67] are forums in which these topics can be discussed and debated pre-
competitively to bring about this evolution. Pharmaceutical companies will be hesitant to adopt new
methods for risk prediction, design space determination, etc., unless it is clear that the regulatory
agencies are willing to accept the change.

The adoption of more complex modeling, such as the Bayesian formalism presented here, will invariably
result in much more nuanced decision making process for which static visualizations will be generally
insufficient to explore the model results. Interactive tools are rapidly acquiring popularity as potential
solutions enabling the end user further explore the analysis for a specific end. These tools are also
enabled by cloud solutions with persistent elastic computing cost [10].

Where does Bayesian reliability modeling fit into the future of data science in pharmaceutical
development?

Since the introduction of the QbD principles by the regulatory agencies, the pharmaceutical
development community has developed workflows and methodologies to produce more data in a
structured fashion. Over the last ten years, the increased use of highly automated and robotic
experimental techniques, high-throughput experiments, process analytical technology (PAT), and
continuous processing across lab development and manufacturing have produced vast amounts of data
spanning many projects and stages of development. However, we are of the opinion that the
pharmaceutical development community has much greater scope and opportunity to take advantage of
these data than reflected in past practice. The application of advanced data science methods, including
machine learning and artificial intelligence have the potential to become disruptive, contingent on the

2
The International Consortium for Innovation and Quality in Pharmaceutical Development (IQ) is a not-profit
organization of pharmaceutical and biotechnology companies with a mission of advancing science-based and
scientifically driven standards and regulations for pharmaceutical and biotechnology products worldwide.
Today, IQ represents 39 companies. Additional information can be found at www.iqconsortium.org.

This article is protected by copyright. All rights reserved.


massive structured data sets that are generated through these automated workflows[10]. The Jensen
groups has recently demonstrated the potential of these technologies in chemical route selection, [68]
and automated optimization. [69] Scientists at Bristol-Myers Squibb have utilized unstructured data and
developed algorithms to predict a compound’s synthetic complexity [70] and process mass intensity
(PMI) for the synthesis[71, 72] However, these technologies have not matured to the point of
widespread adoption in the pharmaceutical industry. The industry trend of more structured and bigger
data has started to enable the use of these technologies, and we will see the impact in the next ten years
[10]. Furthermore, advanced data science techniques will have an increased use in extracting useful and
usable data from unstructured data sets such as proprietary technical reports, electronic notebooks,
journal and patent literature, and other public or proprietary data sources (Figure 5). Figure 5a shows
the different groups of data used in the pharmaceutical development industry plotted by data type and
data volume. The industry trend is moving towards more structured and larger volume, represented by
the yellow arrow in the figure. Figure 5b shows a selection of data science methods that may be useful to
derive value from the type of data represented in Figure 5a.

In the context of robustness and probabilistic modeling, using these data can also be valuable. Historical
batch data from PAT, release testing, stability, and in-process control testing could be used to inform
priors for early phase projects, better identifying aspects like analytical variability for early model
development, which in turn can be used to guide experimental designs. Additionally, as introduced
above, manufacturing records can be used to generate the distributions for noise variables in models
(temperatures, charges, age times, etc.). Ideally, models that generate probability distributions for
critical quality attributes draw from a range of disciplines and departments, requiring companies to be
more integrated than in the past. This paradigm presents its own challenges and significant opportunities
for information technology integration, security, data management, communications and company
culture. Process development requires management of both epistemic and aleatory uncertainty; we see
an expanding role for chemical engineers to calculate and communicate probabilistic process metrics to
drive efficient and impactful process development.

This article is protected by copyright. All rights reserved.


Figure 5: (a) Schematic representation of the data type and data volume generated in the pharmaceutical industry. The
industry trend is towards larger data sets that are more structured, primarily because of the use of automation, high
throughput experimentation, and the use of PAT. (b) Some of the data science methods that may be useful to generate
useful insights and knowledge from the corresponding data in (a).

Acknowledgements

The authors would like to thank Jacob Albrecht, Brendan Mack, Joe Hannon, Andrew Bird, and Martin
Eastgate and the peer reviewers for their thoughtful, relevant and insightful suggestions on scope,
content, and presentation.

This article is protected by copyright. All rights reserved.


REFERENCES

1. McKenzie, P., et al., Can pharmaceutical process development become high tech? AIChE Journal, 2006.
52(12): p. 3990-3994.
2. ICH. Q8 (R2) Pharmaceutical Development. 2009; Available from:
https://www.ich.org/fileadmin/Public_Web_Site/ICH_Products/Guidelines/Quality/Q8_R1/Step4/Q8_R
2_Guideline.pdf.
3. Yu, X.L. and M. Kopcha, The future of pharmaceutical quality and the path to get there. International
journal of pharmaceutics, 2017. 528(1-2): p. 354-359.
4. Toda, F., K. Tanaka, and M. Yagi, Highly selective photoreactions of α-oxoamides and α-tropolone alkyl
ethers in crystalline inclusion complexes. Tetrahedron, 1987. 43(7): p. 1495-1502.
5. Variankaval, N., A.S. Cote, and M.F. Doherty, From form to function: Crystallization of active
pharmaceutical ingredients. AIChE Journal, 2008. 54(7): p. 1682-1688.
6. Ierapetritou, M., F. Muzzio, and G. Reklaitis, Perspectives on the continuous manufacturing of powder-
based pharmaceutical processes. AIChE Journal, 2016. 62(6): p. 1846-1862.
7. Jensen, K.F., Flow chemistry—microreaction technology comes of age. AIChE Journal, 2017. 63(3): p.
858-869.
8. Collins, P.C., Chemical engineering and the culmination of quality by design in pharmaceuticals. AIChE
Journal, 2018. 64(5): p. 1502-1510.
9. Beck, D.A., et al., Data science: Accelerating innovation and discovery in chemical engineering. AIChE
Journal, 2016. 62(5): p. 1402-1416.
10. Venkatasubramanian, V., The promise of artificial intelligence in chemical engineering: Is it here, finally?
AIChE Journal, 2019. 65(2): p. 466-478.
11. Yu, X.L., et al., Understanding pharmaceutical quality by design. The AAPS journal, 2014. 16(4): p. 771-
783.
12. Moore. Quality by Design – FDA Lessons Learned and Challenges for International Harmonization.
International Conference on Drug Development 2012 [cited 2018; Available from:
https://www.fda.gov/downloads/aboutfda/centersoffices/officeofmedicalproductsandtobacco/cder/uc
m341204.pdf.
13. Maafi, M. and W. Maafi, Montelukast photodegradation: Elucidation of Ф-order kinetics, determination
of quantum yields and application to actinometry. International Journal of Pharmaceutics, 2014. 471(1):
p. 544-552.
14. Glodek, M., et al., Process robustness—A PQRI white paper. Pharm. Eng, 2006. 26(6): p. 1-11.
15. Peterson, J.J., A posterior predictive approach to multiple response surface optimization. Journal of
Quality Technology, 2004. 36(2): p. 139-153.
16. Peterson, J.J., G. Miro-Quesada, and E. del Castillo, A Bayesian reliability approach to multiple response
optimization with seemingly unrelated regression models. Quality Technology & Quantitative
Management, 2009. 6(4): p. 353-369.
17. Peterson, J.J., A Bayesian approach to the ICH Q8 definition of design space. Journal of
biopharmaceutical statistics, 2008. 18(5): p. 959-975.

This article is protected by copyright. All rights reserved.


18. Peterson, J.J. and K. Lief, The ICH Q8 definition of design space: A comparison of the overlapping means
and the Bayesian predictive approaches. Statistics in Biopharmaceutical Research, 2010. 2(2): p. 249-
259.
19. Miro-Quesada, G., E. Del Castillo, and J.J. Peterson, A Bayesian approach for multiple response surface
optimization in the presence of noise variables. Journal of applied statistics, 2004. 31(3): p. 251-270.
20. Rajagopal, R., E. Del Castillo, and J.J. Peterson, Model and distribution-robust process optimization with
noise factors. Journal of Quality Technology, 2005. 37(3): p. 210-222.
21. Lebrun, P., et al., A Bayesian design space for analytical methods based on multivariate models and
predictions. Journal of biopharmaceutical statistics, 2013. 23(6): p. 1330-1351.
22. Peterson, J.J. and M. Yahyah, A Bayesian design space approach to robustness and system suitability for
pharmaceutical assays and other processes. Statistics in Biopharmaceutical Research, 2009. 1(4): p. 441-
449.
23. Albrecht, J., Estimating reaction model parameter uncertainty with Markov Chain Monte Carlo.
Computers & Chemical Engineering, 2013. 48: p. 14-28.
24. Tabora, J.E., Jacob Albrecht, Brendan Mack, Probabilistic Models For Forecasting Process Robustness, in
Chemical Engineering in the Pharmaceutical Industry, Active Pharmaceutical Ingredients, M.T.a.E. David
J. am Ende, Editor. 2019, John Wiley & Sons, Inc. p. 920-935.
25. Li, Y.F. and V. Venkatasubramanian, Leveraging Bayesian approach to predict drug manufacturing
performance. Journal of Pharmaceutical Innovation, 2016. 11(4): p. 331-338.
26. Kane, V.E., Process capability indices. Journal of quality technology, 1986. 18(1): p. 41-52.
27. Sullivan, L.P., Reducing variability: A new approach to quality. Quality Progress, 1984. 17(7): p. 15-21.
28. Peng, D.Y. Use Process Capability to Ensure Product Quality. in CDER/FDA IFPAC Annual Meeting. 2014.
Arlington, Virginia.
29. NIST/SEMATECH. e-Handbook of Statistical Methods. [cited 2019 June, 12]; Available from:
http://www.itl.nist.gov/div898/handbook/.
30. Schilling, E.G., Acceptance sampling, in Juran's Quality Control Handbook, J.M.J.a.F.M. Gryna, Editor.
1999, McGraw-Hill: New York.
31. Dudewicz, E.J., Basic statistical methods. Juran’s Quality Control Handbook, 1988: p. 1-121.
32. Gelman, A., et al., Bayesian data analysis. 2013: Chapman and Hall/CRC.
33. Sharma, O., et al., Six Sigma in pharmaceutical industry and regulatory affairs: A review. Journal of
Natura Conscientia, 2011. 2(1): p. 273-293.
34. Alasandro, M. and T.A. Little, Process and Method Variability Modeling to Achieve QbD Targets. AAPS
PharmSciTech, 2016. 17(2): p. 523-527.
35. Yu, X.L., Pharmaceutical quality by design: product and process development, understanding, and
control. Pharmaceutical research, 2008. 25(4): p. 781-791.
36. Rantanen, J. and J. Khinast, The future of pharmaceutical manufacturing sciences. Journal of
pharmaceutical sciences, 2015. 104(11): p. 3612-3638.
37. Braem, A. and G. Turner, Applications of Quality Risk Assessments in Quality by Design (QbD) Drug
Substance Process Development, in Chemical Engineering in the Pharmaceutical Industry, Active
Pharmaceutical Ingredients, M.T.a.E. David J. am Ende, Editor. 2019, John Wiley and Sons Inc. p. 1073-
1090.

This article is protected by copyright. All rights reserved.


38. Fahmy, R., et al., Quality by design I: application of failure mode effect analysis (FMEA) and Plackett–
Burman design of experiments in the identification of “main factors” in the formulation and process
design space for roller-compacted ciprofloxacin hydrochloride immediate-release tablets. AAPS
PharmSciTech, 2012. 13(4): p. 1243-1254.
39. Hulbert, M.H., et al., Risk management in the pharmaceutical product development process. Journal of
Pharmaceutical Innovation, 2008. 3(4): p. 227-248.
40. De Beer, T., et al., Optimization of a pharmaceutical freeze-dried product and its process using an
experimental design approach and innovative process analyzers. Talanta, 2011. 83(5): p. 1623-1633.
41. Peeters, E., et al., Reduction of tablet weight variability by optimizing paddle speed in the forced feeder
of a high-speed rotary tablet press. Drug development and industrial pharmacy, 2015. 41(4): p. 530-539.
42. Stocker, E., et al., Use of mechanistic simulations as a quantitative risk-ranking tool within the quality by
design framework. International journal of pharmaceutics, 2014. 475(1-2): p. 245-255.
43. Debevec, V., S. Srčič, and M. Horvat, Scientific, statistical, practical, and regulatory considerations in
design space development. Drug development and industrial pharmacy, 2018. 44(3): p. 349-364.
44. Thomson, N.M., et al., Case studies in the development of drug substance control strategies. Organic
Process Research & Development, 2015. 19(8): p. 935-948.
45. Peterson, J.J., et al., Predictive distributions for constructing the ICH Q8 design space. Comprehensive
Quality by Design for Pharmaceutical Product Development and Manufacture, 2017: p. 55-70.
46. Castagnoli, C., et al., Application of quality by design principles for the definition of a robust
crystallization process for casopitant mesylate. Organic Process Research & Development, 2010. 14(6):
p. 1407-1419.
47. Halemane, K.P. and I.E. Grossmann, Optimal process design under uncertainty. AIChE Journal, 1983.
29(3): p. 425-433.
48. Swaney, R.E. and I.E. Grossmann, An index for operational flexibility in chemical process design. Part I:
Formulation and theory. AIChE Journal, 1985. 31(4): p. 621-630.
49. Grossmann, I.E. and M. Morari, Operability, resiliency, and flexibility: process design objectives for a
changing world. 1983.
50. Garcia-Munoz, S., et al., Definition of design spaces using mechanistic models and geometric projections
of probability maps. Organic Process Research & Development, 2015. 19(8): p. 1012-1023.
51. Laky, D., et al., An Optimization-Based Framework to Define the Probabilistic Design Space of
Pharmaceutical Processes with Model Uncertainty. Processes, 2019. 7(2): p. 96.
52. Arminger, G. and B.O. Muthén, A Bayesian approach to nonlinear latent variable models using the Gibbs
sampler and the Metropolis-Hastings algorithm. Psychometrika, 1998. 63(3): p. 271-300.
53. Hoffman, M.D. and A. Gelman, The No-U-Turn sampler: adaptively setting path lengths in Hamiltonian
Monte Carlo. Journal of Machine Learning Research, 2014. 15(1): p. 1593-1623.
54. Girolami, M. and B. Calderhead, Riemann manifold langevin and hamiltonian monte carlo methods.
Journal of the Royal Statistical Society: Series B (Statistical Methodology), 2011. 73(2): p. 123-214.
55. Mockus, L., et al., Batch-to-batch variation: a key component for modeling chemical manufacturing
processes. Organic Process Research & Development, 2014. 19(8): p. 908-914.
56. Gelman, A., D. Lee, and J. Guo, Stan: A probabilistic programming language for Bayesian inference and
optimization. Journal of Educational and Behavioral Statistics, 2015. 40(5): p. 530-543.

This article is protected by copyright. All rights reserved.


57. Salvatier, J. and C. Fonnesbeck, PyMC3: Python probabilistic programming framework. Astrophysics
Source Code Library, 2016.
58. Salvatier, J., T.V. Wiecki, and C. Fonnesbeck, Probabilistic programming in Python using PyMC3. PeerJ
Computer Science, 2016. 2: p. e55.
59. Plummer, M. JAGS: A program for analysis of Bayesian graphical models using Gibbs sampling. in
Proceedings of the 3rd international workshop on distributed statistical computing. 2003. Vienna,
Austria.
60. Sturtz, S., U. Ligges, and A. Gelman, R2OpenBUGS: a package for running OpenBUGS from R. URL
http://cran. rproject. org/web/packages/R2OpenBUGS/vignettes/R2OpenBUGS. pdf, 2010.
61. Lunn, D.J., et al., WinBUGS-a Bayesian modelling framework: concepts, structure, and extensibility.
Statistics and computing, 2000. 10(4): p. 325-337.
62. Marsman, M. and E.-J. Wagenmakers, Bayesian benefits with JASP. European Journal of Developmental
Psychology, 2017. 14(5): p. 545-555.
63. Waltemath, D. and O. Wolkenhauer, How modeling standards, software, and initiatives support
reproducibility in systems biology and systems medicine. IEEE Transactions on Biomedical Engineering,
2016. 63(10): p. 1999-2006.
64. Stodden, V., F. Leisch, and R.D. Peng, Implementing reproducible research. 2014: CRC Press.
65. Silver, D.S., The New Language of Mathematics: Is it possible to take all words out of mathematical
expressions? American Scientist, 2017. 105(6): p. 364-372.
66. Knill, D.C. and A. Pouget, The Bayesian brain: the role of uncertainty in neural coding and computation.
TRENDS in Neurosciences, 2004. 27(12): p. 712-719.
67. O’connor, T.F., X.L. Yu, and S.L. Lee, Emerging technology: A key enabler for modernizing pharmaceutical
manufacturing and advancing product quality. International journal of pharmaceutics, 2016. 509(1-2): p.
492-498.
68. Coley, C.W., W.H. Green, and K.F. Jensen, Machine learning in computer-aided synthesis planning.
Accounts of chemical research, 2018. 51(5): p. 1281-1289.
69. Bédard, A.-C., et al., Reconfigurable system for automated optimization of diverse chemical reactions.
Science, 2018. 361(6408): p. 1220-1225.
70. Li, J. and M.D. Eastgate, Current complexity: a tool for assessing the complexity of organic molecules.
Organic & biomolecular chemistry, 2015. 13(26): p. 7164-7176.
71. Li, J., et al., Evolving Green Chemistry Metrics into Predictive Tools for Decision Making and
Benchmarking Analytics. ACS Sustainable Chemistry & Engineering, 2017. 6(1): p. 1121-1132.
72. Li, J., E.M. Simmons, and M.D. Eastgate, A data-driven strategy for predicting greenness scores,
rationally comparing synthetic routes and benchmarking PMI outcomes for the synthesis of molecules in
the pharmaceutical industry. Green Chemistry, 2017. 19(1): p. 127-139.

This article is protected by copyright. All rights reserved.

You might also like