Multi-Scale Chemical Product Design Using The Reverse Problem Formulation

20th European Symposium on Computer Aided Process Engineering – ESCAPE20
S. Pierucci and G. Buzzi Ferraris (Editors)

© 2010 Elsevier B.V. All rights reserved.
Multi-Scale Chemical Product Design using the

Reverse Problem Formulation
Charles C. Solvason, Nishanth G. Chemmangattuvalappil, Mario R. Eden
Department of Chemical Engineering, Auburn University, Auburn, AL 36849, USA
Abstract
The main objective for this research is to extend the reverse problem formulation
algorithm to include aspects of multiple length scales, thereby creating a framework
where product synthesis, design, and optimization can be achieved with significantly
reduced computational time. In order to achieve this objective a centralized framework
was developed that combines property clustering with chemometric techniques like
principal component analysis (PCA) and partial linear regression onto latent surfaces
(PLS) to solve the design problem in a property descriptor sub-domain. Information
from the molecular scale on short range order, such as group structure, conformation,
and stereoregularity, is combined with information from the mesoscale on long range
order, such as the particle size, to design a set of alternative excipients that exhibit
superior properties for use in an acetaminophen tablet without the use of parallel, grid,
or supercomputing techniques.
Keywords: Reverse problem formulation, Property clustering, Multi-scale design
1. Background
The U.S. National Research Council has recently published a report on an emerging
paradigm in product design focused on integrated computational materials engineering
(CICME, 2008). In the report, the committee noted that in order to alleviate the strain
put on U.S. manufacturers from the swiftly changing and increasingly global
marketplace, integrated design closely coupling computational models with
manufacturing processes would be required. The term ‘integrated’ recognizes that the
properties of products are controlled by a multitude of separate and often competing
mechanisms that operate over a wide range of length and time scales. It is the linkage
of the scales that remains the ‘Grand Challenge’ (CICME, 2008). The traditional
method of linking the various length-scales has been to compute information at smaller
scales and pass it to models at larger scales by removing degrees of freedom (coarse-
graining) with the objective being to predict macroscopic properties from molecular
information (Fermeglia and Pricl, 2009). While often the most accurate method for
predicting properties, simulation utilizes a priori knowledge of the molecular
architecture and the computational cost of such a hierarchical nesting method typically
prevents an accurate modeling of mesoscopic structure such as the morphology of
polymers (Fermeglia and Pricl, 2009). Furthermore, when this method is integrated
within the product-process design framework (Hill, 2004), the computational
intensiveness exponentially increases since each projected molecular architecture must
be simulated to determine its physico-chemical properties. To minimize the
computational cost in these types of problems, property prediction simulations are
typically approximated with structure descriptor models based on the property-
molecular architecture relationships. These descriptor based models can be of the group
contribution type (GCM) that directly relate the presence of functional groups to
C.C. Solvason et al.
thermodynamic intensive properties, GCM that describe a set of core properties from
which other thermodynamic properties are calculated, topological indices from which
QSPR/QSAR models can be used to describe properties, case-specific characterizations
that describe molecular architecture which can be empirically related to macro-scale
properties, and/or many other types.
Since it is often the non-linear nature of these property models that complicates design
problems, resulting in MINLP formulations which require tedious solution strategies,
one way to reduce the complexity of the problem is to use the reverse problem
formulation (Gani and Pistikopoulos, 2002; Eden et al., 2004). In this solution strategy,
the complexity of the problem is reduced by using the duality of linear programming to
reformulate the design problem as a series of reverse problems solved in the property
domain. Essentially, the target properties can be determined from the solution of the
reverse simulation problem and, as long as these targets are matched, any number of
property models may be used to generate the molecular architectures at the various
scales to ensure a solution (Eden et al., 2004). The advantage of using reverse problem
formulation in multi-scale product design is that the problem complexity is significantly
reduced by decoupling the constitutive property models from the design.
2. Concept
Chemical product design is a relatively new concept in the systems community which
has shifted focus from developing only the physical form, function, and aesthetics of
assembled products to the design of chemically formulated products (Hill, 2004).
Chemically formulated products are products designed at the molecular, quantum-,
nano-, meso-, macro-, and mega-scale levels to deliver a specific desired attribute.
Examples of formulated products include personal care products (Hill, 2004), nano-
structured materials (Fermeglia and Pricl, 2009) and many other product types. Hill
(2004) notes that the objective of computational product design is to guide and focus
experimentation; and CICME (2008) notes that because of a lack of robust design
models, complimentary experimental and theoretical approaches are of profound
importance. Combining these observations with the reflection that the requisite
properties will be consumer attributes (since consumer preference drives the value of a
product) it follows that the relationship between the underlying fundamental physical-
chemical properties and the consumer attributes will most likely involve empirical
relationships. Based on these observations, the objective of this work is to utilize
chemometrics to demonstrate how to deconstruct the computational difficulty of a
multi-scale product design problem by solving it in a latent property sub-domain under
reduced dimensionality.
3. Method
Chemometrics are used to illustrate the solution concept shown in Fig. 1 where a set of
user defined attributes are mapped to the latent sub-domain so that simultaneous
molecular and meso-scale design can take place. The solution method is as follows:
1. Characterization is used to describe the molecular & micro-structures
2. Decomposition (e.g. Principle Component Analysis), is used to find the latent
property sub-domain and its relationship with the descriptor properties
3. Design of Experiments (DOE) & Partial Linear Regression onto Latent
Surfaces (PLS) are used to determine the attribute – latent property relationship
Multi-Scale Chemical Product Design using the Reverse Problem Formulation
4. Reverse problem formulation (RPF) and property clustering are used to map
the molecular, meso, and macro-scale domains into the latent sub-domain
5. Simultaneous mixture, molecular, and meso-scale particle size design is
accomplished via characterization based group contribution methods and
mixture analysis.
Molecular Structure Microstructure Other Scale-Based

Reparameterization &
Scale Mapping
Characterization Characterization … Characterizations
Prxp Pdxp Prxp
Property-Structure
Relationship
Property-Mapping Property-Mapping Property-Mapping
P f T , x, ng , d p ,... P f T, d p
…
Attribute P f T , ng P f T,s,...
Targets
TARGET Attribute-Property
A Relationship Standardization Standardization Standardization
A f P Q f P , T Q f P , T … Q f P , T
Standardization
Design Design Design
f P , T
Q
Z rxr TARGET
Qrxm '
Qmxr '
Qrxm Qmxr
1
N dxd TARGET
Qdxm '
Qmxd '
(Qdxm Qmxd ) 1
… N sxr TARGET
Qsxm '
Qmxs '
(Qsxm Qmxs ) 1
Figure 1. Solution concept for simultaneous multi-scale product design
3.1. Characterization
Characterization is a class of tools associated with the determination of not only
chemical constituency or molecular structure, but also of larger structural characteristics
describing the orientation and alignment of these molecules (often called microstructure
at the meso-scale). Some examples of characterization techniques include nuclear
magnetic resonance (NMR), x-ray diffraction (XRD), and infrared spectroscopy (IR).
The techniques are applied to a training set of molecules defined by an experimental
design used to explore the interesting facets of a set of property attributes Anxq. A more
detailed description of the types of characterization techniques utilized in this method
can be found in Solvason et al. (2009a) and Gabrielsson et al. (2002).
3.2. Decomposition
The most common decomposition technique is principal component analysis (PCA). By
definition, PCA uses the variance-covariance structure to compress the property data to
principal component data that contains much of the system variability. This result also
improves the interpretation of the data structure by consolidating multiple property
descriptors Pnxp into single, underlying latent variables which are devoid of co-linearity.
Using the inverse definition of the latent property substructure, the loadings Lpxm
represent the underlying latent properties, the scores Tnxm are mixtures of those
properties and the property descriptors are the weights of the underlying latent
variables:
Tnxm Pnxp Lcpxm (1)
This structure can be normalized through the use of a standardization matrix to ensure
that the property descriptors are weighted between 0 and 1. A detailed description of
this analysis can be found in Solvason et al. (2009b,c).
Qnxm Rnxp L pxm (2)
3.3. DOE & PLS

The relationship between the principal component scores Tnxm and the attribute
properties Anxq is then developed using a PLS model of a new DOE factorial design
where the scores are varied between their high (+1) and low (-1) levels. A detailed
description of the analysis can be found in Gabrielsson et al. (2002).
Anxq Tnxm Bmxq (3)
Where Bmxq are the regressed coefficients found using PLS. It should be noted that the
PLS model uses a separate set of scores and loadings to develop the relationship
between Anxq and Tnxm.
3.4. Property Clustering
To visually represent the latent property subspace, property clustering is used to
deconstruct the design problem into a Euclidean vector in the cluster domain and a
scalar called the Augmented Property Index (AUP). The clusters themselves are
conserved surrogate properties described by property operators, which have linear
mixing rules, even if the operators themselves are nonlinear. Methods for the
application of group contribution methods for molecular design have previously been
developed using property clustering by Eljack et al. (2007) and Solvason et al. (2009b).
To utilize the latent variables in the property clustering algorithm, it is important to
recognize that the data structure of Eq. 2 follows a linear mixing rule. By extension,
other complete molecules, molecular groups, or microstructure subspace properties
Qnxm, Qrxm, and Qsxm can be found by multiplying the latent variables Lpxm by the
associated fractions Rnxp, Rrxp, and Rrxp., respectively.
3.5. Simultaneous Design
Since the molecular group and particle subspace property relationships in Eqs. 1-3 were
derived using a decomposition technique, the constraints imposed by decomposition
should also be observed for any new molecular architectures created.
Q jxm Z jxr Qrxm j n, r , s,..., x (4)
For example, the molecular design of Eq. 4 is a representation of a linear mixture of the
underlying latent variable subspace properties, all of which are linear in nature; it is
essentially a linear mixture of linear mixtures. This observation assumes that any
nonlinearity in the attribute system is handled by the attribute-latent property
relationship and not the molecule-group subspace property relationship (Muteki and
MacGregor, 2006).
4. Case Study: Pharmaceutical Tablet Design

The case study discussed in this work is an extension of the review published by
Solvason et al. (2009a), but specifically focused on simultaneous molecular and particle
size design of pharmaceutical excipients. Three attributes that are important to direct
compression tablet manufacturing are disintegration time, crushing strength, and
ejection force. These attributes have been notoriously difficult to analyze based on
traditional mixing design because of the complex and highly nonlinear nature of
pharmaceutical excipients. In order to characterize the molecular architecture that
contributes to these attributes, they are mapped down to a latent domain subspace where
they can be approximated as linear combinations of molecular group and particle size
parameters. The domain subspace was found to be characterized by three latent
properties using a training set of 24 excipients Pnxp, which primarily consisted of
mannitol, maltitol, xylitol, maltodextrin, dextrin, isomalt, and microcrystalline cellulose
from various suppliers. To reduce the number of parameters in the subspace,
Multi-Scale Chemical Product Design using the Reverse Problem Formulation
decomposition was performed using PCA. It was found that the 1st latent property
explained 96% of the data, although, it was decided to keep all properties for illustrative
purposes. Using PLS models developed from the training set, a consumer specific set
of targets Anxq was mapped to the domain subspace Qnxm as shown in Table 1.
Table 1. Attribute targets mapped to the descriptor sub-domain
Subspace Targets Q1 Q2 Q3
UL 2.00 2.00 1.00
LL 1.00 0.00 0.00
The molecular groups present in the training set and identifiable by the characterization
were CH, CH2, OH, CHO, O, CH2-O, CHOH, CH2OH, CHCH2OH, CHCHO, D-
pyranose, Ocyc, E-pyranose, and cellulose. The last 2 groups were only present in the
micro-crystalline cellulose excipient and, as such, were excluded from the short chain
molecular design. In the previous molecular design using these molecular groups, the
selection of the D stereoisomer of pyranose compounds as candidates was preferred,
while the microcrystalline cellulose (MCC), which relies on the E stereoisomer was
notably outside the target range (Solvason et al., 2009a). This was an unexpected result
since MCC is not only one of the most common excipients used in acetaminophen
tablets, but also contained the most information on long-range order when
characterized. In order to better understand this conclusion, a simple investigation of
the meso-scale micro-structure was conducted by analyzing the influence of particle
size (50-700 Pm) on the attribute properties.
There are two ways to illustrate the influence of particle size on the design problem.
The first is to illustrate an average IR/NIR characterization for each particle size.
Although a mathematically valid method for visualizing the particle size effect, a
particle size ‘IR/NIR spectra’ has no physical meaning. A more appropriate method is
to observe the effect of particle size on the individual chemical constituents in the
model. Since increasing particle size results in larger absorbances (and smaller
transmittances), the primary effect was an increase in the AUP values of the training set
molecules and individual groups. A smaller, secondary effect due to a disproportionally
strong effect observed in the 4500-6000 cm-1 range for polyols (Storz and Steffens,
2004) caused a slight alteration of the location of the training set and molecular group
clusters (Fig. 2). Using lever arm analysis on the average and individual particle sizes
adjusted for positive AUPs (not shown), suggests that the final particle size should be in
the 180-200 Pm domain. Reparameterizing the molecular design for this particle size
results in the updated candidates shown in Table 2. In comparison to the candidate list
published by Solvason et al. (2009), some of the D-pyranose candidates are no longer
preferred, indicating that as the particle size is increased, the orientation of the
glycosydic linkages has a diminishing effect on the product attributes.
Table 2. Designed molecules
Candidate Molecules Q1 Q2 Q3
CH2OH-CH2OH 1.25 1.94 0.91
CH2OH-CH2-O-CH2OH 1.78 1.97 0.54
CH2OH-O-CH2OH 1.78 1.93 0.52
OH-CH2OH 1.19 0.94 0.32
OH-(D)pyranose-CH2OH 1.82 0.97 0.66
CH2OH-CH(OH)-OH 1.76 0.92 0.19
OH-CH(CH2OH)-CH2OH 1.77 1.30 0.33
C2
0
90 0
50
P3
0
C1
0
90
150
C3 360
250 50 180
0
520 200 90
90
P1 90 200
250 700
200 -0
300
-0
C1 zero C2 zero
C3 zero Target Feasibility Region
-0
Property Loadings Mannitol
MCC HDMC
-0
Xylitol Xylitab
Avg. Part. Size Maltitol -0
05 06 07 08 09 1 11 12 13 14 15
Figure 2. Design cluster diagram featuring particle sizes in Pm.
Conclusions
In summary, visualizing the reverse problem formulation using the combination of
property clustering and chemometrics provides a framework to solve property driven
processes without commitment to components and/or structures a priori. In particular,
it has been shown that using multivariate characterization techniques to describe both
the molecular and micro-structure of excipient particles combined with PCA and PLS to
map the attribute data into a latent property subspace provides insight into the structure
of pharmaceutical tablets. Planned further analysis will show that this interdependency
can be utilized for simultaneous molecular and microstructure design (including
polymorphism) to occur, circumventing the heavy combinatorial expense typically
associated with solving problems in the meso-scale.
References
Committee on Integrated Computational Materials Engineering, National Research Council
(2008). The National Academies Press, USA.
M.R. Eden, S. B. Jørgensen, R. Gani and, M. El-Halwagi (2004). Chem. Eng. Proc. 43.
F.T. Eljack, M.R. Eden, V. Kazantzi, M.M. El-Halwagi (2007), AIChE Journal, 53(5)
M. Fermeglia and S. Pricl (2009). Comp. & Chem. Eng, 33(10)
J. Gabrielsson, N. Lindberg, and T. Lundstedt (2002). J. Chemometrics. 16.
R. Gani. and E.N. Pistikopoulos (2002). Fluid Phase Equilibria, 194-197.
M. Hill (2004), AIChE Journal, 50(8), 1656-1661.
K. Muteki and J.F. MacGregor (2006), Chemom. & Intell. Lab. Sys., 85, 186-194
C.C. Solvason, N.G. Chemmangattuvalappil, M.R. Eden (2009a), Comp. Aided. Chem. Eng., 26,
C.C. Solvason, N.G. Chemmangattuvalappil, M.R. Eden (2009b), Comp. & Chem. Eng., 33(5).
C.C. Solvason, N.G. Chemmangattuvalappil, F.T. Eljack, M.R. Eden (2009c), I&EC Res. 48(4).
E. Storz and K. Steffens (2004). Starch/Starke. 56.

Multi-Scale Chemical Product Design Using The Reverse Problem Formulation

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Multi-Scale Chemical Product Design Using The Reverse Problem Formulation

Uploaded by

Copyright:

Available Formats

20th European Symposium on Computer Aided Process Engineering – ESCAPE20

S. Pierucci and G. Buzzi Ferraris (Editors)