You are on page 1of 14

www.acsami.

org Research Article

Predicting Young’s Modulus of Linear Polyurethane and


Polyurethane−Polyurea Elastomers: Bridging Length Scales with
Physicochemical Modeling and Machine Learning
Joseph A. Pugar, Calvin Gang, Christine Huang, Karl W. Haider, and Newell R. Washburn*
Cite This: ACS Appl. Mater. Interfaces 2022, 14, 16568−16581 Read Online

ACCESS Metrics & More Article Recommendations *


sı Supporting Information
See https://pubs.acs.org/sharingguidelines for options on how to legitimately share published articles.

ABSTRACT: Predicting the properties of complex polymeric


Downloaded via UNIV DEL PAIS VASCO on December 16, 2022 at 08:07:54 (UTC).

materials based on monomer chemistry requires modeling physical


interactions that bridge molecular, interchain, microstructure, and
bulk length scales. For polyurethanes, a polymer class with global
commercial and industrial significance, these multiscale challenges
are intrinsic due to the thermodynamic incompatibility of the
urethane and polyol-rich domains, resulting in heterogeneities
from molecular to microstructural length scales. Machine learning
can model patterns in data to establish a relationship between the
monomer chemistry and bulk material properties, but this is made
difficult by small data sets and a diverse set of monomers. Using a
data set of 63 industrially relevant and complex elastomers, we
demonstrate that accurate machine learning predictions are possible when monomer chemistry is used to estimate interactions at
interchain length scales. Here, these features were used to accurately (r2 = 0.91) predict the Young’s modulus of polyurethane and
polyurethane−urea elastomers. Furthermore, by a query of the trained model for compositions that yield a target modulus within the
range of accessible values, the capabilities of using this methodology as a design tool are demonstrated. The presented methodology
could become increasingly useful in building models for materials with small data sets and may guide the interpretation of the
underlying physicochemical forces.
KEYWORDS: Young’s modulus, thermoplastic polyurethane, machine learning, small datasets, multiscale modeling

1. INTRODUCTION heterogeneous and multiphase copolymers. As a result, the


An unmet challenge in polymer chemistry and polymer physics variety of competing physicochemical forces within the
is predicting bulk material properties solely on the basis of a material lead PUs to be fundamentally dissimilar from their
knowledge of the monomers and the synthesis conditions, polyolefin counterparts, making them an important system for
particularly in predicting properties that emerge across the development of modeling methodologies.4−6 The micro-
multiple length scales. Previous multiscale modeling of both structures of resulting PUs will in part dictate the mechanical
amorphous and semicrystalline polymers has demonstrated the response of the bulk material to an external stress or load, and
ability to accurately describe behaviors from the atomic to the here the material’s Young’s modulus (E) is used as an
macroscale,1−3 and strategies such as quantum chemical important property that is modeled. While there have been
calculations, molecular dynamic simulations, and continuum numerous works on modeling the linear elasticity and/or
mechanics treatments have provided substantial insight into viscoelastic mechanical responses of polymers with continuum
the physicochemical basis of many scale-specific phenomena. mechanics, many of these models have been derived and
However, the primary challenge remains the integration of validated for simpler single-phase homopolymer (polyolefin,
these models, in which the predictions must accurately cascade polyester, etc.) systems and thus have limitations in the
from molecular constituents to bulk physical properties. treatment of complex heteropolymers.7,8
In modeling the multiscale phenomena in polymeric
materials, polyurethanes (PUs) present a particularly signifi-
cant challenge. PUs are structurally diverse and tunable to Received: December 21, 2021
many applications, including their use in adhesives, coatings, Accepted: March 14, 2022
rigid and flexible foams, and elastomers in the automotive, Published: March 30, 2022
furniture, insulation, construction, sporting goods, and medical
device industries, among others. However, the diversity of
material properties possible for PUs is due to the formation of

© 2022 American Chemical Society https://doi.org/10.1021/acsami.1c24715


16568 ACS Appl. Mater. Interfaces 2022, 14, 16568−16581
ACS Applied Materials & Interfaces www.acsami.org Research Article

Figure 1. Polyurethane elastomer microstructure illustrated as a colored schematic. The legend in the bottom right corner of the figure identifies
each unique chemical structure in the polymer morphology.

In this study, we focus on one particular class of PU, linear thermodynamic and electronic properties of materials accu-
polyurethane and polyurethane−polyurea elastomers. In the rately.16−18 More recently, Ramprasad et al. have presented
interest of brevity, we will at times refer to these systems models that predict bulk polymer properties such as the glass
simply as polyurethanes or PUs, but it should recognized that, transition temperature (Tg) and band gap (Eg) and have
in cases where a diamine chain extender was employed, they captured influential trends such as side-chain length and
are actually polyurethane−polyureas. Like many PU systems, aromaticity degree.19,20 In addition to informatics, other
these materials are incompatible with many theoretical descriptor sources such as quantum chemical calculations,
investigations of viscoelasticity because they do not have a thermodynamic calculations, and experimental/process varia-
defined homogeneous reference state and microphase bles have also been used to parametrize training data sets
segregation alone has not been shown to adequately describe spanning atomic, micro, and bulk scales.21−23 These additional
the elastic mechanical response of the system. Other strategies can offer a more precise approximation of parameters
approaches have used neo-Hookean models with series of and can provide insight into material trends through varying
Maxwell elements aimed to combine the differing mechanical driving forces and/or using different scales. However, tradi-
responses, but these do not incorporate the thermodynamic tional ML methodologies often rely on large data sets, and the
forces known to be present between the hard and soft resulting models are built from a significant number of unique
segments.9,10 The mechanical responses in polyurethanes stem material instances. Most of these works used data sets of 150+
from both enthalpic and entropic forces and consequently yield unique data points (and in some cases 500+), which can be
a distribution of relaxation and retardation time scales.11 time-intensive when traditional experimental methods are
However, the development of a model which is parametrized used. Building models that incorporate state of the art
by experimentally tunable variables, such as the chemical multiscale modeling mentioned previously, while not depend-
structure of the polymer itself, that adequately captures the ing on large data sets to learn from, remains a significant
complex modalities which exist on different length scales and challenge.
accurately predicts bulk mechanical response has not yet been Hierarchical machine learning (HML) is a methodology that
demonstrated. introduces an additional layer of physiochemical latent
Machine learning (ML) introduces the capabilities needed variables to traditional input−output ML modeling, creating
to leverage the multiscale modeling of materials and has a hierarchy of data and providing constraints on the structure
become an increasingly popular strategy in the field of material of the response surface. This approach has been shown
science.12,13 ML can be used to build predictive models from previously to overcome problems associated with having a
sets of training data and learn behaviors and trends in limited number of training samples by modeling the system
materials. The conversion of structural information into response with parameters describing the underlying forces in
chemical and physical parameters to predict material properties the material.24,25 The model then predicts the responses from
was developed long before the use of data-driven techni- those physical forces (“descriptors” or “features”), rather than
ques.14,15 Transformations through functional group contribu- simple material input information (structure or formulation
tion methods and the generation of quantitative structure− data). A hierarchy is generated in which chemical variables are
property relationships (QSPR) have been used to predict the converted into physical descriptors through a series of
16569 https://doi.org/10.1021/acsami.1c24715
ACS Appl. Mater. Interfaces 2022, 14, 16568−16581
ACS Applied Materials & Interfaces www.acsami.org Research Article

equations or transformations, and a second learned trans- without the use of catalysts or solvents. For the series of samples that
formation of these physical descriptors is used to predict the included polypropylene glycol (PPG), 100 ppm of catalyst the
response variable. This series of transformations allows the dibutyltin dilaurate (DBTDL) was used to accommodate the slower
model to capture physically realizable trends, rather than just relative reactivity of the secondary hydroxyl group on the terminal end
statistical significance and can overcome the challenge of small of the polyols, ensuring that the reactions yielded high-molecular-
weight polymers. Once each sample was removed from the vented
data sets.26 Not only has this approach demonstrated improved oven 24 h postreaction, they were left to completely cure at room
accuracy on model test sets, but the interpretability of the temperature for 1 week before mechanical testing began.
model increases because physically meaningful features are 2.2. Measurements. Once each sample was cured, Fourier-
parametrized. This model architecture can also lead to the transform infrared (FT-IR) spectroscopy was performed with a
discovery of new trends that may govern material properties Frontier spectrometer (PerkinElmer) in the 4000−700 cm −1
beyond the instances used to train the model and therefore wavenumber range. The spectra indicated the loss of the free
may also be used as a design tool to tailor a formulation to the isocyanate peak (2200 cm−1) and urea group formation (1690−1640
desired properties. Through the utilization of HML, here we cm−1) for the polyurethane−polyurea systems and a lack thereof for
show a methodology for the prediction of the Young’s modulus the polyurethane systems. Then, the Young’s modulus of each sample
of PUs from a small data set of 63 unique samples using was measured by measuring stress−strain data in tensile mode using a
physicochemical data that additionally can provide insight into rheometer (Discovery HR-2) at room temperature. The strain rate
during tensile measurements was kept constant at 0.05 mm/s for all
chain interactions. We propose that this approach can allow
samples to avoid deviations in mechanical response due to different
data-driven algorithms to predict bulk material properties from deformation rates.27 The samples were analyzed with the TRIOS
smaller data sets more accurately than black-box modeling. software package, and the modulus was documented using the onset
slope from the stress−strain data. Each sample modulus value was an
2. METHODS average of five tensile experiments, and the final reported values had
standard deviations of 5 MPa or less.
2.1. Material Library. Polyurethane and polyurethane−polyurea
2.3. Data Set Generation. 2.3.1. Chemical Variables. Each
elastomers are a product of a polyaddition process, involving the
sample in the data set was initially accompanied by the chemical
copolymerization of three major components: macrodiols, also known
structure, formulation, or known chemical data. This initial feature
as polyols, diisocyanates, and low-molecular-weight diol or diamine
space represents the explicit representation of the chemical structure
chain extenders. During the polyaddition process, the alcohol and
isocyanate groups of the monomers react to yield polymer chains with and is similar in format to popular molecular fingerprinting techniques
many urethane and/or urea junctions (i.e., polyurethanes or in which functional groups were represented by placeholding integers
polyurethane−polyureas, respectively). While the polymerization in a unique sequence that represents the molecule/repeating unit.18 In
proceeds, the product phase segregates into “hard” (urethane-rich addition to the chemical structure, the formulation and known
regions resulting from the reaction of the diisocyanate and chain chemical data were also included. These are values that can be found
extender organized by hydrogen bonding) and “soft” (amorphous or on the isocyanate or polyol’s technical data sheet, such as density and
semicrystalline backbone of the macrodiol) segment domains (Figure molecular weight. In addition, stoichiometry that was directly
1). This thermodynamic and kinetic phase segregation results in calculated during the synthesis of each sample was also included,
atomic-scale interactions between monomers that govern complex e.g. the NCO index of the prepolymer. In total 18 chemical variables
OH
intra- and intermolecular arrangements of growing chains, ultimately were used to represent the simplest parameters that determine
yielding complex microstructures as a result. In this work, a material chemical structure and formulation (the data set is provided in the
training library was synthesized using systematic variations in the Supporting Information). Throughout the remainder of the manu-
chemical structures of the polyurethane monomers and the script the models (Ê = f(xi)) trained on just these 18 chemical
composition of the formulations. A total of 63 unique linear variables {xi} were compared to models (Ê = f(yi)) trained on just the
polyurethane or polyurethane-polyurea formulations were prepared physical variables {yi}, which were mapped from the chemical
that spanned six polyols (including polyethers, polyesters, and
variables f: {xi} → {yi} using the methods discussed in the following
polycarbonate), five diisocyanates (including aromatic and aliphatic),
sections.
and two chain extenders (diol and diamine). The full list of product
2.3.2. Cheminformatics. By representation of the hard and soft
names and structures can be found in Table S.4 and Figure S.1 in the
segments as simplified molecular-input line-entry system (SMILES)
Supporting Information. Each sample was synthesized using a
strings, physicochemical features were computed using the RDKit
standard two-step polyaddition method. In the first of two synthesis
steps, a prepolymer was made by mixing the polyol and diisocyanate database.28 The software uses patterned algorithms that employ a
in a reaction beaker heated to 110 °C for 2 h with constant series of filtering mechanisms based on topology and electronic
mechanical stirring. By introducing a stoichiometric excess of similarity to generate features for each unique input.29 The SMILES
diisocyanate to the polyol (at least 2:1), the product of the first strings for each unique PU repeating unit can be found in Table S.1 in
step was a mixture of isocyanate-capped chains and excess the Supporting Information. Every feature calculation was performed
diisocyanate, typically referred to as a prepolymer. In a second on the polymerized unit structure: i.e., terminal urethane groups on
reaction step, the prepolymer was allowed to react with a the hard segment structure with one chain extender structure
stoichiometrically predetermined amount of chain extender, and a included. The soft segment structure calculations were performed
linear, high-molecular-weight network of chains was formed. The ratio on the repeat unit of the polyols, and all repeating-unit structures were
of isocyanate groups to alcohol and/or amine groups in the second capped with hydrogen atoms.
step is referred to as the NCO:OH index, and for this sample library, 2.3.3. Density Functional Theory (DFT) Calculations. The
the index was kept constant at 1.05 (slight isocyanate functional group ground-state electronic parameters of each hard-segment repeating
excess). Similarly, the hard segment content, defined as the weight unit were calculated as additional features. Using the Gaussian 16W
percent of the chain extender and diisocyanate in the formulation, was software, each structure’s HOMO/LUMO gap, electric dipole
also varied by adjusting the reagent amounts, to prepare polyur- moment, ionization potential, polarizability (α), and hyperpolariz-
ethanes with 25%, 35%, and 45% hard segments. After the chain ability (β) were calculated.30 The B3LYP/6-31G* basis set was
extender was introduced and stirred into the prepolymer for 3 min, employed because of its previous utilization in quantum chemical
the increasingly viscous and exothermic mixture was poured into a calculations with polyurethanes.31 Then the five descriptors were
100 × 20 × 2 mm Teflon mold and cured in an oven at 100 °C for 24 extracted from the calculation output files and were tabulated into the
h. All but one sample class in the training library were synthesized feature set.

16570 https://doi.org/10.1021/acsami.1c24715
ACS Appl. Mater. Interfaces 2022, 14, 16568−16581
ACS Applied Materials & Interfaces www.acsami.org Research Article

2.3.4. Thermodynamic Models. The underlying thermodynamic be useful with small data sets containing data characterized by
driving forces are implicit to phase segregation and mechanical nonlinear behavior.39,40 This nonparametric model generates a prior
response and have been historically used to describe polyurethanes distribution of the latent function space using a mean function
theoretically and analytically.32,33 On the basis of work by Ginzburg et (typically the average value of the response variable) and a covariance
al. in their analytical PU Young’s modulus model,34 thermodynamic kernel describing the similarities in the training data features. The
and phase interaction descriptors were added to the feature set. Each model then makes predictions for test data on the basis of a normal
feature was calculated from either chemical structure or formulation distribution of values parametrized through the covariance matrix.
information. They include features such as the Hildebrand solubility GPRs are typically excellent with nonlinear data and interpolating on
parameters (δ), the densities (ρ), and equivalent and molecular a complex feature space, allowing for improved interpretation on a
weights (E and M, respectively) of the hard and soft segments and the continuous and multivariate Gaussian distribution. GPR has the
phase incompatibility parameter (χ) for a multiblock copolymer following differences from LASSO. (1) The model’s hyperparameters

ÅÄ ÑÉ
(δ − δS)2 ÅÅÅÅ MH E ÑÑÑ
derived from the Flory−Huggins approach shown in eq 1. are strictly dependent on the kernel(s) and are automatically

ÅÅ + S ÑÑÑ
optimized during the covariance computation. (2) It does not have

ÅÅ 2ρ ρS ÑÑÑÖ
ÅÇ H
an inherent regularization capability: i.e., it uses the entire sample
χN = H
RT feature vector to generate the posterior distribution and may overfit
(1)
unless a feature-ranking method is used to decrease the dimension-
2.3.5. Experimental Data. FT-IR absorbances have been shown to ality of the feature space manually. Here, through a combination of
estimate the relative degrees of hydrogen bonding in polyurethanes.35 both LASSO and GPR models, a low-dimensional feature space of
For each sample, 10 spectral features were extracted to be included in physicochemical forces was also developed to model and predict the
the model training space. The peak wavenumber and intensity of the Young’s modulus.
hydrogen-bonded carbonyl (∼1700 cm−1), free carbonyl (∼1750 2.4.2. Feature Selection and Model Tuning. Two different feature
cm−1), and amine (∼3300 cm−1) peaks were used alongside the peak selection methodologies were employed during the modeling. Doing
wavenumber and intensity of the hydrogen-bonded and free urea this reduced the dimensionality of the models and avoided them from
peaks (1640 and 1690 cm−1, respectively).36 becoming susceptible to overdetermination due to the number of
2.4. Machine Learning. The complete chemical and physical data available features (88) being greater than the number of training
sets were finalized by compiling and sorting each unique feature points (43 after train/test split). The first method was to use the
vector with the corresponding modulus that was experimentally regularization inherent to LASSO linear regression to select a small
measured. The validation set containing samples with 35% by weight subset of features to the modulus response. As was mentioned
hard segment was then removed from each data set and was withheld previously, through the incorporation of an L1 norm, the linear
during the training and testing of each model. The set, now consisting regression cost function was adapted to sparsify the available feature
of 54 unique PUs, was randomly split into training and test sets space while regressing to a linear function of best fit (eq 2). In the
containing 80% and 20% of the data points, respectively. All modeling equation, the cost (J) is minimized as a function of the coefficients
was performed in Python using the Scikit-Learn library and using data (β) about a linear function (y = ∑i βi xi).
standardized about a zero-centered mean.37 The model performance
was measured using the coefficient of determination (r2) and the root- J(β) = ||y − h(x , β)||2 + λL1(β) L1 = ∑ |βi |
mean-squared error (RMSE) metrics of each output prediction i (2)
against the experimentally determined modulus. In all LASSO feature selections, leave-one-out cross-validation
2.4.1. Modeling Strategies. Once the chemical to physical (LOOCV) was employed. The regularization tuning parameter (λ)
descriptor transformations were completed, the ML modeling from corresponding to a minimized mean-squared error (MSE) from the
the physicochemical descriptors to the material response variable was cross-validation was specified to the model, and the resulting
performed. Any experimental or empirical data, e.g., FT-IR features, coefficients from the regression were a sparse list of features that
that could not be directly mapped from chemical data using the most accurately predict the response through linear trends in the data.
processes mentioned previously were subject to sequential ML This process was performed over 15 different separations of the
models within the overall HML structure: i.e., the values ultimately training and testing data during the train/test splitting process.
used were output from models built from the curated or measured Averaging feature frequency and performance metrics over 15 unique
data that were parametrized by the chemical variables. splits ensured that features unimportant to a generalizable model were
Two regression approaches were compared in this study: linear not kept as artifacts of a particular or convenient split. For each split,
regression and nonparametric Bayesian estimation. The two modeling the 10 features with the highest coefficients were stored and the top
types provide different strengths and weaknesses that were leveraged five most frequent descriptors from the 15 lists were chosen as the
for specific tasks in building the model hierarchy. Linear regression final sparse set of LASSO-selected features. Due to the variability
determines a function of best fit about a response variable (y) by using limitations to a data set of this size, this rigorous methodology for
least-squares approximations to regress to a sum of optimal linear feature selection was performed to maximize each feature’s ability to
coefficients (β) of independent variables (x). Subsequently, demonstrate a robust and general relationship to the target property.
regularized linear regression, with the least absolute shrinkage and However, it is very likely that, in modeling a physical system, that
selection operator (LASSO), is a popular tool in forming a linear nonlinear relationships exist between input and output variables.
function that relates the response variable to a subset of the total Through the use of GPR and a permutation importance (PI) ranking
feature space.38 LASSO introduces an L1 penalty term to the least- algorithm (GPR-PI), a sparse list of features that may represent
squares linear regression, which effectively removes parameters with nonlinear and linear trends was computed. In GPR, the kernel
lower correlation to responses, and the weight of the penalty can be determines the distance metrics between points in the model. There
optimized through cross-validation. The primary benefits of using are numerous choices for kernels, but in practice, that which yields the
LASSO modeling are (1) it prevents overfitting from the use of too most accurate predictions in training and testing is used. The models
many variables in the model, (2) the result is interpretable with the were tuned by performing cross-validation of the data set over various
physical form of E = ∑i βi xi , (3) modeling parameters are easy to kernels and kernel combinations, and the PI algorithm ranked the
tune, and (4) the final model is sparse (reduced dimensionality). importance of each feature by randomly shuffling the values of each
However, one key disadvantage is that LASSO is poor at capturing feature and measuring the model performance each time. The features
nonlinear or nonlocal trends in the data and, as mentioned previously, that decrease the model’s r2 score most significantly throughout the
such trends may be crucial in creating a robust model for a complex shuffling process were selected to have greater predictive utility.
physical system. Similarly to LASSO, the GP feature selection was performed over 15
The second modeling type employed was Gaussian process unique train/test splits and the most frequent features of high
regression (GPR). This is a Bayesian modeling approach found to importance were pooled in a final sparse feature space. The kernel, the

16571 https://doi.org/10.1021/acsami.1c24715
ACS Appl. Mater. Interfaces 2022, 14, 16568−16581
ACS Applied Materials & Interfaces www.acsami.org Research Article

measure of covariance between features, found to most frequently fit Table 1. Average Performance Metrics of Models Predicting
the data most accurately is shown in eq 3.
ij d(xi , xj)2 yz
the Modulus As a Function of Monomer Composition and
σ 2 ijj 2ν yz
k(x , x′) = σ 2 expjjjj− zz + zz
Stoichiometry: i.e., Using Only the Chemical Variablesa
z
zz j
j z
Γ(ν)2ν − 1 jk lM z
ν

j
k { {
2
d(xi , xj )
2lRBF train RMSE test RMSE

ji 2ν zy
train r2 test r2

K νjjj d(xi , xj)zzz + noise


model (MPa) (MPa)

j lM z
k {
linear regression 0.84 0.63 12.71 14.60
(3) Gaussian Process 0.86 0.65 9.01 14.47
Regression
It is a summation of kernels including the radial basis function (RBF), a
These performances represent a “black box” approach to machine
the Matern function, and a white noise function.37,39 In the first two learning. The results, obtained over 15 unique train/test splits,
terms d(xi, xj)2 is the Euclidean distance between features, σ is the indicate a high level of overfitting as evidenced by the reduction in r2
signal standard deviation, and l is the length scale of each kernel (RBF from training to testing. In all 15 splits the same 18 features in the
and Matern, respectively) and represents the separation length at chemical descriptor set were used to train the models.
which the features become uncorrelated to the response. In the
Matern expression, Γ(ν) and Kν are γ and modified Bessel (second
kind) functions, respectively, and are used in combination to create a training, in which 80% of the data (selected at random) were
“less smooth” RBF parameter. The smoothing parameter (ν) was
always optimized to a value of 1.5 in this model, indicating that the
used to optimize the parameters or hyperparameters in the
function is once-differentiable. The simplified kernel expression models and the remaining 20% were used to test the
including what is often called the 3 Matern is shown in eq 4. performance of the model. Neither linear regression nor

ij d(xi , xj)2 yz i 3 d(xi , xj)2 yzz


k(x , x′) = σ 2 expjjjj− zz + σ 2jjj1 +
2 GPR models showed sufficient predictive capabilities using the
zz jj zz
j 2lRBF z j zz
chemical features exclusively. While statistical benchmarks vary

k { k {
2
lM with requirements, having a test r2 value of at least 0.7 can be

jij 3 d(xi , xj)2 zyz


considered necessary for obtaining meaningful predictions
expjjj− zz + noise
zz
j
from the model. Furthermore, the reduction in r2 from training

k {
lM to testing suggests that both models were overfitting the data
(4) used to train the algorithms, resulting in a lower value when
The combined kernel computed the covariance for each feature pair new data were used to test.
and forms a covariance matrix n features by n features in size. The 3.2. Machine Learning Models Parameterized by
feature covariance matrix and the feature mean vector were then used Physicochemical Forces. Physicochemical forces represent
to parametrize the normal posterior distribution to predict modulus latent variables that determine the system responses. Selecting
values, as shown in eq 5.
ij E yz
a set of forces for data-driven modeling requires a mechanistic
jj zz≈ Ν(μ ⃗ , Κ⃗)
j ̂z
understanding of the factors that determine the modulus, but
k E{
here tools of statistical learning were used to identify those that
(5)
are most strongly correlated with the responses. The
2.4.3. Optimization. After a trained and validated model was mechanical response of a polymeric material involves a
created, all 63 data points were used to train a final model aimed at complex interplay between competing enthalpic and entropic
designing new formulations within the chemical boundaries of the forces, where the tensile force required to perform work upon
sample library. The new model was subjected to a chemical variable and deform the material can be described by first principles as
design grid (∼700 theoretical formulations) and predicted the
modulus of samples with fixed chemical structure as a function of
the change in free energy of the system when elongated. This
changing prepolymer stoichiometry (NCO/OH index). The trans- general relationship among tensile force, enthalpy, and entropy
formation of the chemical variable grid to a physical variable grid is shown in eq 6, where f is the force required to deform the
(both grids are provided in the Chemical and Physical Data set in the material, enthalpy (H) and entropy (S) change are in response
Supporting Information) was analogous to the previously mentioned to elongation (∂L), with constant pressure (P) and temper-
ature (T):8
i ∂H yz i ∂S y
transformations. Thus, the trained model made predictions for every

f = jjj zz − T jjj zzz


point in the grid, yielding an accessible chemical structure and

k ∂L { P , T k ∂L { P , T
prepolymer stoichiometry for a desired modulus. Three formulations
were chosen to demonstrate the HML model as a design tool, (6)
targeted to have Young’s moduli of 10, 40, and 50 MPa, respectively.
A discussion in section 3 covers the three chemical structures explored For the theoretical “ideal” polymer (nonpolar, homogeneous,
and the accuracy of the model at those points. and isotropic), the elastic response of the material can be
approximated by the entropic portion of this equation alone,
3. RESULTS AND DISCUSSION assuming it to have no change in cohesive forces while being
3.1. Machine Learning Models Parameterized by stretched. The mechanical behavior of ideal Gaussian networks
Composition. To benchmark our modeling approach, we are thus functions of the conformational and configurational
began by using linear regression and GPR to predict the entropy of the material and temperature and are simply
Young’s modulus of the PUs in the data set as a function of the governed by component mixing. In heteropolymers such as
blend of monomers that comprised each polymer alongside PUs, the multiphase microstructure results in a thermodynamic
stoichiometric information that described the formulation. competition between the two free energy terms mediated
Both linear regression and GPR with the entire chemical through many physicochemical interactions. The focus of this
variable feature set were performed, which represented a work is on the methodologies to generate modeling descriptors
traditional “black box” approach to machine learning. The across multiple scales and processes to capture these factors are
performance metrics of each model on the training and testing discussed.
sets are presented in Table 1. In these models, the original data 3.3. Training Set and Associated Physicochemical
set comprised of 63 samples was divided into a portion for Descriptors. The 63 samples in this data set were a
16572 https://doi.org/10.1021/acsami.1c24715
ACS Appl. Mater. Interfaces 2022, 14, 16568−16581
ACS Applied Materials & Interfaces www.acsami.org Research Article

chemically diverse group of industrially relevant linear Table 2. Average Performance Metrics of Machine Learning
polyurethane and polyurethane−polyurea elastomers that Models Predicting the Modulus as Parametrized by the
contain complex intra- and intermolecular interactions, multi- Physical Features in Eqs 7 and 8 over 15 Unique Train/Test
phase incompatibilities, and dispersion, dipole, and hydrogen- Splitsa
bonding forces. The use of cheminformatics through the
model train r2 test r2 train RMSE (MPa) test RMSE (MPa)
RDKit library allowed the chemical structure of the polymers
to be represented by electronic, shape, and topological LASSO 0.78 0.76 11.21 11.67
descriptors. Quantum chemical DFT calculations offer a GPR-PI 0.99 0.92 2.72 5.89
a
second approach to describing the unique electronic behavior This feature space included descriptors derived from cheminfor-
of varying hard segment chemical structures. Thermodynamics matics, quantum chemistry, thermodynamics, and experimentally
and analytical equations based on Flory−Huggins theory can determined values of FT-IR and curated values for the soft segment
provide information on the phase interaction and describe the melting temperature.
nonequilibrium state of the PUs. Finally, the frequency shifts
and the relative intensities of the FT-IR peaks can describe to both the FT-IR feature and the soft segment Tm had not yet
hydrogen bond strength. The final physicochemical feature been included in the HML architecture. (2) The GPR model
vector was generated to provide information beyond the may have been overfitting due to the high training accuracies of
atomic scale by describing the molecular-range interactions. r2 = 0.99 and RMSE = 2.72 MPa, most likely due to the
3.4. Feature Selection with Physicochemical Forces. complexity of the kernel employed. While this iteration of the
LASSO and GPR-PI were then performed on the multiscale model was not entirely parametrized by the chemical variables,
physicochemical features (“physical feature set”) generated by the goal of this first iteration was to evaluate the model
computational and experimental methods. The selected predictive performance when it was supplemented with a
features from both strategies are represented in eqs 7 and 8. sparse set of physical descriptors.26 The next iteration
These sparse feature sets were determined using their presented will include the sequential models mentioned
frequency over multiple data splits as mentioned previously. previously. The overfitting caused by the RBF + Matern
At this point in the analysis, the other features were removed kernel was initially chosen because it outperformed all other
from the physical feature set and the model performance covariance expressions when the testing data were fitted.
statistics presented in the following tables are for models Therefore, feature selection and cross-validation were
parametrized by only those features shown. The average performed with this specific kernel and a further dimension-
performances over 15 different divisions of data did not have ality reduction significantly decreased the performance metrics
different feature groupings. The functional form of each model (Δr2 ≈ −0.12). While the model adopted may be overfit, the
is different due to the mathematical differences presented in primary aim of this work was to explore the incorporation of
section 2.4. LASSO yields a linear equation of best fit with physical variables in the model framework. A more in-depth
optimized coefficients, while GPR yields a feature vector that analysis of the sparse parametrization of the model across other
most effectively describes the variance in a normal distribution kernels and modeling techniques will be the subject of future
formed about the known modulus values. work.
LASSO: E ̂ = − 0.33qed HS + 0.19COW,free − 0.39PPPP In the final modeling step, the sequential models were
created and added to the hierarchy. In summary, the chemical
+ 0.17Tm,SS + 0.12Hyperpolar(β) (7) variables were transformed into the physicochemical variables
through three different transformations: (1) known equations
GPR: P(E|̂ wHS, qed HS, CO W,free , DPPP , Tm,SS , U,i MPC HS to directly calculate DPPP and wHS from stoichiometry from
eqs 9 and 10, (2) cheminformatics determination of qedHS and
− MPCSS) (8) MPCHS − MPCSS, from the SMILES input (Supporting
Information), and (3) GPR modeling of COW, free and Tm, SS
Immediately apparent is the degree of overlap in selected from the chemical variable feature space. The results of the
features between both modeling strategies; both contain the regression models in predicting the IR and melting temper-
lump electronic molecular property of the hard segment ature features were both sufficiently accurate to use in the
(qedHS),the wavenumber of the free carbonyl peak (COW, free), broader model (average test scores of at least r2 = 0.85). A
the degree of polymerization of the prepolymer (DPPP), and schematic of the final model hierarchy is presented in Figure 2.
the melting temperature of the soft segment (Tm,SS). However, OH
the model performances with the GPR-PI model selected ( NCO )PP
features outperformed LASSO in accurately predicting the DPPP = 1 + OH
responses in the test data by a significant margin (Table 2). 1 − ( NCO )
PP (9)
The GPR selected three additional features compared to the
LASSO: the hard segment weight fraction (wHS), a urea binary miso + mce
classification variable (Ui), and the difference in maximum wHS =
mpoly + miso + mce (10)
absolute partial Gasteiger charges of the hard and soft
segments (MPCHS − MPCSS). Through the utilization of Figure 3 compares testing performances of the chemical
modeling with a probabilistic function more suitable for small (“black box”) model and the final physicochemical model. The
nonlinear data sets and the incorporation of these three HML model had average training scores of r2 = 0.98 and
additional features, using a GPR for predicting the modulus as RMSE = 2.72 MPa and testing scores of r2 = 0.91 and an
a function of the physical feature space was chosen as the top RMSE = 6.88 MPa, which were slight dropoffs in comparison
half of the model hierarchy. However, two items should be to the models which without reparametrizing the output (with
noted. (1) The sequential models mapping chemical structure measured or documented values of COW, free and Tm,SS,
16573 https://doi.org/10.1021/acsami.1c24715
ACS Appl. Mater. Interfaces 2022, 14, 16568−16581
ACS Applied Materials & Interfaces www.acsami.org Research Article

Figure 2. Final HML model schematic. Chemical structure and formulation variables are transformed into physical and latent forces spanning
multiple length scales through a series of transformations. The transformations include two sequential GPR models, two RDKit computations, and
two outputs from the known equations “Chemical Descriptors”, all from chemical layer inputs (xi). The physical layer is then used to parametrize a
GPR which estimates each sample’s Young’s modulus.

Figure 3. Performance comparison of test data using Gaussian process regression with chemical variables to model the sample modulus and the
HML approach with the physiochemical hierarchy. The plots show the actual E value in comparison to the E value predicted. The 45° blue line has
been added to show where predictions of perfect accuracy would have been. These plots show one unique training test split instance included in the
average score value reported.

respectively) by chemical composition. This was expected, descriptors, the final HML model can make predictions solely
given that each regression model contained error contributions from the chemical structure and formulation input. Addition-
to the overall model due to synthetic and experimental ally, by using the multiscale physicochemical descriptors in the
variance in the empirically determined COW, free feature as well final model, a significant increase in model performance was
as limitations of the chemical feature space to fully capture a
still observed between the physicochemical model and the
continuous function output that Tm,SS has in the latent space.
The limited accuracy of the sequential models results in an original baseline model despite being parametrized by nine
error that propagates to the final modulus prediction in the fewer features (16-parameter feature vector in the chemical
final hierarchy. However, by introduction of regression as a space vs 7-parameter feature vector in the physicochemical
mode of transformation to obtain additional physicochemical space). The average RMSE of the final model was comparable
16574 https://doi.org/10.1021/acsami.1c24715
ACS Appl. Mater. Interfaces 2022, 14, 16568−16581
ACS Applied Materials & Interfaces www.acsami.org Research Article

Figure 4. Black box versus HML modeling comparison. On the left, the chemical variables are used to directly predict the modulus, resulting in a
test score of r2 = 0.65 and RMSE = 14.47 MPa, whereas on the right, the chemical variables are transformed into physical variables on the molecular
scale through a series of transformations, resulting in a test score of r2 = 0.91 and RMSE = 6.88 MPa. The physical layer is then used to parametrize
a GPR that estimates each sample’s Young’s modulus.

Figure 5. Performance comparison of validation data using Gaussian process regression with chemical variables to model the sample modulus and
the HML approach with the physiochemical hierarchy. The plots show the actual E value in comparison to the E value predicted. The 45° blue line
has been added to show where predictions of perfect accuracy would have been. These plots show one unique training test split instance included
in the average score value reported.

to the standard deviation observed in measuring the moduli of “black box” models would rely on statistical interpolation,
the samples experimentally. HML utilizes the physicochemical trends learned from the
3.5. Validation Set. Demonstrating the model’s ability to training data, and the model was able to predict the validation
capture trends outside of the original data space is essential in
set with r2 = 0.91 and an RMSE of 4.15 MPa accuracy on
applications. By withholding nine samples from the total data
set that were formulated with 35% hard segment by weight, the average (Figure 4). Figure 5 shows the performance
model predicted the modulus without requiring any prior comparison between the baseline and HML model prediction
instances of such formulations in the training process. While of the validation set on one particular training test split. Table
16575 https://doi.org/10.1021/acsami.1c24715
ACS Appl. Mater. Interfaces 2022, 14, 16568−16581
ACS Applied Materials & Interfaces www.acsami.org Research Article

Table 3. Average Performance Metrics of the Baseline Chemical Variable-Exclusive GPR Model and the Final Physicochemical
HML Model over 15 Unique Training Test Splitsa
feature space train r2 test r2 validation r2 train RMSE (MPa) test RMSE (MPa) validation RMSE (MPa)
chemical 0.86 0.65 0.65 9.01 14.47 8.30
physicochemical 0.98 0.91 0.91 2.72 6.88 4.15
a
The “physicochemical” feature space here refers to the features shown in eq 8 and in the hierarchy represented in Figure 2. The difference between
the “physicochemical” scores here and the “GPR-PI” scores reported in Table 2 is the incorporation of the sequential models to parametrize all
layers of the model using the original 18 chemical features.

3 shows the average performance comparison between the


models.
The accuracy observed can be due to mapping the modulus
to continuous physical variables rather than fitting it to
chemical structure fingerprints. Thus, the HML model
demonstrates the ability to generalize to new formulations by
leveraging their molecular-scale latent forces. It should be
noted that predicting test set values may be more challenging
than predicting validation set values for some training test
splits, hence the lower validation RMSE. For example, the
model may have received the data for PPG/MDI/BDO/25%
during training, may or may not have had predicted PPG/
MDI/BDO/45% during testing, and then predicted PPG/
MDI/BDO/35% during validation. Therefore, it is expected
that the validation set outperformed the test set on average,
Figure 6. Permutation feature importance box and whisker plot of the
given that the validation extrapolations spanned less distance in final seven features in descending order of importance to the final
the learned feature space. Gaussian process model. Notched boxes in red represent features on
In addition to increasing the accuracy and generalizability of the microscale. Notched boxes in blue represent structural features on
models, building models from physicochemical feature spaces the intermolecular scale. Notched boxes in green represent electronic
enhances interpretability. Using physically and chemically features on the intermolecular scale.
meaningful features, one can subject those selected to cross-
analysis with domain knowledge and the literature to observe feature in the model. The effects of soft segment length and
correlations or discrepancies. These observations can aid in concentration have been systematically studied for their effects
understanding what drives system responses in the material on bulk properties and are highly sensitive to varying ether,
and may become crucial in expanding material searches ester, and carbonate groups in the polyol backbone.43,44
beyond what is experimentally available in existing data sets, By a capture of the degree of entropic phase mixing in each
thus providing new design principles as well as extending our sample with these two features, the model was able to leverage
knowledge of complex materials. the other features describing smaller length scales to map
3.6. Physical Significance of Selected Features. The trends more enthalpic in nature. The urea variable (Ui) is a
final sparse feature space included seven variables describing simple binary identification of whether the sample was
various underlying driving forces in PUs. The interpretation of synthesized with a diamine chain extender or not: i.e., the
the selected features can be separated into two length scale feature was simply 0 for BDO formulations, and 1 for MOCA
categories: microstructure and molecular interaction. Figure 6 formulations. The differences between polyurethane and
shows the final permutation importance scores for the features polyurethane/polyurea were significant, most notably in
in descending order. intermolecular hydrogen bonding in the hard segment, and
The hard segment weight fraction (wHS) and average degree have been extensively studied elsewhere.36,45 The presence of
of polymerization of the prepolymer (DPPP) have direct urea linkages in the hard segment may allow bidentate
implications on the amount, size, and dispersion of the hard hydrogen bonding to occur, in comparison to monodentate
and soft segment phases. The two features are highly bonding in urethane groups. The increase in hydrogen-
correlated, both being functions of the equivalent weight of bonding potential led to polyurethane−polyurea having more
the monomers and ratio of OH contributions from the polyol order in the hard segment and an increase in phase
and chain extender, respectively. Despite their correlation, segregation, resulting in larger but more compact physically
when wHS is removed from the final feature set, the cross-linked structures, imparting a higher resistance to
corresponding coefficient of determination decreased by Δr2 deformation to the material. The wavenumber of the carbonyl
= −0.02 for the average test score and Δr2 = −0.11 for the peak (COW,free) can also be attributed to the extent of
average validation score, suggesting its importance on hydrogen bonding and phase segregation in PUs. Previous
interpolating across the relevant chemical space. It has been studies have directly estimated the degree of phase segregation
consistently observed that by increasing wHS, the entropic (DPS) from IR data.46 However, in PUs with varying degrees
mixing of the phases decreases and larger, more phase- of crystallinity and hydrogen bonding in the soft segment, the
segregated structures form and percolate into the bulk hard segment characteristics did not exclusively correlate with
microstructure, increasing the modulus.41,42 Second, additional descriptors associated with order and secondary bonding. The
trends mapped through the DPPP values, which were unique vibrational shift patterns provided by this feature may be
for each prepolymer, became the second most important providing the model with features associated with hydrogen-
16576 https://doi.org/10.1021/acsami.1c24715
ACS Appl. Mater. Interfaces 2022, 14, 16568−16581
ACS Applied Materials & Interfaces www.acsami.org Research Article

Figure 7. Schematic representation of the TPU microstructure with highlighted regions pertaining to some of the final features in the
physicochemical model. The soft segment melting temperature (Tm,SS) and free carbonyl peak wavenumber (COW,free) (subschematics on the left
side of the figure) represent intermolecular parameters aiming to capture order and secondary bonding characteristics in both the hard and soft
phases. The COW,free visualization represents the lack of hydrogen bonding between the carbonyl functional group in the polyol backbone and the
urethane linkage in the hard segment. The lump molecular property of the hard segment (qedHS) and difference in maximum partial charges
(MPCHS − MPCSS) (middle and top right of the figure, respectively) aim to capture the electronic nature and interfacial polarizability between
phases. The degree of polymerization of the prepolymer (DPPP) at the bottom of the image represents entropic mixing as well as hard segment
dispersion in the multiphase system. The hard segment weight fraction (wHS) and the urea identification (Ui) features are not present in this
visualization.

bonding degree and phase segregation information as functions interaction between unique structures. The dielectrics of the
of implicit chemical structure effects on secondary bonding in soft/hard segment boundary have previously been related to
both the hard and soft segments. In contrast, the melting PU behavior.49 Finally, the most important feature in the
temperature of the soft segment (Tm,SS) likely provided the model was the quantitative estimation of druglikeness of the
model with crystallization trends through dispersion forces in hard segment (qedHS), which is an empirical lump parameter
the soft segment. The manipulation of the soft segment of characteristic structural and electronic molecular properties
structure has been previously studied for how the resulting of the repeating unit. While this informatics descriptor was not
crystallization in the phase affects the mechanical properties.47 designed for polymers, qedHS is a function of the molecular
Crystallization through intermolecular dispersion forces in the weight, partition coefficient, polar surface area, hydrogen bond
polyol backbone is a major source of enthalpic strength in the donors and acceptors, and aromaticity, among others.50 It
bulk material and has also been found to lead to an increased characterizes the hard segment with this series of relevant
phase segregation in PUs.48 The model found high importance electronic and size parameters spanning molecular and
in these intermolecular features that aim to describe how the electronic length scales. The HML model leveraged a final
hard and soft segments order through dispersion and feature space that contained seven unique physicochemical
secondary forces and how segregated they may become as a terms that can be interpreted to govern the mechanical
result. response of the complex material across multiple length scales.
The two electronic features in the final model computed by Five of the final seven features are depicted within the PU
cheminformatics capture interactions between the chemical elastomer microstructure in Figure 7.
units and offer an atomic-scale measure of the internal energy 3.7. Application of HML as a Design Tool. While
in the polymer. The difference in maximum absolute partial predicting the properties in virtual experimentation is an
Gasteiger charges between the segments (MPCHS − MPCSS) is important assessment of a model, experimental validation of
a measure of the interfacial polarizability, which contributes to ML models as a design tool is becoming a standard approach
the cohesive energy. It is interesting to note that this feature to demonstrating capabilities.51 When the GPR trained on all
was highly correlated with the other DFT-computed features 63 data points was subjected to a grid of ∼700 unique PU
and the highly collinear hyperpolarizability (β) feature was formulations, modulus predictions were made throughout the
present in the final LASSO equation. By use of the difference grid and the desired responses could be mapped to structure
between the two individual informatics values, the model and stoichiometry. Here the design aim was to identify unique
identified the importance of the relative electrostatic samples that would yield three different modulus values while
16577 https://doi.org/10.1021/acsami.1c24715
ACS Appl. Mater. Interfaces 2022, 14, 16568−16581
ACS Applied Materials & Interfaces www.acsami.org Research Article

Table 4. Target, Closest Model Conditions Found on the Grid, and Experimentally Measured Moduli Observed during the
Experimental Validation and Designa
target modulus model prediction experimental modulus chain NCO/OH index of
(MPa) (MPa) (MPa) polyol diisocyanate extender prepolymer
10.0 9.54 8.30 PTMEG 2000 HMDI BDO 2.70
40.0 40.34 46.20 PEA 2000 TDI MOCA 2.44
50.0 51.19 60.40 PHMC 2000 HDI BDO 5.30
a
All three samples were constrained to a 1.05 NCO/OH index for the entire polymer: i.e., including the chain extender stoichiometry. The index of
the prepolymer was chosen as the independent variable for the optimization study.

Figure 8. GPR model’s learned relationship for each chemical structure combination explored during the experimental validation study. The blue
points are data points the model trained on with the corresponding uncertainty being represented by the blue error bars. The green points are the
grid used to determine the trend the model learned, where the light green shaded region represents the uncertainty throughout the trend. The red
points are the experimentally measured values of the design samples, coinciding with the NCO/OH index of the prepolymer used in their synthesis.
Perfect design accuracy would be represented by the red point falling on the green point trend.

spanning differing polyol, diisocyanate, chain extender, and set) could be constructed in ways to better capture the
prepolymer indexes: 10, 40, and 50 MPa, respectively. The underlying forces of interest: for example, eliminating
specific formulations provided by the model, along with the categorical features such as the presence of urea groups (Ui)
experimentally measured moduli of each synthesized in to parametrize the entire physical feature space in continuous
practice, are provided in Table 4. variable forces to improve generalizability. Similar to the way
Figure 8 shows the learned trend for each fixed chemical we varied polyol and isocyanate structures in this study, others
composition combination alongside the sample instances the have found that a variation of the chain extender has significant
model either trained on or designed. The 10 and 40 MPa effects in polyurethane systems.54 By introducing more
samples both fall within the model’s confidence interval, structural diversity to the chain extender and having
whereas the 50 MPa sample falls slightly outside of it, descriptors that represent the physicochemical forces corre-
suggesting that the model has more accuracy when it sponding to those structural changes, we may be able to
generalizes to modulus values in the lower regime of the elucidate the effects of the hard segment further. Second, while
data set, albeit the small sample size chosen here. It should also the physical feature space employed in this work spanned
be noted that the distribution of modulus values in the training length scales from electronic to microstructural, the prediction
set was skewed to contain many more instances in a low- of a latent morphology was not apparent. As was mentioned
modulus regime (45 of the 63 data point had E < 36 MPa). In previously, modeling the complexities in PU microstructure is
general, the model has learned that for fixed compositions the a significant challenge and often requires rigorous experimental
relationship between the modulus and NCO/OH prepolymer determination through small-angle X-ray scattering
index resembles a rooted power function. Other systematic (SAXS).55,56 To truly bridge the gap from the chemical
studies of varying prepolymer stoichiometry (or a similarly structure to bulk behavior, the inclusion of a physicochemical
hard segment weight fraction) have reported experimental determination of the intermediate length scales may be crucial.
results describing a variety of different trends including root,52 We propose that direct measurements of the microstructure
high-order,41 and linear53 relationships between the modulus (SAXS) accompanied by other experimental descriptors, such
and prepolymer stoichiometry (hard segment content). While as COW,free, may allow the model to construct a latent
the experimental validation accuracy demonstrates the design morphology and thus better equip the model for more
applicability of the method presented, there is an insufficient challenging prediction and design tasks. Finally, the method-
amount of data for each unique structure to further elucidate ology can also be augmented by incorporating sampling
the learned trends of the model. approaches into data set construction. Specifically, it has been
3.8. Future Directions. This work presents modeling well demonstrated that Bayesian optimization throughout data
materials that are heterogeneous at multiple length scales using set creation is an effective tool for identifying the data points of
physicochemical modeling in a framework of machine learning. highest model importance.57 Simply put, the model can
This multiscale physicochemical approach to predicting identify which sample should be synthesized next to decrease
Young’s modulus in PUs has provided insights into how uncertainty in the learned feature space. A methodology as
modeling efforts in the future may be improved and expanded. such, coupled with a structure-to-continuous and physical
First, the design of the experiment (or organization of the data latent force feature space presented here, introduces the
16578 https://doi.org/10.1021/acsami.1c24715
ACS Appl. Mater. Interfaces 2022, 14, 16568−16581
ACS Applied Materials & Interfaces www.acsami.org Research Article

potential for ML-based models to extrapolate beyond learned Christine Huang − Department of Chemistry, Carnegie
material spaces and become invaluable tools in the fields of Mellon University, Pittsburgh, Pennsylvania 15213, United
material discovery. The combination of increased structural States
diversity improved the featurization of the physical space Karl W. Haider − Covestro LLC, Pittsburgh, Pennsylvania
across larger length scales, and exploration into state of the art 15205, United States
modeling approaches is the basis of proposed work. Complete contact information is available at:
https://pubs.acs.org/10.1021/acsami.1c24715
4. CONCLUSION
The model hierarchy presented here uses the frequency of Notes
chemical functional units in the monomers and stoichiometry The authors declare the following competing financial
exclusively as the input. Then, through the use of a rich interest(s): N.R.W. has started a company to explore the
literature base on phenomenological forces in PUs, a series of industrial application of these methods and discloses a
physicochemical descriptors were generated to parametrize a potential conflict of interest.


Gaussian process ML model. Through transformations, which
include established equations, cheminformatics, and predictive
sequential models, the HML model predicted the modulus of ACKNOWLEDGMENTS
tests sets with an average of r2 = 0.91 and RMSE = 6.88 MPa The authors thank James A. Thompson-Colón (Covestro
accuracy, which was greater than the performance of the “black LLC) for help in designing the feature space and Professor
box” model parametrized solely by the monomer composition. Stephanie Sydlik (CMU) for access to the experimental
While it improved the prediction accuracy and interpretability equipment. C.G. was supported by ARPA-E (DE-AR0001138).
through parametrization via multiscale physical variables, the J.A.P. and N.R.W. gratefully acknowledge support from a
model also showed the capability of interpolating through the Covestro Science Award.


data space and accurately predict the modulus of unseen
formulations (r2 = 0.91, RMSE = 4.15 MPa). The final feature REFERENCES
space contained seven parameters spanning electronic to
microstructural length scales, and the identification of how (1) Jackson, N. E.; Webb, M. A.; de Pablo, J. J. Recent Advances in
Machine Learning Towards Multiscale Soft Materials Design. Current
each had been previously linked to PU behavior was discussed. Opinion in Chemical Engineering 2019, 23, 106.
Finally, the model was experimentally validated through a (2) Li, Y.; Abberton, B. C.; Kröger, M.; Liu, W. K. Challenges in
design exercise to target three specific moduli values within the Multiscale Modeling of Polymer Dynamics. Polymer 2013, 5, 751−
model’s learned trends between chemical structure and 832.
stoichiometry. The presented methodology and resulting (3) Bouvard, J. L.; Ward, D.; Hossain, D.; et al. Review of
performance suggest the importance of incorporating multi- Hierarchical Multiscale Modeling to Describe the Mechanical
scale physicochemical parameters into complex material model Behavior of Amorphous Polymers. J. Eng. Mater. Technol. 2009,
design with small data sets. 131, 0412061−04120615.


(4) Yilgör, I.; Yilgör, E.; Wilkes, G. L. Critical Parameters in
ASSOCIATED CONTENT Designing Segmented Polyurethanes and their Effect on Morphology
and Properties: A Comprehensive Review. Polymer 2015, 58, A1−
*
sı Supporting Information
A36.
Table S.1 The Supporting Information is available free of (5) Uddin, M. S.; Ju, J. Enhanced Coarse-Graining of Thermoplastic
charge at https://pubs.acs.org/doi/10.1021/acsami.1c24715. Polyurethane Elastomer for Multiscale Modeling. J. Eng. Mater.
Repeating units and SMILES, full physical descriptor list, Technol. 2017, 139, 011001.
(6) He, Y.; Qiu, D.; Yu, Z. Multiscale Investigation on Molecular
GPR hyperparameters, commercial reagents, and chem-
Structure and Mechanical Properties of Thermal-Treated Rigid
ical structures (PDF) Polyurethane Foam Under High Temperature. J. Appl. Polym. Sci.
NCO/OH and E values (XLSX) 2021, 138, 51302.


(7) Painter, P. C.; Coleman, M. M. Fundamentals of Polymer Science;
AUTHOR INFORMATION Routledge: 1997.
(8) Hiemenz, P. C. Polymer Chemistry; Marcel Dekker: 2007.
Corresponding Author (9) Johlitz, M.; Steeb, H.; Diebels, S. Experimental and Theoretical
Newell R. Washburn − Department of Materials Science and Investigation of Nonlinear Viscoelastic Polyurethane Systems. J.
Engineering, Carnegie Mellon University, Pittsburgh, Mater. Sci. 2007, 42, 9894.
Pennsylvania 15213, United States; Department of (10) Khan, A. S.; Lopez-Pamies, O.; Kazmi, R. Thermo-Mechanical
Chemistry, Carnegie Mellon University, Pittsburgh, Large Deformation Response and Constitutive Modeling of
Pennsylvania 15213, United States; Department of Viscoelastic Polymers Over a Wide Range of Strain Rates and
Biomedical Engineering, Carnegie Mellon University, Temperatures. Int. J. Plast. 2006, 22, 581.
Pittsburgh, Pennsylvania 15213, United States; orcid.org/ (11) Landel, R.; Mechanical, F. Properties of a Polyurethane
Elastomer in the Rubber-to-Glass Transition Zone. J. Colloid Sci.
0000-0001-7843-8860; Email: washburn@ 1957, 12, 308−320.
andrew.cmu.edu (12) Clegg, S.; Characterising, P. Soft Matter Using Machine
Learning. Soft Matter 2021, 17, 3991−4005.
Authors
(13) Butler, K. T.; Davies, D. W.; Cartwright, H.; Isayev, O.; Walsh,
Joseph A. Pugar − Department of Materials Science and A. Machine Learning for Molecular and Materials Science. Nat. 2018,
Engineering, Carnegie Mellon University, Pittsburgh, 559, 547−555.
Pennsylvania 15213, United States (14) Bicerano, J. Prediction of Polymer Properties; CRC Press: 2002.
Calvin Gang − Department of Chemistry, Carnegie Mellon (15) Bicerano, J. Computational Modeling of Polymers; CRC Press:
University, Pittsburgh, Pennsylvania 15213, United States 1992.

16579 https://doi.org/10.1021/acsami.1c24715
ACS Appl. Mater. Interfaces 2022, 14, 16568−16581
ACS Applied Materials & Interfaces www.acsami.org Research Article

(16) Agrawal, A.; Choudhary, A. Perspective: Materials Informatics (37) Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.;
and Big Data: Realization of the ‘Fourth Paradigm’ of Science in Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofter, P.; Weiss, R.;
Materials Science. APL Mater. 2016, 4, 053208. Dubourg, V.; Vanderpias, J.; Passos, A.; Cournapeau, D.; Brucher, M.;
(17) Pilania, G.; Wang, C.; Jiang, X.; Rajasekaran, S.; Ramprasad, R. Perrot, M.; Duchesnay, E. Scikit-learn: Machine Learning in Python. J.
Accelerating Materials Property Predictions Using Machine Learning. Mach. Learn. Res. 2011, 12, 2825.
Sci. Rep. 2013, 3, 1−6. (38) Tibshirani, R. Regression Shrinkage and Selection Via the
(18) Mueller, T.; Kusne, A. G.; Ramprasad, R. Machine Learning in Lasso. J. R. Stat. Soc. Ser. B 1996, 58, 267.
Materials Science: Recent Progress and Emerging Applications. (39) Rasmussen, C. E. Gaussian Processes in Machine Learning.
Reviews in Computational Chemistry 2016, 186. Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect.
(19) Pilania, G.; Iverson, C. N.; Lookman, T.; Marrone, B. L. Notes Bioinformatics) 2004, 3176, 63.
Machine-Learning-Based Predictive Modeling of Glass Transition (40) Quiñonero, J.; Quiñonero-Candela, Q.; Rasmussen, C. E.; De,
Temperatures: A Case of Polyhydroxyalkanoate Homopolymers and C. M. A Unifying View of Sparse Approximate Gaussian Process
Copolymers. J. Chem. Inf. Model. 2019, 59, 5013−5025. Regression. J. Mach. Learn. Res. 2005, 6, 1939−1959.
(20) Patra, A.; Batra, R.; Chandrasekaran, A.; Kim, C.; Huan, T.; (41) Eceiza, A.; Martin, M. D.; de la Caba, K.; Kortaberria, G.;
Ramprasad, R. A Multi-Fidelity Information-Fusion Approach to Gabilondo, N.; Corcuera, M. A.; Mondragon, I. Thermoplastic
Machine Learn and Predict Polymer Bandgap. Comput. Mater. Sci. Polyurethane Elastomers Based on Polycarbonate Diols With
2020, 172, 109286. Different Soft Segment Molecular Weight and Chemical Structure:
(21) Mannodi-Kanakkithodi, A.; Chandrasekaran, A.; Kim, C.; Mechanical and Thermal Properties. Polymer Engineering and Science
Huan, T.; Pilania, G.; Botu, V.; Ramprasad, R. Scoping the Polymer 2008, 48, 297.
Genome: A Roadmap for Rational Polymer Dielectrics Design and (42) Yanagihara, Y.; Osaka, N.; Iimori, S.; Murayama, S.; Saito, H.
Beyond. Mater. Today 2018, 21, 785. Relationship between Modulus and Structure of Annealed Thermo-
(22) Xu, P.; Lu, T.; Ju, L.; Tian, L.; Li, M.; Lu, W. Machine Learning plastic Polyurethane. Mater. Today Commun. 2015, 2, e9−e15.
Aided Design of Polymer with Targeted Band Gap Based on DFT (43) Melchiors, M.; Sonntag, M.; Kobusch, C.; Jürgens, E. Recent
Computation. J. Phys. Chem. B 2021, 125, 601−611. Developments in Aqueous Two-Component Polyurethane (2K-PUR)
(23) Kopal, I.; Harničárová, M.; Valíček, J.; Krmela, J.; Lukáč, O. coatings. Prog. Org. Coatings 2000, 40, 99.
Radial Basis Function Neural Network-Based Modeling of the (44) Petrović, Z. S.; Javni, I. The Effect of Soft-Segment Length and
Dynamic Thermo-Mechanical Response and Damping Behavior of Concentration on Phase Separation in Segmented Polyurethanes. J.
Thermoplastic Elastomer Systems. Polymers (Basel). 2019, 11, 1074. Polym. Sci., Part B: Polym. Phys. 1989, 27, 545.
(24) Menon, A.; Childs, C. M.; Poczós, B.; Washburn, N. R.; Kurtis, (45) Luo, N.; Wang, D.-N.; Sheng-Kang, Y. Hydrogen-Bonding
Properties of Segmented Polyether Poly(urethane urea) Copolymer.
K. E. Molecular Engineering of Superplasticizers for Metakaolin-
Macromolecules 1997, 30, 4405−4409.
Portland Cement Blends with Hierarchical Machine Learning. Adv.
(46) Tien, Y. I.; Wei, K. H. Hydrogen Bonding and Mechanical
Theory Simulations 2019, 2, 1800164.
Properties in Segmented Montmorillonite/Polyurethane Nanocom-
(25) Pugar, J.; Childs, C. M.; Huang, C.; Haider, K. W.; Washburn,
posites of Different Hard Segment Ratios. Polymer (Guildf). 2001, 42,
N. R. Elucidating the Physicochemical Basis of the Glass Transition
3213−3221.
Temperature in Linear Polyurethane Elastomers with Machine (47) Sonnenschein, M. F.; Lysenko, Z.; Brune, D. A.; Wendt, B. L.;
Learning. J. Phys. Chem. B 2020, 124, 9722. Schrock, A. K. Enhancing Polyurethane Properties via Soft Segment
(26) Childs, C. M.; Washburn, N. R. Embedding Domain Crystallization. Polymer (Guildf). 2005, 46, 10158−10166.
Knowledge for Machine Learning of Complex Material Systems. (48) Mondal, S.; Hu, J. Structural Characterization and Mass
MRS Commun. 2019, 9, 806. Transfer Properties of Polyurethane Block Copolymer: Influence of
(27) Somarathna, H. M. C. C.; Raman, S. N.; Mohotti, D.; Mutalib, Mixed Soft Segment Block and Crystal Melting Temperature. Polym.
A. A.; Badri, K. H. Rate Dependent Tensile Behavior of Polyurethane Int. 2006, 55, 1013−1020.
Under Varying Strain Rates. Constr. Build. Mater. 2020, 254, 119203. (49) North, A. M.; Reid, J. C.; Shortall, J. B. Some Physical
(28) RDKit: Open-Source Cheminformatics. Properties Associated with the Urethane GroupII: Dielectric
(29) Engel, T. Basic Overview of Chemoinformatics. J. Chem. Inf. Relaxation in Thermoplastic Polyurethane Elastomers. Eur. Polym. J.
Model. 2006, 46, 2267. 1969, 5, 565−573.
(30) Frisch, M. J.; Trucks, G. W.; Schlegel, H. B.; Scuseria, G. E.; (50) Bickerton, G. R.; Paolini, G. V.; Besnard, J.; Muresan, S.;
Robb, M. A.; Cheeseman, J. R.; Scalmani, G.; Barone, V.; Petersson, Hopkins, A. L. Quantifying the Chemical Beauty of Drugs. Nat. Chem.
G. A.; Nakatsuji, H.; Li, X.; Caricato, M.; Marenich, A. V.; Bloino, J.; 2012, 4, 90−98.
Janesko, B. G.; Gomperts, R.; Mennucci, B., et al. Gaussian 16, Rev (51) Wu, C.; Chen, L.; Deshmukh, A.; Kamal, D.; Li, Z.; Shetty, P.;
C.01; Gaussian Inc.: 2016. Zhou, J.; Sahu, H.; Tran, H.; Sotzing, G.; Ramprasad, R.; Cao, Y.
(31) Lempesis, N.; In’T Veld, P. J.; Rutledge, G. C. Atomistic Dielectric Polymers Tolerant to Electric Field and Temperature
Simulation of a Thermoplastic Polyurethane and Micromechanical Extremes: Integration of Phenomenology, Informatics, and Exper-
Modeling. Macromolecules 2017, 50, 7399. imental Validation. Cite This ACS Appl. Mater. Interfaces 2021, 13,
(32) Prisacariu, C.; Prisacariu, C. Structural Studies on Polyurethane 53416−53424.
Elastomers. Polyurethane Elastomers 2011, 23. (52) Korley, L. T. J.; Pate, B. D.; Thomas, E. L.; Hammond, P. T.
(33) Graeser, K. A.; Patterson, J. E.; Zeitler, J. A.; Rades, T. The Effect of the Degree of Soft and Hard Segment Ordering on the
Role of Configurational Entropy in Amorphous Systems. Pharmaceu- Morphology and Mechanical Behavior of Semicrystalline Segmented
tics 2010, 2, 224. Polyurethanes. Polymer 2006, 47, 3073−3082.
(34) Ginzburg, V. V.; Bicerano, J.; Christenson, C. P.; Schrock, A. (53) Nakamae, K.; Nishino, T.; Asaoka, S.; Sudaryanto. Microphase
K.; Patashinski, A. Z. Theoretical Modeling of the Relationship Separation and Surface Properties of Segmented Polyurethane
between Young’s Modulus and Formulation Variables for Segmented Effect of Hard Segment Content. Int. J. Adhes. Adhes. 1996, 16, 233−
Polyurethanes. J. Polym. Sci., Part B: Polym. Phys. 2007, 45, 2123. 239.
(35) Xia, H.; Song, M.; Zhang, Z.; Richardson, M. Microphase (54) Chattopadhyay, D. K.; Sreedhar, B.; Raju, K. V. S. N. Effect of
Separation, Stress Relaxation, and Creep Behavior of Polyurethane Chain Extender on Phase Mixing and Coating Properties of
Nanocomposites. J. Appl. Polym. Sci. 2007, 103, 2992. Polyurethane Ureas. Ind. Eng. Chem. Res. 2005, 44, 1772−1779.
(36) Coleman, M. M.; Sobkowiak, M.; Pehlert, G. J.; Painter, P. C.; (55) Velankar, S.; Cooper, S. L. Microphase Separation and
Iqbal, T. Infrared Temperature Studies of a Simple Polyurea. Rheological Properties of Polyurethane Melts. 1. Effect of Block
Macromol. Chem. Phys. 1997, 198, 117−136. Length. Macromolecules 1998, 31, 9181.

16580 https://doi.org/10.1021/acsami.1c24715
ACS Appl. Mater. Interfaces 2022, 14, 16568−16581
ACS Applied Materials & Interfaces www.acsami.org Research Article

(56) Velankar, S.; Cooper, S. L. Microphase Separation and


Rheological Properties of Polyurethane Melts. 2. Effect of Block
Incompatibility on the Microstructure. Macromolecules 2000, 33,
382−394.
(57) Sorg, J.; Singh, S.; Lewis, R. L. Variance-Based Rewards for
Approximate Bayesian Reinforcement Learning. Proceedings of the
Twenty-Sixth Conference on Uncertainty in Artificial Intelligence
(UAI2010).

Recommended by ACS
Leveraging Theory for Enhanced Machine Learning
Debra J. Audus, Brian DeCost, et al.
AUGUST 26, 2022
ACS MACRO LETTERS READ

Predicting Phase Behavior of Linear Polymers in Solution


Using Machine Learning
Jeffrey G. Ethier, Richard A. Vaia, et al.
MARCH 24, 2022
MACROMOLECULES READ

Machine Learning Enhanced Computational Reverse


Engineering Analysis for Scattering Experiments (CREASE)
to Determine Structures in Amphiphilic Polymer Solutions
Michiel G. Wessels and Arthi Jayaraman
JULY 23, 2021
ACS POLYMERS AU READ

Copolymer Informatics with Multitask Deep Neural


Networks
Christopher Kuenneth, Rampi Ramprasad, et al.
JUNE 29, 2021
MACROMOLECULES READ

Get More Suggestions >

16581 https://doi.org/10.1021/acsami.1c24715
ACS Appl. Mater. Interfaces 2022, 14, 16568−16581

You might also like