You are on page 1of 13

Concept

www.small-journal.com

Role of Artificial Intelligence and Machine Learning


in Nanosafety
David A. Winkler

experiments and are also the driver of


Robotics and automation provide potentially paradigm shifting improvements the spectacular growth in omics tech-
in the way materials are synthesized and characterized, generating large, nologies (including materiomics). These
complex data sets that are ideal for modeling and analysis by modern developments have not only seen massive
growth in the number of materials that can
machine learning (ML) methods. Nanomaterials have not yet fully captured
be synthesized and studied but also in the
the benefits of automation, so lag behind in the application of ML methods complexity of the information collected, for
of data analysis. Here, some key developments in, and roadblocks to the example, from high content imaging and
application of ML methods are reviewed to model and predict potentially various omics technologies. This has led to
adverse biological and environmental effects of nanomaterials. This work an exponential increase in data being accu-
focuses on the diverse ways a range of ML algorithms are applied to mulated, the formation of “data lakes,” and
urgent need for computational methods for
understand and predict nanomaterials properties, provides examples of the
processing and extracting useful scientific
application of traditional ML and deep learning methods to nanosafety, and meaning from often highly multidimen-
provides context and future perspectives on developments that are likely to sional data sets generated for large libraries
occur, or need to occur in the near future that allow artificial intelligence to of diverse materials. Data driven modeling
make a deeper contribution to nanosafety. techniques based on machine learning
(ML) have concurrently risen in impor-
tance, sophistication and utility, driven
largely by the availability of these massive
data sets.[1] Larger training data sets usually result in models with
1. Introduction
high prediction accuracies and large domains of applicability.
Most areas of science and technology have been impacted These exciting developments of course impact on nanotech-
greatly by developments in automation, robotics, data science, nologies and nanomaterials. Surprisingly, high throughput
and modeling over the past decade, or soon will be. Automa- synthesis and characterization of nanomaterials have been less
tion and robotics have made it possible to synthesize and char- widely adopted than in other areas of materials science, and ML
acterize materials much faster than traditional one-at-a-time methods to interpret nanomaterials data sets have consequently
not been adopted strongly. Undoubtedly these enabling technol-
Prof. D. A. Winkler ogies will be applied more broadly in the next few years, lever-
La Trobe Institute for Molecular Science aging the high throughput synthesis and characterization and
La Trobe University ML modeling methods developed for bulk materials (Figure 1).[2]
Kingsbury Drive, Bundoora 3042, Australia This Concept report provides a limited review of develop-
E-mail: d.winkler@latrobe.edu.au
ments in ML that have been adopted by scientists interested in
Prof. D. A. Winkler
CSIRO Data61
developing safer nanomaterials, or that are particularly useful
1 Technology Court and should be adopted by researchers in this field in the near
Pullenvale 4069, Australia future. Our aim is to provide some contextual background on
Prof. D. A. Winkler the application of ML to nanosafety and to explore current
School of Pharmacy roadblocks, possible solutions, and to introduce artificial intelli-
University of Nottingham gence (AI) methods not yet widely used that allow exploration of
Nottingham NG7 2QL, UK
larger regions of nanomaterials physicochemical, provenance,
Prof. D. A. Winkler
Monash Institute of Pharmaceutical Sciences and biological response spaces. There are recent reviews dis-
Monash University cussing the application of ML to nanosafety to which we refer
392 Royal Parade, Parkville 3052, Australia readers interested in more in-depth summaries of that field.[4]
The ORCID identification number(s) for the author(s) of this article For example, the comprehensive review by Furxhi et  al.[4c]
can be found under https://doi.org/10.1002/smll.202001883. concluded that the use of nonlinear modeling was gaining pop-
© 2020 The Authors. Published by WILEY-VCH Verlag GmbH & Co. KGaA, ularity, although linear regression is commonly used. There is a
Weinheim. This is an open access article under the terms of the Creative clear shift away from models trained on theoretical descriptors
Commons Attribution License, which permits use, distribution and re- toward those trained on more physicochemically interpretable
production in any medium, provided the original work is properly cited.
nanospecific features, but little consensus on data preproc-
DOI: 10.1002/smll.202001883 essing techniques and general lack of justification in choice of

Small 2020, 16, 2001883 2001883  (1 of 13) © 2020 The Authors. Published by WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim
www.advancedsciencenews.com www.small-journal.com

Figure 1.  The scope of machine learning methods to progress “safe-by-design” nanotechnology. Reproduced with permission.[3] Copyright 2019,
American Chemical Society.

modeling algorithm and model validation method, although in and, ultimately, to generate autonomous experimental systems.
quantitative structure–activity relationships (QSARs) generally Automated or semi-automated synthesis of inorganic nanomate-
such validation methods have converged.[5] Lamon et al.[4a] pro- rials has been achieved heterogeneously on substrates via pulsed
pose model reporting templates for systematic and transparent laser deposition,[9] electrodeposition,[10] chemical vapor deposi-
specification of models used to support regulatory risk assess- tion,[11] and biomolecular templating.[12] Solution-phase syntheses
ments of engineered nanomaterials. The templates included of colloidal nanomaterials have also been automated using flow
a QSAR model reporting format and a model reporting tem- chemistry.[13] We also have not addressed the use of automation,
plate for PBK and environmental exposure models for nano- informatics, or ML modeling for nanomaterials developed for
materials. They demonstrated the utility of these templates for medicine, where they are deliberately introduced into the body
reporting diverse models and for generating an overview of for diagnostic or therapeutic purposes, or the very active research
the computational model landscape for nanomaterials that is area of text mining of nanomaterials data.[14] Our focus is on the
a useful aid for risk assessment. They also identified a lack of use of ML to model nanomaterials datasets relevant to their pos-
analytical techniques hinders model validation, application, and sible adverse biological or ecological impacts, the use of these
acceptance in risk assessment. A recent EU report[4b] reviewed data to progress the “safe-by-design” paradigm,[15] and their use
the state of the art of computational methods for predicting by regulators of nanomaterials.[16]
the properties of engineered nanomaterials and assessed their
applicability in order to provide guidance in REACH regulation.
The review focused on the use of QSAR methods for modeling 2. AI and Machine Learning Methods
and predicting nanomaterials properties and on a range of
and Why They Are Important
compartment-based mathematical models for toxicokinetic, tox-
icodynamic, in vitro and in vivo dosimetry, and environmental Modeling of biological properties of nanomaterials (indeed any
fate models. While this paper was being reviewed, Shatkin pub- materials) involves several unit operations. Materials are syn-
lished another brief perspective on the future of nanosafety.[6] thesized and tested in relevant biological assays to generate a
She identified that interdisciplinary collaborations as one clear data set for training ML models. The physicochemical proper-
key benefit of international investments in nanosafety. She ties of nanomaterials must be encoded as mathematical entities
described the current pressing issues: evaluating health/and called descriptors, suitable relevant features selected from the
environmental risks across the product life cycle; a need for pool of descriptors, and ML methods used to generate a predic-
future safety evaluation of more advanced (e.g., active rather tive model of the desired property. The models must be vali-
than passive) materials; development of reliable and relevant dated by prediction of an independent set of materials not used
new methods for evaluating safety with reduced mammalian to build the models, or by cross-validation methods that involve
testing. She advocated development of faster and more efficient withholding one or more materials from the model, with the
screening methods to advance concepts of safety by design. withheld material being predicted by the model derived from
Here, we do not address developments in partial or full auto- the remaining materials (Figure 2).
mation that are relevant to nanotechnology as these have been Here we distinguish between AI and ML in the following way.
reviewed recently elsewhere (e.g., Jensen et al.,[7] Li et al.,[2] Chan AI is the superset of tasks that demonstrate characteristics of
et al.[8]) and by other authors in this special issue of Small. These human intelligence, while ML is a subset of AI, which accesses
technologies aim to automate experimental aspects of nanoscience data, analyzes trends, and generates intelligent, actionable

Small 2020, 16, 2001883 2001883  (2 of 13) © 2020 The Authors. Published by WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim
www.advancedsciencenews.com www.small-journal.com

Figure 2.  The process of developing a nanomaterials neural network-based ML model. Reproduced with permission.[17] Copyright 2013, Elsevier.

insights. The excitement about ML methods is their generality for each ML method listed above provide in depth description
and platform technology functionality. In the past, when soft- of these ML methods.
ware was required to perform a particular function, each step Almost all of the published computational models map-
required to calculate the outcome needed to explicitly and pre- ping nanomaterials properties to biological endpoints have
cisely encoded into the program. Machine learning methods used simple statistical methods like regression, and conven-
removed this restriction as the same algorithm and software tional ML methods, principally neural networks with simple
code can be applied to a wide variety of modeling applica- architectures.[23] Given the increased awareness of neural net-
tions, encompassing a vast array of materials and endpoints. works in science and technology generally, the use of ANNs in
ML algorithms learn patterns in data in a similar way to that nanosafety and other areas such as drug discovery, is under-
in which humans learn, by repetition and example. However, going a renaissance.[24]
ML methods are vastly quicker and can handle much higher
dimensional data than human researchers can.
A comprehensive description of the different types of 2.2. Deep Learning Methods
machine learning and neural network methods is beyond the
scope of this Concept report. Recent reviews provide this infor- Deep learning methods consist of neural networks with a larger
mation for interested readers, e.g.,[18] and only a brief summary number of hidden layers and complex architectures. They have
is provided here. revolutionized some fields of science and technology by their
ability the recognize features in images, in speech recognition,
and complex decision making.[18c] An important advantage that
2.1. Traditional Machine Learning Methods the newer deep learning methods have over earlier “shallow”
methods is not their ability to generate superior models, but
Traditional ML methods include linear and nonlinear regres- their ability to automatically generate useful descriptors without
sion, artificial neural networks, various types of decision the need for expert input to the modeling process (Figure  3).
trees,[19] Bayesian networks,[20] support and relevance vector Indeed, the universal approximation theorem states that both
machines,[21] and genetic algorithms.[22] They have been used to deep and shallow neural networks, for example, will generate
generate the majority of useful models of the biological prop- models of similar quality given the same training data, shown
erties of nanomaterials in the literature, selected examples of to be the case in several published studies.[25] Three of the most
which are discussed below. These methods have been exten- commonly used deep learning algorithms are convolutional
sively used and are widely known so are not described again neural networks (CNNs), autoencoders, and generative adver-
here. The abovementioned recent reviews and references cited sarial networks (GANs).

Small 2020, 16, 2001883 2001883  (3 of 13) © 2020 The Authors. Published by WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim
www.advancedsciencenews.com www.small-journal.com

Figure 3.  Conventional “shallow” ML methods require a skilled researcher to define relevant descriptors for the materials (top) while deep learning
methods automatically generate complex features from simple representations of materials and use these to generate quantitative models of endpoints
(bottom). Reproduced with permission.[26] Copyright 2019, Royal Society of Chemistry.

CNNs are supervised ML methods that are particularly Technology)-sponsored exploratory workshop on Quantitative
useful in identify image features resulting from spatial corre- Nanostructure Toxicity Relationships held at Vaeshartelt Castle,
lations. These ANNs learn primarily local correlations within Maastricht in 2011.[17] The milestones needed to be met for 2-,
the data, and models are invariant to small translations.[3] 5-, and 10-year timeframes to progress the use of ML methods
Cascading deep neural networks (DNNs) consist of two net- in nanosafety research and for the design of effective but safe
works, one that maps each material to an outcome and one that nanomaterials for commercial applications. This was driven by
maps the outcome to one or more material. This architecture is concern that many nanomaterials were being incorporated into
also known as an autoencoder, also used to reduce the dimen- products currently on the market and complete safety data were
sionality of datasets and to predict materials that have a specific not available for many of them. The two-year timeframe listed
property based on the trained ML model. This inverse mapping the establishment of well characterized materials for experi-
(materials design) problem has also been tackled by GANs, an ments, development of specific nanoparticle descriptors, and
unsupervised learning method. GAN consists of a generator development of high throughput in vitro assays for toxicologi-
that generates trial structure–property models and a discrimi- cally relevant endpoints. In vivo studies in laboratory animals
nator that evaluates the quality of trial models by comparing that are inconsistently predictive of human biology and patho-
it to existing unlabeled data. GANs were initially developed physiology and are increasingly difficult because of animal eth-
as a way to designed structures without input from an expert ical concerns. The use of in vitro assays that correlate with in
scientist. Another approach, active learning, uses ML to select vivo outcomes are required to provide reliable information of
experiments that most efficiently achieve a goal. This approach immediate use by regulators and allow construction of in vivo
is similar to directed evolution, where candidate structures are ML models. For example, the US National Institutes of Health
selected, modified, and tested in sequential generations.[3] and Environmental Protection Agency developed a suite of
cell-based assays to profile the toxicity of chemical compounds
in a variety of cell types. A similar approach, coupled with
3. Computational Nanosafety Roadblocks, ML modeling methods, could yield useful in vivo predictions
for nanomaterials. The ambitious two-year milestones also
Milestones, and Context
included development of high throughput methods for meas-
The first comprehensive list of milestones was developed by the uring the interactions of commercially relevant nanoparticles
participants of the COST (European Cooperation in Science and with plasma proteins (so that the protein corona can be better

Small 2020, 16, 2001883 2001883  (4 of 13) © 2020 The Authors. Published by WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim
www.advancedsciencenews.com www.small-journal.com

understood and predicted), and better methods for tracking image processing neural networks that have appeared over
nanoparticles in the body. It was also important to develop the past 5 years in particular. A new set of nanoinformatics
mechanisms for information sharing and close collaboration milestones for 2030 has recently been defined,[29] essentially
between experimentalists and computational scientists, with echoing, extending, and elaborating the important milestones
input from regulators and industry. Tools for data storage and identified previously. The Nanoinformatics Roadmap 2030, to
sharing, toxicity mark-up languages and informatics tools (like which the author was a contributor, summarizes the state-of-
ToxML, a toxicology database language; DSSTox, a distributed the-art in diverse research areas relevant to NM risk assessment
toxicity database network; and ACToR, an online warehouse of and governance. The new Roadmap identified additional sig-
all publicly available chemical toxicity data), and development nificant challenges: limited access to data; the need to validate
of ontologies (formal representations of knowledge as a set of computational models in a way that is acceptable to regulatory
concepts, and the relationships between those concepts) for agencies; a need to connect and harmonize data sets, e.g., by
nanomaterials)[27] needed to be developed. employing read across and other methods of filling data gaps.
The five-year timeframe specified a significant increase in Machine learning methods are critically dependent on suffi-
the volume of in vivo data on the effects of nanoparticles, devel- cient data for training and validation, the generation of relevant
opment of data storage and sharing methods, development of descriptors that represent properties of nanomaterials, context-
reliable in vitro models of in vivo endpoints , generation of the dependent selection of the most useful subsets of descriptors,
first models of nanoparticle corona in different environments, robust training of models, validation of the predictive power of
elucidation of mechanisms of entry of nanoparticles into cells models, and use of the models to predict properties of new and
and toxicity such as free-radical production, genotoxicity, and improved materials often not yet synthesized.
apoptosis. One of the most important aspects of this process is
The ten-year plan specified milestones that should have been descriptor generation. With good descriptors, almost any ML
completed by 2020: creation of ML models of in vitro and in algorithm will generate a useful model, whereas descriptors
vivo effects of nanoparticles sufficiently reliable for regula- that are poor representatives of materials will always generate
tory purposes; development of models that can reliably predict very poor models.
nanoparticle corona in diverse environments, and development
of nanomaterials classification fingerprints (physicochemical,
genomic, and/or biological profiles of nanomaterials that group 4.1. Paucity of Data Sets to Train ML Models
materials with similar in vivo effects) that allow regulators to
classify nanomaterials into hazard classes in a similar way to As machine learning methods are highly dependent on the
that currently adopted for industrial chemicals. quantity and quality of the data used to train them, the larger
Achieving these milestones in this roadmap required the and more diverse the data set used to train the models, the
maintenance of a network of experimental and computational more reliably they can predict the properties of new mate-
researchers, regulators, and policymakers. This was done rials not used to train the models. The current focus on high
through a series of EU Projects and Actions (e.g., MODENA, throughput methods for synthesis and characterization of
MARINA, NANOSOLUTIONS) and more recently by the nanomaterials and increased use of toxicogenomic data will sig-
funding of at least three EU Horizon 2020 projects, Nano- nificantly address this deficiency.
SolveIT (https://nanosolveit.eu/),[28] SABYDOMA (https:// Unfortunately, most of the ML studies in the nanomaterial lit-
www.bnn.at/projects/sabydoma), and NanoCommons (https:// erature have been trained on small data sets with limited diver-
www.nanocommons.eu/). The milestones also assumed that sity. Models derived from small data sets can more easily be over-
high throughput experimentation methods for synthesis and fitted, as the number of descriptors that can be used is limited by
characterization of nanomaterials would arise to provide data the size of the data set. This limited set of descriptors may not
to train ML models, Achieving this ambitious set of milestones contain sufficient information on the molecular, physicochem-
would provide the tools for the development of nanomaterials ical, and structural characteristics of the nanomaterial to allow
that were both functional and safe using rational “safe-by- a robust and predictive model to be generated. Models derived
design” principles. from such small data sets also necessarily have small domains
of applicability so are not very useful for predicting properties
of new nanomaterials more broadly. Nanosafety researchers are
4. Unresolved Roadblocks Impeding the Progress now employing a number of methods to address these issues
of data set size including experimental design and “read-across”
of ML in Nanosafety
methods. Read-across methods are a nonexperimental method for
In the ensuing seven years these ambitious milestones have filling data gaps based on properties of close analogues or similar
only partially been achieved, largely because the automated chemical category.[30] Design of experiments is a technique for
nanomaterials synthesis and characterization methods have not designing a minimal number of experiments that covers as much
been adopted as quickly as expected, limiting the availability of parameter space as possible, albeit sparsely. The expectation is
of data for training models. High throughput nanomaterials that ML methods can model this activity landscape and provide
synthesis and characterization methods are now being adopted accurate interpolation or imputation of data gaps.
more widely to address this deficiency. Additionally, the par- Sisochenko et  al. recently reported a multi-nano-read-
ticipants of the Maastricht conference could not have foreseen across modeling technique that also employed self-organizing
the spectacular developments in ML such as deep learning and maps.[31] They predicted toxicity of 184 metal oxide and silica

Small 2020, 16, 2001883 2001883  (5 of 13) © 2020 The Authors. Published by WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim
www.advancedsciencenews.com www.small-journal.com

(30 unique chemical types) nanoparticles from 15 datasets to such as molecular weight, charge, elemental composition, etc.
bacteria, algae, protozoa, and mammalian cell lines. They com- They also discussed the development of periodic table descrip-
bined interspecies correlation analysis with a self-organizing tors for metal and metal oxide nanomaterials.[35] These include
map to identify the factors underlying toxicity. Nanoparti- electronegativity, charge, valence, ionic radius, etc. Significant
cles were categorized into four classes of action. Using these progress in this important area has been reported recently.[36]
classes, they estimated the cytotoxicity for untested nanosized They described Sizochenko et  al.’s development of simplex
metal oxides, providing both qualitative and quantitative predic- representation of molecular structure, liquid drop models, and
tion of effects of nanoparticles on different cells and bacteria, metal ligand binding descriptors, some of which can be derived
and revealing different mechanisms of metal oxide nanopar- using quantum chemical methods.[37] They also discussed the
ticle toxicity for prokaryotes and eukaryotes. In a related study, utility of descriptors derived from images of nanoparticles,
Gajewicz also sought to overcome the paucity of data roadblock such as sphericity. Image-based approaches (TEM and struc-
to ML modeling of adverse effects of nanomaterials.[32] She tural representations such as SMILES) are likely to become
employed read-across methods in models to backfill data voids. increasingly important because of the ease of descriptor gen-
Similarity of a target nanomaterial to a reference material in eration using convolutional neural networks. Interpretability,
N-dimensional chemical property space can be used to predict removal of human bias in descriptor selection and exploiting
the activity of target nanomaterial based on the activity of its deep ML objectivity for descriptor generation were seen to be
nearest neighbors. increasingly important. For example, very recently Varsou
et  al. described how image-based descriptors can be used to
predict the zeta potential of nanoparticles.[38] They developed
4.2. Inadequacy of Nanospecific Descriptors to Represent NanoXtract, an automated online tool for the extraction of NM
Nanomaterials image descriptors from TEM images of nanoparticles. These
descriptors encode the geometric properties of distributions
Generation of mathematical entities to represent properties of of nanoparticles and zeta potential models trained on them
nanomaterials in a context dependent way (descriptors or fea- could predict zeta potential with r2 values > 0.9. A very similar
tures) is critical to creation of robust, predictive ML models image processing approach to calculate geometric descriptors
of nanomaterials properties. It has been shown in a number was reported by Odziomek et  al.[39] Mac Fhionnlaoich and
of studies that descriptors have a much larger impact on the Guldin used information entropy to characterize distributions
quality and predictivity of ML models than does the specific of nanoparticles, providing a better estimate of the properties
ML algorithm used to build the model.[33] Nanomaterials have of distributions.[40]
particular issues that make finding useful descriptors more dif- Novel “universal” descriptors for nanoparticles were also
ficult than for single molecules or bulk materials. Largely these reported by Yan et al. and used to generate ML models of gold
revolve around the distribution of sizes and shapes adopted nanoparticle properties using random forest and k-nearest
by nanoparticles, their propensity to agglomerate, and their neighbor (kNN) algorithms.[41] The descriptors were generated
interaction with biological macromolecules that create a sur- by Delaunay tessellation of the surface of the nanoparticles
face coating (corona) that modulates the biological properties (a particular way of joining a set of points to make a triangular
of these materials. Commonly used nanoparticle descriptors mesh) and summing Pauling electronegativity of atoms in each
are obtained from the whole particle and include diameter, tessellation cell to represent the nanostructures (to simulate
surface area, aspect ratio, constitutional properties number of the nanomaterial’s surface properties). The efficacy of these
atoms, number of metal atoms, number of surface atoms, etc.), nanodescriptors was verified by modeling six gold nanoparticle
energy-related properties such as potential energy of surface datasets. The properties modeled were both physicochemical
atoms and of metal atoms, descriptors for any surface coat- (e.g., logP, zeta potential) and biological (e.g., enzyme binding,
ings, zeta potential, aqueous solubility, etc. Earlier published ROS, cellular uptake). A consensus of prediction of the two ML
nano-QSAR studies also employed 1-hot descriptors (indicator methods could predict properties of external test sets with r2
variables) that distinguished between different types of nano- values between 0.76 and 0.95.
particle cores, dopants, coatings, etc. Although very simple
and often effective descriptors, they do not provide mecha-
nistic insight, but are useful for finding an optimal combina- 4.3. In Vitro Models, Model Systems, Translation to In Vivo
tion of complex nanoparticle attributes.[23a] The state-of-the-art Systems
in nanospecific descriptor generation was reviewed recently by
Wyrzykowska and Jagiello.[34] They described the difficulties Ultimately, ML models need to predict potential adverse biolog-
in defining effective descriptors for distributions of nanoma- ical effects on humans, animals, and the environment. Clearly,
terials and in accounting for environmental dynamic changes ethical and cost considerations mean that the amount of in vivo
to the surface of nanomaterials due to corona formation. Sur- data in higher animals is very limited, thus most in vivo data
face modified nanomaterials have traditionally been encoded relates to fish and other aquatic organisms. Consequently, it
using descriptors developed for small organic molecules is essential that model systems capable of generating substan-
(DRAGON and SMILES) but SMILES-based descriptors have tial data for training ML models have a relationship to in vivo
been recently improved (SMILES-based optimal descriptors) effects of nanomaterials.
that consider correlations between SMILES attributes. This The use of in vitro data combined with nanomaterials
concept was extended to include physicochemical properties descriptors has been shown to model in vivo responses better

Small 2020, 16, 2001883 2001883  (6 of 13) © 2020 The Authors. Published by WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim
www.advancedsciencenews.com www.small-journal.com

than descriptors alone.[17,42] Extending this concept, this sug- summarize selected literature applications of ML to nanoma-
gests that predicted in vitro responses may be used in this way terials hazard prediction and “safe-by-design” project outcomes
to better predict in vivo properties. and highlight technologies that are likely to lead to achieving
unmet milestones defined above.
One of the first examples of the use of ML or statistical mod-
4.4. Identifying and Modeling the Biologically Relevant Entity eling to predict the adverse properties of nanomaterials was pub-
lished by Puzyn et al.[48] These researchers discovered a simple,
Nanomaterials almost always undergo significant modification one parameter linear regression model that predicted the cyto-
in biological or environmentally relevant fluids such as serum, toxicity of 17 different metal oxide nanoparticles to Escherichia
plasma, and waterways. Initially, the most abundant macro- coli using descriptors derived from quantum chemical calcu-
molecules in the fluid (protein, humic substances, etc.) bind lations. At a similar time Epa et  al. reported the use of linear
to the material in processes modulated by the surface chem- regression and Bayesian regularized neural networks to predict
istry and shape of the particles. Over time, these macromole- the biological effects of 51 metal oxide nanoparticles with diverse
cules are replaced by less abundant macromolecules that bind metal cores and 109 metal oxide nanoparticles with similar cores
more strongly to the nanomaterial. A hard corona is formed but diverse surface modifiers.[23a] The models could make quan-
from these proteins that are very tightly adsorbed on nanopar- titative predictions of the smooth muscle cell apoptosis and
ticles. It is the affinity of the proteins toward the nanoparticles uptake of nanoparticles by human umbilical vein epithelial cells
that determines the hard corona composition. Subsequently, and pancreatic cancer cells from in vitro assays. These models
there is exchange of population of proteins, which constitute could predict apoptosis in an independent test set of nanomate-
the dynamic structure called a soft corona. Apart from affinity, rials with a standard error of <3 and uptake in the two cells lines
the nanoparticle curvature has a significant effect on corona within a factor of 2. Fourches et al. used support vector machine-
composition, with larger particles generally binding a more based classification and kNNs-based regression to model the
diverse population of proteins.[43] Thus, the nanomaterial plus same data set. The external prediction accuracy of these models
the corona defines the “biologically relevant entity” that inter- was up to 73% for classification modeling and regression models
acts with biology. Fortunately, QSAR-based ML models act as had an r2 of 0.72 for predicting the same properties.[49]
“top down” rather than bottom up approaches. That is, models Liu et al. subsequently reported classification ML models of
of in vivo (or more complex in vitro) systems are still ame- the effect of 44 iron oxide core nanoparticles (used in molecular
nable to ML modeling because these universal approximation imaging) on four cell types (aortic endothelial, vascular smooth
methods can encapsulate a large number of complex receptor muscle, hepatocyte, and monocyte/macrophage), using four
interactions, signalling, and downstream processes triggered different biological assays and four concentrations of nanoparti-
by exposure to nanomaterials into a complex nonlinear func- cles.[50] They employed self-organizing map-based clustering to
tion inside the model. The surface chemistry of nanoparticles define the classes and a Bayesian classifier, logistic regression,
determines the composition of the corona, and the corona linear discriminate analysis, and nearest neighbor classifiers
determines how the particles interact with cells. Essentially, trained on size, spin-lattice, and spin–spin relaxivity, and zeta
ML models can accommodate these transformations of nano- potential descriptors. Their two-class models had relatively high
materials on exposure to biological fluids within the model, accuracies >78%.
capturing the emergent biological responses due to a large Gernand and Casman reported the application of classifica-
number of interacting smaller biological processes.[44] Findlay tion and regression trees and random forest methods to pre-
et  al. tackled this important problem of predicting the protein dicting the pulmonary toxicity of 17 types carbon nanotubes.[51]
corona around silver nanoparticles using an ML approach.[45] Descriptors used to train the model related to the types and
They training a random forest model using biophysicochemical dimensions of the nanotubes, the concentration of metallic
characteristics of proteins, ENMs, and solution conditions. The impurities in the materials, the exposure time and dose, and
area under the receiver operating characteristic curve was 0.83 characteristics of the exposed rodents. The pulmonary toxicity
indicating strong predictive performance. Their model pro- endpoints were the number of polymorphonuclear neutrophils,
vided insight into how the corona is influenced by particle sizes number of macrophages, and lactate dehydrogenase and total
and surface curvature and coatings, and into mechanisms of protein concentrations. Their models could predict the four
protein enrichment. Very recently, Ban et al. also used machine pulmonary endpoints with r2 values between 0.88 and 0.96.
learning to predict the corona composition surrounding CNT attributes that contributed most to carbon nanotube pul-
nanoparticles.[46] monary toxicity were amount and identities of metallic impuri-
ties, nanotube length and diameter, surface area, and aggregate
size. Nanoparticle agglomeration is very important modulator
5. Examples of the Application of AI and ML of the biological effects of nanomaterials. It strongly depends
on the surface charge that stabilizes dispersed nanoparticles,
to Nanosafety
preventing them cohering. Mikolajczyk et  al. reported ML
There have been a number of reviews of the use of ML in models of zeta potential, a measure of surface charge on nano-
nanotoxicology in the past decade,[4d,17,23e,47] and we provide materials, for 15 metal oxide nanoparticles characterized by 11
some key examples of published research below. These were image-based and 17 computed descriptors.[52] They used only
chosen to highlight the diversity of ML approaches in modeling linear regression methods but could predict zeta potentials with
nanomaterials properties across a range of applications. We an RMSE error in a test set of 1.25 mV and an r2 value of 0.87.

Small 2020, 16, 2001883 2001883  (7 of 13) © 2020 The Authors. Published by WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim
www.advancedsciencenews.com www.small-journal.com

Papa et  al. reported linear and nonlinear modeling of assays for cell viability, membrane integrity, and oxidative stress
the cytotoxicity of TiO2 and ZnO nanoparticles by empirical were performed that reflected cellular damage by ZnO nano-
descriptors.[53] The nanoparticles were tested at different con- particles to human umbilical vein endothelial cells or human
centrations for their ability to disrupt the lipid membrane hepatocellular liver carcinoma cells (HepG2). The nonlinear
in cells, assessed using lactate dehydrogenase levels. Data ML models were superior to the linear models, with predictions
were measured for 42 different nanoparticle sizes and shapes for an external test set of nanoparticles having an r2 value of
(24 nanoforms of TiO2 and 18 of ZnO). The models were devel- 0.89 and a standard error of prediction of 12% for cell viability,
oped using multiple linear regression, two types of neural r2 of 0.86 and standard error of 80 RFU for LDH level (mem-
network, and support vector machines. The models could pre- brane integrity), and r2 of 0.67 and standard error of 2.4-fold for
dict the LDH levels for nanoparticles in the test set with errors the luciferase assay characterizing oxidative stress. Apart from
of between 8% and 17% relative to intreated control cells. They being one of the first studies where the physicochemical prop-
found the nonlinear ML methods clearly outperformed the erties of inorganic nanoparticles were varied systematically, the
linear regression models. study also showed that nonlinear ML methods were capable of
As mentioned above, machine learning models of nanosafety- modeling the entire nanoparticle dose–response curve.
relevant properties of nanomaterials are still severely limited While ML methods with computed descriptors usually gen-
by the paucity of available toxicity data, due to cost, time, and erate relatively robust and predictive models of nanoparticle
ethical issues. To address this paucity, Chen et  al. developed structure–property relationships, molecular or mechanistic
ML models for classifying the ecotoxicity of nanomaterials.[54] interpretation of these models is often difficult or impossible.
They used read-across properties derived from multisource Oksel et al. described the use of a genetic programming-based
ecotoxicity data to build individual species-specific models and decision tree (GPTree) in modeling nanomaterials properties.[57]
a single model predicting toxicity for multiple species. They GPTree is a robust, automatic method for generation of accu-
employed four types of tree algorithms to generate across- rate nanoSAR models that works with small datasets, automati-
species and species-specific models with significant predictive cally selects descriptors, and provides significantly improved
power. For LC50 global models the functional tree, C4.5 deci- interpretability of models. The generality of the method was
sion tree, and random tree models all correctly classified more exemplified by the training of accurate nanoSAR models for
than 70% of the samples in training (320 nanomaterials) and four diverse exemplar datasets. The models were quite sparse,
test sets (80 nanomaterials). The functional tree predicted the employing only 13 predictors selected from a large pool of
toxicity of metallic nanoparticles to Danio rerio with accura- descriptors and exhibited accuracies between 98–100% and
cies of 93% and 100% on training (76 materials) and test sets 86–100% on training and test data, respectively. The decision
(18 materials), respectively. trees provided a clear graphical representation of the decision
Fourches et  al. published a study that reduced to practice thresholds for each descriptor, providing much needed clarity
the concepts of “safe-by-design” using ML models of biological of interpretation of NP structure–activity models.
endpoints.[23b] They studied a small library of 83 surface modi- Concu et  al. conducted a large study of set of 260 metal,
fied carbon nanotubes with relatively constant dimensions. metal oxide, and silica nanoparticles with 31 chemical composi-
They assessed their bovine serum albumin, carbonic anhydrase, tions from literature sources.[58] They also measured ecotoxicity
chymotrypsin, and hemoglobin activity together with acute tox- and cytotoxicity in algae, bacteria, fungi, crustaceans, plants,
icity and immune toxicity in vitro assays. Traditional k-nearest fishes, other species, and mammalian cell lines. These 260 nan-
neighbors, random forest, and support vector machine ML oparticles were uniformly randomly combined to produce 54371
models could predict the properties of an external test set with pairs (≈80% of all possible pairs). For each pair one nanopar-
accuracies up to 75% and 77% for protein binding and acute ticle, along with its data, was used as reference state while the
toxicity endpoints, respectively. They discovered chemical sur- other was used as a new state. The 54371 pairs were randomly
face features that correlated with particular biological activities. split into training (40804 pairs, 26131 nontoxic, and 14673 toxic)
The models were used to virtually screen a library of 240  000 and test sets (13567 pairs, 8613 nontoxic, and 4954 toxic). The
potential surface ligands for carbon nanotubes. Experimental models were generated by a linear neural network, radial basis
measurement of nanotubes whose biological properties had function, multilayer perceptron, and probabilistic neural net-
been predicted by ML models showed good agreement. work. These models generated prediction accuracies scored by
Systematic modification of physicochemical properties of the area under the receiver operator characteristic (ROC) curve
nanoparticles, combined with comprehensive biological evalu- of 0.999 for the training set and 0.998 and a two-class accu-
ation and computational analysis, is an important goal for racy of 98% for the test set. The model also provided insight
predictive nanotoxicology and the “safe-by-design” paradigm. into which descriptors influenced the nanoparticle adverse
This provides better mechanistic understanding of nano–bio biological responses. The authors used Y-scrambling to check
interactions and facilitates generation of quantitatively predic- for overfitting of the model, but the unrealistically high ROC
tive and robust models of the properties of nanomaterials that and accuracy values coupled with the large number of neural
have useful domains of applicability.[55] Le et al. reported such a network weights, and repeated trials to optimize the network
study of 45 ZnO nanoparticles in which the particle size, aspect architecture, suggests chance correlations or other potential
ratio, doping type, doping concentration, and surface coating of methodological issues with the models.
the nanoparticles was varied systematically, and the resulting Wang et al. reported a combinatorial study of gold nanoparticles
biological response data modeled using linear regression and of differing sizes, surface modifications, and surface coverage.[59]
Bayesian regularized neural network ML methods.[56] Biological They generated a small library of 34 nanoparticles and, using 29

Small 2020, 16, 2001883 2001883  (8 of 13) © 2020 The Authors. Published by WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim
www.advancedsciencenews.com www.small-journal.com

descriptors and the kNN algorithm, developed models for cellular mass spectrometry protein data as inputs, and blood clearance
uptake in human lung and kidney cells, ability to induce oxidative and organ accumulation as outputs to train a supervised deep
stress, and hydrophobicity (logP values). Each model employed neural network that predicted nanoparticle spleen and liver
11 or fewer descriptors and the model performance was assessed accumulation ≥94% accuracy. This showed that the complex
using tenfold cross-validation. The four models had high cross pattern or fingerprint of nanoparticle surface adsorbed pro-
validated predictivity, with r2 values of 1.0, 0.99, 0.97, and 0.99, teins modulates liver and spleen uptake. Using these models,
respectively. they designed nanoparticles with 50% and 70% lower liver and
Kovalishyn et al. collected data from 128 literature sources to spleen uptake, respectively.
assemble a set of 964 data points for inorganic nanomaterials.[60]
They analysed both toxicological/ecotoxicological properties
(EC50, LC50, MIC, and mortality rate) and physicochemical 6. Perspective
properties for metal and metal oxide nanoparticles ranging
from 1 to 90  000  nm. They generated models for these four Clearly the remaining roadblocks to the application of ML to
endpoints across multiple species using kNN, random forest, nanosafety need to be removed or reduced. Increased automa-
and neural network methods. Cross-validation and test sets tion of synthesis and characterization of nanomaterials, and
were used to assess model predictive power, giving q2 values improved descriptors from deep learning methods should help
between 0.58 and 0.80 for cross-validated regression models resolve some of these roadblocks. By extending the developments
and r2 between 0.49 and 0.78 for test sets. It is not clear how in ML methods in other areas of science and technology to
they encoded the species into the models. nanomaterials and identifying where these overlap with current
Recently, Hataminia et  al. used a neural network to model areas of need, we can postulate areas where ML methods will
the toxicity of iron oxide nanoparticles to kidney cells.[61] The generate impact in the short to medium term.
model was trained using particle size, concentration, incuba-
tion time, and the surface charge of the nanoparticles to pre-
dict the percent kidney cell viability. Their model appeared to 6.1. Multiobjective ML Models and Evolutionary Methods
be highly predictive (no statistics given in graphs but extensive
analysis of the effect of the four input parameters to the model Most ML models reported in the literature can predict a single
on kidney cell viability was presented). biological property for nanomaterials. Clearly, “safe-by-design”
Deep learning algorithms are having enormous impact in implies that nanomaterials should meet a number of key
many areas of science and technology,[18b,c,62] including mate- design criteria, some taking advantage of important new and
rials science. As stated previously, the main advantage of deep useful properties shown by nanoscale forms of materials, some
neural networks is their ability to automatically generate useful eliminating or at least reducing adverse human and ecological
higher order feature for ML models. It is therefore surprising properties so that the products incorporating nanomaterials
that they have not yet been widely applied to modeling nanoma- can be manufactured, used, and disposed of safely. Some ML
terials properties relevant to nanosafety, given that the dearth of methods are general, not restricted to a single dependent vari-
nanospecific descriptors is one of the factors limiting progress able. For example, neural networks can have multiple outputs,
in predictive nanotoxicology. each representing a different property. Although multiobjective
Most applications have employed DNNs image recognition ML models are becoming more widely used in the pharmaceu-
capabilities to extract useful data from microscopy images of tical field for example, there is only very limited proof-of-con-
nanoparticles and cells. For example, Coquelin et  al. reported cept studies reported for nanomaterials. For example, Ambure
the use of CNNs to estimation of particle size distribution on et  al. reported the development of a specific software package,
SEM images of aggregated TiO2 particles.[63] Their aim was QSAR-Co, to tackle the modeling of nanomaterials properties
automation of SEM measurements to estimate individual par- simultaneously.[67]
ticle diameters. Aggregated nanoparticles are often partially Evolutionary methods are the usual approach to find the
or fully omitted from estimates of particle size distribution or best balance between multiple requirements for nanomaterials,
from ML models of biological responses to nanoparticles. They e.g., functional performance, lack of toxicity, cost. ML models,
reported a method called “context encoding” to predict missing particularly those than can predict more than one property
parts of aggregated nanoparticles. Likewise, Horwath et al. used simultaneously, can be used as surrogate fitness functions in
CNNs to segment TEM images of nanoparticles to more reli- such evolutionary optimization methods, reducing the need for
ably calculate size distributions.[62] Image profiling using open experimental evaluation of fitness of materials. Encoding rele-
source CellProfiler and a CNN-based algorithm ilastik,[64] were vant physicochemical, structural, and processing parameters of
employed by Ilet et al. to study nanoparticle distributions.[65] nanomaterials as a vector defines essentially their “genome.”[22]
A particularly impressive use of ML in nanosafety was pub- By generating a relatively small number of different nano-
lished recently by Lazerovits et  al.[66] They conducted experi- materials, ideally using design of experiments to systemi-
ments aimed at understanding the adsorption of blood proteins cally explore nano–bio interactions,[55] these materials can
on nanoparticles immediately after intravenous injection, how be assessed against one or more utility and safety endpoints
this interface changes during circulation, and how it alters to that represent fitness functions in an evolutionary algorithm.
distribution of nanoparticles in vivo. They showed that the Selecting the nanomaterials that best match the desired proper-
evolution of proteins on nanoparticle surfaces predicts the ties and mutating their nanomaterials genome generates a new
biological fate of nanoparticles in vivo. They employed protein population of materials for synthesis and testing. By iterating

Small 2020, 16, 2001883 2001883  (9 of 13) © 2020 The Authors. Published by WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim
www.advancedsciencenews.com www.small-journal.com

around this evolutionary cycle a number of times, new mate- materials to map encoded molecular descriptors into a mate-
rials with superior properties can be discovered in a similar rials structure.[70] Given the potentially large advantages of these
way to how niche species evolve in natural selection. Single inverse design approaches it is likely that the nanoscience com-
point mutations (changing one element in the genome) usually munity will adopt these novel learning approaches to reduce to
explore local regions of nanomaterials space, while crossover practice inverse design of “safe-by-design” nanomaterials.
operations (splitting two genomes and combining the pieces
in new ways) allow jumps into new areas of nanomaterials
space.[22] As the evolutionary cycle progresses, it is possible to 6.3. Autonomous Methods
use the experimental data on nanomaterials and their proper-
ties to generate ML models as surrogate fitness functions later Inverse design and evolutionary methods make possible a new
in the cycle, reducing the number of experiments required. paradigm for the design, synthesis, and optimization of mate-
Multiple objective fitness functions, e.g., a favorable strength rials that is clearly applicable to nanomaterials. Fully autono-
coupled with low adverse biological effects, can be defined by mous researchers or robot scientists that select and carry out
a Pareto surface, the curve of solutions of equal fitness but experiments without a human in the loop are an exciting and
that involve different tradeoffs between the multiple objectives. potentially accessible development in chemical and materials,
To our knowledge, evolutionary algorithms have not yet been science.[7,71] Such systems can be achieved in several ways.
used to generate new optimal nanomaterials, although there is Active learning can be combined with automated experimenta-
ample evidence in other areas of materials science of the value tion to generate a closed loop system, or ML models of nano-
of this approach.[22] materials fitness landscapes can be used with evolutionary
algorithms to select the fittest materials and “mutate” them for
input into the next set of optimization experiments.[22] The first
6.2. Inverse Design proof of concept in nanoscience was exemplified by a system
for automated growth and characterization of carbon nano-
One of the important aims of ML models is the ability to have to tubes on the surface of micropillars.[72] An automated experi-
model predict new materials with better properties than those mental system was combined with logistic regression analysis
in the training set, essentially inverting the model to generate to autonomously select new experimental conditions to achieve
new materials structures. Given the nature of the descriptors an experimental goal. This fully autonomous system was able
used in these models and the complexity of the response sur- to generate information and useful materials faster than nonau-
faces that constitute the nonlinear models, it has been almost tonomous methods by automating water-assisted CVD growth
impossible until recently to achieve this highly desirable out- experiments, using in situ spectroscopy, that conducted over
come. Consequently, the main way models achieve this is by 100 experiments per day. Regression modeling mapped regions
predicting the properties of a large number of real or virtual of selectivity toward single-wall and multiwall carbon nanotube
materials in databases. However, care must be taken to ensure growth in a complex parameter space.
that the materials lie within or close to the domains of applica-
bility of the models (the multidimensional space defined by the
ranges of the descriptor variables and the dependent variable 6.4. Use of Web Cloud Services
that is modeled).
Recent advances in machine learning methods have changed Data and model disposition and sharing is being recognized
this landscape allowing for the first time, de novo predictions as essential to improve efficiency and transparency of research
of new structures that are predicted to have improved proper- into nanosafety. Making these accessible using the FAIR princi-
ties. Autoencoders and GANs convert nanomaterials structural, ples (Findable, Accessible, Interoperable, and Reusable) that is
physicochemical, and processing variables into latent variable/ a cornerstone of the Open Science.[73] Making data and models
descriptors that can be used to generate ML models of proper- available to all researchers increases the robustness of the sci-
ties relevant to nanomaterials utility and safety. The advantage ence, allows reuse of models, and expands the pool of data on
of these methods is that they allow the latent descriptions to be nanoparticles that can be used to train new models. Expanding
“inverted” to generate structures and processing conditions for the use of automation to accelerate synthesis and testing, the
new nanomaterials with improved properties. There are so far application of omics technologies to probe the biological effects
no clear examples of the application of this approach to nano- of nanomaterials, and collecting, curating, and making widely
materials. In related fields, Kim et al. recently reported the use accessible the data collected so far are the best ways to over-
of GANs in the inverse design of porous materials[68] and So come the paucity of data identified earlier in this paper. Most
and Rho used deep convolutional GANs to generate new nano- database activities are now cloud based and this trend will
photonic structures by inverse design.[69] Gomez-Bombarelli surely increase. Nanosafety has established a number of initia-
et  al. showed how a DNN and a recurrent neural network tives that aim to make data and models accessible via cloud ser-
(RNN) can be used as an encoder and the decoder, respectively, vices. For example, Afantitis et  al. reported the Enalos Cloud
for inverse design. The DNN models structure–property rela- Platform (http://enalos.insilicotox.com/NanoProteinCorona/)
tionships between molecule structures and material proper- developed under an EU 7th Framework project NanoMILE, that
ties and encodes the materials into latent molecule descriptors can be used for regulatory or NP safe-by-design projects.[15d,74]
encoding molecule information. The RNN allows decoding of It allows the virtual screening of NPs, based on a list of the
the latent molecule information corresponding to improved significant NP descriptors, identifying those NPs that would

Small 2020, 16, 2001883 2001883  (10 of 13) © 2020 The Authors. Published by WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim
www.advancedsciencenews.com www.small-journal.com

warrant further toxicity testing on the basis of predicted NP cel- Conflict of Interest
lular association. This work is being extended by the develop-
The author declares no conflict of interest.
ment of cloud services under a new EU Horizon 2020 project,
NanoSolveIT, in which the author is a member.[28]

Keywords
6.5. Toxicogenomics adverse biological effects, artificial intelligence, deep learning, machine
learning, nanomaterials, safe-by-design
The use of genomic data to predict and understand the mech-
anisms by which nanomaterials generate adverse biological Received: March 22, 2020
effects has been growing rapidly over the past decade and now Revised: May 7, 2020
represents a substantial field of research in its own right. Charac- Published online: June 15, 2020
teristic gene expression profiles (or fingerprints) of toxicological
responses to exposure to nanoparticles can identify biomarkers
that are predictive of nanomaterials toxicity.[75] Gene expression
[1] T.  Le, V. C.  Epa, F. R.  Burden, D. A.  Winkler, Chem. Rev. 2012, 112,
profiles provide a rich data set that can be linked to specific path- 2889.
ways being modulated by exposure to nanomaterials. Such data [2] Y.  Li, J.  Wang, F.  Zhao, B.  Bai, G.  Nie, A. E.  Nel,
are amenable to analysis by sparse feature selection methods and Y. Zhao, Natl. Sci. Rev. 2018, 5, 385.
machine learning. Fortunately, this large research area has been [3] K. A.  Brown, S.  Brittman, N.  Maccaferri, D.  Jariwala, U.  Celano,
comprehensively reviewed by the Greco group in Finland.[76] In Nano Lett. 2020, 20, 1.
particular they discussed in detail the synergistic combination [4] a) L. Lamon, D. Asturiol, A. Vilchez, R. Ruperez-Illescas, J. Cabellos,
of transcriptomic data and machine learning for understanding A.  Richarz, A.  Worth, Comput. Toxicol. 2019, 9, 143; b) A.  Worth,
and predicting adverse biological effects of nanomaterials for K. Aschberger, D. Asturiol Bofill, J. Bessems, K. Gerloff, R. Graepel,
regulatory purposes and potential SbD applications.[76b] The E.  Joossens, L.  Lamon, T.  Palosaari, A.  Richarz, JRC Technical
Reports, Joint Research Centre, Italy, European Union, Luxembourg
question still remains as to how well changes in the geno-
2017; c) I. Furxhi, F. Murphy, M. Mullins, A. Arvanitis, C. A. Poland,
type translate to the observed phenotype. Microarray data have Nanomaterials 2020, 10, 116; d) C.  Oksel, C. Y.  Ma, J. J.  Liu,
serious limitations when it comes to showing associations using T. Wilkins, X. Z. Wang, Particuology 2015, 21, 1.
biological inference because these data rarely indicate cause and [5] D. L. Alexander, A. Tropsha, D. A. Winkler, J. Chem. Inf. Model. 2015,
effect. One of the greatest limitations of microarray data is that 55, 1316.
expression of mRNAs does not always translate into proteins [6] J. A. Shatkin, Nano Lett. 2020, 20, 1479.
because siRNAs and other mechanisms can block the transla- [7] K. F.  Jensen, C. W.  Coley, N. S.  Eyke, Angew. Chem., Int. Ed. 2020,
tion process.[77] Nonetheless gene expression fingerprints cou- https://doi.org/10.1002/anie.201909987.
pled with modern ML methods can provide valuable insight into [8] E. M. Chan, C. X. Xu, A. W. Mao, G. Han, J. S. Owen, B. E. Cohen,
the mechanisms of nanoparticle interactions with biology, as has D. J. Milliron, Nano Lett. 2010, 10, 1874.
[9] S. Senkan, M. Kahn, S. Duan, A. Ly, C. Leidhom, Catal. Today 2006,
been found in other areas of science.[78]
117, 291.
[10] S. H.  Baeck, T. F.  Jaramillo, A.  Kleiman-Shwarsctein,
E. W. McFarland, Meas. Sci. Technol. 2005, 16, 54.
7. Conclusions [11] T. Kuykendall, P. Ulrich, S. Aloni, P. Yang, Nat. Mater. 2007, 6, 951.
[12] J. M. Whitling, G. Spreitzer, D. W. Wright, Adv. Mater. 2000, 12, 1377.
Machine learning has enormous potential to accelerate the [13] a) H. Nakamura, Y. Yamaguchi, M. Miyazaki, H. Maeda, M. Uehara,
development and use of safer nanomaterials for industrial P.  Mulvaney, Chem. Commun. 2002, 2844; b) A. L.  Washington,
applications. The main issues holding the field back are still the G. F.  Strouse, J. Am. Chem. Soc. 2008, 130, 8916; c) B. K. H.  Yen,
relative paucity of high-quality data that can be used to train and A.  Gunther, M. A.  Schmidt, K. F.  Jensen, M. G.  Bawendi, Angew.
validate models, the need for better mathematical descriptors to Chem., Int. Ed. 2005, 44, 5447.
[14] a) D. A.  Winkler, Curr. Med. Chem. 2017, 24, 483; b) Y.  Li, Q.  Pu,
encode nanomaterials properties, and ways of incorporating the
S.  Li, H.  Zhang, X.  Wang, H.  Yao, L.  Zhao, Patt. Recog. Lett. 2019,
heterogeneity and dynamic nature of the “biologically relevant 117, 111.
entity” when nanomaterials transit particular biological envi- [15] a) N. G.  Bastus, V.  Puntes, Curr. Med. Chem. 2018, 25, 4587;
ronments and compartments. Capturing this complexity with b) A.  Kraegeloh, B.  Suarez-Merino, T.  Sluijters, C.  Micheletti,
robust mathematical descriptors is of prime importance, as Nanomaterials 2018, 8, 239; c) I. van de Poel, Z. Robaey, NanoEthics
descriptor quality is the most important element for robust and 2017, 11, 297; d) D.-D.  Varsou, A.  Afantitis, A.  Tsoumanis,
predictive ML model generation. Advances in automated syn- G.  Melagraki, H.  Sarimveis, E.  Valsami-Jones, I.  Lynch, Nanoscale
thesis and characterization, high content screening, predictive Adv. 2019, 1, 706; e) D. A. Winkler, in Computational Nanotoxicology
modeling of the nanoparticle corona for a given environment, (Eds: A. Gajewicz, T. Puzyn), Jenny Stanford, Singapore 2020, p. 507.
understanding of how to mathematically encode the biophys- [16] R. B. Altman, N. Khuri, M. Salit, K. M. Giacomini, Sci. Transl. Med.
2015, 7, 315ps22.
icochemical surface properties of nanoparticle, and advances in
[17] D. A.  Winkler, E.  Mombelli, A.  Pietroiusti, L.  Tran, A.  Worth,
deep and shallow ML methods on the horizon should remove B. Fadeel, M. J. McCall, Toxicology 2013, 313, 15.
these roadblocks and catalyze a rapid expansion in the power [18] a) D. C. Elton, Z. Boukouvalas, M. D. Fuge, P. W. Chung, Mol. Syst.
and usefulness of these computational methods in design of Des. Eng. 2019, 4, 828; b) N.  Rusk, Nat. Methods 2016, 13, 35;
safer nanomaterials. c) Y. LeCun, Y. Bengio, G. Hinton, Nature 2015, 521, 436.

Small 2020, 16, 2001883 2001883  (11 of 13) © 2020 The Authors. Published by WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim
www.advancedsciencenews.com www.small-journal.com

[19] V.  Svetnik, A.  Liaw, C.  Tong, J. C.  Culberson, R. P.  Sheridan, [40] N. Mac Fhionnlaoich, S. Guldin, Chem. Mater. 2020, 32, 3701.
B. P. Feuston, J. Chem. Inf. Comput. Sci. 2003, 43, 1947. [41] X.  Yan, A.  Sedykh, W.  Wang, X.  Zhao, B.  Yan, H.  Zhu, Nanoscale
[20] a) P. A. Aguilera, A. Fernández, R. Fernández, R. Rumí, A. Salmerón, 2019, 11, 8352.
Environ. Mod. Softw. 2011, 26, 1376; b) B.  Sheehan, F.  Murphy, [42] S.  Lee, K.  Park, H. S.  Ahn, D.  Kim, Toxicol. Appl. Pharmacol. 2010,
M.  Mullins, I.  Furxhi, A. L.  Costa, F. C.  Simeone, P.  Mantecca, 246, 38.
Int. J. Mol. Sci. 2018, 19, 649. [43] M. Nath, G. Roy, Mol. Pharmaceutics 2020, 17, 725.
[21] F. R. Burden, D. A. Winkler, J. Chem. Inf. Model. 2015, 55, 1529. [44] J. D. Halley, D. A. Winkler, Complexity 2008, 13, 10.
[22] T. C. Le, D. A. Winkler, Chem. Rev. 2016, 116, 6107. [45] M. R.  Findlay, D. N.  Freitas, M.  Mobed-Miremadi, K. E.  Wheeler,
[23] a) V. C.  Epa, F. R.  Burden, C.  Tassa, R.  Weissleder, S.  Shaw, Environ. Sci.: Nano 2018, 5, 64.
D. A.  Winkler, Nano Lett. 2012, 12, 5808; b) D.  Fourches, D.  Pu, [46] Z. Ban, P. Yuan, F. Yu, T. Peng, Q. Zhou, X. Hu, Proc. Natl. Acad. Sci.
L.  Li, H.  Zhou, Q.  Mu, G.  Su, B.  Yan, A.  Tropsha, Nanotoxicology USA 2020, 117, 10492.
2016, 10, 374; c) D. A. Winkler, F. R. Burden, B. Yan, R. Weissleder, [47] a) B. Sun, R. Li, X. Wang, T. Xia, Int. J. Biomed. Nanosci. Nanotechnol.
C.  Tassa, S.  Shaw, V. C.  Epa, SAR QSAR Environ. Res. 2014, 25, 2013, 3, 4; b) J. Ehret, M. Vijver, W. Peijnenburg, Altern. Lab. Anim.
161; d) T.  Puzyn, D.  Leszczynska, J.  Leszczynski, Small 2009, 5, 2014, 42, 43; c) J. M. Gernand, E. A. Casman, IEEE Intell Syst. 2014,
2494; e) R.  Rajeswari, R.  Jothilakshmi, Mater. Focus 2016, 5, 335; 29, 84; d) D. Fourches, R. Lougee, in Bioactivity of Engineered Nano-
f) B. Rasulev, A. Gajewicz, T. Puzyn, D. Leszczynska, J. Leszczynski, particles (Eds: B.  Yan, H.  Zhou, J. L.  Gardea-Torresdey), Springer,
in Towards Efficient Designing of Safe Nanomaterials: Innovative Singapore 2017, p. 361; e) V. V. Kleandrova, F. Luan, A. Speck-Planche,
Merge of Computational Approaches and Experimental Techniques M. N. D.  Cordeiro, in Materials Science and Engineering: Concepts,
(Eds: T. Puzyn, J. Leszczynski), Royal Society of Chemistry, Cambridge Methodologies, Tools, and Applications, IGI Global, Hershey, PA, USA
2012, Ch. 10, p. 220; g) B.  Saini, S.  Srivastava, presented at IOP 2017, p. 1504; f) C.  Oksel, C. Y.  Ma, J. J.  Liu, T.  Wilkins,
Conference Series: Materials Science and Engineering, Bali, Indonesia, X. Z.  Wang, in Modelling the Toxicity of Nanoparticles (Eds: L.  Tran,
January 2018. M. A.  Bañares, R.  Rallo), Springer International Publishing 2017,
[24] a) Baskin II, D. Winkler, I. V. Tetko, Exp. Opin. Drug Discovery 2016, p. 103; g) A. Nel, T. Xia, H. Meng, X. Wang, S. Lin, Z. Ji, H. Zhang,
11, 785; b) K. G. Reyes, B. Maruyama, MRS Bull. 2019, 44, 530. Acc. Chem. Res. 2013, 46, 607.
[25] a) D. A.  Winkler, T. C.  Le, Mol. Inf. 2017, 36, 1600118; b) R.  Liu, [48] T. Puzyn, B. Rasulev, A. Gajewicz, X. Hu, T. P. Dasari, A. Michalkova,
M.  Madore, K. P.  Glover, M. G.  Feasel, A.  Wallqvist, Toxicol. Sci. H. M.  Hwang, A.  Toropov, D.  Leszczynska, J.  Leszczynski,
2018, 164, 512. Nat. Nanotechnol. 2011, 6, 175.
[26] A. S.  Barnard, B.  Motevalli, A. J.  Parker, J. M.  Fischer, C. A.  Feigl, [49] D.  Fourches, D.  Pu, C.  Tassa, R.  Weissleder, S. Y.  Shaw,
G. Opletal, Nanoscale 2019, 11, 19190. R. J. Mumper, A. Tropsha, ACS Nano 2010, 4, 5703.
[27] A. M. Richard, C. Yang, R. S. Judson, Toxicol. Mech. Methods 2008, [50] R.  Liu, R.  Rallo, R.  Weissleder, C.  Tassa, S.  Shaw, Y.  Cohen, Small
18, 103. 2013, 9, 1842.
[28] A. Afantitis, G. Melagraki, P. Isigonis, A. Tsoumanis, D. D. Varsou, [51] J. M. Gernand, E. A. Casman, Risk Anal. 2014, 34, 583.
E.  Valsami-Jones, A.  Papadiamantis, L. A.  Ellis, H.  Sarimveis, [52] A.  Mikolajczyk, A.  Gajewicz, B.  Rasulev, N.  Schaeublin, E.  Maurer-
P.  Doganis, P.  Karatzas, P.  Tsiros, I.  Liampa, V.  Lobaskin, Gardner, S.  Hussain, J.  Leszczynski, T.  Puzyn, Chem. Mater. 2015,
D.  Greco, A.  Serra, P. A. S.  Kinaret, L. A.  Saarimaki, R.  Grafstrom, 27, 2400.
P.  Kohonen, P.  Nymark, E.  Willighagen, T.  Puzyn, A.  Rybinska- [53] E.  Papa, J. P.  Doucet, A.  Doucet-Panaye, SAR QSAR Environ. Res.
Fryca, A.  Lyubartsev, K.  Alstrup Jensen, J. G.  Brandenburg, 2015, 26, 647.
S.  Lofts, C.  Svendsen, S.  Harrison, D.  Maier, K.  Tamm, J.  Janes, [54] G.  Chen, W. J.  Peijnenburg, V.  Kovalishyn, M. G.  Vijver, RSC Adv.
L.  Sikk, M.  Dusinska, E.  Longhin, E.  Runden-Pran, E.  Mariussen, 2016, 6, 52227.
N.  El Yamani, W.  Unger, J.  Radnik, A.  Tropsha, Y.  Cohen, [55] X.  Bai, F.  Liu, Y.  Liu, C.  Li, S.  Wang, H.  Zhou, W.  Wang, H.  Zhu,
J.  Leszczynski, C.  Ogilvie Hendren, M.  Wiesner, D.  Winkler, D. A. Winkler, B. Yan, Toxicol. Appl. Pharmacol. 2017, 323, 66.
N.  Suzuki, T. H.  Yoon, J. S.  Choi, N.  Sanabria, M.  Gulumian, [56] T. C.  Le, H.  Yin, R.  Chen, Y.  Chen, L.  Zhao, P. S.  Casey, C.  Chen,
I. Lynch, Comput. Struct. Biotechnol. J. 2020, 18, 583. D. A. Winkler, Small 2016, 12, 3568.
[29] A.  Haase, F.  Klaessig, EU Nanosafety Cluster, https://doi. [57] C.  Oksel, D. A.  Winkler, C. Y.  Ma, T.  Wilkins, X. Z.  Wang,
org/10.5281/zenodo.1486012 (accessed: November 2018). Nanotoxicology 2016, 10, 1001.
[30] K.  van  Leeuwen, T. W.  Schultz, T.  Henry, B.  Diderich, G. D.  Veith, [58] R. Concu, V. V. Kleandrova, A. Speck-Planche, M. Cordeiro, Nanotoxicology
SAR QSAR Environ. Res. 2009, 20, 207. 2017, 11, 891.
[31] N. Sizochenko, A. Mikolajczyk, K. Jagiello, T. Puzyn, J. Leszczynski, [59] W. Wang, A. Sedykh, H. Sun, L. Zhao, D. P. Russo, H. Zhou, B. Yan,
B. Rasulev, Nanoscale 2018, 10, 582. H. Zhu, ACS Nano 2017, 11, 12641.
[32] A. Gajewicz, Nanoscale 2017, 9, 8435. [60] V.  Kovalishyn, N.  Abramenko, I.  Kopernyk, L.  Charochkina,
[33] a) S. S.  Young, F.  Yuan, M.  Zhu, Mol. Inf. 2012, 31, 707; L.  Metelytsia, I. V.  Tetko, W.  Peijnenburg, L.  Kustov, Food Chem.
b) P.  Mikulskis, M. R.  Alexander, D. A.  Winkler, Adv. Intell. Syst. Toxicol. 2018, 112, 507.
2019, 1, 1900045. [61] F.  Hataminia, Z.  Noroozi, H.  Mobaleghol Eslam, Toxicol. In Vitro
[34] E.  Wyrzykowska, K.  Jagiello, in Computational Nanotoxicology 2019, 59, 197.
(Eds: /Gajewicz, T. Puzyn), Jenny Stanford, Singapore 2020, p. 245. [62] J. P.  Horwath, D. N.  Zakharov, R.  Megret, E. A.  Stach, 2019,
[35] S.  Kar, A.  Gajewicz, T.  Puzyn, K.  Roy, J.  Leszczynski, Ecotoxicol. arXiv:1912.06077 [eess.IV].
Environ. Saf. 2014, 107, 162. [63] L.  Coquelin, N.  Fischer, N.  Feltin, L.  Devoille, G.  Felhi, Mater. Res.
[36] P.  De, S.  Kar, K.  Roy, J.  Leszczynski, Environ. Sci.: Nano 2018, 5, Express 2019, 6, 085001.
2742. [64] S. Berg, D. Kutra, T. Kroeger, C. N. Straehle, B. X. Kausler, C. Haubold,
[37] N.  Sizochenko, B.  Rasulev, A.  Gajewicz, V.  Kuz’min, T.  Puzyn, M. Schiegg, J. Ales, T. Beier, M. Rudy, K. Eren, J. I. Cervantes, B. Xu,
J. Leszczynski, Nanoscale 2014, 6, 13986. F. Beuttenmueller, A. Wolny, C. Zhang, U. Koethe, F. A. Hamprecht,
[38] D. D.  Varsou, A.  Afantitis, A.  Tsoumanis, A.  Papadiamantis, A. Kreshuk, Nat. Methods 2019, 16, 1226.
E. Valsami-Jones, I. Lynch, G. Melagraki, Small 2020, 1906588. [65] M.  Ilett, J.  Wills, P.  Rees, S.  Sharma, S.  Micklethwaite, A.  Brown,
[39] K. Odziomek, D. Ushizima, P. Oberbek, K. J. Kurzydlowski, T. Puzyn, R.  Brydson, N.  Hondow, J. Microsc. 2019, https://doi.org/10.1111/
M. Haranczyk, J. Microsc. 2017, 265, 34. jmi.12853.

Small 2020, 16, 2001883 2001883  (12 of 13) © 2020 The Authors. Published by WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim
www.advancedsciencenews.com www.small-journal.com

[66] J. Lazarovits, S. Sindhwani, A. J. Tavares, Y. Zhang, F. Song, J. Audet, [74] A. Afantitis, G. Melagraki, A. Tsoumanis, E. Valsami-Jones, I. Lynch,
J. R. Krieger, A. M. Syed, B. Stordy, W. C. W. Chan, ACS Nano 2019, Nanotoxicology 2018, 12, 1148.
13, 8023. [75] A. Poma, M. L. Di Giorgio, Curr. Genomics 2008, 9, 571.
[67] P.  Ambure, A. K.  Halder, H.  González-Díaz, M. N. D.  Dias Soeiro [76] a) P. A. S.  Kinaret, A.  Serra, A.  Federico, P.  Kohonen, P.  Nymark,
Cordeiro, J. Chem. Inf. Model. 2019, 59, 2538. I.  Liampa, M. K.  Ha, J. S.  Choi, K.  Jagiello, N.  Sanabria,
[68] B. Kim, S. Lee, J. Kim, Sci. Adv. 2020, 6, eaax9324. G.  Melagraki, L.  Cattelani, M.  Fratello, H.  Sarimveis,
[69] S. So, J. Rho, Nanophotonics 2019, 8, 1255. A.  Afantitis, T. H.  Yoon, M.  Gulumian, R.  Grafstrom, T.  Puzyn,
[70] R.  Gomez-Bombarelli, J. N.  Wei, D.  Duvenaud, J. M.  Hernandez- D.  Greco, Nanomaterials 2020, 10, 750; b) A.  Serra, M.  Fratello,
Lobato, B.  Sanchez-Lengeling, D.  Sheberla, J.  Aguilera-Iparraguirre, L.  Cattelani, I.  Liampa, G.  Melagraki, P.  Kohonen, P.  Nymark,
T. D. Hirzel, R. P. Adams, A. Aspuru-Guzik, ACS Cent. Sci. 2018, 4, 268. A.  Federico, P. A. S.  Kinaret, K.  Jagiello, M. K.  Ha, J. S.  Choi,
[71] a) C. W.  Coley, N. S.  Eyke, K. F.  Jensen, Angew. Chem., Int. Ed. N.  Sanabria, M.  Gulumian, T.  Puzyn, T. H.  Yoon, H.  Sarimveis,
2019, https://doi.org/10.1002/anie.201909989; b) H. S.  Stein, R.  Grafstrom, A.  Afantitis, D.  Greco, Nanomaterials 2020, 10,
J. M. Gregoire, Chem. Sci. 2019, 10, 9640. 708.
[72] P. Nikolaev, D. Hooper, N. Perea-Lopez, M. Terrones, B. Maruyama, [77] K. Freeman, Environ. Health Persp. 2005, 113, A388.
ACS Nano 2014, 8, 10214. [78] H.  Autefage, E.  Gentleman, E.  Littmann, M. A.  Hedegaard,
[73] EC Expert Group on Turning FAIR Data into Reality, https://codata. T.  Von Erlach, M.  O’Donnell, F. R.  Burden, D. A.  Winkler,
org/initiatives/working-groups/fair-data-expert-group/ (accessed: M. M.  Stevens, Proc. Natl. Acad. Sci. USA 2015, 112,
May 2020). 4280.

Small 2020, 16, 2001883 2001883  (13 of 13) © 2020 The Authors. Published by WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim

You might also like