You are on page 1of 10

Journal of Hazardous Materials 441 (2023) 129904

Contents lists available at ScienceDirect

Journal of Hazardous Materials


journal homepage: www.elsevier.com/locate/jhazmat

Modeling phytoremediation of heavy metal contaminated soils through


machine learning
Liang Shi a, b, 1, Jie Li c, d, 1, Kumuduni Niroshika Palansooriya a, e, Yahua Chen b, Deyi Hou f,
Erik Meers g, Daniel C.W. Tsang h, Xiaonan Wang i, *, Yong Sik Ok a, *
a
Korea Biochar Research Center, APRU Sustainable Waste Management Program & Division of Environmental Science and Ecological Engineering, Korea University,
Seoul 02841, South Korea
b
College of Life Sciences, Nanjing Agricultural University, Nanjing 210095, China
c
Department of Chemical and Biomolecular Engineering, National University of Singapore, Singapore 117585, Singapore
d
CAS Key Laboratory of Urban Pollutant Conversion, Institute of Urban Environment, Chinese Academy of Sciences, Xiamen 361021, China
e
State Key Laboratory of Subtropical Silviculture, Zhejiang A&F University, Hangzhou 311300, China
f
School of Environment, Tsinghua University, Beijing 100084, China
g
Department of Green Chemistry & Technology, Faculty of Bioscience Engineering, Ghent University, Coupure Links 653, 9000 Ghent, Belgiu
h
Department of Civil and Environmental Engineering, The Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong, China
i
Department of Chemical Engineering, Tsinghua University, Beijing 100084, China

H I G H L I G H T S G R A P H I C A L A B S T R A C T

• Machine learning was used to predict


factors that affect phytoremediation.
• Metal ion radius was the most important
factor affects HM accumulation in
shoots.
• Plant family was the most important
factor that affect BCF and MER.
• Crassulaceae family had the highest
potential as hyperaccumulators.

A R T I C L E I N F O A B S T R A C T

Editor: Lingxin CHEN As an important subtopic within phytoremediation, hyperaccumulators have garnered significant attention due
to their ability of super-enriching heavy metals. Identifying the factors that affecting phytoextraction efficiency
has important application value in guiding the efficient remediation of heavy metal contaminated soil. However,
Keywords:
it is challenging to identify the critical factors that affect the phytoextraction of heavy metals in
Heavy metal
Hyperaccumulator
soil–hyperaccumulator ecosystems because the current projections on phytoremediation extrapolations are
Machine learning rudimentary at best using simple linear models. Here, machine learning (ML) approaches were used to predict

* Corresponding authors.
E-mail addresses: wangxiaonan@tsinghua.edu.cn (X. Wang), yongsikok@korea.ac.kr (Y.S. Ok).
1
These authors contributed equally to this work

https://doi.org/10.1016/j.jhazmat.2022.129904
Received 12 May 2022; Received in revised form 24 August 2022; Accepted 1 September 2022
Available online 5 September 2022
0304-3894/© 2022 Published by Elsevier B.V.
L. Shi et al. Journal of Hazardous Materials 441 (2023) 129904

Phytoextraction the important factors that affecting phytoextraction efficiency of hyperaccumulators. ML analysis was based on
Soil remediation 173 data points with consideration of soil properties, experimental conditions, plant families, low-molecular-
weight organic acids from plants, plant genes, and heavy metal properties. Heavy metal properties, especially
the metal ion radius, were the most important factors that affect heavy metal accumulation in shoots, and the
plant family was the most important factor that affect the bioconcentration factor, metal extraction ratio, and
remediation time. Furthermore, the Crassulaceae family had the highest potential as hyperaccumulators for
phytoremediation, which was related to the expression of genes encoding heavy metal transporting ATPase
(HMA), Metallothioneins (MTL), and natural resistance associated macrophage protein (NRAMP), and also the
secretion of malate and threonine. New insights into the effects of plant characteristics, experimental conditions,
soil characteristics, and heavy metal properties on phytoextraction efficiency from ML model interpretation
could guide the efficient phytoremediation by identifying the best hyperaccumulators and resolving its efficient
remediation mechanisms.

1. Introduction 2019a). For example, Sheoran et al (Sheoran et al., 2016). discovered


that the success of phytoextraction depended on the plant biomass and
Heavy metals and metalloids are harmful pollutants owing to their metal bioavailability, and that the phytoavailability of metals was
toxicity, ubiquity, non-biodegradability, and bioavailability for crop affected by the pH, redox potential, cation exchange capacity, type and
uptake. Hence, they pose significant threats to worldwide food safety texture of soil, root exudates, and rhizosphere processes of plants. Liang
and security (Hou et al., 2020). The enrichment of heavy metals will et al (Liang et al., 2009). compared the remediation efficiency of Cd/Zn
reduce biodiversity and productivity, thereby altering the structure and hyper- and non-hyperaccumulator plants and discovered that although
function of ecosystems (Montoya-Mayor et al., 2013; Niemeyer et al., some hyperaccumulators had high bioconcentration factor (BCF), phy­
2012). Some heavy metals are toxic and carcinogenic to humans. For toextraction efficiency remained limited because of the small biomass.
example, Hg, Pb, and As can affect the growth and function of the central However, these studies only considered a few or a single aspect of
nervous system. Hg, Pb, and Cd can affect the kidney and liver. Ni, Cd, quantitative variables, such as plant or soil variables; and they could not
and Cu can affect the skin, bone, and teeth (Bertin et al., 2017). For be applied to quantitatively analyze the effect of qualitative variables
example, the National Soil Pollution Survey Report jointly issued by the such as plant species, genotypes, and organic acids, which affected
Chinese Ministry of Environmental Protection and the Chinese Ministry phytoextraction efficiency. Few studies simultaneously considered
of Land and Resources shows that the point exceeding rates of Cd, Ni, As, various factors (such as plants, soil, metals, experiment, and environ­
Cu, and Zn in Chinese soil are 7.0%, 4.8%, 2.7%, 2.1% and 0.9%, ment) and quantitatively evaluated the effects of each potential factor
respectively, threatening farmland production and food quality and and the spatial characteristics of bioaccumulation with respect to
safety. Therefore, toxic heavy metals in ecosystems are detrimental to different heavy metals and plants (Hanandeh et al., 2021; Li et al.,
the environment and human health, and we focus on these five kinds of 2021a; Cipullo et al., 2019). Furthermore, it is difficult to construct a
heavy metals here. comprehensive model that can predict the bioaccumulation factors of
Phytoremediation is an environmentally friendly method compared heavy metals in soil–hyperaccumulator systems. The methods used in
with physical, chemical, and other biological methods (Jin et al., 2021; previous studies could not identify the importance of different variables
Wang et al., 2021). Plants with accumulation or super-accumulation in the modeling process, which is vital for decision-makers in identifying
ability can transfer or immobilize heavy metals in soil to reduce their the essential factors for heavy metal pollution control and phytor­
damage to ecosystems (Wood et al., 2016). Phytoextraction is promising emediation in soils.
for the in situ remediation of large area of contaminated land because it ML, a subcategory of artificial intelligence, can adapt and learn from
directly employs plants with large biomass, rapid growth, heavy metal large, complex, and multidimensional data to develop predictive
tolerance, and high accumulation ability to repair heavy models. Different ML algorithms, such as artificial neural networks
metal-contaminated soil (Robinson et al., 2002). Plants with the (ANNs), random forests, and gradient boosted machines, have been
following heavy metal hold capacity are identified as hyper­ widely used to model the heavy metal adsorption in water systems. For
accumulators: plants with leaves or aboveground segments whose Cd example, Cd accumulation in farmland soils, availability and toxicity of
content exceeds 100 mg kg− 1; Co, Cr, Cu, Ni, and Pb contents exceed heavy metals in soils by compost or biochar, and heavy metal accumu­
1000 mg kg− 1; Zn content exceeds 3000 mg kg− 1; and Mn content lation derived from soil–crop systems (Hu et al., 2020; Hanandeh et al.,
exceeds 10,000 mg kg− 1 (Li et al., 2018a). Additionally, the ratio of 2021; Li et al., 2021a; Cipullo et al., 2019). However, the use of ML to
heavy metal content in an aboveground hyperaccumulator to that un­ identify the factors that affect the accumulation of heavy metals in
derground is always greater than 1 (Wood et al., 2016). More than 450 hyperaccumulators has not been established and is worth investigating
species of plants belonging to at least 34 families have been identified as to elucidate the hidden and intertwined relations.
hyperaccumulators (Verbruggen et al., 2009). For example, Pteris cre­ Hence, plant family, environmental conditions, soil properties, and
tica, Pteris fauriei, Pteris oshimensis, and Pteris vittata are As hyper­ metal properties are considered as input data in this study to predict the
accumulators. Sedum alfredii, Sedum plumbizincicola, Arabis paniculata, factors that affecting phytoextraction by ML approach. This study pro­
and Centella asiatica are Cd hyperaccumulators. Arabis paniculata, vides new insights into the phytoremediation of heavy metal contami­
Isachne globosa, Pogonatherum crinitum, are Pb hyperaccumulators. Ce­ nation in soil–hyperaccumulator ecosystems and guarantees the safety
losia argentea, Phytolacca americana, Polygonum lapathifolium, Polygonum of agricultural products (Fig. 1).
pubescens, and Schima superba are Mn hyperaccumulators. Arabis pan­
iculata, Corydalis davidii, Picris divaricata, S. alfredii, S. plumbizincicola, 2. Methodology
and Viola baoshanensis are Zn hyperaccumulators (Li et al., 2018a).
Phytoremediation is a complicated process in which the trans­ 2.1. Dataset preparation
formation and accumulation of heavy metals from soil to plants are
affected by the species, uptake mechanisms and physiological charac­ We performed a systematic literature search and review to obtain a
teristics of plants, soil physicochemical properties and agronomic comprehensive dataset for the ML model. Keywords associated with the
practices, physical and chemical properties of the heavy metals, and phytoremediation of heavy metals in soils were used to obtain relevant
environmental and experimental factors (Hu et al., 2020; Wang et al., literature from Google Scholar (https://scholar.google.com) and Web of

2
L. Shi et al. Journal of Hazardous Materials 441 (2023) 129904

et al., 2021). To ease downstream model training and testing, the entire
dataset was partitioned into two, i.e., 80% for hyperparameter tuning
based on 5-fold cross-validation (CV) and ML model training, and the
remaining 20% data as test set for validating the generalization ability of
the trained models.
Based on our previous work, extreme gradient boosting (XGBoost),
an effective ML algorithm for modeling the application of carbon related
material (Yuan et al., 2021), was employed to develop multilabel pre­
diction model where the model could predict the five above-mentioned
Fig. 1. Schematic diagram of Machine Learning applied in this work. Driven by outcomes (i.e., HMshoot, yield, BCFs, MER, and RT) at the same time. It
the ML model, 173 available input and output data were used to evaluate the was owing to the success of tree-based ensemble algorithms develop­
heavy metal enrichment effect of hyperaccumulators. ment and good tradeoff between bias and variance to avoid overfitting
for the regression predictive model. Based on the gradient boosting
Science (https://www.webofscience.com). We used ‘Hyper­ models demonstrated in our previous study (Li et al., 2021a), XGBoost
accumulator’, ‘Heavy metal’ and ‘pH’ as key words to search paper on used more accurate approximations to identify the best tree model, i.e.,
‘Web of Science’. In addition, we used ‘Hyperaccumulator’, ‘Heavy by computing second partial derivatives (second-order gradients) of the
metal’, ‘pH’, ‘Cation exchange capacity’, ‘Organic carbon’, ‘Tempera­ loss function to obtain more information for the gradient direction (Chen
ture’, ‘Organic acid’, ‘Gene’, ‘Time’, ‘Pot’, ‘Field’, ‘Soil weight’, ‘Ni’, and Guestrin, 2016). Moreover, regularization terms as penalty were
‘Cd’, ‘Zn’, ‘As’, ‘Cu’ as key words to search paper on ‘Google Scholar’. integrated to avoid the bias and improve model generalization. In
Data pertaining to soil, heavy metals, plants, and experiments were XGBoost, four important hyperparameters, including the n_estimators,
extracted as inputs, and the shoot heavy metal concentration (HMshoot, learning_rate, subsample ratio, and max_depth, were adjusted to adapt
µg/g), shoot yield (yield, mg/plant), BCFs, metal extraction ratio (MER, to our dataset. For tree-based models, it is unnecessarily to do the data
%), and remediation time (RT, per kg soil [yr]) were identified as out­ normalization because they are irrelevant to the absolute values to split
puts to compile the original datasets (Fig. 1). the trees (Li et al., 2021b). Therefore, the original input values were
The properties of the soil, including pH, cation exchange capacity directly used for the model development of XGBoost.
(CEC, cmol/kg), and organic carbon (OC, g/kg) were considered for the To identify the types of LMWOA generated and the effect of plant
properties of different heavy metals, and the electronegativity of heavy genes on the mechanism of heavy metal phytoremediation, the ANN
metals (HM_x), ion radius of heavy metals (HM_r, nm), and total heavy algorithm with ‘sigmoid’ as the activation function in the output layer
metal concentration (μg/g) were identified from only 20 papers by was applied to develop classification models. The reason to select ANN
searching across major databases. Information regarding the different model for classification is that ANN has been proven to be able to handle
plants obtained from the literature was categorized based on the family the gene expression profiling and it is also quite popular in the phytor­
level. The experimental conditions, including the temperature differ­ emediation domain. The detailed description of the ANN algorithm is
ence during planting, planting time, soil mass used for pot experiments, described in our previous paper (Li et al., 2021b). Here, two hidden layer
and pot depth, were considered for modeling. A dataset contained 173 ANNs were developed and the number of neurons in each hidden layer
data points were obtained with five heavy metals (As, Cd, Cu, Ni, and was optimized by searching optimal numbers from 2 to 128. Moreover,
Zn), seven different family levels of plants, and seven types of soils the activation function in the hidden layer was ‘relu’, and ‘adam’ was
related. All of the data come from 20 papers which provide all the selected as the optimizer with learning rate oof 0.001 during model
necessary information, including experimental conditions, hyper­ training. To train the ANN classification model, in addition to the inputs
accumulator information, heavy metal properties and soil properties in considered in the previous regression model, the heavy metal type was
common. considered by creating a binary column for each heavy metal. Subse­
To interpret the phytoremediation mechanism, eight types of low quently, all inputs in the entire dataset were normalized before training
molecular weight organic acids (LMWOAs) generated from plants and and test data splitting to improve the convergence process during model
21 types of gene expression in plants were considered as other outputs training by removing the mean and scaling to the unit variance (Li et al.,
here to develop individual classification models that could aid in un­ 2021c).
derstanding the mechanism (Fig. 1). Since organic acids have the
function of chelating heavy metals, they may change the available 2.3. ML model performance evaluation and model-based feature analysis
content of heavy metals in the soil. Under the regulatory action of heavy
metal-related transport genes, the accumulation of heavy metals by Once the ML models were developed, the remaining 20% of the data
plants increase. Therefore, we also targeted the two aspects (LMWOAs points were introduced to evaluate the prediction performance. For the
and gene expression) to investigate their impacts on the phytor­ regression model, the determination coefficient (R2) and root mean
emediation efficiency of hyperaccumulator. squared error (RMSE) were applied to obtain the prediction accuracy
(Yuan et al., 2021) with an R2 value closer to one indicating a better
prediction; whereas, a smaller RMSE represented a higher accuracy.
2.2. Data pre-processing and ML model development
Accuracy was denoted by the accuracy score and F1 score were for the
classification models (Anon, 2022a).
For the preliminary datasets, the unit for each variable was uniform
before the model development. We performed the data statistics on the ∑
N
y n − yn )2

compiled datasets and found that 2.9% and 4.6% of the temperature
difference and soil weight data were missing. The missing values were R 2
(y, ̂y ) = 1 − n=1
(1)
∑N

filled in using the K-Nearest Neighbor method to complete the dataset (yn − y)2
n=1
for model training. After this data filling, we obtained three datasets for
√̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅
phytoremediation properties, plant acid generation and plant genes √N
√∑
expression, respectively. Each dataset contained 173 pieces of data­
2
√ (̂
√n=1 y n − yn )
points without missing values. Moreover, one-hot encoding was per­ RMSE (y, ̂y ) = (2)
N
formed to transfer the categorical features (heavy metal type, plant
family, LMWOA, and plant genes) into a one-hot numeric array (Pant

3
L. Shi et al. Journal of Hazardous Materials 441 (2023) 129904

1 ∑
N− 1 metal concentrations in the hyperaccumulator. The average, minimum
accuracy score (y, ̂
y) = I(̂
y i = yi ), (3) and maximum concentrations of heavy metals in the hyperaccumulators
N N
were in the following order: Ni (8586 mg kg-1; min: 1303 - max: 23387)
where y, y, and ̂ y are the true value, average of the true value, and > Zn (3430 mg kg-1; min: 79.7 - max: 15469) > As (3109 mg kg-1; min:
predicted value among the total number of data points (N) for each 1080 - max: 5138) > Cd (96.8 mg kg-1; min: 0.45 - max: 1310) > Cu
(15.6 mg kg-1; min: 5.7 - max: 35.6). The CVs of Ni, Zn, As, Cd, and Cu
prediction target, respectively. I(̂y i = yi ) represents the indicator func­
concentrations were 72.07%, 114.63%, 92.29%, 190.48%, and 62.32%,
tion, as follows:
{ } respectively. The CVs of all heavy metal concentrations in the hyper­
1 if ̂y i = yi , accumulators indicated high or exceptionally high variabilities (Ridge­
I(̂
y i = yi ) = (4)
0 if ̂y i ∕
= yi . way, 2020). The maximum concentrations of Ni, Zn, As, Cd, and Cu in
the hyperaccumulators here were significantly higher than the defined
F1 score = 2 ×
Pr × Re
(5) heavy metal concentrations in typical hyperaccumulators.
Pr + Re Figure S3 shows the BCFs of the heavy metals in the different
hyperaccumulators. The mean BCFs for each heavy metal based on
TP
Pr = (6) average values from different hyperaccumulators were as follows: As
TP + FP
(24.39) > Cd (21.96) > Ni (7.71) > Zn (7.21) > Cu (0.05). Crassulaceae
TP exhibited the highest BCFs for Cd (37.33) and Zn (9.50), Brassicaceae,
Re = (7) Cu (0.06), Ni (7.71), and Pteridaceae, for As (24.39).
TP + FN

where, Pr and Re are the precision and recall of the classification model.
TP, FN, and FP are the numbers of true positives, false negatives, and 3.2. Development of regression predictive model
false positives, respectively. The “micro” was selected during F1 score
calculate to calculate metrics globally by counting the total true posi­ Based on our preliminary regression model development trials, it was
tives, false negatives and false positives. difficult to adapt the outputs, including the five identified targets
Feature analysis, including feature importance and feature correla­ (HMshoot [µg/g], yield [mg/plant], BCF, MER [%], and RT per kg soil
tion, was interpreted based on the developed models with the best [y]), to the ML models. After performing a data distribution analysis, it
prediction performance. For the regression model, the feature impor­ was discovered that the data did not present a normal distribution owing
tance was automatically achieved from the Gini importance by devel­ to the dispersed data points (see Figure S1). To solve this issue and
oping a tree-based XGBoost model (Rosa, 2022). This model provided a improve the model performance, a logarithmic transformation was
score to indicate the value of each feature during the construction of employed to redistribute the data, (see Figure S2).
trees within the model. The importance was calculated based on the Apart from the data preprocessing, several critical hyperparameters
improvement in the performance measured by the Gini index during the in XGBoost were adjusted using the training dataset based on five-fold
selection of the split point to develop the trees. The final feature cross-validation to obtain a predictive model with good prediction
importance was averaged from all constructed trees in the model. It performance. The average RMSE for all targets from validation was used
should be noted that although the Gini importance tends to bias towards to determine the optimal hyperparameters. Based on the hyper-
numerical values, such feature importance could be acceptable since it parameter tuning results of XGBoost (see Figure S4), the average
has been cross validated by another feature analysis method (Pal­ RMSE decreased as the number of trees (n_estimators) increased from 10
ansooriya et al., 2022). For the feature correlation, XGBoost was inte­ to 30; then, decreased until the learning rate increased to 0.1. Moreover,
grated with the partial dependence plot (PDP) Python tool (PDPbox) to based on an optimal n_estimator of 30 and a learning rate of 0.1, the
visualize the marginal effect of a specified input variable with different subsample rate and maximum depth (max_depth) of the trees were
values on the model outcome (Anon, 2022b). In the ANN classification further optimized. The results indicated that a smaller average RMSE
models, feature importance was identified using the SHAP method, was achieved with a subsample rate of 0.7 and a maximum depth of 5
which proved to be useful for explaining black box models. SHAP (see Figure S4). Based on the optimal hyperparameters, two XGBoost
method determines the importance of features based on SHAP value models were trained with the output data before and after logarithmic
which is a concept in cooperative game theory. More details regarding transformation (Sigmund et al., 2020). It was discovered that the pre­
the SHAP method are available in our previous paper (Li et al., 2020). diction performance improved after the original output data were
logarithmically transferred, particularly for the testing performance.
3. Results and discussion The XGBoost model performed better with the transformed data closer
to a normal distribution, since the value ranges of the transformed data
3.1. Statistical analysis of data acquired were more compact and balanceable, which was more acceptable for
regression model (Table 1, Figures S1 and S2).
Table S1 lists the basic descriptive statistics of the heavy metal The original experimental data versus the predicted data of heavy
concentrations in the soil samples. The average, minimum and metal concentrations in terms of the shoot concentration (HMshoot),
maximum concentrations of heavy metals in soil were in the following shoot yield (yield), BCF, MER, and RT are shown as scatter plots (see
order: Ni (1014 mg kg-1; min: 553 mg kg-1 - max: 1780 mg kg-1) > Zn Fig. 2). The line y = x indicated that the predicted values were equal to
(631 mg kg-1; min: 109 mg kg-1 - max: 3170 mg kg-1) > Cu (294 mg kg- the measured value, and the closer the dots were to the y = x line, the
1
; min: 196 mg kg-1 - max: 801 mg kg-1) > As (131 mg kg-1; min: better the prediction efficiency. Almost all the predicted and training
126 mg kg-1 - max: 135 mg kg-1) > Cd (7.1 mg kg-1, min: 0.34 mg kg-1 - points were concentrated on the y = x line; however, the predicted
max: 48 mg kg-1). The concentrations of all heavy metals in the soil were points indicated a less dense distribution than the training data points
higher than the typical background concentrations and ecological soil because of the slightly deteriorated accuracy in the prediction perfor­
screening levels. The coefficient of variation (CV) represented the degree mance. The R2 values of the XGBoost model for the HMshoot, yield, BCF,
of variability in heavy metal concentrations. The high CV of heavy MER, and RT in the test dataset were 0.93, 0.79, 0.89, 0.92, and 0.91,
metals in soils from the survey region indicated that the accumulation of respectively, which were higher than the average 5-fold CV R2. How­
heavy metals in these soils was likely due to anthropogenic activities ever, these values were still located in the ranges of the standard devi­
(Manta et al., 2002). ation plus the average 5-fold CV R2 (See Figure S5). This result indicated
Table S2 presents the basic statistical characteristics of the heavy that our developed model did not have serious overfitting issues.

4
L. Shi et al. Journal of Hazardous Materials 441 (2023) 129904

Table 1 importance of the pot depth and total heavy metal concentrations was
Training and testing performance of developed XGBoost models based on orig­ less than 25% of the metal ion radius. This showed that the heavy metal
inal and log-transferred output data. concentration in the shoots was primarily determined by the metal ion
Items Output parameters Training Testing radius. Based on the correlation analysis of the top four features (see
R2 RMSE R2 RMSE
Fig. 4a), the metal ion radius and pot depth were negatively associated
Original HMshoot (µg/g) 0.97 629 0.77 1371 with the HMshoot, and they exhibited a linear relationship with a high
Yield (mg/plant) 0.98 346 -0.15 1804 slope in the metal ion radius range of 0.07–0.10 nm and pot depth range
BCF 0.97 2.99 0.59 13.19 of 20–100 cm. In addition, the total heavy metal concentrations in soil
MER (%) 0.97 0.80 0.52 2.11
and soil mass (0.6–1 kg) contributed positively to the HMshoot (see
RT per kg soil (yr) 0.91 8420 0.71 9460
Log transferred HMshoot (µg/g) 0.96 0.57 0.93 0.66 Fig. 4a). The results showed that the high HMshoot was primarily due to
Yield (mg/plant) 0.93 0.55 0.79 0.77 the small metal ion radius, shallow pot depth, high total heavy metal
BCF 0.95 0.44 0.89 0.69 concentrations in soil, and soil mass (0.60–1 kg). Gu et al (Gu and Lan,
MER (%) 0.95 0.60 0.92 0.77 2021). discovered that the adsorption capacity of Neochloris oleoa­
RT per kg soil (yr) 0.96 0.59 0.91 0.83
bundans biomass to two-valence metal ions investigated in their study
Note: HMshoot: heavy metal concentration in shoots (µg/g); yield: shoot yield was proportional to the electronegativity and inversely proportional to
(mg/plant); BCF: bioconcentration factor; MER: metal extraction ratio (%); RT the radius of the metal ions; however, the exact reason has not been
per kg soil (yr): remediation time (1 year, 1 kg contaminated soil). clarified.
Fig. 3b shows that the experimental condition type constituted 48%
3.3. Model-based interpretation to yield prediction; whereas, the metal property type constituted only
4%. Soil mass was the most important feature for yield, followed by OC
Further, a feature analysis of each input to each output was per­ in soil, Brassicaceae, and planting time. Based on the correlation anal­
formed to understand the phytoremediation process (see Fig. 3). The ysis of the top-four experimental condition features (see Fig. 4b), the soil
input variables were categorized into four types to determine the mass was positively associated with the yield in the ranges of 0.6–1.5 kg
importance of each feature type: plant family, experimental conditions, and negatively related in the ranges of 4–6 kg. The soil OC was nega­
soil properties, and metal properties. Fig. 3a shows the metal property tively associated with the yield when the concentration was 24.6 kg to
contributed 71% to the HMshoot in terms of importance; whereas the soil 81.7 g/kg. The planting time was positively associated with the yield
property contributed only 4%. The ionic radius of metal was the most from 0.83 to 2.84 months. Meanwhile, the yield increased and reached a
important feature for the HMshoot, followed by the pot depth and total maximum when the temperature difference was 4 ◦ C; this increase was
heavy metal concentration in the soil (see Fig. 3a). Although the three associated with an improvement in soil enzyme activity. The yield of
features mentioned were the top three notable features, they indicated buckwheat (Polygonaceae family) increased owing to improvements in
significant differences on the importance for the HMshoot. The the accumulated temperature, temperature and water use efficiency,

Fig. 2. Multi-task predicted data vs original experimental data of (a) heavy metal concentration in shoot (HMshoot), (b) shoot yield (Yield), (c) BCF, (d) MER and (e)
RT based on optimized ML models XGBoost with training and testing datasets. 173 available input and output data were used to develop the predictive model.

5
L. Shi et al. Journal of Hazardous Materials 441 (2023) 129904

Fig. 3. Prediction plots (training and testing) based on feature importance of XGBoost for (a) heavy metal concentration in shoot (HMshoot), (b) shoot yield (Yield),
(c) BCF, (d) MER, and (e) RT. Plant family: Amaranthaceae, Asteraceae, Brassicaceae, Crassulaceae, Fabaceae, Pteridaceae, and Solanaceae; Experimental conditions:
T difference (◦ C), planting time (months), Pot depth (cm), soil mass (kg); Soil properties: soil pH, soil CEC (cmol/kg), and soil OC (g/kg); Metal properties: HM_x,
HM_r (nm), and total heavy metal concentration (µg/g). Note: HM_r (nm): Ion radius of heavy metals; Total HM conc (μg/g): Total heavy metal concentration in soil
before planting; Soil mass (kg): Fresh weight of soil used in each treatment; HM_X: Electronegativity of heavy metals; T difference (℃): Temperature difference; Soil
CEC (cmol/kg): Soil cation exchange capacity; Soil OC (g/kg): Soil organic carbon.

and soil organic carbon. Qu et al (Qu and Feng, 2020). discovered that increase the growth rate of plants, thereby diluting the content of heavy
buckwheat yield increased owing to improvements in the OC in the soil, metals in plants, resulting in a decrease in BCF (Venzhik et al., 2015).
accumulated temperature, temperature and water use efficiency. In Fig. 3d shows that the plant family type constituted 58% in terms of
addition, the increase in OC in the soil was associated with an importance to the MER, whereas Crassulaceae, pot depth, and Pter­
improvement in soil enzyme activity. idaceae were the most notable features for MER. Pot depth was nega­
For the BCF (see Fig. 3c), the plant family was the most important tively correlated with the MER (Fig. 4d), this might be also due to the
feature type, accounting for 41%, whereas Crassulaceae, soil mass, and higher heavy metal concentration in the deeper soil layer (Tőzsér et al.,
pot depth were the top three vital features. Employment of appropriate 2017). However, more in-depth research may be needed to explain why
plants is the key to the success of phytoremediation. Crassulaceae has the contribution of ‘Plant family’ feature on BCF and MER significant
high BCF and MER may be due to high biomass, high growth rate and higher than ‘Experimental condition’, ‘Soil property’ and ‘Metal prop­
strong ability of absorbing and accumulating heavy metals compare erty’. For RT, the plant family was the most important feature type,
with other hyperaccumulators (Shen et al., 2022). The soil mass accounting for 51%, followed by the experimental conditions, metal
(0.6–1 kg) was positively associated with the BCF. However, the pot properties, and soil properties (see Fig. 3e). These important feature
depth, HM_x, and temperature difference were the three negative fea­ results for the inputs were similar to those of the MER. The pot depth,
tures associated with BCF (see Fig. 4c). Although we can not find the temperature difference, total heavy metal concentration, and HM_x were
relationship between soil mass and BCF in reference, we speculate that all positively associated with RT (see Fig. 4e).
within a certain mass range, as the soil mass increases, the total heavy
metal content in the culture system gradually increases, and the 3.4. LMWOA and gene identification for in-depth interpretation
enrichment capacity of plants is gradually enhanced within the
threshold, and when the threshold of plants can absorb heavy metals is Figure S5 and Figure S6 shows the results of accuracy based on five-
reached, the BCF will remain unchanged. For pot depth, Tőzsér et al fold cross-validation for hyperparameter tuning for the ANN model
(Tőzsér et al., 2017). found that increasing element concentrations to­ based on 80% of the data points from the dataset. The accuracy of
ward deeper layers, which could explain the relative low BCF in organic acid (a) and genes (b) increased with the number of neurons in
deep-soil experiments. For HM_x, the low accumulation efficiency of the first hidden layer from 2 to 16. In addition, as the number of neurons
heavy metals with plants might be related to the high electronegativity increased in the second hidden layer, the accuracy increased. However,
(Fan et al., 2016). In addition, a higher temperature difference may no further improvement for organic acid identification was observed as

6
L. Shi et al. Journal of Hazardous Materials 441 (2023) 129904

Fig. 4. Correlation of top-four continuous input features with log-based (a) heavy metal concentration in shoot (HMshoot), (b) shoot yield (yield), (c) bioconcentration
factor (BCF), (d) metal extraction factors (MER), and (e) RT. Note: HM_r (nm): Ion radius of heavy metals; Total HM conc (μg/g): Total heavy metal concentration in
soil before planting; Soil mass (kg): Fresh weight of soil used in each treatment; Soil OC (g/kg): Soil organic carbon; T difference (℃): Temperature difference; HM_X:
Electronegativity of heavy metals.

the number of neurons increased continuously in the first hidden layer identification were 32 and 128. Once determined the optimal hyper­
from 16 to 128 with the number of neurons in the second hidden layer parameters of ANN, it was retrained by the all the training data for
over 64. Therefore, the final optimized hyper-parameters of ANN for LMWOAs and gene identification. As shown in Fig. 5, the accuracy and
organic acid identification were 16 and 46 for the first and second F1 score of the classification model for test data of LMWOAs exhibited a
hidden lawyers, respectively. Similarly, the optimized number of neu­ lower accuracy than the training data, though not significantly, with a
rons in the first and second layers of ANN determined for gene test accuracy and F1 score of approximately 0.8 and 0.85 for identifying

7
L. Shi et al. Journal of Hazardous Materials 441 (2023) 129904

uptake and accumulation of Cd in the roots of B. juncea. However,


compared with Brassicaceae and other plants, Crassulaceae played the
most important role in BCF, MER, and RT (see Fig. 3c-e), which might be
due to the high secretion of malate and threonine (see Fig. 6a), and also
the high expression of HMA, MTL, and NRAMP (see Fig. 6b). This
indicated that these three genes had a significant impact on BCF, MER,
and RT. As reported in the literature, the elevated expression of
tonoplast-localized HMA3 in the shoots of S. plumbizincicola was vital to
Cd detoxification, which contributed to the maintenance of the normal
growth of young leaves of S. plumbizincicola in Cd-contaminated soils
(Liu et al., 2017b). Peng et al (Peng et al., 2017). discovered a positive
correlation between transcript levels of MTL in roots and Cd accumu­
lation in leaves of S. plumbizincicola, and that elevated transcript levels
and heterotypic variation in protein sequences of SpMTL might
Fig. 5. Test performance including prediction accuracy and F1 score based on
optimal hyperparameters with testing data points for LMWOA and genes
contribute to Cd hyperaccumulation and hypertolerance in
identification. 173 available input and output data were used to develop the S. plumbizincicola. In addition, the lower biomass of Crassulaceae
predictive model. resulted in the longest RT compared with other hyperaccumulators.
Except for the plant features, HM_r, soil mass, and pot depth were the
LMWOAs. For the genes, the training and test data matched well, and the most important features of the HMshoot, yield, BCF, MER, and RT (see
test accuracy and F1 score exceeded 0.9. Fig. 3). However, in addition to the plant features, soil pH, pot depth,
The LMWOAs secreted by plants could change the available heavy and soil mass were the most important factors contributing to the eight
metal concentration in soil through chelation; thereby, affecting the types of LMWOA secretion. In addition, pot depth was the most
absorption of heavy metals by plants (Meier et al., 2012). This is a important factor contributing to the expression of the 21 genes. The pot
principle we also adopt to engineer for increasing the uptake efficiency. depth primarily contributed to malate, citrate, and fumarate secretion,
For example, LMWOAs and some chelators have always been added to as well as the expression of the HMA gene. The encoded protein of the
the soil to increase metal uptake by phytoremediation hyper­ HMA (HM ATP synthase) family gene loaded Cd from the symplasm to
accumulators or crops (Duquène et al., 2009; Wang et al., 2022). The the xylem, affecting the migration rate of Cd from the root to the
heavy metal transporter genes in the roots, stems, and leaves of hyper­ aboveground segment as well as the accumulation of Cd in the above­
accumulators can transport metal elements from roots to stems or leaves ground segment (Uraguchi and Fujiwara, 2012; Wu et al., 2021). The
to enhance phytoremediation efficiency, or transport metal elements to HMA1 gene in the root of high-Cd-accumulation pepper X55 under Cd
vacuoles and cell walls to participate in detoxification (Liu et al., 2017a; 10 level was 52 times more upregulated compared with that under the
Ye et al., 2020). For plant features, Brassicaceae exhibited the highest Cd 0 level (Hu et al., 2021). This showed that the expression of HMA1
shoot yield compared with other hyperaccumulators, which might be was regulated by exogenous Cd, and that the expression of HMA in
due to the high secretion of various LMWOAs (Fig. 6a), which could plants planted in deeper soil might be lower than that of plants growing
chelate heavy metals in soil and reduce their toxicity to plants; thereby, in shallower soil, owing to the lower heavy metal concentrations in
promoting plant growth (Yang et al., 2019). In addition, some gene deeper soil layers. However, the direct relationship between plant
expression occurred in Brassicaceae, which was beneficial to plant organic acid secretion and pot depth has not been reported; hence,
growth. For example, the HMA4 gene in Brassica juncea was involved in further confirmation is required. For example, experiments and genetic
Cd2+ binding in the cytosol under low heavy metal concentrations analysis could be conducted to detailly investigate the mechanism
(Wang et al., 2019b). (Li et al., 2018b). reported that Nramp1;4, behind pot depth affecting gene expressions.
Nramp3;5, Nramp3;10, Nramp4;4, HMA3, and HMA4 were vital for the Here, the XGBoost and ANN models were applied to model the
complicated process of hyperaccumulator remediation for heavy metal-

Fig. 6. Feature importance analysis with respect to (a) LMWOA and (b) genes of plants based on the explanation of ANN model using SHAP values. Note: Soil mass
(kg): Fresh weight of soil used in each treatment; Soil CEC (cmol/kg): Cation exchange capacity; T difference (℃): Temperature difference; Total HM conc (μg/g):
Total heavy metal concentration in soil before planting; HM_X: Electronegativity of heavy metals; HM_r (nm): Ion radius of heavy metals; Soil OC (g/kg): Soil organic
carbon; Plant family: Amaranthaceae, Asteraceae, Brassicaceae, Crassulaceae, Fabaceae, and Solanaceae.

8
L. Shi et al. Journal of Hazardous Materials 441 (2023) 129904

containing soils. The developed ML models predicted the heavy metal Republic of Korea. This work was also supported by the International
concentration in shoots, yield, BCFs, MER, and RT of different heavy Postdoctoral Exchange Program Fellowship (PC2020041). This work
metals, as well as identified the acid generation and gene expression for was also supported by the National Research Foundation of Korea (NRF)
a deep interpretation of soil–hyperaccumulator ecosystems. It was used grant funded by the Korea government (MSIT) (No.
to quantify the importance of variables and identify potential control 2021R1A2C2011734). This research was supported by Basic Science
factors affecting phytoremediation efficiency in soil-hyperaccumulator Research Program through the National Research Foundation of Korea
systems, as well as to provide the suitable hyperaccumulators for spe­ (NRF) funded by the Ministry of Education (NRF-
cific heavy metal-contained soil to accelerate the remediation process. 2021R1A6A1A10045235).
However, some limitations of this study could be improved in the future.
First of all, extreme gradient boosting (XGBoost) should be compared Appendix A. Supporting information
with other ML algorithms to evaluate their suitability for phytor­
emediation models. Secondly, there is a need to increase the amount of Supplementary data associated with this article can be found in the
data to improve the model’s accuracy. For example, factors that might online version at doi:10.1016/j.jhazmat.2022.129904.
affect the phytoremediation efficiency were not comprehensively
considered because of the lack of data, such as geographical location of References
soil, climate factors, soil texture, etc. Moreover, the model-guided
phytoremediation experimental work could be another direction for Anon, 2022b; 〈https://github.com/SauceCat/PDPbox〉.
Anon, 2022a; 〈https://scikit-learn.org/stable/modules/model_evaluation.html#accurac
the continuation of the present work based on the suitable plant, and y-score〉.
also experimental and soil conditions. Bertin, V., Allemon, J., Sajet, P., Dieu, S., Papin, A., Collet, S., Gaucher, R., Chalot, M.,
Michiels, B., Raventos, C., 2017. Torrefaction and pyrolysis of metal-enriched
poplars from phytotechnologies: effect of temperature and biomass chlorine content
4. Conclusions on metal distribution in end-products and valorization options. Biomass-.-.
Bioenergy 96, 1–11.
In summary, the ‘plant family’ was the most important feature for Chen, T.; Guestrin, C., Xgboost: A scalable tree boosting system. In Proceedings of the
22nd acm sigkdd international conference on knowledge discovery and data mining
phytoremediation, followed by experimental conditions, soil properties, 2016, pp. 785–794.
and heavy metal properties. In addition, the plant family dominated the Cipullo, S., Snapir, B., Prpich, G., Campo, P., Coulon, F., 2019. Prediction of
BCF, MER, and RT. The Crassulaceae family had the highest potential of bioavailability and toxicity of complex chemical mixtures through machine learning
models. Chemosphere 215, 388–395.
hyperaccumulators for phytoremediation, which was related to the
Duquène, L., Vandenhove, H., Tack, F., Meers, E., Baeten, J., Wannijn, J., 2009.
expression of HMA, MTL, and NRAMP genes. The metal ion radius was Enhanced phytoextraction of uranium and selected heavy metals by Indian mustard
the most important factor affecting heavy metal concentration in shoots. and ryegrass using biodegradable soil amendments. Sci. Total Environ. 407,
In addition to the comprehensive interpretation of phytoremediation, 1496–1505.
Fan, C.H., Bo, D.U., Zhang, Y.C., Gao, Y.L., Chang, M., 2016. Determination of lead and
the developed ML model can adapt to other phytoremediation systems cadmium in Calendula officinalis seedlings for phytoremediation of multi-
to evaluate the soil remediation performance by predicting final heavy contaminated loess by using flame atomic absorption spectrometry with wet
metal distributions in plants and the plant growth. Moreover, the ML digestion. Spectrosc. Spectr. Anal. 36, 2625–2628.
Gu, S.W., Lan, C.Q., 2021. Biosorption of heavy metal ions by green alga neochloris
model can be utilized to design new phytoremediation experiments and oleoabundans: effects of metal ion properties and cell wall structure. J. Hazard.
guide the field phytoremediation for a specific and heavy metal- Mater. 418, 126336.
contaminated soil. Hanandeh, I.E., Mahdi, Z., Imtiaz, M.S., 2021. Modelling of the adsorption of Pb, Cu and
Ni ions from single and multi-component aqueous solutions by date seed derived
biochar: comparison of six machine learning approaches. Environ. Res. 192, 110338.
CRediT authorship contribution statement Hou, D., O’Connor, D., Igalavithana, A.D., Alessi, D.S., Luo, J., Tsang, D.C.W., Sparks, D.
L., Yamauchi, Y., Rinklebe, J., Ok, Y.S., 2020. Metal contamination and
bioremediation of agricultural soils for food safety and sustainability. Nat. Rev. Earth
L.S. and J.L. contributed equally to this work. L.S.: data collection, Environ. 1 (7), 366–381.
writing (review and editing) and visualization; J.L.: modeling, writing Hu, B.F., Xue, J., Zhou, Y., Shao, S., Fu, Z.Y., Li, Y., Chen, S.C., Qi, L., Shi, Z., 2020.
(review and editing), and visualization; K.N.P.: review and editing; X.N. Modelling bioaccumulation of heavy metals in soil-crop ecosystems and identifying
its controlling factors using machine learning. Environ. Pollut. 262, 114308.
W.: conceptualization, writing (review and editing), and supervision;
Hu, X.T., Li, T., Xu, W.H., Chai, Y.R., 2021. Distribution of cadmium in subcellular
and Y.S.O.: conceptualization, writing (review and editing), and fraction and expression difference of its transport genes among three cultivars of
supervision. pepper. Ecotoxicol. Environ. Saf. 216 (15), 112182.
Jin, Y.L., Wang, L.W., Song, Y.N., Zhu, J., Qin, M.H., Wu, L.H., Hu, P.J., Li, F.B., Fang, L.
P., Chen, C., Hou, D.Y., 2021. Integrated life cycle assessment for sustainable
Statement of environmental implication remediation of contaminated agricultural soil in China. Environ. Sci. Technol. 55
(17), 12032.
Machine learning can guide the phytoremediation of heavy metal- Li, J., Pan, L., Suvarna, M., Tong, Y.W., Wang, T., 2020. Fuel properties of hydrochar and
pyrochar: prediction and exploration with machine learning. Appl. Energy 269,
contaminated soils to improve remediation efficiency. 115166.
Li, J., Pan, L., Suvarna, M., Wang, X., 2021a. Machine learning aided supercritical water
Declaration of Competing Interest gasification for H2-rich syngas production with process optimization and catalyst
screening. Chem. Eng. J. 426, 131285.
Li, J., Zhang, W.J., Liu, T.G., Yang, L.H., Li, H.L., Peng, H.Y., Jiang, S.J., Wang, X.N.,
The authors declare that they have no known competing financial 2021b. Machine learning aided bio-oil production with high energy recovery and
interests or personal relationships that could have appeared to influence low nitrogen content from hydrothermal liquefaction of biomass with experiment
verification. Chem. Eng. J. 425, 130649.
the work reported in this paper. Li, J., Zhu, X.Z., Li, Y.N., Tong, Y.W., Wang, X.N., 2021c. Multi-Task prediction and
optimization of hydrochar properties from high-moisture municipal solid waste:
Data availability application of machine learning on waste-to-resource. J. Clean. Prod. 278, 123928.
Li, J.T., Gurajala, H.K., Wu, L.H., Ent, A.V.D., Qiu, R.L., Baker, A.J.M., Tang, Y.T.,
Yang, X.E., Shu, W.S., 2018a. Hyperaccumulator plants from China: a synthesis of
Data will be made available on request. the current state of knowledge. Environ. Sci. Technol. 52, 11980–11994.
Li, N.N., Li, S.T., Wang, S.F., Xie, D.T., Luo, F., 2018b. How exogenous cadmium affects
micronutrients accumulation and the related gene expression regulation in Brassica
Acknowledgements
juncea. Int. J. Agric. Biol. 20, 2074–2082.
Li, X.Y., Geng, T., Shen, W.J., Zhang, J.R., Zhou, Y.Z., 2021a. Quantifying the influencing
This work was carried out with the support of the Cooperative factors and multi-factor interactions affecting cadmium accumulation in limestone-
Research Program for Agriculture Science and Technology Development derived agricultural soil using random forest (RF) approach. Ecotoxicol. Environ.
Saf. 209, 111773.
(Project No. PJ01475801) from Rural Development Administration, the

9
L. Shi et al. Journal of Hazardous Materials 441 (2023) 129904

Liang, H.M., Lin, T.H., Chiou, J.M., Yeh, K.C., 2009. Model evaluation of the Sheoran, V., Sheoran, A.S., Poonia, P., 2016. Factors affecting phytoextraction: a review.
phytoextraction potential of heavy metal hyperaccumulators and non- Pedosphere 26 (2), 148–166.
hyperaccumulators. Environ. Pollut. 157 (6), 1945–1952. Sigmund, G., Gharasoo, M., Hüffer, T., Hofmann, T., 2020. Deep learning neural network
Liu, H., Zhao, H.X., Wu, L.H., Liu, A.N., Zhao, F, J., Xu, W.Z., 2017a. Heavy metal ATPase approach for predicting the sorption of ionizable and polar organic pollutants to a
3 (HMA3) confers cadmium hypertolerance on the cadmium/zinc hyperaccumulator wide range of carbonaceous materials. Environ. Sci. Technol. 54 (7), 4583.
Sedum plumbizincicola. N. Phytol. 215, 687–698. Tőzsér, D., Harangi, S., Baranyai, E., Lakatos, G., Fülöp, Z., Tóthmérész, B., Simon, E.,
Liu, H., Zhao, H.X., Wu, L.H., Liu, A.N., Zhao, F.J., 2017b. Heavy metal ATPase 3 2017. Phytoextraction with Salix viminalis in a moderately to strongly contaminated
(HMA3) confers cadmium hypertolerance on the cadmium/zinc hyperaccumulator area. Environ. Sci. Pollut. Res. 25, 3275–3290.
Sedum plumbizincicola. N. Phytol. 215, 687–698. Uraguchi, S., Fujiwara, T., 2012. Cadmium transport and tolerance in rice: perspectives
Manta, D.S., Angelone, M., Bellanca, A., Neri, R., Sprovieri, M., 2002. Heavy metals in for reducing grain cadmium accumulation. Rice 5, 5.
urban soils: a case study from the city of Palermo (Sicily), Italy. Sci. Total Environ. Venzhik, Y.V., Talanova, V.V., Titov, A.F., Kholoptseva, E.S., 2015. Similarities and
300, 229–243. differences in wheat plant responses to low temperature and cadmium. Plant
Meier, S., Alvear, M., Borie, F., Aguilera, P., Ginocchio, R., Cornejo, P., 2012. Influence of Physiol. 42, 508–514.
copper on root exudate patterns in some metallophytes and agricultural plants. Verbruggen, N., Hermans, C., Schat, H., 2009. Molecular mechanisms of metal
Ecotoxicol. Environ. Saf. 75, 8–15. hyperaccumulation in plants. N. Phytol. 181 (4), 759–776.
Montoya-Mayor, M., Fernandez-Espinosa, A.J., Seijo-Delgado, I., Ternero-Rodríguez, M., Wang, J.W., Liang, S., Xiang, W.W., Dai, H.P., Duan, Y.Z., Kang, F.R., Chai, T.Y., 2019b.
2013. Determination of soluble ultra-trace metals and metalloids in rainwater and A repeat region from the Brassica juncea HMA4 gene BjHMA4R is specifically
atmospheric deposition fluxes: a 2-year survey and assessment. Chemosphere 92, involved in Cd2+ binding in the cytosol under low heavy metal concentrations. BMC
882–891. Plant Biol. 19, 89.
Niemeyer, J.C., Lolata, G.B., Carvalho, G.M.D., Da Silva, E.M., Sousa, J.P., Nogueira, M. Wang, L.W., Hou, D.Y., Shen, Z.T., Zhu, J., Jia, X.Y., Ok, Y.S., Tack, F.M.G., Rinklebe, J.,
A., 2012. Microbial indicators of soil health as tools for ecological risk assessment of 2019a. Field trials of phytomining and phytoremediation: A critical review of
a metal contaminated site in Brazil. Appl. Soil Ecol. 59, 96–105. influencing factors and effects of additives. Crit. Rev. Environ. Sci. Technol. 50 (24),
Palansooriya, K.N., Li, J., Dissanayake, P.D., Suvarna, M., Li, L.Y., Yuan, X.Z., Sarkar, B., 2724–2774.
Tsang, D.C.W., Rinklebe, J., Wang, X.N., Ok, Y.S., 2022. Prediction of soil heavy Wang, L.W., Rinklebe, J., Tack, F.M.G., Hou, D.Y., 2021. A review of green remediation
metal immobilization by biochar using machine learning. Environ. Sci. Technol. 56, strategies for heavy metal contaminated soil. Soil Use Manag. 37 (4), 936–963.
4187–4198. Wang, X.L., Souza, M.F.D., Li, H.C., Qiu, J., Ok, Y.S., Meers, E., 2022. Biodegradation and
Pant, J., Pant, R.P., Singh, M.K., Singh, D.P., Pant, H., 2021. Analysis of agricultural crop effects of EDDS and NTA on Zn in soil solutions during phytoextraction by alfalfa in
yield prediction using statistical techniques of machine learning. Mater. Today.: soils with three Zn levels. Chemosphere 292, 133519.
Proc. 3, 34. Wood, J.L., Tang, C., Franks, A.E., 2016. Microbial associated plant growth and heavy
Peng, J.S., Ding, G., Meng, S., Yi, H.Y., Gong, J.M., 2017. Enhanced metal tolerance metal accumulation to improve phytoextraction of contaminated soils. Soil Biol.
correlates with heterotypic variation in SpMTL, a metallothionein-like protein from Biochem. 103, 131–137.
the hyperaccumulator Sedum plumbizincicola. Plant Cell Environ. 40, 1368–1378. Wu, X., Su, N., Yue, X., Fang, B., Zou, J., Chen, Y., Shen, Z.G., Cui, J., 2021. IRT1 and
Qu, Y., Feng, B.L., 2020. Straw mulching improved yield of field buckwheat (Fagopyrum) ZIP2 were involved in exogenous hydrogen-rich water-reduced cadmium
by increasing water-temperature use and soil carbon in rain-fed farmland. Acta Ecol. accumulation in Brassica chinensis and Arabidopsis thaliana. J. Hazard. Mater. 407,
Sin. https://doi.org/10.1016/j.chnaes.2020.11.008. 124599.
Ridgeway, G., Generalized Boosted Models: A guide to the gbm package. 2020. Yang, P., Chen, H.J., Fan, H.Y., Li, Q.S., Gao, Q., Wang, D.S., Wang, L.L., Zhou, C.,
Robinson, B., Fernandez, J.E., Madejon, P., Maranon, T., Murillo, J.M., Green, S., Zeng, E.Y., 2019. Phosphorus supply alters the root metabolism of Chinese flowering
Clothier, B., 2002. Phytoextraction: an assessment of biogeochemical and economic cabbage (Brassica campestris L. ssp. chinensis var. utilis Tsenet Lee) and the
viability. Plant Soil 249, 117–125. mobilization of Cd bound to lepidocrocite in soil. Environ. Exp. Bot. 167, 103827.
Rosa, G.J.M.; Blackwell., The Elements of Statistical Learning: Data Mining, Inference, Ye, P., Wang, M.H., Zhang, T., Liu, X.Y., Jiang, H., Sun, Y.P., Cheng, X.Y., Yan, Q., 2020.
and Prediction. 2022. Enhanced cadmium accumulation and tolerance in transgenic hairy roots of solanum
Shen, X., Dai, M., Yang, J.W., Sun, L., Tan, X., Peng, C.S., Ali, I., Naz, I., 2022. A critical nigrum L. expressing iron-regulated transporter gene. IRT1. Life 10, 324.
review on the phytoremediation of heavy metals from environment: performance Yuan, X.Z., Suvarna, M.N., Low, S., Dissanayake, D., Ok, Y.S., 2021. Applied machine
and challenges. Chemosphere 291, 132979. learning for prediction of CO2 adsorption on biomass waste-derived porous carbons.
Environ. Sci. Technol. 55, 11925.

10

You might also like