You are on page 1of 15

Journal of Hydrology 625 (2023) 129985

Contents lists available at ScienceDirect

Journal of Hydrology
journal homepage: www.elsevier.com/locate/jhydrol

Research papers

Modeling potential arsenic enrichment and distribution using stacking


ensemble learning in the lower Yellow River Plain, China
Wengeng Cao a, 1, *, Yu Fu b, 1, Yanpei Cheng a, *, Wenhua Zhai b, Xiaoyue Sun a, b, Yu Ren a,
Deng Pan c
a
The Institute of Hydrogeology and Environmental Geology, Chinese Academy of Geological Science (CAGS), Shijiazhuang 050061, China
b
North China University of Water Resources and Electric Power, Zhengzhou 450011, China
c
Institute of Natural Resource Monitoring of Henan Province, Zhengzhou 450016, China

A R T I C L E I N F O A B S T R A C T

Keywords: In the high arsenic area of the northern Henan Plain in the lower Yellow River, the interaction between surface
Groundwater water and groundwater is frequent due to the control of groundwater over exploitation and the ecological
Arsenic contamination replenishment of the rivers. It is not clear that what leads to the cause and dynamic change in the mechanism of
Stacking ensemble learning model
arsenic pollution in shallow groundwater. Using machine learning algorithms to model the risk of high arsenic
Risk distribution
Lower Yellow River
occurrence in the groundwater can help analyze mechanism of arsenic in groundwater. In this study, a stacking
ensemble learning model was constructed to predict the risk distribution of shallow high arsenic groundwater in
the lower reaches of the Yellow River in northern Henan Province using multi-variate parameters. Furthermore,
we analyzed the potential areas at risk of high arsenic enrichment under different groundwater levels by control.
The results show that the groundwater arsenic exceedance rate was 16.76%, and the high arsenic groundwater
distributed from northeast to southwest, while that in the middle and south mainly distributed in the alluvial pre-
fan depressions and the Yellow River crevasse splay. Compared with the single model, the stacking ensemble
learning model has the Area Under the Curve (0.87), accuracy (0.82), and specificity (0.88) and sensitivity
(0.77). With the best overall performance, the predicted risk distribution of groundwater arsenic is highly
consistent with the observed results, and the potential groundwater area with high arsenic risk accounts for
19.67% of the total area. The impact of the Yellow River burst, average annual temperature, annual precipita­
tion, ground elevation, and hydraulic gradient as the most significant indicator factors are affecting groundwater
arsenic enrichment in the study area. Among these factors, the sedimentary environment accounts for 27.10% in
the process of arsenic enrichment. Groundwater also plays an important role, with a relative share of 13.69%. In
the simulation of groundwater rebound process, it was found that when the groundwater level raised in the range
of 1 m − 3 m, the high arsenic area could be reduced by up to 19.52%. The methods and findings in this work can
support the management and control of arsenic enrichment and distribution in shallow groundwater systems
around the world.

1. Introduction including Bangladesh, India, Myanmar, Vietnam, Argentina, the United


States, and China, are affected by the presence of high arsenic in
Arsenic is a toxic carcinogen that can cause serious health problems groundwater (Cao et al., 2018; Kumar et al., 2016; Pi et al., 2015; Wen
including skin cancer and lung cancer (Oremland and Stolz, 2003). et al., 2013). The highest arsenic concentration found in drinking water
According to the standards set by the World Health Organization, the in Bangladesh and India exceeds 2 mg/L (Janardhana, 2022), while that
arsenic content in drinking water should not exceed 10 μg/L. The rise in in the Datong and Hetao basins of China exceeds 1 mg/L (Guo et al.,
the concentration of arsenic in groundwater has become a serious global 2013). Tarim Basin, the Ejin Jinan Basin, the Heihe Basin, the Qaidam
environmental issue (Ravenscroft, 2007). Many major countries, Basin, the Northeast Plain, and the North China Plain (Rodríguez-Lado

* Corresponding authors.
E-mail addresses: caowengeng@mail.cgs.gov.cn (W. Cao), chengyanpei@mail.cgs.gov.cn (Y. Cheng).
1
The authors contributed equally to the manuscript.

https://doi.org/10.1016/j.jhydrol.2023.129985

Available online 22 July 2023


0022-1694/© 2023 Elsevier B.V. All rights reserved.
W. Cao et al. Journal of Hydrology 625 (2023) 129985

et al., 2013) are areas with notably high arsenic levels in China. The In China, groundwater mining activities are frequent in the North
lower reaches of the Yellow River in northern Henan Province are Henan Plain. Over recent years, the government has limited over the
located in the southern part of the North China Plain, a region where the excessive exploitation of groundwater and carried out ecological
overall quality of shallow groundwater is notoriously poor and with high replenishment of several rivers. The exploitation and replenishment
arsenic levels. For example, in Cao Gang Township, Fengqiu County, result in frequent interaction between surface water and groundwater.
local people are exposed to high level of arsenic and many suffer from However, the dynamic characteristics of arsenic risk under the control of
symptoms including skin pigmentation or pigment loss. climatic conditions, hyporheic zone conditions, aquifer hydrogeological
In regions with high levels of naturally occurring arsenic, identifying conditions and groundwater environmental conditions in the region are
low arsenic water sources and mitigating adverse impacts through not clear to us, and the controlling factors for the occurrence and dis­
pollution control can help ensure the safety of drinking water. Research tribution of arsenic under groundwater mining condition need further
in the field has shown that even in areas that were originally low in study. To this end, in this work we look at the regulation of risks asso­
arsenic concentration, the level can increase after a period of abstraction ciated with the occurrence and distribution of arsenic under ground­
(Erban et al., 2013; Smith et al., 2018). Meanwhile, the high concen­ water pressure extraction conditions to provide safe drinking water for
tration of arsenic in groundwater and associated regulative technology residents in North Henan Plain area. Research on the modelling risk and
have received wide attention (Song et al., 2006; Rahman et al., 2014). occurrence of high arsenic concetration in groundwater is mainly based
The current research management of high arsenic in groundwater on machine learning algorithms, which have strong nonlinear process­
mostly starts from the specific hydrochemical characteristics of the ing capabilities and can help learn the complex relationships among the
contaminant, and involved efficient and cost-effective management controlling factors for high arsenic in groundwater. Researchers have
technologies. developed statistical models to study the occurrence of arsenic in
The enrichment of high arsenic concentration in groundwater is groundwater based on different algorithms. For example, Zhang et al.
affected by a combination of the provenance and hydrogeochemical (2012) used a logistic regression model to analyze the correlation be­
conditions. Natural high arsenic groundwater is dominated by geologic tween arsenic levels in groundwater and environmental variables in
genesis (Fendorf et al., 2010). The natural hydrogeochemistry leading to Shanxi (Datong-Xinzhou-Taiyuan-Yuncheng basin). Also, Erickson et al.
the release of arsenic from aquifers into groundwater is dominated by (2021) used an enhanced regression tree model to predict the spatial
reductive dissolution (Fendorf et al., 2010; Saunders et al., 2008), distribution of arsenic in groundwater in the northern region of the
desorption (Manning et al., 1998; Masue et al., 2007), oxidation of United States. Podgorski et al. (2020) used a random forest model to
arsenic-bearing sulphide iron ore (Das et al., 1996; Welch et al., 2000), predict the spatial distribution of high arsenic concentration in
and geothermal processes (Bundschuh and Maity, 2015; Sharifi et al., groundwater in India, and discovered that about 180 thousand to 30
2016). Although in most cases high arsenic groundwater comes natu­ million people in India live in areas with arsenic levels above 10 µg/L.
rally, the impact of anthropogenic exploitation activities on arsenic Liang et al (2021) used the back propagation neural network method to
enrichment in groundwater cannot be ignored. The slow groundwater predict arsenic concentrations in groundwater in the Lanyang Plain,
flow rate makes it vulnerable to mining activities (Michael and Voss, Taiwan, China, and found that the back-propagation neural network
2008). For example, by modeling groundwater flow in the Bengal Basin method had higher accuracy in prediction comparing with the Ordinary
and the Red River Delta in Vietnam, some scholars found that the in­ Kriging method. Most of these commonly used algorithms apply a single
crease in hydraulic gradient from deep groundwater exploitation caused machine learning model to predict the risk distribution of high arsenic
the infiltration of shallow high arsenic groundwater, which in turn gave groundwater, which, however, can lead to some drawbacks arising from
rise in arsenic contamination in deep groundwater (Norrman et al., inherent and systematic limitations. For example, linear classifiers
2008; Erban et al., 2014; Knappett et al., 2016). Likewise, shallow generate poor model performance when dealing with nonlinear data.
groundwater pumping can lead to upward recharge of semi-compressed Neural network methods, in contrast, tend to fall into local minima and
high arsenic water. Through field experiments in the Bengal Basin, have a slow learning convergence rate. The tree-structured algorithm
Neidhardt et al. showed that shallow groundwater pumping induced an relies heavily on data, and the data set will appear to be overfitted if low
overall upward migration of low saline high arsenic water at the inter­ arsenic interference signal is present in the high arsenic feature data.
face with such water, which in turn led to an increase in arsenic con­ Therefore, it is difficult for the traditional machine learning model to
centration in shallow groundwater (Neidhardt et al., 2013). According accurately portray the arsenic risk distribution in groundwater.
to groundbreaking view on the effect of groundwater pumping on the To determine the potential high arsenic risk in groundwater in the
distribution of high arsenic groundwater presented by Harvey et al. lower Yellow River north area, this study used the stacking ensemble
(2002), the massive human extraction of groundwater altered the nat­ learning algorithm based on data sampled from 1,081 sites of shallow
ural flow field of groundwater, causing surface water bodies such as groundwater in the study area. We modeled the risk and distribution of
water used in irrigation to infiltrate and recharge the groundwater. high arsenic groundwater in the lower Yellow River north area based on
Irrigation has greatly changed the location, timing and chemical content the effects of human activities, climate change, depositional environ­
of water recharged to the aquifer, flushed water through the system ment, soil physicochemical characteristics, and hydrogeology on the
more quickly, and circulated large fluxes of water through rice fields arsenic level in groundwater. Through this overall, research, we iden­
during the dry season, which could mobilize arsenic from oxides in near- tified the areas with high-risk of arsenic and analyzed, the impact of
surface sediments, as arsenic and the products of organic carbon main controlling variables on the spatial distribution of arsenic. In
oxidation are strongly correlated (Harvey et al., 2006). Groundwater addition, the risk regulation on the high arsenic enrichment during
mining also leads to the introduction of surface oxidants such as O2, groundwater extraction was simulated to provide a basis for safe and
NO–3, SO2-4 , etc., thus affecting arsenic enrichment (Senn and Hemond, effective use and management of groundwater resources in the area.
2002; Kim et al., 2009; Liu et al., 2014; Huang et al., 2018). Surface
water, which is rich in NO–3, infiltrates downward and oxidizes Fe2+ and 2. Overview of the study area
As(V) in groundwater, forming aqueous iron oxide (HFO) and arsenic
complex II, and resulting in a decrease in groundwater arsenic concen­ 2.1. Hydro-meteorology
tration(Senn and Hemond, 2002). In the Jianghan Plain, the recharge of
O2-rich surface water during the irrigation season oxidizes Fe2 +and As The study area belongs to the temperate continental monsoon
(III) in the groundwater, while, in the non-irrigated season, the aquifer climate zone, with a distinct four season pattern, and an annual average
returns to an anaerobic environment and the arsenic concentration in­ temperature between 13.3 ~ 15.6 ℃. The annual precipitation of the
creases (Schaefer et al., 2016; Schaefer et al., 2017; Huang et al., 2018). location is 496.7 ~ 751.3 mm, and most of the rainfall concentrated

2
W. Cao et al. Journal of Hydrology 625 (2023) 129985

Fig. 1. Location of the study area and distribution of sampling points.

from July to September. The annual evaporation capacity is about 988.0 mainly loose medium-fine sand, and the lower boundary is relatively
~ 1,023.9 mm, with the highest evaporation rates in May and June. The stable regional clay and pulverized clay. Groundwater in this area re­
study area features a dense river network spreading from southwest to ceives recharge from atmospheric precipitation, yellow irrigation water,
northeast. With a radial pattern of diversions, two major water systems seepage from rivers and canals, lateral runoff, etc., and then is dis­
are formed respectively, namely the Haihe River Basin and the Yellow charged by evaporation, artificial mining and runoff to the downstream.
River Basin. The main tributaries of the Haihe River Basin include Weihe
River, Anyang River, Communist Canal, Qi River, etc.. They covered a 3. Materials and methods
watershed area of 1.53 × 104 km2, which accounted for about 9.20% of
the total area of the province. The tributaries of the Yellow River Basin 3.1. Sample collection and processing
include Yiluo River, Qin River, Natural Wenyan Canal, Jindi River, etc.,
with a watershed area of 3.62 × 104 km2, accounting for about 21.70% Detailed hydrogeological and environmental investigations focus on
of the total area of the province. arsenic groundwater problems in the year of 2010, 2019, and 2020 in
the North Henan Plain, collecting 1,081 sets of shallow groundwater
samples (locations of the sampling sites are shown in Fig. 1). The depth
2.2. Geomorphology
of the sampling wells ranges from 5 m to100 m.
A full analysis was conducted on each sample including tests for trace
The North Henan Plain (i.e. the Henan part of the North China Plain)
elements, As valence analysis and Fe valence analysis. Before sampling,
is located in the northern part of Henan Province, at the eastern foot of
the hydrogeological formation structure of the sampled well was
the Taihang Mountains and north to the Yellow River. The corre­
investigated, the groundwater level was measured, and water was
sponding administrative jurisdiction include Jiaozuo, Xinxiang, Hebi,
pumped for 15–20 mins to wash the well. The field test indexes mainly
Anyang and Puyang cities, covering a total area of about 19,733.75 km2.
include water temperature, pH, Oxidation Reduction Potential (ORP)
The topography and geomorphology of the study area featured strong
and Dissolved Oxygen (DO). The pH, ORP and DO of the water samples
geological structures, with the Taihang Mountains, remnants of hills and
were determined in situ using the U.S. Hash Sension 2 desktop ion
mounds in the western part, and the vast plain in the eastern part,
concentration meter and U.S. Hash Quanta portable water quality meter.
forming a high terrain that depreciates from west to east. The altitude
The content of ammonia nitrogen, sulfide and ferrous iron in the water
ranges from 203 m ~ 40 m, with the highest point located in north­
samples were determined using the U.S. Hash DR2800 portable
western part of Jun County. Meanwhile, the lowland is situated in the
spectrophotometer.
eastern part of Taizian County, which borders Shandong Province. The
The sample bottles were rinsed with the well water 3 or 4 times.
slope drop is 1/500 to 1/2000, and the terrain is relatively flat. The area
Arsenic samples were collected using pre-cleaned 25 mL high-density
is mainly composed of two large geomorphic units, namely the Taihang
polyethylene brown sampling bottles, and when collecting water sam­
Mountain Front Alluvial Plain and the Yellow River Alluvial Plain
ples, 1 mL concentrated hydrochloric acid was added dropwise to acidify
(Fig. 1).
the water samples till the level of pH was below 2. Then, the sample was
labeled and stored in refrigeration at 4℃, and sent to the laboratory for
2.3. Hydrogeological conditions tests and analysis. The alkalinity in the water samples was measured
with the acid-base neutralization titration method within 24 h. K+, Na+,
Besdies the Upper Ordovician, Silurian, Devonian and Lower Ca2+, Mg2+ and other cations were measured by U.S. IRIS Interpid II XSP
Carboniferous, all other systems are distributed in the area. In partic­ inductively coupled Plasma-Atomic Emission Spectrometer (ICP-AES).
ular, the fourth series is widely distributed and exposed on the surface, Cl-, SO2-
4 and other anions were measured by U.S. Dionex ICS-1500 ion
and the lithology mainly includes river and lake deposits, clay, chalky chromatograph. HCO–3 was measured with titration method. As for the
clay, powder, sand, gravel and sand. The bottom interface in the western total arsenic concentration, it was measured with the Beijing Haiguang
tilted plain area is generally buried at a depth of 40 ~ 180 m, while that AFS-3100 atomic fluorescence spectrometer. The test was carried out by
in the plain area is generally buried at a depth of 160 ~ 400 m. the Institute of Hydrogeology and Environmental Geology, Chinese
The shallow water-bearing system in this area refers to the water- Academy of Geological Sciences. It is notable that the temperature of the
bearing medium buried at a depth of 160 m and the submerged and testing environment was 23℃ and humidity was 50%. Furthermore,
semi-pressurized water system in it, whose water-bearing medium is

3
W. Cao et al. Journal of Hydrology 625 (2023) 129985

Table 1 temperature, precipitation, evapotranspiration, geomorphology type,


Model predictor variables and descriptions. and groundwater table for the distribution of high arsenic groundwater
Class Variable Description in Bangladesh.
Our study comprehensively analyzed the environmental variables
Climate Temperature Mean annual temperature(◦ C)
Precipitation Total rainfall per year(mm) that may affect the spatial distribution of high arsenic groundwater, and
Actual evapotranspiration Annual total actual established a total of 30 initial environmental variables under six cate­
(AET) evapotranspiration (mm) gories including climate, human activities, sedimentary environment,
Human Distance to river Distance to all rivers flowing hydrogeology, and soil physicochemical characteristics. The sources and
activities through the study area(m)
Cumulative change Cumulative change in
descriptions of these environmental variables are shown in Table 1. The
groundwater level from 1959 to environmental variables in the study area were divided into spectral
2020(m) elements based on a 500 m spatial resolution. It should be noted that due
Interannual variation Change in groundwater level to the large number of environmental variables, there may be redundant
between the sampling year and
information and interference signals, and useful variables need to be
the previous year(m)
Groundwater depth Sampling well water level(m) screened out for modeling. A recursive feature elimination algorithm
Hydraulic gradient Hydraulic gradient I = ΔH/L.is (Cao et al., 2021) using random forest as an iterative classifier was used
the difference in water level to measure the importance of the impact score of environmental vari­
elevation between the two points ables in the model, remove environmental variables of low importance,
of the isohyetal line, L is the
horizontal distance between
and finally filter out the best subset of 18 environmental variables,
these two points which is shown in Fig. 2.
Sedimentary Elevation 30 m resolution digital elevation
environment model (DEM)(m) 3.3. Risk assessment model construction and validation
The impact of the Yellow Normalized calculation of the
River burst number of historical Yellow
River outbursts When applying machine learning algorithms to groundwater quality
Clay layer Layers of clayey soil modeling, various models have shown good prediction performance
Ratio of clay sand Percentage of clayey soil (Hanoon et al., 2021; Nguyen et al., 2020; Ghobadi et al., 2022; Chau­
thickness
han et al., 2019), including linear models (logistic regression, support
Quaternary landforms Including: alluvial fans and
alluvial floodplains, floodplains,
vector machines and linear discriminant analysis, etc.), Boosting
marine floodplains, lakes, (extreme gradient boosting), Bagging (random forest), and neural net­
depressions, riverine zones, works. However, the Random Forest (RF) model is better at handling
loess-like soils, bedrock high-dimensional data, outliers, noise, over-fitting and multicollinearity
Hydrogeology Water yield property L/s
problems (Cutler et al., 2007). Similarly, the eXtreme Gradient Boosting
Infiltration coefficient of
precipitation (XGBoost) model can handle high-dimensional data and is less likely to
Permeability coefficient fall into over-fitting, while improving computational efficiency through
Specific yield multi-threaded parallel computing (Desdhanty and Rustam, 2021).
Soil Physical and chemical Including: sand fraction, silt Support Vector Machine (SVM) can transform a nonlinear problem into
characteristics of shallow fraction, clay fraction, soil
and deep soils organic carbon, soil ph
a linear problem in some high-dimensional space (Hosseini and Mah­
Others Land-use types Arable land, buildings, forest jouri, 2014). Linear Discriminant Analysis (LDA) models are simple,
land, water systems require no tuning of parameters, and can handle different classes of data
Normalized difference NDVI= (NIR-R)/(NIR + R) with widely different training sample sizes (Chauhan et al., 2019). The
vegetation index (NDVI) NIR is the reflection value in the
stacking model is a better choice compared with the enumerated
near infrared band
R is the reflection value in the methods as it can combine different types of machine learning models.
red band Such advantages of the stacking ensemble learning have been demon­
Slope Slop=(Elevation difference / strated in several studies on geoengineering problems (Sun and Trevor,
horizontal distance)(◦ ) 2018; Hu et al., 2020; Taghizadeh-Mehrjardi et al., 2020).
Based on the above discussion, the ensemble model (Fig. 3) is ob­
while analyzing the groundwater samples, 5% of duplicate samples were tained by fusing XGBoost, RF, and SVM as the base learners of the
added, and the error of all duplicate samples was found to be less than stacking model and LDA as the meta-learners of the stacking model. This
5%. configuration is based on the principle that “Base-learner should be as
diverse as possible while ensuring good performance, while meta-learner
should perform well and have a simple structure” (Chatzimparmpas
3.2. Environmental variables affecting arsenic content et al., 2021), which is used to predict the spatial distribution of high
arsenic groundwater.
The transport and movement of arsenic in aquifers are influenced by The prediction of binary target variables can mitigate some errors
geomorphology, geology, hydrogeology, biogeochemistry, and human and thus improve the accuracy and validity of the model. Consequently,
activities (Chakraborty et al., 2020; Bhattacharya et al., 1997; Van Geen a threshold value of 10 μg/L was used to categorize arsenic levels.
et al., 2003; Ravenscroft et al., 2005; Shamsudduha et al., 2015; Harvey Arsenic mass concentrations at or below 10 μg/L were recorded as 0,
et al., 2002; Smith et al., 2018). Factors such as topography, geo­ while concentrations higher than 10 μg/L were recoded as 1. A modeling
morphology, sediment characteristics, soil properties, land-use types, dataset was constructed with the binary arsenic concentration data as
groundwater flow, vegetation, and other environmental variables have the dependent variable and environmental variables as the independent
been used by scholars as independent variables to predict distribution of variable. To create the dataset, random division was performed in an 8:2
arsenic in groundwater. For example, Podgorski et al. (2020) used a ratio, where the training set and test set were stratified sampled to
random forest model to predict the distribution of high arsenic maintain an equal arsenic excess rate. The model performance was
groundwater in India based on 26 variables such as evapotranspiration, accessed using metrics such as the area under the receiver operating
precipitation, soil physicochemical characteristics, land-use types, and Area Under the Curve (AUC), Accuracy, Specificity and Recall. The AUC
water table depth. Tan et al. (2020) used a boosted regression tree model typically ranges from 0.5 to 1, and the larger the AUC value, the better
based on 90 environmental factors such as ground elevation, slope, air the model performance. Accuracy represents the proportion of correctly

4
W. Cao et al. Journal of Hydrology 625 (2023) 129985

Fig. 2. The optimal subset of environment variables.

5
W. Cao et al. Journal of Hydrology 625 (2023) 129985

Fig. 3. Stacking model modeling process. X-train is the independent variable of the training set, and Y-train is the dependent variable of the training. Modeling
process: First, the training set is utilized for 5-fold cross-validation using RF, XGBoost and SVM models. Following the training, each model obtains a set of data with
the same size as the training set. The three sets of data are combined to form the characteristic data of the two-layer meta-learner. Additionally, apart from the 5-fold
cross-validation of the training set, the test set is evaluated five times. However, there is a distinction in the calculation of test set results. The results of the test set
calculations are averaged, while the results of the three models are combined to create the test set for the two-layer meta-learner. Subsequently, the feature data of
two-layer meta-learner is employed to train an LDA model, and the test set is used to access the performance of the stacking model.

predicted samples to all samples, and it ranges from 0 to 1. Recall simulation period was determined and fixed solution conditions were
measures the ability to correctly classify samples with arsenic mass processed prior to model construction. Through repeated manual ad­
concentration above 10 μg/L, while specificity measures the ability to justments of the hydrogeological parameters, boundary conditions and
correctly classify samples with arsenic mass concentration at or below water exchange volume, the calculated water level value was compared
10 μg/L. Finally, the constructed stacking model was used to predict the with actual measured values to minimize the differences. Such iterative
spatial distribution of high arsenic groundwater in the lower reaches of calculation process enabled the realization of numerical simulation. The
the Yellow River in northern Henan Province, and to map the probability hydrogeological parameters include the parameters used to calculate
distribution of high arsenic groundwater within the study area. various source and sink terms, including the infiltration coefficient of
atmospheric precipitation, irrigation infiltration coefficient, river infil­
tration coefficient and evaporation intensity, etc. Additionally, the
3.4. Groundwater level control simulation
hydrogeological parameters specific to the aquifer itself are considered,
such as the permeability coefficient of submersible aquifer, water supply
To estimate the potential risk areas of groundwater arsenic under
degree and the permeability coefficient and water release coefficient of
varying water level conditions, it is essential to incorporate groundwater
confined aquifer. Based on the stratified equilibrium analysis, the
level as a dynamic variable into the risk assessment model. Therefore, a
shallow aquifer in the study area primarily receives recharge from
numerical model of groundwater flow need to simulate the rise in the
infiltration of precipitation, while the deep aquifer receives recharge
groundwater level, closely resembling the actual flow field. In order to
through cross-flow from the shallow aquifer. Artificial extraction serves
mitigate the risk of high arsenic, the areas identified as high arsenic
as the main discharge mechanism. Both the shallow and deep aquifers
zones by the machine learning model are considered crucial for water
exhibit a negative equilibrium, leading to a continuous decline in the
level regulation. Three targets for water level rise are established: 1 m, 3
groundwater levels.
m and 5 m. Achieving these specific water level goals involves contin­
uously adjusting the amount of groundwater extraction based on the
4. Results and analysis
existing numerical model. In areas where the groundwater depth is
shallow, if a water level rise causes the groundwater level to surpass the
4.1. Statistics and distribution characteristics of arsenic mass
land surface, the water level depth data will be processed as 0 m. The
concentration in groundwater
modeling process for the numerical model of groundwater flow is as
follows:
Table 2 presents the statistical findings regarding the key compo­
A numerical model is constructed to simulate the change of
nents of groundwater chemistry in the study area, which reveals that the
groundwater levels. The modeling process is as follows. According to the
maximum, minimum and mean values of arsenic mass concentration in
aquifer structure, boundary conditions and characteristics of ground­
groundwater were below 0.1, 190 and 7.06 μg/L, respectively. The
water flow field in the study area, a 1 km × 1 km grid was used to classify
median and standard deviation were established as 1.20 and 16.52 μg/L,
the basin. Subsequently, a numerical groundwater flow model was
respectively. The coefficient of variation was determined to be 2.4,
constructed using Groundwater Modeling System software. The

Table 2
Statistical characteristics of major chemical composition of groundwater in the study area.
Samples (n = 1081) pH TDS ORP NO–3 NH+
4 Fe2+ As
(mg/L) (mg/L) (mg/L) (mg/L) (mg/L) (μg/L)

Minimum 6.10 271.30 − 281.00 <0.01 <0.016 <0.04 <0.1


Maximum 8.40 7825.00 337.00 239.40 2.82 14.80 190.00
Mean 7.40 1012.20 − 9.00 3.80 0.18 1.40 7.40
Median 7.40 797.00 –23.00 0.10 0.03 0.70 2.20
SD 0.30 782.00 122.90 13.40 0.69 1.80 17.50
CV 0.04 0.80 − 13.70 3.60 3.80 1.30 2.40

Note: SD is standard deviation; CV is coefficient of variation; CV = SD/mean.

6
W. Cao et al. Journal of Hydrology 625 (2023) 129985

limestone or marl, leading to increased content of Ca and Mg in the


cation due to the groundwater leaching. The Na-Mg type water needs to
undergo long-term water–rock interaction and is generally found in the
runoff stagnation zone of groundwater, mainly distributed in the
floodplain of the main Yellow River stream. Due to the lateral recharge
from the Yellow River, the groundwater along the Yellow River exhibits
a mix of Na-Mg, Na-Ca and Na-Ca-Mg types as the primary cations.
Regarding groundwaters with varying arsenic contents, the mean
and median values of Na+ in low arsenic groundwaters were slightly
higher compared to high arsenic groundwaters. Specifically, the mean
and median values of Na + in low arsenic groundwaters were 186 mg/L
and 134 mg/L, respectively, whereas in high arsenic groundwaters they
are 174 mg/L and 112 mg/L. However, the contents of other major
conventional major anions did not show significantly variations with
respect to arsenic levels. Meanwhile, the mean and median of TDS and
NO–3 in high arsenic groundwater were lower than those in low arsenic
groundwater (Table 3). Notably, the difference between the maximum
and minimum of ion concentrations in high arsenic groundwater is
smaller than that in low arsenic groundwater, and the extreme value
distribution intervals of SO2-
4 and NO3 were generally narrower than

those in low arsenic groundwater overall (Fig. 5). The relatively higher
content of Na+ and TDS in low-arsenic groundwater suggests compar­
atively stronger evaporative concentration effect. In terms of the water
chemistry types, high arsenic water (As>10 μg/L) in the entire region
Fig. 4. Piper diagrams of shallow groundwater. exhibits mainly Na-Mg-HCO3 and Ca-Mg-Na-HCO3-Cl types. As evi­
denced from the distribution of sampling sites (Fig. 1), the distribution
indicating a substantial spatial variability in the groundwater quality of high arsenic groundwater appears to follow a northeast-southwest
concentration in the study area. trend, with high arsenic groundwater in the central and southern part
Out of the 1,081 sampling points in the study area where high arsenic of the study area mainly distributed in the former depressions of the
groundwater was detected, the exceedance rate was 16.76%. The pH of alluvial floodplain fan and in the eastern part of the Yellow River
the groundwater ranged from 6.1 to 8.4, with a mean value of 7.4, declination fan area centered around Puyang, and high arsenic
indicating a neutral to weakly alkaline environment. Total Dissolved groundwater is mostly unevenly distributed.
Solids (TDS) concentrations varied significantly, ranging from 271.3 to
7825 mg/L. Groundwater Eh values spanned from − 281 to 337 mV,
4.2. Groundwater arsenic risk distribution
with a mean value of − 9 mV, indicating a reducing environment in the
study area. The dominant cations in the groundwater were primarily
Table 4 presents the evaluation outcomes of the XGBoost, RF, SVM,
Ca2+ and Na+, with average concentrations arranged in descending
and stacking models on the test set. Notably, the stacking model ach­
order as Ca2+>Na+>Mg2+>K+. Notably, Na+ exhibited significant
ieved the highest value for AUC, Accuracy, Specificity, and Recall
spatial variability, indicated by its large coefficient of variation (1.1);
values. In terms of the model evaluation metrics, the stacking model
HCO–3 and SO2- 4 were the prominent anions, with average concentrations
exhibited the highest prediction accuracy, with model Recall and
ordered as HCO–3>SO2- -
4 >Cl >NO3. The coefficients of variation for

2- - Specificity of 0.88 and 0.75, respectively. These results indicate that the
NO3, SO4 , and Cl were larger at 3.6, 1.4, and 1.4, respectively, sug­

stacking model can accurately predict areas within the study area where
gesting greater spatial variability for these ions. The Piper trilinear map
arsenic mass concentrations are at or below 10 μg/L and above 10 μg/L.
(Fig. 4), illustrate that the diverse hydrochemical characteristics of
Fig. 6 displays the calculation results of XGBoost, RF, SVM, and
groundwater in the study area. Na+ was the dominant cation, followed
stacking models for the probability distribution of arsenic mass con­
by Ca2+ and Mg2+, while neither Ca2+ nor Mg2+ alone constituted the
centration exceeding 10 μg/L in the study area. The overall trends of the
main ions in groundwater in the area. HCO–3 was the main anion, fol­
spatial distribution of high arsenic groundwater, as estimated by these
lowed by SO2- -
4 and Cl , both of which have no classification advantage.
models, were generally similar. The arsenic contamination in ground­
Ca-Mg-Na type water was mainly distributed in the Taihang Piedmont,
water in the study area was prominently concentrated in the central,
modern Yellow River crevasse splay and around the Yellow River
southern, and eastern parts of the study area. However, notable differ­
diversion canal. In these regions, the aquifers predominantly consist of
ences in the local areas were observed in localized regions among

Table 3
Content of major ions in groundwater with different arsenic content.
Indicators As ≤ 0.01 mg/L As>0.01 mg/L
Max Min Mean Median Max Min Mean Median

K+ 6.68E + 01 2.50E-01 3.29E + 00 2.09E + 00 2.76E + 02 3.60E-01 7.12E + 00 2.55E + 00


Na+ 1.20E + 03 8.56E + 00 1.86E þ 02 1.34E þ 02 7.43E + 02 1.96E + 01 1.74E + 02 1.12E + 02
Ca2+ 5.15E + 02 5.55E + 00 8.31E + 01 7.23E + 01 4.69E + 02 2.28E + 00 7.98E + 01 7.25E + 01
Mg2+ 3.57E + 02 8.87E + 00 7.44E + 01 6.61E + 01 3.02E + 02 3.97E + 00 7.47E + 01 6.44E + 01
HCO–3 1.21E + 03 1.25E + 02 5.82E + 02 5.88E + 02 1.05E + 03 2.26E + 02 5.83E + 02 5.94E + 02
SO2-4 1.95E + 03 2.30E + 00 1.78E + 02 1.16E + 02 5.77E + 02 9.61E + 00 1.56E + 02 1.26E + 02
Cl- 1.33E + 03 2.32E + 00 1.47E + 02 1.02E + 02 1.62E + 03 1.74E + 01 1.34E + 02 7.82E + 01
NO–3 3.87E + 02 1.00E-02 1.38E þ 01 1.41E þ 00 9.88E + 01 1.00E-01 4.63E + 00 8.88E-01
TDS 6.20E + 03 1.76E + 02 1.08E þ 03 8.89E þ 02 3.53E + 03 3.42E + 02 9.21E + 02 7.64E + 02
pH 8.71E + 00 6.90E + 00 7.69E + 00 7.78E + 00 1.02E + 01 7.00E + 00 7.71E + 00 7.75E + 00

7
W. Cao et al. Journal of Hydrology 625 (2023) 129985

Fig. 5. Main ion boxplot in groundwater with different arsenic contents.

various models. According to the results obtained from XGBoost and RF


Table 4
models, there is a higher distribution probability of high arsenic
Performance comparison of different models.
groundwater in Fengqiu County of Xinxiang City and the northern part
Evaluation indicators XGBoost RF SVM Stacking of Hua County of Anyang City. However, this outcome does not accu­
AUC 0.83 0.869 0.71 0.87 rately show the distribution of high arsenic groundwater, and the
Accuracy 0.75 0.79 0.70 0.82 simulation results of SVM model do not provide detailed information
Specificity 0.82 0.86 0.86 0.88
regarding the changes in distribution high arsenic groundwater in
Recall 0.67 0.71 0.52 0.75
localized areas.
Nevertheless, the stacking model outperforms others by providing

Fig. 6. Probability distribution of arsenic mass concentration in groundwater exceeding 10 μg/L.

8
W. Cao et al. Journal of Hydrology 625 (2023) 129985

Fig. 7. Importance of environmental variables chart.

more detailed insights into the local changes in high arsenic areas, even with a significant silt layer. The area exhibits good sealing property,
in regions without sampling points. This capability comes from the poor runoff conditions, and is rich in organic matter and clay minerals.
stacking model’s ability to fully leverage the strengths of different ma­ Consequently, the groundwater environment has evolved into a
chine learning algorithms to mitigate errors. The results of the proba­ hypoxic-anoxic condition. In the Yellow River crevasse splay area, the
bility of high arsenic simulated by the stacking model ranged from 0.09 frequent flooding of the Yellow River provides a substantial amount of
to 0.92. Using a probability threshold of 0.5, the high arsenic area organic matter, as well as sand and clay interbedded sedimentary
encompassed 3,882.75 km2, which accounts for about 19.67% of the environment, leading to low oxygen-hypoxia conditions. Under anaer­
total study area. The high arsenic groundwater was found to be obic conditions, a series of reduction reactions will occur in the both two
concentrated in two specific regions: the pre-Taihang Mountains areas, whereby microorganisms decompose organic matter initiating the
depression and the Yellow River vent fan area. Specifically, the areas reductive dissolution of arsenic-containing iron oxides/hydroxides.
with the highest concentrations were observed in Yanjin County, Xin­ Thus a large amount of As is released into the groundwater, resulting in
xiang City, the southern part of Weihui City, the northern part of Yua­ the formation of high-arsenic groundwater.
nyang County and the northwest of Fengqiu County, the western part of Climatic factors, such as Average Annual Temperature, Annual Pre­
Hua County, Anyang City, the western and the northern part of Puyang cipitation and Actual Evapotranspiration, also exert a large impact on
City. The areas with high arsenic groundwater distribution probability the distribution of groundwater arsenic. Precipitation and temperature
greater than 0.80 are mainly located in the southern part of Yanjin play a crucial role in regulating groundwater arsenic quality concen­
County, Xinxiang City, covering an area of 432.29 km2, which accounts tration by influencing surface runoff infiltration. When precipitation
for 2.19% of the total area of the study area. These areas require increases, surface water levels such as rivers and lakes rise, leading to
enhanced monitoring and management of groundwater quality. enhanced surface water recharge to groundwater and diluted arsenic in
groundwater. Additionally, surface water contains dissolved oxygen,
which brings oxygen and other oxidants into groundwater, hindering
4.3. Analysis of the main controlling variables of as the reductive release of arsenic and subsequently reducing the concen­
tration of arsenic in groundwater. The increase in temperature and
To identify the factors influencing the spatial pattern of high arsenic evaporation causes the surface water level to fall, leading to lateral
groundwater, the impact scores were calculated for all features involved feeding of groundwater to feed the rivers and reducing the dilution of
in the modeling. The Gini impurity was calculated for each feature of surface water. Moreover, the lack of oxygen, NO–3 and other oxidants
each decision tree expressed by the regression variance. A larger Gini imported from the outside in the aquifer, coupled with the abundant
impurity indicates a smaller influence of the feature on the dependent organic matter in the sediment, leads to the continuous consumption of
variable. The relative importance of each explanatory variable is illus­ dissolved oxygen in the aquifer by microorganisms during the oxidation
trated in Fig. 7. Among the environmental variables, the impact scores of process. As a resut, the groundwater become more reducible, facilitating
the Yellow River burst, average annual temperature, annual precipita­ the reduction and dissolution of iron and manganese (hydrogen) oxides,
tion, ground elevation, and hydraulic gradient have the most significant and releasing the arsenic adsorbated on the surface to the groundwater.
influence on the prediction of high arsenic groundwater. Consequently, the arsenic concentration in the groundwater increases,
Comprehensive ranking of the impact of the environmental factors, it which is consistent with the findings of Fendorf et al (2010).
was found that the depositional environment (Ratio of clay sand, the Ground elevation and hydraulic gradient also influence the distri­
impact of the Yellow River burst, Quaternary landforms, and elevation) bution of high arsenic groundwater, primarily through groundwater
exerts the greatest influence on the groundwater arsenic enrichment flow rate. In the plain areas with low elevation and low hydraulic
with a relative impact of 27.1%. Specifically, the impact of the Yellow gradient, characterized by fine the sediment particles and slow,
River burst contributes most to the distribution of arsenic risk in groundwater flow rate the time of water–rock action are prolonged.
groundwater. Filed work indicate that the high arsenic groundwater in Similarly, the absence of O2, NO–3 and other oxidants reduces the iron
the study area is primarily distributed in the former depression of the oxides as oxidants in the sediment, leading to the release of adsorbed
Taihang Mountains and the Yellow River crevasse splay. The former arsenic and an increase in the arsenic mass concentration in the water.
depression is located at the junction of northern Xinxiang City and Hua Conversely, in the Piedmont recharge area with high altitude and high
County of Anyang City, representing a former depression in the alluvial hydraulic gradient, the sediment particles are larger, the groundwater
fan. The medium of the aquifer consists of fine sand, silt and silty clay,

9
W. Cao et al. Journal of Hydrology 625 (2023) 129985

Fig. 8. Marginal distribution of groundwater influence factors. (a) relationship between groundwater depth and high arsenic probability. (b) relationship between
cumulative groundwater change and high arsenic probability. (c) relationship between interannual groundwater variation and high arsenic probability. (d) influence
of three groundwater factors acting together on high arsenic groundwater probability.

flow rates are faster, and the groundwater recharge occurs more rapidly, Fig. 7 presents the degree of importance and ranking of the effects of
which brings O2 and other oxidants into the aquifer and is not conducive different predictor variables on the probability of arsenic risk in
to the enrichment of arsenic. groundwater in the study area in the final model. It is evident that the
groundwater factors, including Cumulative Groundwater Change,
5. Discussion Interannual Groundwater Variation, and Groundwater Depth, are the
predictor variables that ranked relatively high in their impact on model
5.1. Analysis of the relationship between high arsenic probability and predictions, with a relative importance of 13.69%. This indicates that
groundwater variables the groundwater variability contributes significantly to the accuracy of
the groundwater arsenic risk distribution simulation.
Partial Dependence Plots (PDPs) are useful tools to show the mar­ To explain the effects of groundwater variability on the probability
ginal effect of one or two features on the predicted outcome of a machine of arsenic in groundwater, the relationships between groundwater
learning model. In the final model, it can be found that sedimentary depth, cumulative groundwater change, interannual variability of
environment, climate, and human activities have the most significant groundwater and the probability of high arsenic, and the marginal ef­
impact on groundwater arsenic concentration (Fig. 7). In order to study fects of the three influencing factors of groundwater acting together on
how to regulate groundwater arsenic risk, controllable human activity the probability of high arsenic in groundwater were analyzed separately
factors are selected as the main research objects. In this study, the pre­ (Fig. 8). From the analysis, clear petterns regarding the influence of the
dicted outcome is the probability of As exceeding 10 μg/L, and the three groundwater variables on the probability of high arsenic can be
features include the predictive variables of groundwater characteristics: observed. The probability of high arsenic increases with the increase of
water table depth, interannual variation of water table and cumulative groundwater depth. The cumulative change of water level represents the
variation of water table. By comparing the PDPs of the model, the total magnitude of groundwater level decline in the past 60 years to
hydrochemical and sedimentation processes that control the spatial date, showing an overall increasing trend of the probability of high
pattern of arsenic in groundwater can be inferred. arsenic with the continuous rise of cumulative change (Fig. 8).

10
W. Cao et al. Journal of Hydrology 625 (2023) 129985

inhibits the release of arsenic from aquifers. In the plain area of north
Henan, areas with high arsenic downwash are mainly found in the
aquifer characterized by a reduced environment. The change of
groundwater level is mainly atrributed to anthropogenic mining activ­
ities. The local flow field formed by anthropogenic groundwater mining
activities can accelerate the groundwater flow velocity and disturb the
original sedimentary environment of the aquifer. Such disruption can
lead to the release of the organic carbon stored in the sediment under
natural conditions and thus promoting the allochthonous reduction of
arsenic-containing iron oxides or hydroxides (Neumann et al., 2014). In
addition, ground subsidence caused by groundwater extraction com­
pacts the clay interbedded in the aquifer, resulting in the release of its
pore water that is rich in arsenic organic matter and competing ions
(Smith et al., 2018). Furthermore, it is evident that mining disturbance
triggers the release of highly bioavailable organic matter from the sed­
iments contributing to the evolution of the reduced environment and the
enrichment of arsenic in groundwater. Groundwater mining also facili­
tates the entry of bioactive organic matter from surface waters into
Fig. 9. The control area of groundwater exploitation.
groundwater, accelerating the evolution of the groundwater reducing
environment and promoting the release of arsenic.
Fluctuation of high arsenic probability can be oberved when the cu­
mulative variation of water level is between 8 ~ 13 m and 16 ~ 20 m.
5.2. Analysis of arsenic risk under different water level
The interannual variation of water level reflects the change in ground­
water level between adjacent years, and reveals a gradual decrease in
From the foregoing analysis, it can be observed that the water level of
the probability of high arsenic as the groundwater level rises. When the
groundwater is an important factor affecting the spatial pattern of
water level remains unchanged, the high arsenic probability reaches its
arsenic in groundwater. It is also notable that the mining activities of
maximum, and when the water level decreases in the range of 0–5 m,
groundwater in northern Henan Province are one of the major factors
there is a certain magnitude of decrease in the high arsenic probability.
causing fluctuations in groundwater levels and leading to changes in
In the multi-factor marginal distribution (Fig. 9-d), the high arsenic
hydrogeological and hydrochemical conditions. Therefore, considering
probability showed an increasing trend with the increase of ground­
the comprehensive view of regional water use and taking the high
water depth and the change of cumulative groundwater, as well as
arsenic area in the north Henan plain as the key area, the regulation of
decreasing the interannual water level variability. These dynamic vari­
groundwater extraction will effectively change the environmental state
ables have different effects on the risk probability of arsenic in
of arsenic distribution in groundwater to different degrees.
groundwater, where the increased groundwater depth and cumulative
According to Fig. 8-a, the partial reliance indicates that the proba­
groundwater change provides favorable conditions for the release of
bility of high arsenic occurrence increases with the increase in water
arsenic in the aquifer. Conversely, the elevation of groundwater level
table burial depth. When the burial depth of groundwater table exceeds

Fig. 10. Change of groundwater depth (a) no change; (b) Water level raised 1 m; (c) Water level raised 3 m; (d) Water level raised 5 m.

11
W. Cao et al. Journal of Hydrology 625 (2023) 129985

Fig. 11. Interannual variation of groundwater level (a) no change; (b) Water level raised 1 m; (c) Water level raised 3 m; (d) Water level raised 5 m.

30 m, the probability of groundwater arsenic risk rises to the highest and areas within the study area are mainly located in Fengqiu, Yanjin, and
then basically remains stable without further increases. Conversely, the Yuanyang in Xinxiang and in Hua County in Anyang. Based on the
burial depth of shallow groundwater falls with in the range of 30 m, and administrative division of the high arsenic distribution area, the
with an increase in the burial depth of shallow groundwater, favorable groundwater extraction adjustment area was delineated based on the
conditions for the release of arsenic in the aquifer are promoted. Based comprehensive analysis and adjustment of the high arsenic risk distri­
on the results of groundwater arsenic risk prediction, the high arsenic bution area, the conditions of topography, stratigraphic structure, and

Fig. 12. Cumulative variation of groundwater level (a) no change; (b) Water level raised 1 m; (c) Water level raised 3 m; (d) Water level raised 5 m.

12
W. Cao et al. Journal of Hydrology 625 (2023) 129985

Fig. 13. Distribution of high arsenic groundwater under different groundwater regulation scenarios (a) no change in groundwater level. (b) 1 m rise in water level.
(c) 3 m rise in water level. (d) 5 m rise in water level.

groundwater flow field in the high arsenic risk distribution area (Fig. 9). trend, and the main change area in the high arsenic risk area was still
By simulating the effect of water level rebound under different observed in the southeast of the study area, including the plain area of
pressurized extraction amounts in the groundwater extraction adjust­ Yanjin, Yuan Yang, Fengqiu and the eastern part of Hua County. The
ment area, the distribution of high arsenic groundwater risk under potential impact area of high arsenic groundwater rose to 3617 km2,
different scenarios of groundwater level rebound was accessed. These accounting for 18.32% of the study area.Fig. 11.Fig. 12..
scenarios included no rebound (Fig. 10-a, 11-a, 12-a), 1 m rebound Based on the simulation effect of arsenic risk regulation under
(Fig. 10-b, 11-b, 12-b), 3 m rebound (Fig. 10-c, 11-c, 12-c), and 5 m different water level changes carried out in the key area under, the
rebound (Fig. 10-d, 11-d, 12-d). The groundwater arsenic risk prediction process of groundwater rebound, the area of high arsenic distribution in
model (Fig. 13) was utilized to analyze the potential impact areas of high the study area showed an overall trend of reduction. When the water
arsenic groundwater under these scenarios. In the absence of ground­ level increased by 1 m and 3 m, the area of potential high arsenic zone
water level rebound, the high arsenic risk area was mainly concentrated was gradually reduced by 9.57% and 19.52%, respectively. Similarly,
in the southwest plain area of Xinxiang in northern Henan and Puyang, while the regulating effect was gradually enhanced, the area of high
gradually decreasing from the southeast to the northwest mountainous arsenic groundwater risk reduced continuously and the risk reduced as
regions. The potential impact area of high arsenic groundwater covered well.
3882.75 km2, accounting for 19.67% of the study area. With a 1-meter The high arsenic water in the study area is mainly stored in the
rebound in groundwater level, the main distribution trend of high chemically reduced environment. The control of artificial mining vol­
arsenic risk area remained unchanged, but the overall area of the district ume can alter the groundwater flow field, weakening the infiltration and
with a high probability of arsenic in groundwater decreased. The recharge of surface water bodies such as mountain front and lateral
probability of high arsenic risk decreased in Yanjin, Fengqiu in Xinxiang inflow of Yellow River. This reduction in the introduction of reactive
and Hua County in Anyang, and the potential impact area of high arsenic organic matter leads to a decrease in the reductive dissolution of iron
groundwater was reduced to 3511 km2, accounting for 17.78% of the and manganese oxides, resulting in a significant decrease of ground­
study area. When the groundwater rebounded by 3 m, the probability of water arsenic concentration. When the groundwater level was raised by
arsenic in groundwater in the study area fluctuated to a certain extent. 5 m, the area of the high arsenic zone was reduced by 6.84% compared
The risk of arsenic in groundwater in Yuanyang and Hua County with the case when the water level was not raised. The reduced area in
appeared to increase, while the probability of high arsenic decreased in the high arsenic zone was smaller than compared to the situation where
the eastern part of Yuanyang. Overall, the potential impact area of high the groundwater was raised by 1 m and 3 m, indicating a weakening of
arsenic groundwater continued to decrease, covering an area of 3125 the regulation effect. This observation is also evident in the marginal
km2, which accounted for 15.83% of the study area. When groundwater effect of the probability of high arsenic groundwater (Fig. 9). As the
rebounded by 5 m, groundwater arsenic risk showed an increasing groundwater level rises (Fig. 9-a), the probability of high arsenic

13
W. Cao et al. Journal of Hydrology 625 (2023) 129985

gradually decreases. However, when the water table depth decreases to the work reported in this paper.
around 16 m, near 10 m, and less than 5 m, the probability of high
arsenic increases to some extent. This indicates that the reduction of Data availability
groundwater in the study area fluctuates to a certain extent during the
rebound of the water table, which in turn affects the release of arsenic. The data that has been used is confidential.
Based on the aforementioned results, the groundwater level rebound can
be regulated within the range of 1 m to 3 m by controlling the amount of References
groundwater extraction. This approach ensures an effective regulation
of groundwater arsenic risk and enhances the rationality and feasibility Bhattacharya, P., Chatterjee, D., Jacks, G., 1997. Occurrence of Arsenic-Contaminated
Groundwater in Alluvial Aquifers from Delta Plains, Eastern India: Options for Safe
of the high arsenic risk regulation scheme in the identified key area. Drinking Water Supply. International Journal of Water Resources Development. 13
(1), 79–92. https://doi.org/10.1080/07900629749944.
6. Conclusions Bundschuh, J., Maity, J.P., 2015. Geothermal arsenic: occurrence, mobility and
environmental implications. Renewable and Sustainable Energy Reviews. 42,
1214–1222. https://doi.org/10.1016/j.rser.2014.10.092.
In this study, a stacking ensemble learning model was developed to Cao, W.G., Guo, H.M., Zhang, Y.L., Ma, R., Li, Y.S., Dong, Q.Y., Li, Y.J., Zhao, R.K., 2018.
predict the potential risk of high arsenic distribution in shallow Controls of paleochannels on groundwater arsenic distribution in shallow aquifers of
alluvial plain in the Hetao Basin, China. The Science of the Total Environment.
groundwater using geospatial parameters of the lower reaches of the 613–614 (1), 958–968. https://doi.org/10.1016/j.scitotenv.2017.09.182.
Yellow River in northern Henan Province. Furthermore, the results were Cao, H., Xie, X., Wang, Y., Deng, Y.M., 2021. The Interactive Natural Drivers of Global
utilized to analyze the major spatial parameters that influence control­ Geogenic Arsenic Contamination of Groundwater. Journal of Hydrology. 597,
126214 https://doi.org/10.1016/j.jhydrol.2021.126214.
ling the risk and susceptibility of high arsenic in the groundwater.
Chakraborty, M., Sarkar, S., Mukherjee, A., Shamsudduha, M., Ahmed, K.M.,
Furthermore, we analyzed the influence of regulating groundwater Bhattacharya, A., Mitra, A., 2020. Modeling Regional-Scale Groundwater Arsenic
withdrawals to highlight the outcome on the risk of arsenic enrichment Hazard in the Transboundary Ganges River Delta, India and Bangladesh: Infusing
in the study area. Physically-Based Model with Machine Learning. Science of the Total Environment.
748, 141107 https://doi.org/10.1016/j.scitotenv.2020.141107.
The results revealed that the arsenic mass concentration in ground­ Chatzimparmpas, A., Martins, R.M., Kucher, K., Kerren, A., 2021. StackGenVis:
water in the study area ranged from 0.1 to 190 μg/L, and the exceedance Alignment of Data, Algorithms, and Models for Stacking Ensemble Learning Using
rate (arsenic concentration greater than 10 μg/L) was found to be Performance Metrics. IEEE Transactions on Visualization and Computer Graphics. 27
(2), 1547–1557. https://doi.org/10.48550/arXiv.2005.01575.
16.76%, which indicates a potential risk to the health of local residents. Chauhan, V.K., Dahiya, K., Sharma, A., 2019. Problem Formulations and Solvers in
The distribution of high arsenic was predominantly observed along the Linear SVM: a Review. Artificial Intelligence Review. 52 (2), 803–855. https://doi.
former depressions of the Taihang Mountains alluvial fan and the Yellow org/10.1007/s10462-018-9614-6.
Cutler, D.R., Edwards Jr., T.C., Beard, K.H., Cutler, A., Hess, K.T., Gibson, J., Lawler, J.L.,
River vent fan areas in the study area, particularly in Yanjin and the 2007. Random forests for classification in ecology. Ecology 88 (11), 2783–2792.
northern part of Yuanyang and Fengqiu counties of Xinxiang City, the https://doi.org/10.1890/07-0539.1.
southern part of Hua and Neihuang counties of Anyang City, Fan and Das, D., Samanta, G., Mandal, B.K., Chowdhury, T.R., Chanda, C.R., Chowdhury, P.P.,
Basu, G.K., Chakraborti, D., 1996. Arsenic in groundwater in six districts of West
Puyang counties of Puyang City, etc. Bengal. India. Environmental Geochemistry and Health. 18, 5–15. https://doi.org/
The stacking ensemble model demonstrated superior performance 10.1039/AN9952000917.
compared to the XGBoost, RF, and SVM independent machine learning Desdhanty, V.S., Rustam, Z., 2021. Liver Cancer Classification Using Random Forest and
ExtremeGradient Boosting (XGBoost) with Genetic Algorithm as Feature Selection,
models in terms of AUC (0.87), accuracy (0.82), and arsenic concen­
in: 2021International Conference on Decision Aid Sciences and Application (DASA).
tration (0.82). The predictions of the stacking model were highly In: Presented atthe 2021 International Conference on Decision Aid Sciences and
consistent with the actual distribution of arsenic in groundwater of the Application (DASA), pp. 716–719. https://doi.org/10.1109/
area. The predicted distribution of high arsenic groundwater covered DASA53625.2021.9682311.
Erban, L.E., Gorelick, S.M., Zebker, H.A., Fendorf, S., 2013. Release of arsenic to deep
19.67% of the entire study area. The order of importance of the influ­ groundwater in the Mekong Delta, Vietnam, linked to pumping-induced land
encing factors indicated that the impact of the Yellow River burst, subsidence. Proceedings of the National Academy of Sciences of the United States of
average annual temperature, annual precipitation, elevation, and hy­ America. 110 (34), 13751–13756. https://doi.org/10.1073/pnas.1300503110.
Erban, L.E., Gorelick, S.M., Fendorf, S., 2014. Arsenic in the multi-aquifer system of the
draulic gradient were the most significant factors contributing to Mekong Delta, Vietnam: analysis of large-scale spatial trends and controlling factors.
groundwater arsenic enrichment. Environmental Science & Technology. 48 (11), 6081–6088. https://doi.org/10.1021/
Overall, the depositional environment is the main condition affecting es403932t.
Erickson, M.L., Elliott, S.M., Brown, C.J., Stackelberg, P.E., Ransom, K.M., Reddy, J.E.,
the release of arsenic from groundwater, with a relative importance of Cravotta, C.A., 2021. Machine-learning predictions of high arsenic and high
27.1%, which ranked top among all influencing factors. Groundwater manganese at drinking water depths of the glacialaquifer system, northern
factors, including cumulative groundwater change, interannual continental United States. EnvironmentalScience&Technology. 55 (9), 5791–5805.
https://doi.org/10.1021/acs.est.0c06740.
groundwater variation, and groundwater depth, were also significant Fendorf, S., Michael, H.A., van Geen, A., 2010. Spatial and Temporal Varitons of
predictors of arsenic enrichment, with a relative importance of 13.69%. Groundwater Arsenic in South and Southeast Asia. Science. 328 (5982), 123–127.
Furthermore, the simulation of the groundwater recharge process found https://doi.org/10.1126/science.1172974.
Ghobadi, A., Cheraghi, M., Sobhanardakani, S., Lorestani, B., Merrikhpour, H., 2022.
that raising the groundwater level within the range of 1 m-3 m led to an
Groundwater Quality Modeling Using a Novel Hybrid Data-Intelligence Model Based
decrease in the area at risk of high arsenic groundwater, and the area of on Gray Wolf Optimization Algorithm and Multi-Layer Perceptron Artificial Neural
high arsenic zone could be reduced by up to 19.52%. The findings in this Network: a Case Study in Asadabad Plain, Hamedan. Iran. Environmental Science and
Pollution Research. 29 (6), 8716–8730. https://doi.org/10.1007/s11356-021-16300-
work provide support for the management and control of arsenic
4.
enrichment and distribution in shallow groundwater systems around the Guo, H.M., Guo, Q., Jia, Y.F., Liu, Z.Y., Jiang, Y.X., 2013. Chemical characteristics and
world. geochemical processes of high arsenic groundwater in different regions of China.
Funding Journal of Earth Sciences and Environment. 35 (3), 83–96. https://doi.org/10.3969/
j.issn.1672-6561.2013.03.008.
This research was funded by National Key R&D Program of China Hanoon, M.S., Ahmed, A.N., Fai, C.M., Birima, A.H., Razzaq, A., Sherif, M., Sefelnasr, A.,
(2022YFC3703701), National Natural Science Foundation of China El-Shafie, A., 2021. Application of Artificial Intelligence Models for Modeling Water
(41972262), Hebei Natural Science Foundation for Excellent Young Quality in Groundwater: Comprehensive Review, Evaluation and Future Trends.
Water, Air, & Soil Pollution. 232 (10), 1–41. https://doi.org/10.1007/s11270-021-
Scholars (D2020504032). 05311-z.
Harvey, C.F., Swartz, C.H., Badruzzaman, A.B.M., Keon-Blute, N., Yu, W., Ali, M.A.,
Declaration of Competing Interest Jay, J., Beckie, R., Niedan, V., Brabander, D., Oates, P.M., Ashfaque, K.N., Islam, S.,
Hemond, H.F., Ahmed, M.F., 2002. Arsenic mobility and groundwater extraction in
bangladesh. Science. 298 (5598), 1602–1606. https://doi.org/10.1126/
The authors declare that they have no known competing financial SCIENCE.1076978.
interests or personal relationships that could have appeared to influence

14
W. Cao et al. Journal of Hydrology 625 (2023) 129985

Harvey, C.F., Ashfaque, K.N., Yu, W., Badruzzaman, A.B.M.M., Ali, A., Oates, P.M., Podgorski, J., Wu, R., Chakravorty, B., Polya, D.A., 2020. Groundwater Arsenic
Michael, H.A., Neumann, R.B., Beckie, R., Islam, S., Ahmed, M.F., 2006. Distribution in India by Machine Learning Geospatial Modeling. International
Groundwater dynamics and arsenic contamination in Bangladesh. Chemical Journal of Environmental Research and Public Health. 17 (19), 7119. https://doi.
Geology. 228 (1–3), 112–136. https://doi.org/10.1016/j.chemgeo.2005.11.025. org/10.3390/ijerph17197119.
Hosseini, S.M., Mahjouri, N., 2014. Developing a fuzzy neural network-based Rahman, S., Kim, K.H., Saha, S.K., Swaraz, A.M., Paul, D.K., 2014. Review of remediation
supportvector regression (FNN-SVR) for regionalizing nitrate concentration in techniques for arsenic (As) contamination: A novel approach utilizing bioorganisms.
groundwater. Environmental Monitoring and Assessment. 186, 3685–3699. https:// Journal of Environmental Management 134, 175–185. https://doi.org/10.1016/j.
doi.org/10.1007/s10661-014-3650-8. jenvman.2013.12.027.
Hu, X.D., Zhang, H., Mei, H.B., Xiao, D.H., Li, Y.Y., Li, M.D., 2020. Landslide Ravenscroft, P., 2007. Predicting the global extent of arsenic pollution of groundwater
Susceptibility Mapping Using the Stacking Ensemble Machine Learning Method in and its potentialimpact on human health. UNICEF Rep. 1–35. https://www.resea
Lushui. Southwest China. Applied Sciences. 10 (11), 4016. https://doi.org/10.3390/ rchgate.net/publication/313628997_Predicting_the_global_extent_of_arsenic_pollut
app10114016. ion_of_groundwater_and_its_potential_impact_on_human_health.
Huang, K., Liu, Y., Yang, C., Duan, Y.H., Yang, X.F., Liu, C.X., 2018. Identification of Ravenscroft, P., Burgess, W.G., Ahmed, K.M., Burren, M., Perrin, J., 2005. Arsenic in
Hydro-Biogeochemical Processes Controlling Seasonal Variations in Arsenic Groundwater of the Bengal Basin, Bangladesh: Distribution, Field Relations, and
Concentrations within a Riverbank Aquifer at Jianghan Plain. China. Water Resources Hydrogeological Setting. Hydrogeology Journal. 13 (5), 727–751. https://doi.org/
Research. 54 (7), 4294–4308. https://doi.org/10.1029/2017WR022170. 10.1007/s10040-003-0314-0.
Janardhana, R. N. (2022). Arsenic in the geo-environment: A review of sources, Rodríguez-Lado, L., Sun, G., Berg, M., Xue, H., Zhang, Q., Zheng, Q., Johnson, C.A., 2013.
geochemical processes, toxicity and removal technologies. Environmental Research. Groundwater Arsenic Contamination Throughout China. Science. 341 (6148),
203, 111782.https://doi.org/10.1016/j.envres.2021.111782. 866–868. https://doi.org/10.1126/science.1237484.
Kim, K., Moon, J.T., Kim, S.H., Ko, K., 2009. Importance of surface geologic condition in Saunders, J.A., Lee, M.-K., Shamsudduha, M., Dhakal, P., Uddin, A., Chowdury, M.T.,
regulating as concentration of groundwater in the alluvial plain. Chemosphere. 77, Ahmed, K.M., 2008. Geochemistry and mineralogy of arsenic in (natural) anaerobic
478–484. https://doi.org/10.1016/j.chemosphere.2009.07.053. groundwaters. Applied Geochemistry. 23 (11), 3205–3214.
Knappett, P.S.K., Mailloux, B.J., Choudhury, I., Khan, M.R., Michael, H.A., Barua, S., Schaefer, M.V., Ying, S.C., Benner, S.G., Duan, Y.H., Wang, Y.X., Fendorf, S., 2016.
Mondal, D.R., Steckler, M.S., Akhter, S.H., Ahmed, K.M., Bostick, B., Harvey, C.F., Aquifer arsenic cycling induced by seasonal hydrologic changes within the Yangtze
Shamsudduha, M., Shuai, P., Mihajlov, I., Mozumder, R., van Geen, A., 2016. River basin. Environmental Science & Technology. 50 (7), 3521–3529. https://doi.
Vulnerability of low-arsenic aquifers to municipal pumping in Bangladesh. Journal org/10.1021/acs.est.5b04986.
of Hydrology. 539, 674–686. https://doi.org/10.1016/j.jhydrol.2016.05.035. Schaefer, M.V., Guo, X., Gan, Y., Benner, S.G., Griffin, A.M., Gorski, C.A., Wang, Y.X.,
Kumar, M., Das, A., Das, N., Goswami, R., Singh, U.K., 2016. Co-occurrence perspective Fendorf, S., 2017. Redox controls on arsenic enrichment and release from
of arsenic and fluoride in the groundwater of Diphu, Assam, northeastern India. aquifersediments in central Yangtze River Basin. Geochimica et Cosmochimica Acta.
Chemosphere. 150, 227–238. https://doi.org/10.1016/j.jhydrol.2016.05.035. 204, 104–119. https://doi.org/10.1016/j.gca.2017.01.035.
Liang, C.P., Sun, C.C., Suk, H., Wang, S.W., Chen, J.S., 2021. A Machine Learning Senn, D.B., Hemond, H.F., 2002. Nitrate Controls on Iron and Arsenic in an Urban Lake.
Approach for Spatial Mapping of the Health Risk Associated with Arsenic- Science. 296, 2373–2376. https://doi.org/10.1126/SCIENCE.1072402.
Contaminated Groundwater in Taiwan’s Lanyang Plain. International Journal of Shamsudduha, M., Taylor, R.G., Chandler, R.E., 2015. A Generalized Regression Model of
Environmental Research and Public Health. 18 (21), 11385. https://doi.org/ Arsenic Variations in the Shallow Groundwater of Bangladesh. Water Resources
10.3390/ijerph182111385. Research. 51 (1), 685–703. https://doi.org/10.1002/2013wr014572.
Liu, F., Huang, G.X., Sun, J.C., Jing, J.H., Zhang, Y., 2014. Distribution of arsenic in Sharifi, R., Moore, F., Keshavarzi, B., 2016. Mobility and chemical fate of arsenic and
shallow aquifers of Guangzhou region, China: natural and anthropogenic impacts. antimony in water and sediments of Sarouq River catchment, Takab geothermal
Water Quality Research Journal. 49 (4), 354–371. https://doi.org/10.2166/ field, northwest Iran. Journal of Environmental Management. 170, 136–144.
WQRJC.2014.014. https://doi.org/10.1016/j.jenvman.2016.01.018.
Manning, B.A., Fendorf, S.E., Goldberg, S., 1998. Surface Structures and Stability of Smith, R., Knight, R., Fendorf, S., 2018. Overpumping Leads to California Groundwater
Arsenic(III) on Goethite: Spectroscopic Evidence for Inner-Sphere Complexes. Arsenic Threat. Nature Communications. 9 (1), 1–6. https://doi.org/10.1038/
Environmental Science & Technology. 32 (16), 2383–2388. https://doi.org/ s41467-018-04475-3.
10.1021/es9802201. Song, S.X., López-Valdivieso, A., Hernandez-Campos, D.J., Peng, C.S., Monroy-
Masue, Y., Loeppert, R.H., Kramer, T.A., 2007. Arsenate and arsenite adsorption and Fernández, M.G., RazoSoto, I., 2006. Arsenic removal from high- arsenic water by
desorption behavior on coprecipitated aluminum: iron hydroxides. Environmental enhanced coagulation with ferric ions and coarse calcite. Water Research. 40 (2),
Science & Technology. 41 (3), 837–842. https://doi.org/10.1021/es061160z. 364–372. https://doi.org/10.1016/j.watres.2005.09.046.
Michael, H.A., Voss, C.I., 2008. Evaluation of the sustainability of deep groundwater as Sun, W., Trevor, B., 2018. A stacking ensemble learning framework for annual river ice
an arsenic-safe resource in the Bengal Basin. Proc. Natl. Acad. Sci. U.S.A. 105 (25), breakup dates. Journal of Hydrology. 561, 636–650. https://doi.org/10.1016/j.
8531–8536. jhydrol.2018.04.008.
Neidhardt, H., Berner, Z., Freikowski, D., Biswas, A., Winter, J., Chatterjee, D., Norra, S., Taghizadeh-Mehrjardi, R., Schmidt, K., Chakan, A.A., Rentschler, T., Scholten, T.,
2013. Influences of groundwater extraction on the distribution of dissolved As in Scholten, T., 2020. Improving the Spatial Prediction of Soil Organic Carbon Content
shallow aquifers of West Bengal. India. Journal of Hazardous Materials. 262, in Two Contrasting Climatic Regions by Stacking Machine Learning Models and
941–950. https://doi.org/10.1016/j.jhazmat.2013.01.044. Rescanning Covariate Space. Remote Sensing. 12, 1095. https://doi.org/10.1016/j.
Neumann, R.B., Pracht, L.E., Polizzotto, M.L., Badruzzaman, A.B.M., Ali, M.A., 2014. jhydrol.2018.04.008.
Biodegradable organic carbon in sediments of an arsenic-contaminated aquifer in Tan, Z., Yang, Q., Zheng, Y., 2020. Machine Learning Models of Groundwater Arsenic
Bangladesh. Environmental Science & Technology Letters. 1 (4), 221–225. https:// Spatial Distribution in Bangladesh: Influence of Holocene Sediment Depositional
doi.org/10.1021/ez5000644. History. Environmental Science & Technology. 54 (15), 9454–9463. https://doi.org/
Nguyen, P.T., Ha, D.H., Nguyen, H.D., Phong, T.V., Trinh, P.T., Al-Ansari, N., Le, H.V., 10.1021/acs.est.0c03617.
Pham, B.T., Ho, L.S., Prakash, I., 2020. Improvement of Credal Decision Trees Using van Geen, A., Zheng, Y., Versteeg, R., Stute, M., Horneman, A., Dhar, R., Steckler, M.,
Ensemble Frameworks for Groundwater Potential Modeling. Sustainability. 12 (7), Gelman, A., Small, C., Ahsan, H., Graziano, J.H., Hussain, I., Ahmed, K.M., 2003.
2622. https://doi.org/10.3390/su12072622. Spatial Variability of Arsenic in 6000 Tube Wells in a 25 km2 Area of Bangladesh.
Norrman, J., Sparrenbom, C.J., Berg, M., Nhan, D.D., Nhan, P.Q., Rosqvist, H., Jacks, G., Water Resources Research. 39 (5) https://doi.org/10.1029/2002WR001617.
Sigvardsson, E., Baric, D., Moreskog, J., Harms-Ringdahl, P., Hoan, N.V., 2008. Welch, A.H., Westjohn, D., Helsel, D.R., Wanty, R.B., 2000. Arsenic in groundwater of
Arsenic mobilisation in a new well field for drinking water production along the Red the United States: occurrence and geochemistry. Groundwater. 38, 589–604. https://
River, Nam Du. Hanoi. Applied Geochemistry. 23 (11), 3127–3142. doi.org/10.1111/j.1745-6584.2000.tb00251.x.
Oremland, R.S., Stolz, J.F., 2003. The Ecology of Arsenic. ChemInform. 34 (35), Wen, D., Zhang, F., Zhang, E., Wang, C., Han, S., Zheng, Y., 2013. Arsenic, fluoride and
939–944. https://doi.org/10.1126/SCIENCE.1081903. iodine in groundwater of China. Journal of Geochemical Exploration. 135, 1–21.
Pi, K., Wang, Y., Xie, X., Su, C., Ma, T., Li, J., Liu, Y., 2015. Hydrogeochemistry of co- Zhang, Q., Rodríguez-Lado, L., Johnson, C.A., Xue, H., Shi, J., Zheng, Q., Sun, G., 2012.
occurring geogenic arsenic, fluoride and iodine in groundwater at Datong Basin, Predicting the risk of arsenic contaminated groundwater in Shanxi Province.
northern China. Journal of Hazardous Materials. 300, 652–661. Northern China. Environmental pollution. 165, 118–123.

15

You might also like