You are on page 1of 34

Environmental Science and Pollution Research

https://doi.org/10.1007/s11356-024-31963-5

REVIEW ARTICLE

Deep learning in water protection of resources, environment,


and ecology: achievement and challenges
Xiaohua Fu1 · Jie Jiang1,2 · Xie Wu3 · Lei Huang4 · Rui Han5 · Kun Li6,7 · Chang Liu2 · Kallol Roy8 · Jianyu Chen2 ·
Nesma Talaat Abbas Mahmoud8 · Zhenxing Wang2

Received: 24 August 2023 / Accepted: 6 January 2024


© The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature 2024

Abstract
The breathtaking economic development put a heavy toll on ecology, especially on water pollution. Efficient water resource
management has a long-term influence on the sustainable development of the economy and society. Economic development
and ecology preservation are tangled together, and the growth of one is not possible without the other. Deep learning (DL)
is ubiquitous in autonomous driving, medical imaging, speech recognition, etc. The spectacular success of deep learning
comes from its power of richer representation of data. In view of the bright prospects of DL, this review comprehensively
focuses on the development of DL applications in water resources management, water environment protection, and water
ecology. First, the concept and modeling steps of DL are briefly introduced, including data preparation, algorithm selection,
and model evaluation. Finally, the advantages and disadvantages of commonly used algorithms are analyzed according to
their structures and mechanisms, and recommendations on the selection of DL algorithms for different studies, as well as
prospects for the application and development of DL in water science are proposed. This review provides references for
solving a wider range of water-related problems and brings further insights into the intelligent development of water science.

Keywords Deep learning · Machine learning · Water resources management · Water environment protection · Water
ecology

Introduction

Water is not only an important resource for humans but


also a basic component of ecological sustenance. Sustain-
able water resources, water environment, and water ecology
are invaluable for human health and sustainable economic
Responsible Editor: Xianliang Yi growth. But over the past half-century, factors such as popu-
lation growth, use of fossil fuel, deforestation, and human
Xiaohua Fu and Jie Jiang contributed equally to this work.

4
* Zhenxing Wang School of Environmental Science and Engineering,
wangzhenxing@scies.org Guangzhou University, Guangzhou 510006,
People’s Republic of China
1
Ecological Environment Management 5
China Environment Publishing Group, Beijing 100062,
and Assessment Center, Central South University
People’s Republic of China
of Forestry and Technology, Changsha 410004,
6
People’s Republic of China Freeman Business School, Tulane University, New Orleans,
2 LA 70118, USA
State Environmental Protection Key Laboratory of Water
7
Environmental Simulation and Pollution Control, Guangzhou Huacai Environmental Protection Technology
Ministry of Ecology and Environment, South China Co., Ltd, Guangzhou 511480, People’s Republic of China
Institute of Environmental Sciences, Guangzhou 510655, 8
Institute of Computer Science, University of Tartu,
People’s Republic of China
51009 Tartu, Estonia
3
China Railway Water Information Technology Co, LTD,
Nanchang 330000, People’s Republic of China

Vol.:(0123456789)
Environmental Science and Pollution Research

activity caused serious environmental issues. Therefore, to and water science, we can see a steady exponential increase
protect ecological safety and human health, (pan-national) in the average number of DL applications per year in the
policies are formulated to enact. Reducing major pollutants, water industry. We summarize the detailed applications of
promoting “human-water harmony,” and building a better DL into seven major application areas in Fig. 1 and provide a
index to evaluate performance measures of integration, of detailed review of the applications in each area. More impor-
water resources, the water environment, and water ecology. tantly, the advantages and disadvantages of representative
Recently, machine learning (ML) has been increasingly algorithms are discussed, and their applicability in different
applied to the field of water science (Buonocore et al. 2018; scenarios and studies is analyzed by comparing their algo-
Mauricio-Iglesias et al. 2015). Fuzzy logic, random for- rithm characteristics. Finally, the future research potential,
est, and decision trees are most commonly used to monitor research direction, challenges, and prospects in the field of
rivers, lakes, groundwater (Chau 2006; Sagan et al. 2020), water science are prospected.
drinking water treatment (Li et al. 2021a), water quality
management (Yaseen 2021), wastewater treatment (Wang
et al. 2022), etc. These reviews help researchers understand An overview of deep learning
and expand the application of artificial intelligence in water
science. They can use their internal mechanism to select Recently, deep learning has seen stellar growth. This comes
the most important features from the original data, rather as a combination of many factors: availability of large data-
than making the best choice through experience (Du et al. sets, compute power with GPU and TPU, and automatic
2016). Although artificial intelligence technology plays a feature engineering from data. The neural layers are deeply
crucial role in the modeling of water environment treatment stacked and each layer learns a particular feature and passes
processes, it also has drawbacks such as large data volume, to the next layers on which it learns. The generalization
weak adaptability, and inability to scale well. The difficulty capability on the unseen test data distributions comes from
of model transformation makes it perform well in modeling the deep architectures. A brief overview of how the DL
and prediction, but it is not ideal (Fulcher et al. 2013). Com- model is built is given in the following sections, for more
pared with traditional artificial intelligence models, DL not information on the specifics of the DL algorithm.
only has powerful feature extraction and recognition capabil-
ities but also has self-learning and self-completion functions, Data preprocessing pipeline
which is an effective way to solve the modeling problem of
nonlinear water quality dynamic systems (Niu et al. 2020; We gather data from open-source hydrological research data-
Yang et al. 2019). We have noticed that many review papers bases, experiments, and from simulations. Data preprocess-
on DL have been published in the field of water science. ing takes a significant amount of time and effort. Data is
We note that many review papers on deep learning have cleaned to remove the underlying noise and curated before
been published in water-related fields, such as water quality training the model. The hydrological data also has miss-
management (Wai et al. 2022), wastewater treatment (Alvi ing values, acquisition errors, and redundancies. The data
et al. 2023a), water resource management (Fu et al. 2022; is read and through Pandas data frame and then converted
Shen 2018), aquatic animal visual recognition and detection to Python numpy arrays. The input columns are chosen as
(Shen 2018), and sewage pipe detection (Sun et al. 2023), training inputs and the output column is training output. The
these reviews help researchers understand and expand the data is split into training, evaluation, and testing using stand-
application of deep learning in the field of water science. ard libraries of Tensorflow/Pytorch.
However, so far, these reviews have been limited to one
domain or the application of a particular algorithm, lacking Algorithm selection
a comprehensive review scope. Sit et al. (2020) conducted
a comprehensive analysis on this topic, but there is still no Artificial neural networks (ANNs) have been developed for
in-depth analysis and suggestions on the application of dif- over 70 years since it was first proposed in 1942. 1943 to
ferent algorithms in different water problems. 1969 were the first wave of research and development ANN.
Therefore, the motivation of our research is to further Multi-layer perceptron (MLP) was developed (an MLP with
expand the scope of the review based on the previous lit- three hidden layers is shown in Fig. 3a). During the sec-
erature review. Firstly, the establishment process of the DL ond wave (after 1986) of development, more layers were
model is briefly introduced, including data preparation, added and backpropagation was invented to train the models.
algorithm selection, and modeling process, and its applica- Deep learning has snowballed since 2006. Explosive growth
tion is summarized into three key tasks: water resources, occurred after 2012, and the speed of innovative research has
water environment, and water ecological management. In been significantly accelerated. The deep learning develop-
our statistical meta-analysis of publications related to DL ment process is illustrated in Fig. 2. The large circle in Fig. 1
Environmental Science and Pollution Research

Fig. 1  Applications of deep learning in the field of water protection of resources, environment, and ecology

Fig. 2  Timeline of deep learning development


Environmental Science and Pollution Research

Fig. 3  Typical structure and operation principles of MLP, DRL, CNN, RNN, LSTM, and GRU​

represents critical transitions in deep learning development. Convolutional Neural Network (CNN) The CNN is a type
The size of the small circle heralds the breakout scale of of DL algorithm specifically designed to process structured
deep learning this year. The upward oblique line indicates grid-like data, such as images or sequences of data. The CNN
that the popularity of deep learning is in the rising period. is inspired by the organization of the visual cortex in animals,
In contrast, the oblique downward curve indicates that DL where neurons are arranged in receptive fields that respond to
popularity is declining. As can be observed, DL has devel- specific regions of the visual field. Similarly, CNN has layers
oped very well over the last decades. of interconnected neurons, known as convolutional layers, that
process local regions of the input data (Fukushima 2013). The
main key components of CNN are shown in Fig.3d.
Supervised learning
Convolutional layers: These layers perform the core operation
Supervised machine learning is a subfield of artificial intelli- of convolution, where small filters (also known as kernels) slide
gence (AI) and machine learning (ML) that involves training across the input data and perform element-wise multiplication
a model on labeled data. In supervised learning, the algo- and summation. The purpose is to extract local features, such
rithm learns from a given dataset, which consists of input as edges, corners, or textures, from the input data. Activation
features and corresponding target labels. The goal is to teach functions: After each convolutional layer, an activation func-
the model to map the input features to the correct output tion, such as the rectified linear unit (ReLU), is typically applied
labels based on the provided examples. element-wise to introduce non-linearities into the network. This
Environmental Science and Pollution Research

helps the network learn complex patterns and representations. includes the external input of the current step and the hidden
Pooling layers: Pooling layers reduce the spatial dimensions of layer state of the previous step. These inputs are processed
the feature maps generated by the convolutional layers. Common through the weights and activation functions of the network
pooling operations include max pooling, which selects the maxi- layer, which not only generate the hidden layer state and
mum value in each local region, or average pooling, which cal- output of the current step but are also cyclically sent back
culates the average value. Fully connected layers: After several to the hidden layer as part of the input for the next time
convolutional and pooling layers, CNNs often end with one or step. They process sequences by maintaining a hidden state
more fully connected layers. These layers connect every neuron that acts as memory, allowing information to be propagated
from the previous layer to every neuron in the subsequent layer. from previous elements to future elements. This algorithm
Fully connected layers help to learn high-level representations is similar to traditional backpropagation, which unfolds the
and make predictions based on the extracted features. Loss func- network along the timeline to handle the gradients gener-
tion: CNNs are typically trained in a supervised manner, and a ated by each time step to update the weights in the network,
loss function is used to quantify the error between the predicted minimizing the error between the prediction and the actual
output and the true labels. Common loss functions include cross- results. The RNN can handle sequences of arbitrary lengths,
entropy loss for classification tasks or mean squared error for making them suitable for tasks like language modeling,
regression tasks. During the training process, the CNN learns to machine translation, and sentiment analysis (Elman 1990).
automatically extract meaningful features from the input data, Currently, the RNNs have relatively few direct applications
allowing it to generalize to unseen examples. The weights and in the field of water science, in part due to their limitations,
biases of the network are adjusted iteratively using optimization and they do not perform well in learning and maintaining
techniques like stochastic gradient descent (SGD) or its variants. long-term dependencies due to problems with disappearing
Dropout is a regularization technique commonly used in CNN gradients or exploding gradients (Riedmiller 1994). As a
to prevent overfitting and improve the generalization ability of result, the RNN has relatively few direct applications. Most
the network. Overfitting occurs when the model learns to memo- studies tend to cite and compare newly developed models
rize the training data too well, leading to poor performance on with traditional RNNs to highlight the significant improve-
unseen data. The idea behind dropout is to randomly deactivate ments that have been made(Chen et al. 2018). In addition,
(or “drop out”) a portion of the neurons in a layer during training some researchers have tried to combine the RNN with other
(LeCun et al. 2015). This means that for each training example, a ML algorithms in the field of water science to compare
fraction of the neurons are temporarily ignored or “dropped out” the performance of individual algorithms and explore the
with a certain probability (Hinton et al. 2012). CNN has been performance advantages of integrated methods over single
incredibly successful in various computer vision tasks, including methods(Kumar et al. 2022; Ren et al. 2020; Xu et al. 2019).
water image classification, water object detection, and image
segmentation (this will be reviewed in detail in “Application of Long short‑term memory (LSTM) The LSTM is a specialized
deep learning in water environment” section). In addition, CNN type of RNN that addresses the vanishing gradient problem,
can also be used for water quality time series prediction/forecast- which can hinder the training of traditional RNN. As shown
ing. Just as 2D-CNNs capture spatial dependence by extracting in Fig. 3e, LSTM uses a more complex architecture with gat-
features from 2D patches of local input, their one-dimensional ing mechanisms to control the flow of information and pre-
counterparts, called 1D-CNNs, can identify local patterns in serve long-term dependencies, and these “gates” are neurons
time series or sequential problems (Pyo et al. 2021). with learnable weights that surround cell states to control the
Sequence neural networks, also known as sequence flow of information. After training, the input gate is respon-
models or sequential neural networks, are a class of neu- sible for determining which inputs are important enough to
ral networks specifically designed to handle sequential or be remembered by the network. The oblivion gate deter-
temporal data. They are widely used in natural language mines which past states should be preserved, and for how
processing (NLP), speech recognition, time series analy- long. Finally, the output gate determines how much infor-
sis, and other tasks that involve sequences. Sequence neu- mation is extracted from the cell state to produce the final
ral networks are capable of capturing dependencies and output. These gates work together to enable the LSTM net-
patterns in sequential data by considering the order and work to retain information for a long time while discarding
context of the input elements. Here are some commonly information that is no longer needed. In this way, LSTM can
used types of sequence neural networks: effectively deal with long-term dependence problems with-
out being affected by disappearing gradients or explosions
Recurrent neural network (RNN) The RNN is the most fun- (Hochreiter and Schmidhuber 1997). LSTM shows better
damental type of sequence model. From Fig. 3c, the RNN prediction performance indexes which are difficult to obtain
structure generally consists of the input layer, hidden layer, under the condition of model structure or data in the past. In
and output layer. At each time step, the input of the RNN groundwater level prediction, Zhang et al. (2018) used the
Environmental Science and Pollution Research

dropout method in the LSTM layer to predict the monthly However, the application of Transformers in water science
scale groundwater level in an irrigation district, improving research is relatively rare. Castangia et al. (2023) have dem-
the prediction and fitting ability of the model, and pointing onstrated better performance than RNN in flood prediction,
out its application value in areas without groundwater level because its attention mechanism can focus on a specific part
observation. In terms of rainfall and runoff forecast, under of the input, allowing the model to receive a larger feature
standard data sets such as large sample catchment attributes set. Making it more accurate in a shorter period. In addition,
and meteorological datasets (CAMELS), the LSTM model is researchers use a remote sensing module of the water envi-
more dominant than the traditional physical model in terms ronment in the Transformer. The self-attention mechanism of
of BiasRMSE, NSE, R2, and peak discharge deviation. As the Transformer can also help enhance the feature represen-
for the relative deviation of low flow, different studies have tation of boundary information and improve the accuracy of
put forward inconsistent conclusions (Addor et al. 2017). semantic segmentation (Yan et al. 2023; Zhong et al. 2022).
In the field of water quality and other related fields, Zhi
et al. (2021) established a river oxygen deaeration predic- Unsupervised learning
tion model on 236 river basins in CAMELS based on LSTM
and found that the model could reveal the trend of dissolved Unsupervised machine learning is a branch of machine learn-
oxygen decreasing with the increase of water temperature, ing where the algorithm learns from unlabeled data with-
indicating the potential of predicting dissolved oxygen in out any explicit target or output variable. Unlike supervised
river basins without water quality measurement. learning, where the model is trained on labeled examples to
make predictions or classifications. Unsupervised learning
Gated recurrent unit (GRU) The GRU is similar to the LSTM focuses on discovering patterns, relationships, and structures
and also addresses the vanishing gradient problem. As within the data itself. The primary goal of unsupervised
shown in Fig. 3f, the GRU has two gates: an update gate learning is to explore and understand the inherent structure or
and a reset gate. The role of the update gate is to deter- hidden information in the input data. It can be used for tasks
mine how much past information is retained and how much such as clustering, dimensionality reduction, anomaly detec-
new information is received from the input layer. The func- tion, and data visualization. (i) Clustering algorithms, such
tion of the reset gate is similar to that of the forget gate as k-means clustering, hierarchical clustering, and DBSCAN
in the LSTM, which is used to control the flow of long- (density-based spatial clustering with noise application),
term information. They have a simplified architecture with group similar data points together based on their inherent
fewer gating mechanisms compared to LSTM, making them similarities or distances. The goal is to identify natural clus-
computationally more efficient (Cho et al. 2014). The GRU ters or subgroups within the data. (ii)Dimensionality reduc-
is commonly used in scenarios where a simpler recurrent tion techniques, such as principal component analysis (PCA),
model is sufficient, such as Gao et al. (2020) showed that the t-SNE, and autoencoder (AE), aim to reduce the number of
GRU may be preferred for short-term runoff predictions due variables or features in the dataset while preserving its essen-
to its lesser model training time requirements. In contrast, tial structure, and they are very useful when dealing with
Loc et al. (2020) indicated that GRUs might not perform high-dimensional data. Popular generative models include
well when presented with extreme or outlier water quality the Gaussian mixture model (GMM), variational autoencod-
data; instead, LSTM is more adept at processing time series ers (VAEs), and generative adversarial network (GAN). They
data that exhibit strong regularity and smooth fluctuations. are used to model the underlying distribution of the data.
It is important to note that both LSTM and GRU incorporate They can learn the patterns and generate new instances that
features called “gating mechanisms,” which generally make resemble the original data.
them superior to RNN and other ML models in terms of Unsupervised learning algorithms often require careful
storing and managing information. preprocessing and feature engineering to handle missing
values, outliers, and other data quality issues. Additionally,
Transformer The Transformer have gained significant atten- the evaluation of unsupervised learning results can be more
tion in recent years due to their exceptional performance in subjective compared to supervised learning, as there are no
NLP tasks. Unlike RNN-based models, the Transformer oper- predefined targets to measure against. Evaluation metrics
ates in parallel and does not rely on sequential processing. for unsupervised learning depend on the specific task, such
They employ a self-attention mechanism that allows them to as silhouette score for clustering or reconstruction error for
capture long-range dependencies efficiently. The Transformer dimensionality reduction. They are commonly used to detect
has been successful in machine translation, language under- samples that are sparsely distributed and far from most data.
standing, and text generation. The Transformer can com- Unlike supervised learning algorithms that predict data
pletely replace the recursion layer with a self-attention mech- directly, this paper focuses on two popular unsupervised
anism, which allows longer input sequences to be modeled. models: GAN and AE.
Environmental Science and Pollution Research

Generative Adversarial Network (GAN) GAN consists there is little information given by the outside, the system
of two parts: a generator and a discriminator, the goal must rely on its own experience to self-learn, to improve the
of which is to create data that is realistic enough that it action plan to adapt to the environment. In addition, we have
is difficult for the discriminator to tell whether they are also made breakthroughs in deep reinforcement learning
real or not (Wang et al. 2017). On the other hand, the dis- (DRL), which refers to a set of methods to approximate the
criminator strives to be a better “detective,” distinguishing value function (deep Q-learning) or strategy function (strat-
between real data and fake data generated by the genera- egy gradient method) through deep neural networks (Zhang
tor, and its training is in an adversus-game state (Good- et al. 2019). The principle of RL is based on Markov deci-
fellow et al. 2014), obtaining a large number of sample sion process (MDP). In this process, agents obtain rewards
data from a small number of labeled samples, while GANs by observing and acting on the environment and adjust their
learn to generate high-quality data from a small number of strategies based on the size of the rewards to maximize
instances. This is particularly valuable in applications such future returns. We illustrate this principle in Fig. 3b. The
as aquatic ecological image recognition enhancement and professional application of DRL in the field of water science
restoration, which will be reviewed in detail in “Applica- is still in the early stage, mostly applied to urban wastewater
tion of deep learning in water ecosystem” section. In par- systems. The professional application of RL in the field of
ticular, Wang (2021b) used GAN to optimize the fusion water science is still in its infancy, mainly applied to urban
of passive microwave and infrared data to estimate pre- sewage systems. Hernández-del-Olmo et al. (2016) showed
cipitation. After receiving both inputs, the GAN generator that RL can better control the DO set points of proportional
generates fusion samples, which were then input together integration (PI) controllers compared with manually oper-
with the real samples observed by ground radar. The GAN ated and ammonia-based PI controllers. Consider system
discriminator calculated cross-entropy and other losses to state variables (i.e., ammonium and DO concentrations) to
train the model and finally makes it difficult for the dis- reduce operating costs.
criminator to distinguish whether the input is a fusion
sample or a real sample. The trained generator can be used Comparisons advantages and disadvantages
to obtain the fusion precipitation, and the analysis results of the different DL models
from the aspects of precipitation intensity and development
situation showed that the method is closer to the ground Table 1 summarizes the advantages and disadvantages of
radar observation results than the Global Precipitation Pro- each of the DL models. This could serve as a quick reference
gram Multi-Satellite Joint Precipitation Inversion (IMERG) for future scholars who wish to have a fresh start in choosing
satellite precipitation products. a suitable DL model in water science.

Autoencoder (AE) The AE is a typical unsupervised net-


work used in the water domain for feature extraction and Modeling process
data dimensionality reduction (Jiang 2018). It is composed
of an encoder, decoder, and hider. The encoder is responsi- The general modeling process of a DL model includes steps
ble for mapping the input data into the encoding space, and such as processing of variables to be input, data set partition-
the decoder’s goal is to reconstruct the input data as accu- ing, model selection, and parameter optimization (Fig. 4).
rately as possible, adjusting the weights and parameters of
the encoder and decoder based on the BP backpropagation 1. Preprocessing. Data sets containing input features and their
algorithm to make it as similar as possible to the original features are collected to build a feature variable database.
data. Compared with PCA autoencoders, they can capture The data is preprocessed and cleaned to ensure that its
both linear and nonlinear patterns present in training data, format is suitable for the learning algorithm. This step may
which is very useful in dealing with complex nonlinear involve things like removing outliers, processing missing
hydrological systems. Depending on the modeling task, the values, normalizing data, or coding categorical variables.
denoising autoencoder (DAE) can be used for wastewater 2. Data set partitioning. The data set is partitioned into a
plant applications such as data reconstruction or denoising. training set and a test set. The validation set is divided
from the training set to tune the model’s structure and
parameters, and the final evaluation of the model per-
Reinforcement learning formance is performed on the test set.
3. Model training. The data is used to train the model in
Reinforcement learning (RL) is the learning of the intelli- which the algorithm learns basic patterns and relation-
gent system from the environment to the behavior mapping, ships between input features and target labels. During
to maximize the reinforcement signal function value. Since training, the model adjusts its internal parameters to
Table 1  Comparisons of advantages and disadvantages of different DL models
Model Advantages Disadvantages References

CNN • Shared convolution kernel, no pressure on high-dimensional • Need to adjust parameters, need a large sample size, train- (LeCun et al. 2015; Li et al. 2019a)
data processing ing is best done on GPU
• No need to select the feature manually, the weight training • Overfitting risk, unclear physical meaning
is good, and the feature classification effect is good
RNN • Long-term dependencies in the sequence can be captured, • Gradient vanishing and gradient explosion (Hochreiter 1998; Riedmiller 1994)
and a sharing strategy is adopted • The training time is long; it is sensitive to hyperparameters,
and its interpretability is poor
LSTM • Solve the problem of gradient vanishing • The computational complexity of LSTM is higher (Hochreiter and Schmidhuber 1997; Hu et al. 2018)
• Capture long-term dependencies • It takes a lot of data to train
GRU​ • Better than traditional RNNs at capturing long-term • The interpretation of gating mechanisms and the flow of (Cho et al. 2014; Salloom et al. 2021)
dependencies, training is faster, takes less time, and is less information within the network can be more difficult than
prone to overfitting with traditional RNNs
Transformer • The correlation between each word is calculated without • Local information acquisition is not as strong as RNN and (Castangia et al. 2023; Yosinski et al. 2014)
passing through hidden layers CNN
• It can be calculated in parallel and can make full use of • There are problems with location information coding
GPU resources
GAN • The model is only used for backpropagation and does not • The interpretability is poor, and the distribution Pg(G) of (Goodfellow et al. 2014; Wang et al. 2017)
require Markov chain practice the generative model is not explicitly expressed
• GANs can produce clearer, more realistic samples • GANs are not suitable for processing discrete forms of data
AE • Strong generalization and unsupervised data annotation is • For exception identification scenarios, the training data (Ballard 1987; Rumelhart et al. 1986)
not required must be normal
DRL • Handle high-dimensional, non-linear state and action space, • Require a lot of data and computing resources to train, and (Van Hasselt et al. 2015; Zhang et al. 2019)
and is suitable for complex decision problems the training time is long
• Learn the optimal strategy adaptively, without the need for • Problems like overfitting can occur,
artificial design features or rules • For complex tasks, there are still problems of low learning
efficiency and unstable performance
Environmental Science and Pollution Research
Environmental Science and Pollution Research

Fig. 4  Deep learning modeling


process

minimize the error or difference between its predicted Application of deep learning in water
output and the true label. environment
4. Nonlinear activation and optimization. Appropriate
activation functions and optimization algorithms are To gain insight into the application of DL in the field of
selected according to actual requirements. The activa- Water Science research, on September 30, 2022, we ana-
tion function is used to increase the nonlinear expression lyzed publications on topics (“deep learning”) and (“water
ability of neural networks, and different optimization science” or “water ecology” or” ecological resources”) in
algorithms will lead to different training effects. The the ISI Web of Science database from 2000 to 2022. The
commonly used optimization algorithms mainly include number of 4928 publications and keywords of deep learning
Adam, AdaGrad, RMSProp, etc. in the water environment were analyzed using bibliometric
5. Hyperparameter optimization. Hyperparameters include statistics and visualization methods. These findings suggest
the number of neurons, learning rate, batch size, regu- that DL, as a new analytical tool, is playing an increasingly
larization coefficient, etc. It is a combinatorial optimi- important role in the big data era of ecological resource
zation problem, so it cannot be optimized by gradient- research. Using the analysis and search results function of
based methods. Generally, evolutionary algorithms and WoS, all the search results are analyzed according to the
other methods are used to find the optimal solution, and principle of metrology. Keyword Network Diagram Draw a
the common methods include the grid search method. collaborative network diagram in VOSviewer. In a keyword
It is necessary to select a suitable hyperparameter opti- network diagram, a node represents a keyword. The size of
mization method according to the characteristics of the the node circle indicates the frequency of occurrence in the
model. graph. The strength of the relationship between keywords is
6. Model evaluation. Evaluation metrics include classi- indicated by the thickness of the connections between nodes.
fication measures (e.g., accuracy, precision, recall, or As can be seen from Fig. 5, the research contributions in the
F1 score, etc.) and regression algorithms (e.g., RMSE, field of water that are included in deep learning are mainly
MSE, MAE, R2, etc.) to evaluate the performance of the focused on deep learning, machine learning, and water qual-
training model. ity, remote sensing, and ecosystem.
Environmental Science and Pollution Research

Fig. 5  Trends of deep learning


papers based on WOS database
analysis in water science
research: a number of papers
by year and b co-occurrence
network diagram of research
keywords

Application of deep learning in water resource Water drainage distribution system


management
The rational allocation of water resources requires a com-
Water resource management roughly refers to the activities bination of water supply management and water demand
of adjusting and regulating water-related behaviors such management. The application of deep learning in water
as development, utilization, protection of water resources, resource allocation mainly includes two aspects: water
and prevention and control of water disasters based on demand forecasting and abnormal detection of water distri-
the natural circulation law and comprehensive carrying bution networks.
capacity of water resources. These tasks can be divided Short-term water demand forecasting helps with optimal
into two categories: water drainage distribution system control of water supply systems, and its accurate prediction
and hydrology. In what follows, we reviewed the work in helps to reduce operating costs and save energy. Recent stud-
these directions and first summarized and compared them ies have shown that DL achieves high performance in pre-
in Table 2 and 3. dicting daily water demand (Perea et al. 2023). Kavya et al.
Environmental Science and Pollution Research

Table 2  Past research on water resource allocation using DL algorithms


Reference Parameter Model Key contribution

Kavya et al. (2023) 10-min interval data (temperature and humid- LSTM • The DL performs better than the traditional
ity) ML model
Salloom et al. (2021) Historical water demand data (updated every GRU​ • The proposed method reduces the complex-
24 h in 2016) ity of the model by six times and maintains
the same accuracy
Guo et al. (2018) 15-min prediction and a 24-h prediction GRUN • The GRUN outperforms the ANN and
interval SARIMA models for both 15-min and 24-h
forecasts
Liu et al. (2023) Daily water demand data from two water STL-ADa-LSTM • The STL-Ada-LSTM model has the highest
plants accuracy in predicting SWDF while balanc-
ing the stability and simplicity of the model
Du et al. (2021) 1660 daily water demand data (1 Jan 2016 to DWT-PCA-LSTM • Both data pre-processing and optimal param-
11 Sep 2020) eter selection techniques are used
Pu et al. (2023) 15-min interval total of 70,909 observations Wavelet-CNN-LSTM • The wavelet-CNN-LSTM outperforms the
(1 Oct 2019 to 30 Oct 2021) other models both in single-step and multi-
step prediction
Meijer et al. (2019) Over 2 million CCTV images CNN • To eliminate data leakage bias, “leave
two-inspections-out” cross-validation was
introduced
Wang and Cheng (2018) 3000 images Faster R-CNN • This model was suitable for the detection of
sewer defects with high mAP and low miss
rate
Yin et al. (2020) 3664 images and a data set of 4056 samples CNN, YOLOv3 • The YOLOv3 network enables efficient and
accurate defect detection of urban drainage
systems
Oh et al. (2022) 4456 sets YOLOv5 • The YOLOv5 was superior to other standard
models YOLO and SSD in real-time sewer
defect detection
Song et al. (2021) 513 images with a size of 256 × 256 pixels FCN • They propose a subsurface drainage pipe
detection approach based on DL with optical
images
Zanfei et al. (2022) 3,000,000 sets (temperature, evapotranspira- GCRNN • The ability of the GCRNN to produce accu-
tion) rate and reliable forecasting
Shao et al. (2021) 64 pipe segments Bayesian-SWMM • Reconstruct the complete profile of an
unknown discharging incidence in sewer
networks

(2023) compared 9 ML and DL models in their study, and LSTM model with AdaBoost-LSTM, which has the highest
the results showed that the LSTM model can predict hourly prediction accuracy while taking into account the stability
or subhourly demand by capturing features from previous and simplicity of the model. To decompose complex infor-
time-step demand, outperforming traditional ML models mation in time series, Du et al. (2021) proposed a preproc-
in water demand prediction. To further improve prediction essing technology prediction of water requirement trans-
accuracy, some studies have proposed using the RNN of the formation (DWT) and principal component analysis (PCA)
GRU model to handle sequential relationships in historical of mixed LSTM model combined with discrete wavelet.
demand data, and introducing the unsupervised classification Further, Pu et al. (2023) proposed the wavelet-CNN-LSTM
method K-means to create new features, which can achieve model, to consider the separation of low-frequency periodic
more accurate and reliable water demand prediction (Guo components and high-frequency random components in the
et al. 2018; Salloom et al. 2021). In addition, most of the original time, the original time series is decomposed into
existing achievements are single DL methods, and there is the third-order wavelet transform of approximate terms c­ A3
relatively little research on coupled DL methods. To improve and detailed terms ­cD1, ­cD2 and ­cD3. The subsequent CNN-
the accuracy of daily water requirement, Liu et al. intro- LSTM model can more effectively capture the changes in
duced the STL method to decompose the smooth sequence the model of characteristic short-term water demand data.
data into three groups, extract the effluent demand feature In general, DL shows good performance in water demand
data, and used AdaBoost ensemble learning to integrate the prediction (e.g., LSTM and GRU). At present, there is room
Table 3  Past research on hydrology using DL algorithms
Reference Data Parameter Model Key contribution

Gao et al. (2020) 8022 hourly rainfall and runoff 153 sets (2000 Flow, precipitation LSTM, GRU​ • The GRU model performed just as well as the
to 2014) LSTM model, but the GRU required less model
training time
Han and Morrison (2022) 11,464,000 sets PVC, HLD, ROD, and HBG LSTM-s2s • The LSTM-s2s model obtained desirable
results for error forecasting
Sadeghi et al. (2019) 869,665 sets TP, FP, MS CNN • It has higher precision and detection ability
than traditional precipitation based on remote
sensing data
Yang et al. (2023a) 2960 indicators NH4+-N, COD, TP MLR, BPNN, GA-BPNN, LSTM • The moving average method was adopted.
Smooth the raw data. The LSTM water quality
prediction method has the best effect
Zhu et al. (2023) BB, WS, LDX, and XHB hydrological stations Temperature and streamflow Spatiotemporal deep learning • The improved accuracy of runoff forecasting
(2001/1/1 to 2015/12/31) rainfall-runoff (SDLRR) using IMERG or TMPA spatial information
Herbert et al. (2021) SWE and reservoir inflow T, 𝑚𝑖, 𝑘𝑖, 𝜆, α, β, θ LSTM-LSTM, ResCNN-LSTM • Long-term water supply forecasts of the opti-
mal deep learning algorithm proved superior to
the statistical method
Xu et al. (2022) Rainfall, evaporation, and discharge data at Discharge CNN-LSTM • The combination of CNN and LSTM or GRU
monthly interval (1976 to 1997) CNN-GRU​ realizes the deep mining of data characteristics,
which was superior to the solution of a single
model
Li et al. (2016) Two daily inflow series of the Three Gorges Reservoir inflow SAE, DRBM, FFNN, ARIMA • The two DNN models proposed are superior to
Reservoir and the Gezhouba Reservoir the traditional models in statistical criteria
Castangia et al. (2023) a set of 13 hydrological stations (1 Jan 2014 to River depth Transformer • The transformer was superior to RNN in terms
31Dec 2014) of RMSE and MAE
Zhang et al. (2018) monthly water diversion, evaporation, precipita- Water table depth Improved LSTM • The LSTM layer applied the dropout method to
tion, temperature, and time (2000 to 2013) improve the learning and fitting ability of the
model
Kabir et al. (2020) 2005 and 2015 flood dataset Water depth and streamflow CNN • The CNN model trained the 2 hydraulic model
and verified its performance in flood simulation
better than the traditional SVR
Environmental Science and Pollution Research
Environmental Science and Pollution Research

for improvement in the methods of daily water supply pre- Some new methods and models are proposed for short-
diction, and future studies can further explore the coupled term water demand prediction and anomaly detection
DL algorithm and other data processing methods to further of water supply networks. The detailed contents of the
improve accuracy and stability. related literature are shown in Table 2. DL (e.g., CNN,
DL method has been widely used in anomaly detection GCRNN, and FCN) have shown good performance and
of water distribution networks. It can map the pressure application potential in these fields. Future research could
data of the water distribution network with the abnormal further explore how sensors can be combined with fluid
state of the network and use deep learning to explore the dynamics and DL models to build digital twin online
abnormal state of the network. For instance, CNN was management platforms to better understand and control
used to solve the problem of multi-station leakage detec- the dynamics of water supply systems (Bartos and Kerkez
tion in the water distribution network and the historical 2021).
pressure data of the water distribution network is used as
input to predict the sensor pressure and finally determine Water information management
whether there is leakage at the pipe network site (Meijer
et al. 2019). Wang and Cheng (2018) proposed a convolu- Common applications of DL in hydrology include runoff
tional neural network faster R-CNN9 (faster region-based forecasting, flood forecasting, and flood warning. From
convolutional neural network), which reduces the time the summary of previous studies listed in Table 2, it can
and labor cost of image detection and interpretation, and be found that water flow is the parameter with the highest
increases the accuracy of defect detection. MAP achieved research frequency, followed by the analysis of a large num-
83%, facilitating subsequent status assessments. However, ber of hydrometeorological data, to achieve accurate predic-
surface features (i.e., gullies and depressions) will result tion and monitoring of different hydrological processes and
in limited data availability. To solve this challenge, Song achieve sustainable management of water resources.
et al. (2021) proposed a detection method for underground The research of runoff forecast mainly includes rainfall-
drainage systems based on an optical image fully convolu- runoff simulation and reservoir inflow runoff simulation. In
tion network (FCN), which uses optical images to detect the rainfall-runoff simulation, most scholars focus on the
drainage problems effectively. Kumar et al. (2020) used runoff simulation of a single watershed or a single station,
a fine-tuned FCN algorithm to segment and measure the and a few scholars have carried out regional-scale rainfall-
damage of five types of sewage pipes. Compared with runoff simulation research (Hu et al. 2018; Jehanzaib et al.
the traditional masked region-based RCNN method, the 2022). The results show that both ANN and LSTM network
proposed method showed effective performance in the models perform well in rainfall runoff models, and are
damage detection of multiple similar sewer pipes under superior to the traditional conceptual model and physical
a complex background. In addition, deep learning can be model. In particular, the LSTM model outperforms the ANN
combined with smart camera sensors on the Internet of model in performance and has better simulation performance
Things to analyze and classify images based on the sever- (Rahimzad et al. 2021). To solve the problem of lack of
ity of sewer congestion. For instance, the YOLOv3 net- spatial rainfall information, Zhu et al. (2023) used SPP data
work detects aging devices in municipal drainage systems provided by satellites and improved rainfall estimation accu-
through data-driven CNN image supervised learning to racy to establish a spatial-temporal deep learning rainfall and
provide technical support for estimation technicians (Yin runoff forecast model for hydrology stations in the upper
et al. 2020). More recently, Oh et al. (2022) proposed an reaches of the Yangtze River, thus improving the accuracy
improved YOLO approach that improves the monitoring of of runoff forecast. Although remote sensing data provides
drainage systems via CCTV video text by integrating con- high-resolution satellite information, there are still limita-
volutional block attention modules (CBAMs). When faced tions in estimating precipitation. Therefore, several studies
with the failure of IOT sensors, we can capture spatial and have explored the effectiveness of combining CNNS with
temporal correlations between water demand time series infrared (IR) and water vapor (WV) channels of geostation-
based on the concept of graphs (Zanfei et al. 2022). In ary satellites to estimate precipitation rates (Sadeghi et al.
practice, because it is difficult to determine the location of 2019). In addition, this paper also proposes different input
the emission source, the duration of the emission, and the preprocessing and deep learning hyperparameter optimiza-
amount of pollutants emitted, a combination of a black- tion methods to discuss, such as the encoder-decoder model
box model and a hydrodynamic model can provide a more combining different input preprocessing methods and the
accurate simulation. For example, a random water source model using CNN, SVM, and GPR to jointly predict rainfall
identification model combining Bayesian inference and data to improve the efficiency and accuracy of the prediction
SWMM can reconstruct profiles of instantaneous water model (Ehteram et al. 2023; Jamei et al. 2023). In the long-
mass incidence in sewer networks (Shao et al. 2021). term prediction of reservoir inflow, DL has the problems of
Environmental Science and Pollution Research

uncertainty and low accuracy. An earlier attempt was made recurrent neural networks in terms of RMSE and MAE, and
by Muluye and Coulibaly (2007) but failed to accurately has lower requirements for recurrent networks. According
predict long-term inflow peaks. To improve the accuracy of to the domain criteria, the forecast error obtained is consid-
long-term prediction, multi-time hydrological grid data can ered acceptable, which proves the applicability of the Trans-
be connected to form video information. By combining CNN former in flood forecasting tasks to help reduce disasters and
and LSTM, complex spatiotemporal features can be con- losses caused by floods (Castangia et al. 2023).
structed and extracted to predict the time step of reservoir In terms of flooding, the widespread adoption of seman-
inflow in the future, which provides a new idea for improv- tic segmentation models can annotate every pixel of an
ing the accuracy of flood prediction (Herbert et al. 2021; Xu object detected in an image and quickly predict observed
et al. 2022). In the hydrological and water quality forecast of rainfall events and flood depths at unknown and known
practical projects, the fluctuation of flow under meteorologi- spatial locations not included in the training dataset. Land
cal conditions and actual conditions is often ignored, result- use/land cover (LULC) is one of the important indicators
ing in the accuracy of the forecast model is not ideal. To of urbanization and plays a key role in flood modeling. For
solve this problem, Yang et al. (2023a) proposed a method example, Gong et al. (2022) proposed an end-to-end data
based on multi-source data fusion, taking into account mul- processing method using U-net and DeepLabV3 + to evalu-
tiple influencing factors. The moving average method is used ate its applicability in different watershed modeling systems
to process the original data, and the accuracy of each model through pixelated LULC (Land use/land cover) semantic
is improved, among which GA-BPNN has the most remark- segmentation. Additionally, in flood modeling, we need
able improvement effect. Compared to other prediction mod- to note that most existing ML algorithms are not suitable
els based on deep neural networks or linear methods, LSTM for multi-output scenarios (i.e., predicting flood variables
prediction models perform well on RMSE and R2. However, (e.g., water depth)) for multiple cells in a single model. DL
in existing studies, the DL model is usually optimized for models have achieved breakthrough results in the field of
its structure or coupled with other (physical hydrological computer vision, especially CNN, with the ability to extract
model) models to improve the forecast performance, and unknown features and learn compact representations (Chen
there are few other upgrading methods. The DL model lacks et al. 2021a). Kabir et al. (2020) used the CNN model to
consideration of the mechanical factors of rainfall and run- train the 2 hydraulic model and verified its performance in
off. For example, coupling LSTM and WRF-hydro models flood simulation better than the traditional support vector
to improve runoff prediction (Cho and Kim 2022). regression (SVR) method through benchmark testing, which
The urban flood control model based on hydrodynamics provides great potential for real-time flood modeling and
has been relatively mature, with high simulation accuracy, prediction. These post-processing frameworks have achieved
but high dependence on basic data. Even if the model uses a certain effect in improving the prediction performance of
parallel computing technology such as GPU/CPU, it still the hydrological model (Han and Morrison 2022).
cannot avoid complex hydrologic and hydraulics calcula-
tions, and it is difficult to get a fundamental breakthrough Application of deep learning in water
in the simulation speed. In contrast, studies using DL to environmental protection
generate susceptibility maps have been widely used (Dodan-
geh et al. 2020; Gharakhanlou and Perez 2023; Karim et al. Water quality modeling and digital planning
2023), in particular, spatial sequence LSTM is more reliable
in flood warning and forecasting (Fang et al. 2021; Luppi- Water quality forecasting refers to judging the pollution
chini et al. 2022). For forecast uncertainty information, some level of collected water samples by giving historical meas-
studies have integrated the feature and temporal dual atten- urements or relevant data to provide the basis for pollution
tion (DA) mechanism and recursive encoder-decoder (RED) prevention and source water protection. It is closely related
structure into the LSTM to establish a DA-LSTM-RED to water quality indicators such as pH value, conductivity,
model, which encourages the proposed model to extract key temperature, biological oxygen demand, chemical oxygen
input information for different types and moments of input demand, total nitrogen, total phosphorus, chlorophylla,
variables to improve the accuracy of multi-step-ahead flood etc. (Chen et al. 2020; Uddin et al. 2021). However, due
prediction (Cui et al. 2023). However, most studies ignore to random and systematic errors, noisy signals may con-
the importance of developing appropriate feature engineer- taminate water quality monitoring stations and experi-
ing methods. In contrast, Transformer is an attention-based mental data. DL has excellent processing power compared
neural network that processes all location information in a to other ML methods, and DL is proven to be the most
sequence simultaneously through a self-attentional mecha- realistic and accurate method for water quality prediction
nism without relying on cyclic connections. It is found that (Singha et al. 2021). Water quality prediction needs to
Transformer is superior to recurrent neural networks and consider the time sequence of input data, and RNN can
Environmental Science and Pollution Research

store and transform previous input information to support effectively enhance long-term prediction accuracy (Tian
decision reference and contribute to water quality system et al. 2019). However, its performance may be affected
management (Antanasijevic et al. 2013). RNNs perform by differences between the source domain and the target
relatively poorly in long-sequence data processing. As an domain. If there are large differences between the two
evolution of RNN, LSTM can handle longer time series domains, the effect of transfer learning may be limited,
problems (Liu et al. 2019). GRU is the variant of LSTM, resulting in less accuracy of the predicted results. Chen
which can handle long-term sequences, shorten training et al. (2021c) further efforts were made to populate large-
time, and obtain better results (Jiang et al. 2021). Due to scale continuous missing data by using the TrAdaBoost
the dynamic nature and complexity of water quality time LSTM algorithm. However, the data imbalance or label
series data, it is often difficult for a single model to deal skew in the replenishment process of this method may lead
with the dynamic nature and complexity of data, but the to deviation or inaccurate replenishment results.
combination of models can make up for their shortcom- As mentioned in the review, as shown in Table 4, time
ings. To this end, researchers can also combine different series models have widely used algorithms in water quality
types of models, for instance, using the CNN model and prediction (e.g., LSTM, GRU) and require a large amount
RNN-related variant model to predict image time series, of data input. In the future, for areas where data is lacking
which can give full play to the advantages of the fusion of or water quality data is lacking, we need to further study and
space-time information of CNN and RNN module, extract apply these issues to improve the accuracy and reliability of
relevant variable information, and make it have adaptabil- water quality prediction.
ity to complex nonlinear systems in WQ prediction`(Baek
et al. 2020; Tu et al. 2019). In addition to this, recent stud- Detection of risk substances and toxicity assessment
ies have proposed new hybrid models such as prediction
models based on wavelet decomposition (Xiao et al. 2017), Water environmental toxicology is a multi-disciplinary field
variational mode decomposition (Huang et al. 2021), and that studies the behavior, effects, and risks of pollutants in
empirical mode decomposition (Ma et al. 2020). These water environment. It involves ecology, chemistry, molecular
models also have some limitations, such as reduced model biology, and environmental science. Toxic pollutants dis-
reliability, complex model structure, long training time, charged into the natural water environment will inevitably
and high computational costs. Therefore, Yu et al. (2022) cause harm to the aquatic ecosystem and eventually poison
also decomposed the original water quality data into mul- human beings through the food chain. A thorough under-
tiple subsequences by using empirical wavelet transform standing of water environmental toxicology is essential to
(EWT), then reorganized the decomposed subsequences protect ecosystems and safeguard public health. Traditional
by fuzzy C-means clustering, and finally added the predic- toxicity assessment methods are often costly and associated
tion results of the subsequences into the BiGRU prediction with uncertainty, but nowadays, high-throughput comput-
model to obtain the prediction results of water quality. ing techniques are increasingly favored because of their
In practical applications, it is necessary to select appro- efficient data processing capabilities. QSAR model can pre-
priate models and algorithms according to different data dict the toxicological properties of unknown substances by
characteristics and conduct adequate data processing and associating the structure of chemical substances with toxic
preprocessing, to improve the stability and accuracy of effects. As the amount of data increases, traditional experi-
prediction results. ence-based QSAR methods become more and more difficult
Dealing with missing water quality data is a major chal- to deal with when solving complex problems. Therefore,
lenge, often due to various human errors or facility issues. the ability of DL-QSAR to predict toxicity of pollutants
Researchers have explored various approaches, such as becomes particularly important (Heo et al. 2019). However,
transfer learning, which utilizes existing knowledge to the structure description of DL-QSAR model reported so
transition to another similar domain with less data. Instead far is still limited to the two-dimensional level. Wang et al.
of having to train an entirely new model from scratch, (2021b) established the SepPCNET model, classified 1317
one could use the weights of an already-trained model as chemicals by ToxCast and screened environmental estro-
a starting point and fine-tune the model with data from gens. By introducing a novel three-dimensional molecular
the new region (Zhuang et al. 2021). Peng et al. (2022) surface point cloud with electrostatic potential to describe
designed a deep transfer learning model (TLT) based on the chemical structure, the model was able to identify active
the Transformer architecture, introducing a recursive fine- and inactive chemicals with an accuracy of 82.8 and 88.9%,
tuning method. This approach not only transfers knowl- respectively.
edge from the source monitoring station to the target sta- In addition to organic chemistry, nanoparticles are also of
tion but also avoids overfitting the source data during the great interest in environmental toxicology. Due to its specific
pretraining stage. When applied to real datasets, TLT can surface area or photocatalytic activity, it can cause harmful
Table 4  Past research on water quality modeling using DL algorithms
Reference Parameter Data Model Key contribution

Singha et al. (2021) pH, TDS, TH, EWQI, etc 226 groundwater samples DL • DL has excellent processing power compared to
RF other ML methods
XGBoost
Liu et al. (2019) pH, DO, CODMn, conductiv- 912 sets (1 Jan 2016 to 30 Jun 2018) LSTM • The LSTM can handle longer time series problems
ity, turbidity, and ­NH3-N
Zhi et al.(2021) DO 236 sets CAMELS-chem LSTM • The LSTM showed potential for predicting DO in
watersheds where water quality measurements are
not available
Lee and Lee (2018) Algae Temperature, PH, DO, BOD, COD, chlorophyll-α, LSTM • The LSTM was able to solve the problem of
and cyanobacteria weekly interval. Water level and RNN information loss and was superior to predicting
bondage at a daily interval chlorophyll-α better than MLP and RNN
Jiang et al. (2021) pH, temperature, conductivity, 16,800 sets (Oct 21, 2019, to Nov 25, 2019) GRU​ • The GRU was faster at predicting water quality and
and water quantity indicator the learning curve. The results show that the R2
of GRU is higher than RNN and LSTM, and R2 is
higher than the ML linear method
Baek et al. (2020) TN, TP, and TOC Water level, TN, TP, TOC, and weather data CNN-LSTM • Combined CNNs and LSTMs to predict water level
and water quality
Tu et al. (2019) COD 3240 sets (2013 to 2019) CNN-GRU​ • The CNN-GRU was superior to GRU, RNN, and
Interval GRU​ SVR in terms of error and training time
RNN
Huang et al. (2021) DO DO at 10-min intervals DeepAR • The VMD technique was used to extract the fre-
quency features of the original data
Yu et al. (2022) DO, ­NH3-H Water quality data of 3 monitoring stations (Oct 1, EWT-FCM BiGRU​ • Compared with a single BiGRU model, it has better
2017, to Apr 30, 2020) prediction performance
Peng et al. (2022) PH, DO, ­NH3-N, and CODMn 52,800 sets Transfer Learning based • The TLT model can significantly improve the water
on Transformer (TLT) quality prediction accuracy of the Transformer
model model by introducing the TL framework
Zhu et al. (2021b) DO 23,000 sets (May 1, 2017, to May 31, 2019) BiLSTM-Attention • Used a large available data set for an aquatic system
Temperature, pH, turbidity, water conductivity, to predict DO concentration trends in the target
chlorophyll-c, blue-green algae, and DO system
Chen et al. (2021c) DO DO at 4-h intervals (Aug 2018 to Aug 2020) TrAdaBoost-LSTM • Captured long-term dependencies between time
series to fill in large-scale continuous missing data
Tian et al. (2019) Algae 25,000 Temperature, pH, ORP, EC, turbidity, DO, TL-LSTM • The TL has better prediction effect than parameter
and chlorophyll-α at 5-min interval TL-RNN norm
TL-ANN penalties and dropout
Environmental Science and Pollution Research
Environmental Science and Pollution Research

effects on aquatic organisms such as oxidative stress and photochemical reactions of microalgae species. Pyo et al.
lipid peroxidation through the production of reactive oxygen (2022) developed hyperspectral images and a 1D-CNN
species. Mill et al. (2021) used U-Net and semi-automatic regression model for analyzing cyanophyte concentration.
synthesis data; nanoparticle segmentation of microscopic Pyo et al. (2021) used hydrodynamic simulations to predict
images can be achieved with segmentation accuracy com- in situ cyanobacteria cell concentrations. Due to the opti-
parable to manual annotation of toxicologically relevant cal complexity of inland waters, these studies are limited to
metal oxide nanoparticle ensemble. Emerging pollutants monitoring the vertical distribution of harmful algal blooms.
(ECS) include a wide range of man-made chemicals, which Rodrigues et al. (2021) used a model marine diatom (Phaeo-
adversely affect the normal functions of homeostasis, repro- dactylum tricornutum) as a test organism to assess expo-
duction and metabolism by disrupting the endocrine system sure to different ECs at different doses. The results showed
of animals and humans. Mukherjee et al. (2021) combined that the 2D convolutional neural network predicted the type
the CNN and LSTM models to play the advantages of the of EC exposed to the culture with 97.65% accuracy, while
framework of active learning. The class activation (CAM) Rocket performed best in predicting the concentration of the
generated from the feature extraction layer is used to predict culture with 100% accuracy. In addition to algae, especially
the binding affinity and agonistic activity of a given chemi- aquatic in vivo models provide information related to the
cal to the estrogen receptor (ER), and the Grad-CAM is used organism’s complex physiological processes and metabolic
to visualize and map the structural alarm and determine the pathways (Dong et al. 2023). Deep learning morphomet-
chemical environment specific to the structural alarm. ric analysis (DLMA) was used to quantitatively identify
In aquatic environmental toxicology, the toxicity 8 phenotypic abnormalities of zebrafish larvae, to achieve
of aquatic animals and plants also needs special atten- classification and segmentation of phenotypic characteris-
tion. The presence of pollutants can be detected through tics, and realize efficient hazard identification of chemical

Table 5  Past research on water environmental toxicology using DL algorithms


Reference Model Input data Key contribution

Heo et al. (2019) DL-QSAR Collected from the available literature experi- • The accuracy of DL-QSAR in quantitative analy-
mental datasets assessing qualitative responses sis is above 90%
EDCs
Wang et al. (2021b) SepPCNET 1317 chemicals • The difference in isomer activity was successfully
recognized
Kumar et al. (2021) DNN 3039 compounds • Dnn-based classifiers have the highest accuracy
in predicting the mutagenicity of compounds
Shen et al. (2021) DNN 1056 pesticides • The accuracy of DNN and RF methods in
RF screening potential ER agonists exceeded 63%,
respectively
Rodrigues et al. (2021) 2D CNN Phaeodactylum tricornutum • The 2D CNN was 97.65% accurate in predicting
Rocket the type of EC to which the culture was exposed,
and Rocket performed best in predicting the
concentration the culture was subjected to with
100% accuracy
Mukherjee et al. (2021) CNN-LSTM 7812 sets • In combination with CNN and LSTM, the Grad-
Chemical for the estrogen receptor CAM generated from the feature extraction layer
visualizes key structures
Hu et al. (2020) CNN QSAR prediction • Proposed QSAR prediction by the concatenation
of end-to-end encoder-decoder model and CNN
architecture
Mill et al. (2021) U-Net Two different diameters and food grade ­TiO2 • CNNS were trained to accurately segment nano-
nanoparticles particles in microscope images
Pyo et al. (2021) CNN Investigate specific algal (from 2015 to 2018) • The EFDC-NIER model was used to calibrate the
EFDC-NIER water quality synthesis data
Pyo et al. (2022) 1D-CNN Chlorophyll-a, phycocyanin, lutein, fucoxanthin, • The 1D-CNN model used quantitative and evalu-
zeaxanthin (from 2019 to 2020) ative analysis of algal phenomena
Dong et al. (2023) TensorMask 870 zebrafish larvae images • The DL model was used to quantitatively identify
Mask R-CNN (open-source image data) in vivo toxicity screening of zebrafish larvae
Green et al. (2021) GAN-ZT 1003 unique ToxCast chemicals • The combination of Go-ZT and GAN-ZT was
Go-ZT best for predicting toxicity of untested compounds
Environmental Science and Pollution Research

substances and environmental pollutants. By combining smaller than that of the indirect prediction method, which
high-throughput screening data with the DL method, Green predicted the COD amount according to the sewage flow rate
et al. (2021) combined Go-ZT and GAN-ZT to predict the and COD concentration. Li et al. (2021b) also showed that
toxicity of untested chemicals, providing an efficient way to the model combining CNN and LSTM was superior to the
screen 85,000 environmental chemicals. model using the two networks alone in terms of accuracy
As shown in Table 5, the application of DL to the detec- and prediction ability.
tion of water-characteristic risk substances and toxicity In wastewater plant inspection, the main challenge of
assessment has been widely successful (e.g., CNN). How- influent measurements in fault detection is the nonlinearity
ever, this method also has some limitations, such as high of the interference they are subjected to. Among them, the
consumption of computing resources, poor interpretation, AE model has received extensive attention because of its
and limited adaptability. To overcome these limitations, we data coordination ability to adapt to the nonlinear relation-
anticipate additional efforts in the future, including improv- ship and dynamics of the data. The researchers improved
ing network structure and training strategies, providing bet- on the AE, and these models were particularly effective in
ter interpretability, and improving model robustness and dealing with noisy data. For instance, Ba-Alawi et al. (2021)
generalization to adapt to different imaging conditions and established a stacked autoencoder as SDAE as a denoising
complex backgrounds. In addition, other DL methods, such device, although the denoising method is mainly aimed at
as GAN and Transformer, can be explored to further improve extracting good and useful features from input data. It natu-
the performance of water signature risk substance detection rally extends to the interpolation of missing data, and Abiri
and toxicity assessment. et al. (2019) investigated the use of denoising autoencoders
for single interpolation of missing data to overcome lim-
Water pollution treatment ited data challenges. Due to the high cost of sensors and
the frequency and delay of sampling and laboratory analy-
Neural networks (NNs) have been used in modern waste- sis, wastewater treatment process data may be sparse, and
water treatment facilities to help simulate water quality and transfer learning can be used as an alternative solution to
equipment operating conditions. However, previous stud- deal with data scarcity. For instance, Alvi et al. (2022a) used
ies have mainly focused on the characterization of sludge autoencoders to systematically enhance the target domain
bulking behavior and evolution during the activated sludge data so that the synthetic samples generated by them closely
process, and have not provided a definitive solution. With track the target domain distribution to improve the trans-
the introduction of a time series DL algorithm (e.g., LSTM), fer learning performance of the task. However, data-driven
its good effectiveness has been proved (Farhi et al. 2021). multi-layer models take too long to train due to heavy com-
As shown in Table 6, COD is the parameter that is studied putation. To overcome this problem, Safder et al. (2022)
most frequently in the process of monitoring sewage, and used extreme learning machine (ELM) technology to adjust
it is also a key indicator for evaluating the effect of sewage the weight of SAE in the training plan. The deep confidence
treatment. Secondly, SS, BOD, and other parameters have a network technology is applied to the feature space extracted
close relationship with COD (Guang et al. 2019), which are by ELM to classify the current state of the secondary clari-
not only affected by time series but also related to spatial fier and improve the operation and performance of the waste-
distribution. To better understand the data, the model, the water treatment plant through fast training time, generaliza-
selection of the best input features, and to discover the key tion ability, and classification ability.
factors in this specific domain, Alvi et al. (2022b) investi- In wastewater plant inspection, the main challenge of
gated the relative importance of each input feature to predic- influent measurements in fault detection is the nonlinearity
tive modeling problems through feature importance via input of the interference they are subjected to. Among them, the
suppression, but it failed to adequately consider the interac- AE model has received extensive attention because of its
tions between variables. Zhang et al. (2023) take a more data coordination ability to adapt to the nonlinear relation-
advanced perspective by using interpretable artificial intelli- ship and dynamics of the data. The researchers improved
gence (XAI) methods such as SHAP to take into account the on the AE, and these models were particularly effective in
interactions between the 12 input metrics when quantifying dealing with noisy data. For instance, Ba-Alawi et al. (2021)
their contributions. To converge the model faster and bet- established a stacked autoencoder as SDAE as a denoising
ter, researchers try to build a hybrid neural network predic- device, although the denoising method is manly aimed at
tion model to improve the prediction performance (Wang extracting good and useful features from input data. It natu-
et al. 2021a). For instance, Wang et al. (2019b) used the rally extends to the interpolation of missing data, and Abiri
CNN-LSTM mixed model to dynamically predict the COD et al. (2019) investigated the use of denoising autoencoders
of wastewater treatment effluent, and found that the error for single interpolation of missing data to overcome lim-
of the direct prediction method of COD mass flow rate was ited data challenges. Due to the high cost of sensors and
Table 6  Past research on WWTP WQ prediction using DL algorithms
Reference Parameter Model Input data Key contribution

Farhi et al. (2021) Ammonia and nutrients LSTM Climate data, ammonia, nitrate, flow rates, rotors’ water • The effluent concentration of ammonia ­NH4+ and nitrate
level depth, oxygen, and turbidity ­NO3- can be predicted several hours in advance
Alvi et al. (2022b) Ammonium and nitrate GRUconv 25,812 samples • The GRUconv model enabled performance gain up to 37%
DO, pH, temperature, TSS turbidity, and nutrient concentra- in RMSE over the closest competitor
tions (Jul 25, 2020, to Jul 7, 2021)
Safder et al. (2022) SVI ELM-SAE-DBN 1040 samples • The modle outperformed existing state-of-the-art methods
COD, BOD, DO, influent flow, MLSS, cyanide in the SVI modeling task by 38 to 78%
Alvi et al. (2022a) Nitrite and ammonia AETL 58,464 samples • The proposed method used AE to systematically enhance
pH, temperature, and DO at 15-min intervals the target domain data to improve the TL performance of
Environmental Science and Pollution Research

the task
Wang et al. (2019b) COD CNN-LSTM Wastewater temperature, pH, AN, sewage inflow, influent • The hybrid CNN-LSTM prediction model had higher accu-
COD, and effluent COD at 1-min intervals racy and better prediction performance than the stand-alone
CNN or LSTM model
Li et al. (2021b) COD and SS CNN-LSTM-Attention Influent COD and SS, flowrate, PH, temperature, DO, efflu- • A hybrid CNN-LSTM-Attention model performed the best
ent COD, and SS in COD and SS forecasting
Zhang et al. (2023) COD, TN, and TP LSTM tenWQIs (e.g., COD, TN, TP, pH, ­NH3-N, SS, T, σ, TU, • The GSA based on SHAP was performed to identify the
DO, HACH, USA) and five meteorological indicators input variables contributing significantly to detection
targets
Guang et al. (2019) COD Bi-LSTM Feed rate, aeration, effluent BOD, nitrate, SS, and COD • The LSTM has good performance in solving the soft meas-
LSTM urement problem of COD
Niu et al. (2020) COD and SS GA-DBN Feedrate, pH, temperature, DO, influent COD, and SS • The GA-DBN demonstrated lower errors with higher fitting
than DBN and BPNN
Wang et al. (2021a) Nutrients DBN-EL ORP, DO, influent TP, effluent TP, • The parameters update the weights only when positive
NH4, pH, TSS, nitrate, and temperature events are triggered, and a DBNEL convergence analysis
based on Markov process optimization is proposed
Ba-Alawi et al. (2021) COD, TP, TN, and SS SDAE 720 samples • The SDAE-based fault detection performance was superior
COD, TP, TN, PH, Temperature, turbidity, and SS to conventional methods with a detection rate of 98%
Dairi et al. (2019) Influent conditions RNN-RBM Influent conditions datas (September 1, 2010 to May 14, • The area under the curve of OCSVM based on RNN-
2011) RBM was as high as 0.98, which was superior to all other
scenarios
Pang et al. (2019) COD, TP, and N
­ H4+-N ASM2d-QL COD, TP, ­NH4+-N, and MLSS • The algorithm provided successful intelligent modeling and
stable optimal control strategies under fluctuating influent
loads
Chen et al. (2021b) DO and Chemical dosage MADRL 10,000 sample data • The model has a high Q value for influents with appropriate
control variables, learning abstract features from high-
dimensional states
Niu et al. (2022) Energy consumption (EC) DBN-DMOALO-PI SO5, ­SNO2, ­TSSinf, ­BOD5, ­CODinf, ­TNinf, ­SNHinf • The problem that the objective function can overcome the
and effluent quality (EQ) dynamic characteristics of process data was solved
Ma et al. (2020) BOD DMF-DNN 32,323 sets • The DMF-DNN can solve sparse matrix problems more
Temperature,pH, and DO intelligently
Environmental Science and Pollution Research

the frequency and delay of sampling and laboratory analy- optimization, which can achieve simultaneous optimization
sis, wastewater treatment process data may be sparse, and of energy consumption and effluent quality, thus effectively
transfer learning can be used as an alternative solution to reducing the cost of the wastewater treatment process and
deal with data scarcity. For instance, Alvi et al. (2022a) used achieving carbon neutrality. At the same time, digital twin
autoencoders to systematically enhance the target domain technologies such as virtual reality and augmented reality
data so that the synthetic samples generated by them closely are also applied in wastewater treatment, using cutting-edge
track the target domain distribution to improve the trans- deep learning technology to achieve real-time simulation and
fer learning performance of the task. However, data-driven synchronous control, to break through the technical prob-
multi-layer models take too long to train due to heavy com- lems of directional transfer and transformation of pollutants
putation. To overcome this problem, Safder et al. (2022) in water bodies. NN is widely used in wastewater treatment,
used ELM technology to adjust the weight of SAE in the and emerging technologies and the combination of DL and
training plan. The deep confidence network technology is bioinformatics offer more possibilities and efficiencies for
applied to the feature space extracted by ELM to classify wastewater treatment systems. However, how to improve
the current state of the secondary clarifier and improve the the accuracy and interpretability of information mining and
operation and performance of the wastewater treatment plant analysis is still the main problem at present.
through fast training time, generalization ability, and clas-
sification ability. Application of deep learning in water ecosystem
Overall, the core paradigm of water pollution control is
shifting from pollutant removal to resource and energy uti- Water body detection and aquatic species identification,
lization. Regarding the calculation and control of energy as well as wetland monitoring protection, are essential to
consumption in sewage treatment plants, Long et al. (2016) ensure the health of watershed ecosystems. Deep learning is
showed that a single key performance indicator (KPI) could gaining increasing attention in this area, both from a group
not be applied universally. Therefore, Oulebsir et al. (2018) and individual-level perspective. In this section, therefore,
proposed a method for multi-wastewater treatment plant we discuss research using deep learning in this area, which
daily time steps based on different KPls. Hernandez-del- we summarize in Table 7.
Olmo et al. (2016) pointed out that reinforcement learning
(RL) can better control the DO set points of proportional Identification of aquatic ecological species
integration (PI) controllers that consider system state vari-
ables such as ammonium and DO concentrations, compared The underwater environment is complicated, so it takes a
with manually operated and ammonium-based PI control- lot of manpower and material resources to obtain the under-
lers. Other studies test and apply RL methods of Q-based water paired images. With the development of IoT devices,
learning. For example, Syafiie et al. (2011) used a Fen- e.g., cameras and drones, large amounts of underwater image
ton reagent to control the advanced oxidation process of data can be collected for the identification and classification
phenolic substances in a laboratory factory and used the of animals and aquatic plants (Gray et al. 2019; Mittal et al.
RL method to optimize the use of a Fenton reagent, thus 2022). Traditional methods such as PCA and SVM are often
improving the efficiency of the advanced oxidation process. used to extract features from aquatic animal images for target
Pang et al. (2019) optimized the hydraulic residence time recognition and detection (Olden et al. 2008). However, due
of anaerobic and aerobic reactors with the RL method to to the high heterogeneity of the water environment and the
achieve a more efficient wastewater treatment process. In similarity of aquatic species, it is difficult to distinguish dif-
addition to traditional RL, DRL is obtained by combining ferent aquatic organisms by external characteristics, which
DL and RL to deal with complex control problems. Chen poses certain challenges for the body type recognition and
et al. (2021b) used the multi-drug deep deterministic pol- detection of aquatic objects. Researchers have begun to use
icy gradient (DDPG) method. Control of DO and chemical DL for high-resolution remote sensing image feature classi-
doses in wastewater treatment plants in continuous action fication and change detection (Krizhevsky et al. 2017; Song
and state space. In this approach, they tested various reward et al. 2020), e.g., VGGNet (Muhammad et al. 2018), FCN
functions to develop sustainable control strategies. The DRL (Schuegraf and Bittner 2019), ResNet (Zhu et al. 2021a), and
approach is considered to be better able to deal with the U-NET (Sharma et al. 2018). Compared with the ML clas-
complexity and uncertainty of the system. To better deal sification method, it can more effectively adaptively learn
with changes in wastewater composition during wastewa- recognition features from images through supervised learn-
ter treatment, Dairi et al. (2019) proposed an RNN-RBM- ing (Shi et al. 2021). Water bodies are habitats for many
based OCSVM model for data-driven anomaly detection. biological species, and water extraction helps identify sites
In addition, Niu et al. (2022) also proposed a dynamic opti- where bathymetric species may exist. Li et al. (2019a) used
mization control method based on multi-objective Antlion the FCN model to extract water bodies from high-resolution
Environmental Science and Pollution Research

remote sensing images of GF-2 under limited training sam- not only expands the number of images but also improves
ples. The results are significantly better than NDWI, SVM, the accuracy of recognition(Li et al. 2018; Wang et al.
and SM models. Due to the changing interference caused by 2023). The continuous optimization of the DL algorithm
multiple imaging conditions and complex land backgrounds, improves the stability of the algorithm, and to a certain
Zhang et al. (2021) proposed the cascaded fully convolu- extent improves the search for real images from underwater.
tional network (CFCN) to improve the performance of water However, these methods ignore the diversity of underwater
body detection in high-resolution SAR images. Furthermore, conditions. Wang et al. (2020a) proposed an attention-gen-
Wang et al. (2020b) used ResNet as an encoder to obtain erating adversarial network (CA-GAN) model to enhance
advanced feature information of the input image. Moreover, underwater images by creating a many-to-one mapping func-
the features are abstracted by residual convolution, and dif- tion based on the attention mechanism. For problems such
ferent levels of features are fused. Finally, a higher precision as complex numbers of organisms and biomass estimation,
lake water body extraction map is obtained. The proposal the experimental verification by Fabbri et al. (2018) proved
of DL improved network performance, but this led to the that Cycle GAN is effective in underwater image enhance-
consumption of computing resources and the complexity ment tasks and solves the problem that the model needs to be
of neural network structures. Therefore, researchers built trained on paired data. In recent years, the Cycle GAN has
lightweight DNNS by using reverse residual structures to been widely used in underwater image preprocessing tasks.
enhance gradient propagation, which first increased the Based on Cycle GAN, Han et al. (2020) proposed an end-to-
dimension and then decreased the dimension to reduce the end spiral generative adversarial network (Spira-GAN) for
number of parameters and computational complexity while underwater image enhancement tasks to restore underwater
maintaining high image classification accuracy(Wang et al. images. The model has several convolution-deconvolution
2021d). block generators that can retain more meaningful detail in
For the poor quality of underwater images, DL solved the original underwater image. At the same time, a pixel-
these problems by enhancing the generalization ability level loss function composed of mean square error and angle
through data enhancement and achieving higher target error is used to train the model stably to overcome the prob-
detection accuracy (Capinha et al. 2021; Christin et al. 2019; lem of overexposure and avoid color distortion.
Deep et al. 2019). Zhang et al. (2020) proposed a stochastic This section mainly summarizes the research status of
gradient descent (SGD) preprocessing algorithm to improve deep learning in the field of aquatic animal identification
the pixel quality and detection accuracy of images. Not all and detection. Although DL has achieved good results, in
images can be preprocessed to improve their detection qual- the actual situation, the adaptability and robustness of these
ity, so some researchers try to get better detection results methods still have a great lack. A priori-driven underwater
by improving the model. For instance, Yang et al. (2023b) image enhancement model requires specific domain knowl-
found that TL combined with multiple CNNs could effec- edge and can be invalidated when the assumptions do not
tively identify algal cell images and harmful phytoplank- match the actual scenario. On the other hand, for pure data-
ton. The results showed that the recall rate reached 98.0%, driven deep learning methods, the performance of the net-
which confirmed the significant role of TL in improving the work is closely related to the quality and quantity of training
identification performance of harmful phytoplankton. Wang data. Due to the complete reliance on data, when the domain
et al. (2021c) proposed an underwater image enhancement gap between the training image and the test image is large,
network (UIE-Net) based on CNN. The UIE-Ne network is the model will fail on the image with different color skew-
mainly composed of two subnetworks, namely color correc- ness and turbidity. Most of these studies have not considered
tion networks (CC-Net) and contrast enhancement networks the problems, which are the challenges faced by DL methods
(CE-Net). The convergence speed of the network is signifi- in the identification and detection of aquatic animals and are
cantly improved by using the pixel interrupt strategy, and the also the future research direction.
color correction and fog removal of underwater images are
realized by unified training. In addition, a region-based CNN Overall management of wetland ecology
is proposed to detect fish moving freely in an unconstrained
underwater environment and to solve the difficult problem of In the late 1970s, the concept of “ecosystem health” emerged
fish identification through the auxiliary recognition method. internationally, which is a new goal of environmental manage-
Because underwater images are affected by wavelength- ment and ecosystem management. Human activity affects all
dependent light absorption and scattering, there will be ecosystems and their changes, and we believe that DL tools
serious color distortion and detail loss, which will seriously are an appropriate way to achieve these goals. For instance,
affect the detection and recognition of underwater objects. DL tools can be applied to monitoring and assessment of
The emergence of GAN can solve this problem to some ecosystem restoration and restoration processes to under-
extent. The existing extended GAN-based image dataset stand the ecological functions and effects of restoration of
Table 7  Past research on water ecosystem using DL algorithms
Reference Application Model Input Data Key contribution

Li et al. (2019b) Water body extraction FCN 10,000 VHR images • Introduced a flexible one-to-one convolution layer
Wang et al. (2020b) Water body extraction MSLWENet 6774 images • By using residual convolution to extract features, the
overall accuracy of the model is up to 98.53%
Wang et al. (2021d) Water body extraction MobileNetV2 GF-2, WorldView-2, and UAV orthoimages • The water extraction accuracy of MobileNetV2
model is the highest
Zhang et al. (2021) Water body detection CFCN 12,796 image patches • A new variable focal loss (VFL) function was pro-
posed and a frequency-dependent factor was used to
replace the constant weight factor of focal loss
Zhang et al. (2021) Sea cucumber detection SGD 120 underwater images of sea cucumbers • SGD effectively acquired image features to extend
the accuracy of sea cucumber detection
Chen et al. (2023a) Aquatic monitoring YOLOv5s6 + DA 3307 images • The combined F1 of YOLOv5s6 model and data
augmentation (DA) can reach 96.84%
Yang et al. (2023b) Aquatic ecological monitoring CNN-TL 7859 phytoplankton images • Compared with the model without fine-tuning, the
average accuracy of the CNN-TL model for auto-
matic recognition of harmful phytoplankton images
is improved by 11.9%
Wang et al. (2021c) Underwater image enhancement UIE-Net 200 images from the NUS-8 data • Proposed two parallel subnetworks CC-Net and CE-
Net to generate pixel-wise color cast and transmis-
sion map to enhance underwater images
Li et al. (2018) Underwater image color correction GAN 3800 underwater images and 3800 air images • The first attempted that correct the color casts of
underwater images by weakly supervised learning
Wang et al. (2023) Underwater image enhancement SA-GAN 1000 real underwater images • Enhance underwater images by referencing paired
raw and high-quality natural images
Wang et al. (2020a) Underwater image enhancement CA-GAN 70,000 pairs images • The concurrent channel and spatial attention feature
fusion module were introduced to recalibrate the
front-end feature map and the back end feature map
generated by the decoder layer
Fabbri et al. (2018) Underwater image enhancement Cycle GAN 1000 collected real underwater images • Improved the quality of visual underwater scenes
and restored underwater images to generate data sets
Han et al. (2020) Underwater image enhancement Spiral-GAN URPC Dataset, EUVP Dataset, RUIE Dataset, • Proposed spiral training strategy can implicitly
Underwater-Mot Dataset increase the training data, allowing the models to
learn a more complex mapping
DeLancey et al. (2020a) Wetland CNN Sentinel-1, Sentinel-2, ALOS, and DEM (2017 to • The CNN-generated wetland product proved to be
classification 2018) more accurate than the XGBoost wetland product
by 5%
Hosseiny et al. (2022) Wetland classification WetNet Sentinel-1 and Sentinel-2 (GEE) • WetNet that accuracy and processing time better
than other DL approaches
Du et al. (2020) Forested wetland classification U-Net WorldView-3 (WV3), lidar, DEM, TWI • The integration of topographic metrics in DL model
can improve the classification accuracy for depres-
sional wetlands
Environmental Science and Pollution Research
Environmental Science and Pollution Research

restored ecosystems. As shown in Table 7, we can use the

• Predictions, which recall rate of 91% and accuracy


combination of Sentinel-1, and Sentinel-2, radar data, aerial

• The Nadam optimizer improved accuracy to more

• The Adam optimizer was used to train the model,


than 96% and reduced data loss to less than 0.12

and the feature extractor was combined with the


images, and satellite images to carry out large-scale monitor-
ing and assessment of wetlands, including wetland delinea-
tion, wetland vegetation types, and surface water dynamics.
DeLancey et al. (2020b) evaluated the effectiveness of CNNS
times that of the benchmark model
in wetland area classification. Compared with traditional ML,
• DeepNets performed wetland

CNN improves the discriminant ability of wetland classifi-

Swin Transformer classifier


cation by more than 10%. Hosseiny et al. (2022) proposed
the WetNet model, which was composed of three different
submodels, including several cyclic layers and convolution
Key contribution

layers, to improve the monitoring classification and perfor-


rate of 57%

mance of wetlands. Faced with the low availability of wetland


reference data for large-scale wetland monitoring, Jamali et al.
(2022) proposed to combine GAN and CNN into a new model
3DUNetGSFormer, using 3D GAN networks to generate syn-
thetic Sentinel-1/2 data for classes with limited training data.
Sentinel-2, ALOS-DEM, and NOAA-DEM images

Then, the real and synthetic data are passed to CNN, and the
results show that the accuracy and F1 of the model are greatly
improved compared with the use of real data.
The semantic segmentation method divides the coastal
and wetland land cover areas of each superimposed feature
on remote sensing images into separate layers. O'Neil et al.
(2020) used deep learning architecture DeepNets of remote
sensing data and input dataset composed of high-resolution
LiDAR DEM, NAIP

3DUNetGSFormer Sentinel-1/2 (GEE)

terrain index and normalized differential vegetation index for


wetland semantic segmentation. The results show that the
model can achieve high precision in training and evaluating
Input Data

a single site. In addition, models trained across multiple sites


can achieve similar accuracy. In addition, in the face of thou-
sands of slight forest undulations, wetlands often occur when
the forest canopy is submerged. Du et al. (2020) used the most
advanced U-NET semantic segmentation deep neural network
to map the submerged area of forest wetlands. They used
DeepNets

BiSeNet
Model

LiDAR intensity images to infer the inundation of wetlands.


By adding terrain information to the pixel-based random forest
output, the overall accuracy is slightly improved. Pham et al.
(2022) developed BiSeNet, an advanced U-structured neural
network that analyzes spatial and contextual information in two
stages, allowing it to interpret 13 ecosystems in the estuarine
Wetland identification

Ecological succession

region from images in less than 1 min. In addition to map-


ping species and areas of high value to ecosystems and con-
Wetland mapping

servation, DL can also be used to track the impact of human


Application

activities on ecosystems. For instance, YOLO combined data


enhancement techniques to monitor the effects of indirect fac-
tors such as temperature changes and wastewater discharge on
river ecosystems with greater efficiency and accuracy, and such
integrated algorithms in turn more effectively track changes
Table 7  (continued)

O'Neil et al. (2020)

Jamali et al. (2022)


Pham et al. (2022)

in plankton populations (Chen et al. 2023b). DL can also use


tracking information from industrial fishing vessels to map the
fishery’s footprint. This classification was essential for measur-
Reference

ing the health and quality of water bodies and for protecting
endangered species (Mittal et al. 2022).
Environmental Science and Pollution Research

Going one step further, we envision automated sensors numbers, LSTM and GRU make up for these shortcomings
that use deep learning to manage water ecosystems, which of RNN and can better solve the tasks with a large amount
can continuously manage water ecosystems without much of data. For image recognition and classification tasks, CNN
human intervention. In addition, it can also help decision- can be better used for visual processing. In these tasks, there
makers make policy or management decisions about assess- may be some unlabeled data that can be used as part of
ment services, which is expected to break through the con- semi-supervised learning to improve the performance of the
struction and overall management technology system of model. However, a common problem with NN algorithms
green watersheds. is that they are often black-box and lack interpretability. In
contrast, Gans do not require explicit labeled data and labels.
It generates and recognizes data by training generative and
Discuss and suggestions recognizer models in adversarial learning. Therefore, Gans
are very useful when dealing with image distortion tasks that
The above review lists the representative applications of DL have well-defined features and need to generate understand-
in the field of water science. Most importantly, the scope of able rules. In addition, reinforcement learning (DRL) can
this review paper is extended to as many application types as address issues such as long-term planning, maintenance, and
possible, including the elaboration of water environmental management, addressing optimal decision-making. In the
resource management, water environmental protection, and field of water science, there are other algorithms and models
water ecological basin optimization management. To better such as Transformer and DBN. They offer different ways to
use DL in water-related research, we compare and analyze solve various problems. According to the specific research
the advantages and disadvantages of these DL algorithms in task and data characteristics, it is very important to choose
different research directions, and put forward suggestions for the appropriate algorithm and model.
algorithm selection according to the applicability of algo-
rithms to different scenarios. The results are briefly shown
in Table 8. Future research challenges
In general, supervised learning usually requires a large
amount of labeled data to train the model. RNN correlation DL algorithm application optimization
model is the most widely used algorithm in the process-
ing of prediction tasks, but there are still problems such as DL has gradually penetrated the field of water pollution
long training time and slow convergence speed. Because the treatment research, and many papers have accomplished
data in many fields of water-related research are time series the same task with almost the same methods and different

Table 8  Recommendations on the selection of DL algorithm in different research directions of water science
Means Applications Algorithm recom- Data requirement Algorithm character- Applicable conditions
mended istics

Supervised learning Water demand fore- LSTM, RNN, GRU​ Historical demand Long-time memory; Data of time series
casting data, Rainfall data, complex structure,
Predicting water observed flow, and black box, calcula-
quality water depth data tion burden
Flood forecasting
Runoff forecasting
Supervised or semi- Flood prediction and CNN, U-NET (e.g., Urban catchment data, Automatic feature The sample is pre-
supervised learning monitoring R-CNNs and YOLO) flow, CCTV video extraction sented in the form
Leakage detection and of images
location
Detection of risk sub-
stances and toxicity
assessment
Reinforcement learn- Water pollution treat- DRL Demand data, system Better generalization As mentioned above
ing ment technology state variables performance
Unsupervised learning Aquatic identification GAN LiDAR data, satellite Adversarial training, Data with a small
and detection data, and radar generator and dis- volume
criminator network
structure, zero-sum
game, loss function
Environmental Science and Pollution Research

data. Scholars should optimize the deep learning model indicating the importance of each spatial location. On this
with existing focus length, to reduce training parameters, basis, Jalwana et al. (2021) introduced camera to calculate
improve running speed improving classification effect, etc., accurate significance maps using gradient backpropagation
and develop more general algorithms and models accord- strategies, avoiding problems with external factors such as
ing to water treatment and management requirements. For heuristics, priors, and thresholds, allowing for better inter-
example, the standard convolution is replaced by separable pretation of depth vision model predictions.
convolution, which can reduce the number of model param- Model distillation: Distillation is a process of training
eters and improve the calculation speed of the model (Hu a smaller, more interpretable model to mimic the behavior
et al. 2019). By adding additional features to the model to of a larger, complex model. By transferring knowledge
improve the classification recognition effect (Cruz et al. from the black box model to the smaller model, the result-
2017),transfer learning and capsule network (CapsNet) make ing model can be more interpretable while maintaining
up for the loss of spatial information (Sabour et al. 2017), similar performance.
and better solve the two defects of CNN’s low recognition Rule extraction: Rule extraction methods aim to extract
ability between objects (Wang et al. 2019a). human-understandable rules or decision trees that mimic
the behavior of the deep neural network. These rules pro-
Black box effect and algorithm interpretability vide explanations for the model’s predictions by explicitly
stating the conditions under which specific outcomes are
DL technology can sense, learn, act, and even make deci- predicted. Layer-wise relevance propagation (LRP): LRP
sions autonomically. However, the effectiveness of the is a technique that aims to attribute the prediction of a
technology is mainly limited by the inability to explain the deep neural network to the input features by redistribut-
rationality of its analysis and decision to users, evaluate the ing the prediction score back to the input. LRP assigns
advantages and disadvantages of its model, and predict its relevance values to each input feature, highlighting their
universality in new tasks. It is not even secure for future contribution to the final prediction.
applications. Network dissection: Network dissection involves ana-
Interpretability of deep neural networks, especially in lyzing the responses of intermediate network layers to
the form of “black box” models, refers to the ability to identify if they correspond to specific semantic concepts
understand and explain the reasoning behind their predic- or object classes. This technique provides insights into the
tions or decisions. Deep neural networks are often consid- interpretability of deep neural networks by revealing the
ered black box models because their internal operations types of concepts the network has learned.
and complex architectures make it challenging to inter- SHapley Additive exPlanations (SHAP) is a game
pret their decision-making process. While interpretability theory-based method that assigns a “fair” value to each
remains a significant challenge for deep neural networks, feature, and the SHAP method supports both global anal-
researchers have developed several techniques to shed ysis of estimates and local analysis of detailed instances.
light on their behavior. Here are some approaches used to Wang et al. (2022) used the shap-additive interpretation
enhance interpretability: method to explain the output of the DL model to under-
Feature importance: By analyzing the importance of input stand the effect of the upstream of a river on the estuarine.
features in the network’s decision-making process, it is pos- The XAI method showed that the SHAP method was help-
sible to gain insights into which features are most influential. ful to understand the direction and extent of the influence
Techniques such as feature visualization and saliency map- of input covariates on the estuarine water quality. It can
ping can help highlight the regions or attributes of the input also be used to quantify the contribution of various char-
that contribute most significantly to the output. acteristics (e.g., rainfall, soil moisture, and temperature)
Layer visualization: Deep neural networks consist of mul- to the prediction of water flow or level.
tiple layers, and visualizing the activations of each layer can Local interpretable model-agnostic explanations
provide insights into the representations learned at different (LIME): It is an algorithm that provides local explana-
levels of abstraction. Techniques like activation maximi- tions for complex models by locally approximating them
zation and deconvolutional networks allow researchers to with interpretable models. It modifies individual data
generate visualizations that reveal the features learned by samples by adjusting the eigenvalues and analyzes their
the network. impact on the output. It acts as an “interpreter,” interpret-
Grad-CAM: Gradient-weighted class activation mapping ing the predictions of each data sample.
(Grad-CAM) is a technique that highlights the regions of It is suggested that in the future, many researches
an input image that contribute the most to a specific class on model interpretability should be accelerated, such
prediction. By analyzing the gradients flowing into the as the breakthrough of Sobol sensitivity analysis, SHP
final convolutional layer, Grad-CAM generates a heatmap value calculation, and other new technologies, and the
Environmental Science and Pollution Research

development of AI system construction theory and eval- management can we break through the constraints of the
uation methods for water and environment should be relatively small amount of available data in water systems
accelerated. and develop and apply DL algorithms based on small data
under the premise that there are still differences in the
Data validity and standardization development level, human, and material input of different
countries and regions. Finally, more accurate and reliable
For most countries and regions, basic data such as water forecasting and management of urban water treatment sys-
quantity and quality of urban water treatment systems gen- tems can be achieved.
erally rely on manual recording, and the data immediacy
and effectiveness are poor. However, the migration and Digital twins and autonomous systems
transformation process of pollutants in the water system
changes rapidly, and it is difficult to feedback on the imme- A digital twin is the creation of a virtual copy of a physical
diate situation of the water system only by relying on man- entity in the real world through a digital model, which is
ually recorded data. If the DL algorithm is trained on this used to understand, learn, and predict the behavior of the
basis, the result will inevitably have a huge deviation from physical system. By integrating multi-disciplinary, multi-
the real situation, resulting in poor prediction performance. physical, and multi-scale information, digital twins can sim-
At the algorithmic level, the traditional DL model is ulate complex problems such as river basins, hydrological
limited by the insufficient amount of data and cannot cycles, and water quality transmission. It presents engineer-
effectively capture the dynamic changes of pollutants in ing health data and predicts changes in physical systems.
the water system. Therefore, transfer learning technol- Digital twins can predict floods and droughts in advance,
ogy provides a feasible solution to solve the problem of develop effective flood control and drought relief plans, and
limited data. We can adopt transfer learning to alleviate optimize water resource regulation. Through the intelligent
the problem of insufficient samples, which is still in its application platform, the digital twin technology can simu-
infancy. Recently, Alvi et al. (2022a) proposed the autoen- late and preview flood control and regulation schemes; real-
coder transfer learning (AETL) method, which used a deep ize the coordinated scheduling of hydrology, water conserv-
model to expand the limited target domain data. Further, ancy, and environment; and improve the management and
by using Markov chain processes and random walks to operation capacity of river basin water resources. However,
enhance the actual data of the task, simulated data gen- digital twins face challenges such as data acquisition and
eration can produce training samples that are close to the quality, complexity and multi-scale issues, model accuracy
actual situation, and fine-tune the limited actual data at a and predictive power, system security and privacy, and tech-
later stage to improve the accuracy and generalization abil- nology integration and application implementation. Most
ity of the model prediction (Alvi et al. 2023b). In terms of of the existing research is based on physical models, and
hardware facilities, the long-term development needs of few focus on integrating DL components. To fully lever-
urban water treatment systems have promoted the devel- age the potential of digital twins, future advances in deep
opment of online monitoring and sensing technology. In learning applications will likely combine CNN and LSTM
the future, we will further strengthen the construction networks to understand and predict system behavior, and
and development of online data monitoring and sensing use DRL to determine the best interventions. The analysis
technologies in the field of water and environment, and and decision-making system of the whole process of the
implement the standardization of data quality, interfaces, water cycle from meteorological rainfall to surface runoff
and protocols. At the same time, at the data management production to confluence to river course evolution is con-
level, the standard dataset similar to ImageNet (http://​www.​ structed. Only by overcoming these challenges can digital
image-​net.​org/) in the field of machine learning is further twins truly realize their value in the field of water science,
developed, and the existing datasets such as CAMELS are providing innovative solutions for water management and
extended. Building a database of water quality, hydrologi- conservation.
cal, and meteorological data with consistent data sources,
uniform format, high quality, and different regions and The interdisciplinarity and human dynamic
periods can provide researchers with adequate data sup- challenge
port. In addition, strengthening the culture and institutions
of data sharing can encourage all participants to contribute In addition, researchers, engineers, and managers with
and use these valuable resources, thus forming a mutually backgrounds in disciplines related to water science often
beneficial data ecosystem. do not have relevant knowledge and technical experience in
In short, only through the comprehensive improvement the field of artificial intelligence, which leads to the prac-
of algorithm innovation, hardware improvement, and data tical value of artificial intelligence technology is not fully
Environmental Science and Pollution Research

utilized. To give full play to the advantages of interdisci- technologies and equipment, such as wireless communica-
plinary disciplines, it is necessary to cultivate interdiscipli- tion, automation, remote sensing, monitoring, and control.
nary talents with knowledge in different fields. These talents The promotion and popularization of these technologies may
will combine water science and deep learning techniques to face problems such as high investment costs and inconsistent
develop more advanced technologies and apply them to real technical standards. In addition, sustainability of use and
engineering practices. This will help solve some problems maintenance needs to be taken into account when imple-
that cannot be solved by existing methods, provide more menting a DL system, for instance, DL systems depend on
effective solutions, and create more efficient and sustainable the operation and maintenance of equipment, which can
management methods in the water environment sector. introduce additional cost and management challenges if
equipment fails or needs to be replaced due to technological
Emerging AI tools updates. At the same time, the transition from a traditional
centralized system to a distributed system may involve some
With the introduction of ChatGPT, the number of AIGCs issues of conflict of interest and loss of vested interests that
(artificial intelligent-generated content) exploded, espe- need to be properly handled.
cially in academia, where ChatGPT became a useful tool for From a technical and socioeconomic perspective, they
researchers (Vaishya et al. 2023). Sandra Mitrovic´ (2023) are intertwined and interdependent, and all three need to be
made text prediction based on the Transformer model and transformed and co-developed over time, further, to form
used SHAP to explain the model. Although the text gener- new and stable integrated simulations of how the natural
ated by ChatGPT is polite, without specific details and more socioeconomic systems feed each other to continue to pro-
objective, the result table can still achieve a high accuracy vide safe and reliable services while addressing emerging
rate. ChatGPT can complete 80–90% of the code writing challenges.
tasks, and ChatGPT can retrieve multiple data sources well,
for example, in the ecological field, plant traits, species dis-
tribution areas, and meteorological data can be obtained Conclusion
simultaneously (Merow et al. 2023). However, we must be
aware of some limitations and challenges of ChatGPT. It This review paper serves to cover the importance of the DL
may inadvertently generate false or false information, lead- model in numerous instances related to water resources,
ing to misdirection and misunderstanding. ChatGPT lacks water environment, and water ecology. Firstly, this paper
common sense and background knowledge and can only introduces the development history of DL networks, and
generate responses based on the information provided, which then introduces the popular models in the field of water
can result in inaccurate or incomplete responses. Therefore, science, such as CNN, LSTM, GRU, and GAN, to compare
careful verification and calibration are still required when and contrast the characteristics, advantages, and disadvan-
using content generated by ChatGPT. Especially in complex tages of different deep learning models. With respect to
fields (e.g., hydrology), the involvement of professionals is the specific applications, the literature review indicates
very important. that RNN is more suitable for solving time series prob-
lems, and has an evolutionary version of LSTM and shows
Technical and socioeconomic more satisfactory performance in long-term prediction.
RNN is widely used in water quality, runoff, hydrology,
There are certain limitations in the structure, operational and other parameters prediction. CNN is better at solv-
model, and possibilities of traditional urban water supply ing image processing research, such as water environment
systems, and the water supply sector is moving in the direc- remote sensing and image classification. For the unsuper-
tion of alternative solutions, and the realistic challenges vised learning algorithm GAN, which has emerged in
faced by this progressive transformation of traditional sys- recent years, it has a good application in underwater bio-
tems. Firstly, it is necessary to consider its cost-effective- logical image recognition because of its excellent generat-
ness. DL algorithms need a large amount of data for training ing ability. AE is more suitable for handling nonlinearity
to achieve the expected accuracy, and the verification pro- and noise problems in capturing training wastewater data.
cess may be very time-consuming, the computational cost of Many studies have shown that various hybridization tech-
the training process may be high, and there is many of uncer- niques can improve the performance of DL models. These
tain information in the engineering practice. It is unknown hybridization techniques come in various forms, includ-
whether its practical application value and effect can make ing combinations of different foundational DL models
up for these costs. This is an unknown challenge for many (e.g., CNN-LSTM models), attention-based hybridization
research institutions and enterprises. The DL systems of (e.g., attention-GRU model), statistics-based hybridiza-
the design, construction, and operation require advanced tion (e.g., SARIMA-LSTM model), and TL hybridization
Environmental Science and Pollution Research

(e.g., TL-LSTM model). According to the results of the (2019YFC1804800), the Science and Technology Program of Guang-
existing literature, it is found that the hybrid DL mod- dong Forestry Administration (2020-KYXM-08), Pearl River S&T
Nova Program of Guangzhou, China (No. 201710010065), Youth
els were superior to the traditional DL model in terms Foundation of SCIES (PM-zx097-202304-147), and European Social
of accuracy and fitting. This may be due to the fact that Fund via IT Academy Program.
hybrid DL models perform well in a variety of specific
tasks, including feature extraction, trend learning, and data Data availability No data, material, or code is available associated with
the manuscript.
manipulation. Further, we also hope to combine physical
models such as CFD to simulate the migration model of Declarations
hydrodynamic water quality pollutants. Most importantly,
the scope of this review paper is extended to as many areas Ethics approval Not applicable.
of water science as possible, including water quality pre-
Consent to participate Not applicable.
diction, water environmental toxicology, water drainage
distribution system, hydrology and water resources, wet- Consent to publication This article has not been published before and
land classification, aquatic image recognition, water pol- its publication has been approved by all co-authors.
lution treatment, and other tasks, which distinguishes our
Competing interests The authors declare no competing interests.
research from previous reviews on DL models and water
science. For each subfield, the DL model is discussed in
specific areas, focusing on promoting the transformation
of water ecological environment protection from pollution
prevention to systematic management and overall promo- References
tion of water resources, water environment, water ecology
and other elements. Abiri N, Linse B, Edén P, Ohlsson M (2019) Establishing strong impu-
tation performance of a denoising autoencoder in a wide range of
In addition, it is recommended to further study the seven missing data problems. Neurocomputing 365:137–146. https://​
major challenges, namely, algorithmic optimization devel- doi.​org/​10.​1016/j.​neucom.​2019.​07.​065
opment, interpretability and credibility, data validity, chal- Addor N, Newman AJ, Mizukami N, Clark MP (2017) The CAMELS
lenges and standardization of interdisciplinary and human data set: catchment attributes and meteorology for large-sample
studies. Hydrol Earth Syst Sci 21(10):5293–5313. https://d​ oi.o​ rg/​
dynamics, the rise of AI tools, technology and socioeco- 10.​5194/​hess-​21-​5293-​2017
nomic and digital twins, to shape water-environment inte- Alvi M, Cardell-Oliver R, French T (2022a) Utilizing autoencoders to
gration innovation and collaborative development driven improve transfer learning when sensor data is sparse, 9th ACM
by deep learning technologies. We hope that this review International Conference on Systems for Energy-Efficient Build-
ings, Cities, and Transportation (BuildSys). Assoc Computing
will inspire people to think and act on future research and Machinery, Boston, MA, pp 500–503. https://​doi.​org/​10.​1145/​
applications, harness the power of deep learning to help 35633​57.​35674​07
digitize water environmental systems, and inspire more Alvi M, French T, Cardell-Oliver R, Keymer P, Ward A (2022b) Cost effec-
researchers to join the water-smart community and vigor- tive soft sensing for wastewater treatment facilities. IEEE Access
10:55694–55708. https://​doi.​org/​10.​1109/​access.​2022.​31772​01
ously advance the conservation and construction of beau- Alvi M, Batstone D, Mbamba CK, Keymer P, French T, Ward A,
tiful rivers and lakes to revolutionize water research and Dwyer J, Cardell-Oliver R (2023a) Deep learning in wastewater
practice. treatment: a critical review. Water Res 245. https://​doi.​org/​10.​
1016/j.​watres.​2023.​120518
Acknowledgements The authors thank all members of for their Alvi M, French T, Cardell-Oliver R, Batstone D, Akhtar N (2023b)
friendly cooperation in completing this study. Enhanced deep predictive modelling of wastewater plants with
limited data. IEEE Trans Industr Inform, pp 1–11. https://d​ oi.o​ rg/​
Author contribution Xiaohua Fu: initial idea and conceptualization, 10.​1109/​tii.​2023.​32818​35
investigation, and visualization. Jie Jiang: methodology, writing— Antanasijevic D, Pocajt V, Povrenovic D, Peric-Grujic A, Ristic M
original draft, and writing—review and editing. Xie Wu: methodology (2013) Modelling of dissolved oxygen content using artificial
and formal analysis. Lei Huang: resources, formal analysis, writing— neural networks: Danube River, North Serbia, case study. Envi-
review and editing, and conceptualization. Rui Han: reviewing and ron Sci Pollut Res 20(12):9006–9013. https://​doi.​org/​10.​1007/​
writing—review and editing. Kun Li: reviewing and writing—review s11356-​013-​1876-6
and editing. Chang Liu: reviewing and writing—review and editing. Ba-Alawi AH, Vilela P, Loy-Benitez J, Heo S, Yoo C (2021) Intelligent
Nesma Talaat Abbas Mahmoud: data curation, software, reviewing, sensor validation for sustainable influent quality monitoring in waste-
and writing—literature review. Jianyu Chen: reviewing and writing— water treatment plants using stacked denoising autoencoders. J Water
review and editing. Zhenxing Wang: supervision, funding acquisition, Process Eng 43:16. https://​doi.​org/​10.​1016/j.​jwpe.​2021.​102206
project administration, and review and editing. Kallol Roy: conceptu- Baek SS, Pyo J, Chun JA (2020) Prediction of water level and water
alization, validation, review and editing, and supervision. quality using a CNN-LSTM combined deep learning approach.
Water 12(12):13. https://​doi.​org/​10.​3390/​w1212​3399
Funding This work was financially supported by the Major Science Ballard DH (1987) Modular learning in neural networks. In: Proceed-
and Technology Program for Water Pollution Control and Treatment ings of the sixth national conference on artificial intelligence-
(2017ZX07101003), National Key Research and Development Project vol 1, AAAI’87. AAAI Press, Seattle, Washington, pp 279–284
Environmental Science and Pollution Research

Bartos M, Kerkez B (2021) Pipedream: an interactive digital twin Deep BV, Dash R, Ieee (2019) Underwater fish species recognition
model for natural and urban drainage systems. Environ Modell using deep learning techniques, 6th International Conference on
Softw 144:11. https://​doi.​org/​10.​1016/j.​envso​ft.​2021.​105120 Signal Processing and Integrated Networks (SPIN). Ieee, Noida,
Buonocore E, Mellino S, De Angelis G, Liu GY, Ulgiati S (2018) India, pp 665-669
Life cycle assessment indicators of urban wastewater and sewage DeLancey ER, Simms JF, Mahdianpari M, Brisco B (2020a) Compar-
sludge treatment. Ecol Indic 94:13–23. https://​doi.​org/​10.​1016/j.​ ing deep learning and shallow learning for large-scale wetland
ecoli​nd.​2016.​04.​047 classification in Alberta, Canada. Remote Sens 12(1):20. https://​
Capinha C, Ceia-Hasse A, Kramer AM, Meijer C (2021) Deep learning doi.​org/​10.​3390/​rs120​10002
for supervised classification of temporal data in ecology. Ecol DeLancey ER, Simms JF, Mahdianpari M, Brisco B, Mahoney C, Kari-
Inform 61:9. https://​doi.​org/​10.​1016/j.​ecoinf.​2021.​101252 yeva J (2020b) Comparing deep learning and shallow learning
Castangia M, Grajales LMM, Aliberti A, Rossi C, Macii A, Macii for large-scale wetland classification in Alberta, Canada. Remote
E, Patti E (2023) Transformer neural networks for interpretable Sens 12(1):20. https://​doi.​org/​10.​3390/​rs120​10002
flood forecasting. Environ Modell Softw 160:9. https://​doi.​org/​ Dodangeh E, Choubin B, Eigdir AN, Nabipour N (2020) Integrated
10.​1016/j.​envso​ft.​2022.​105581 machine learning methods with resampling algorithms for flood
Chau KW (2006) A review on integration of artificial intelligence into susceptibility prediction. Sci Total Environ 705:13. https://​doi.​
water quality modelling. Mar Pollut Bull 52(7):726–733. https://​ org/​10.​1016/j.​scito​tenv.​2019.​135983
doi.​org/​10.​1016/j.​marpo​lbul.​2006.​04.​003 Dong GQ, Wang N, Xu T, Liang JY, Qiao RX, Yin DQ, Lin SJ (2023)
Chen YY, Cheng QQ, Cheng YJ, Yang H, Yu HH (2018) Applications Deep learning-enabled morphometric analysis for toxicity
of recurrent neural networks in environmental factor forecasting: screening using zebrafish larvae. Environ Sci Technol 12. https://​
a review. Neural Comput 30(11):2855–2881. https://​doi.​org/​10.​ doi.​org/​10.​1021/​acs.​est.​3c005​93
1162/​neco_a_​01134 Du L, McCarty GW, Zhang X, Lang MW, Vanderhoof MK, Li X, Huang
Chen KY, Chen HX, Zhou CL, Huang YC, Qi XY (2020) Compara- CQ, Lee S, Zou ZH (2020) Mapping forested wetland inundation in
tive analysis of surface water quality prediction performance and the Delmarva Peninsula, USA using deep convolutional neural net-
identification of key water parameters using different machine works. Remote Sens 12(4):19. https://​doi.​org/​10.​3390/​rs120​40644
learning models based on big data. Water Res 171:10. https://​ Du XD, Cai YH, Wang S, Zhang LJ, Ieee (2016) Overview of deep
doi.​org/​10.​1016/j.​watres.​2019.​115454 learning, 31st Youth Academic Annual Conference of Chinese-
Chen C, Hui Q, Xie WX, Wan SH (2021a) Convolutional neural networks Association-of-Automation (YAC). Ieee, Wuhan, Peoples R
for forecasting flood process in internet-of-things enabled smart city. China, pp 159-164
Comput Netw 186:12. https://​doi.​org/​10.​1016/j.​comnet.​2020.​107744 Du BG, Zhou QL, Guo J, Guo SS, Wang L (2021) Deep learning with
Chen KH, Wang HC, Valverde-Perez B, Zhai SY (2021b) Optimal long short-term memory neural networks combining wavelet
control towards sustainable wastewater treatment plants based transform and principal component analysis for daily urban water
on multi-agent reinforcement learning. Chemosphere 279:12. demand forecasting. Expert Syst Appl 171. https://​doi.​org/​10.​
https://​doi.​org/​10.​1016/j.​chemo​sphere.​2021.​130498 1016/j.​eswa.​2021.​114571
Chen Z, Xu H, Jiang P, Yu SN, Lin G (2021c) A transfer learning-based Ehteram M, Ahmed AN, Khozani ZS, El-Shafie A (2023) Convolu-
LSTM strategy for imputing large-scale consecutive missing data tional neural network -support vector machine model-gaussian
and its application in a water quality prediction system. J Hydrol process regression: a new machine model for predicting monthly
602:16. https://​doi.​org/​10.​1016/j.​jhydr​ol.​2021.​126573 and daily rainfall. Water Resour Manag 25. https://​doi.​org/​10.​
Chen Z, Du M, Yang XD, Chen W, Li YS (2023a) Deep-learning-based 1007/​s11269-​023-​03519-8
automated tracking and counting of living plankton in natural Elman JL (1990) Finding structure in time. Cogn Sci 14(2):179–211.
aquatic environments. Environ Sci Technol 10. https://​doi.​org/​ https://​doi.​org/​10.​1207/​s1551​6709c​og1402_1
10.​1021/​acs.​est.​3c002​53 Fabbri C, Islam M J, Sattar J, Ieee (2018) Enhancing underwater
Chen Z, Du M, Yang XD, Chen W, Li YS, Qian C, Yu HQ (2023b) imagery using generative adversarial networks, IEEE Interna-
Deep-learning-based automated tracking and counting of living tional Conference on Robotics and Automation (ICRA). IEEE
plankton in natural aquatic environments. Environ Sci Technol International Conference on Robotics and Automation ICRA.
10. https://​doi.​org/​10.​1021/​acs.​est.​3c002​53 Ieee Computer Soc, Brisbane, Australia, pp 7159–7165
Cho K, Van Merriënboer B, Bahdanau D, Bengio Y (2014) On the Fang ZC, Wang Y, Peng L, Hong HY (2021) Predicting flood suscep-
properties of neural machine translation: Encoder-decoder tibility using LSTM neural networks. J Hydrol 594:20. https://​
approaches. arXiv preprint arXiv:14091259 doi.​org/​10.​1016/j.​jhydr​ol.​2020.​125734
Cho K, Kim Y (2022) Improving streamflow prediction in the WRF- Farhi N, Kohen E, Mamane H, Shavitt Y (2021) Prediction of waste-
Hydro model with LSTM networks. J Hydrol 605:12. https://d​ oi.​ water treatment quality using LSTM neural network. Environ
org/​10.​1016/j.​jhydr​ol.​2021.​127297 Technol Innov 23:12. https://​doi.​org/​10.​1016/j.​eti.​2021.​101632
Christin S, Hervet E, Lecomte N (2019) Applications for deep learning Fu G, Jin Y, Sun S, Yuan Z, Butler D (2022) The role of deep learning
in ecology. Methods Ecol Evol 10(10):1632–1644. https://​doi.​ in urban water management: a critical review. Water Res 223.
org/​10.​1111/​2041-​210x.​13256 https://​doi.​org/​10.​1016/j.​watres.​2022.​118973
Cruz AC, Luvisi A, De Bellis L, Ampatzidis Y (2017) X-FIDO: an Fukushima K (2013) Artificial vision by multi-layered neural networks:
effective application for detecting olive quick decline syndrome neocognitron and its advances. Neural Netw 37:103–119. https://​
with deep learning and data fusion. Front Plant Sci 8:12. https://​ doi.​org/​10.​1016/j.​neunet.​2012.​09.​016
doi.​org/​10.​3389/​fpls.​2017.​01741 Fulcher BD, Little MA, Jones NS (2013) Highly comparative time-
Cui Z, Guo SL, Zhou YL, Wang J (2023) Exploration of dual-attention series analysis: the empirical structure of time series and their
mechanism-based deep learning for multi-step-ahead flood prob- methods. J R Soc Interface 10(83):12. https://​doi.​org/​10.​1098/​
abilistic forecasting. J Hydrol 622:15. https://​doi.​org/​10.​1016/j.​ rsif.​2013.​0048
jhydr​ol.​2023.​129688 Gao S, Huang YF, Zhang S, Han JC, Wang GQ, Zhang MX, Lin QS
Dairi A, Cheng TY, Harrou F, Sun Y, Leiknes T (2019) Deep learn- (2020) Short-term runoff prediction with GRU and LSTM net-
ing approach for sustainable WWTP operation: a case study on works without requiring time step optimization during sample
data-driven influent conditions monitoring. Sust Cities Soc 50:9. generation. J Hydrol 589:11. https://​doi.​org/​10.​1016/j.​jhydr​ol.​
https://​doi.​org/​10.​1016/j.​scs.​2019.​101670 2020.​125188
Environmental Science and Pollution Research

Gharakhanlou NM, Perez L (2023) Flood susceptible prediction Hu CH, Wu Q, Li H, Jian SQ, Li N, Lou ZZ (2018) Deep learning with
through the use of geospatial variables and machine learning a long short-term memory networks approach for rainfall-runoff
methods. J Hydrol 617:20. https://​doi.​org/​10.​1016/j.​jhydr​ol.​ simulation. Water 10(11):16. https://d​ oi.o​ rg/1​ 0.3​ 390/w
​ 10111​ 543
2023.​129121 Hu GS, Yang XW, Zhang Y, Wan MZ (2019) Identification of tea
Gong SM, Ball J, Surawski N (2022) Urban land-use land-cover extrac- leaf diseases by using an improved deep convolutional neural
tion for catchment modelling using deep learning techniques. network. Sust Comput 24:8. https://​doi.​org/​10.​1016/j.​suscom.​
J Hydroinform 24(2):388–405. https://​doi.​org/​10.​2166/​hydro.​ 2019.​100353
2022.​124 Hu SS, Chen P, Gu PY, Wang B (2020) A deep learning-based chemi-
Goodfellow IJ, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, cal system for QSAR prediction. IEEE J Biomed Health Inform
Ozair S, Courville A, Bengio Y (2014) Generative adversarial 24(10):3020–3028. https://​doi.​org/​10.​1109/​jbhi.​2020.​29770​09
nets. Proceedings of the 27th International Conference on Neural Huang JD, Huang Y, Hassan SG, Xu LQ, Liu SY (2021) Dissolved oxy-
Information Processing Systems - Volume 2. MIT Press, Mon- gen content interval prediction based on auto regression recurrent
treal, Canada, pp 2672–2680 neural network. J Ambient Intell Humaniz Comput 10. https://​
Gray PC, Fleishman AB, Klein DJ, McKown MW, Bezy VS (2019) A doi.​org/​10.​1007/​s12652-​021-​03579-x
convolutional neural network for detecting sea turtles in drone Jalwana M, Akhtar N, Bennamoun M, Mian A (2021) CAMERAS:
imagery. Methods Ecol Evol 10(3):345–355. https://​doi.​org/​10.​ enhanced resolution and sanity preserving class activation map-
1111/​2041-​210x.​13132 ping for image saliency, IEEE/CVF Conference on Computer
Green AJ, Mohlenkamp MJ, Das J, Chaudhari M, Truong L, Tanguay Vision and Pattern Recognition (CVPR). IEEE Conference on
RL, Reif DM (2021) Leveraging high-throughput screening data, Computer Vision and Pattern Recognition. Ieee Computer Soc,
deep neural networks, and conditional generative adversarial Electr Network, pp 16322–16331. https://​doi.​org/​10.​1109/​cvpr4​
networks to advance predictive toxicology. PLoS Comput Biol 6437.​2021.​01606
17(7):16. https://​doi.​org/​10.​1371/​journ​al.​pcbi.​10091​35 Jamali A, Mahdianpari M, Brisco B, Mao DH, Salehi B, Mohammadi-
Guang H, Tong B, Li L, Sun XY (2019) Chemical Oxygen Demand manesh F (2022) 3DUNetGSFormer: a deep learning pipeline
Soft-Measurement Method via Long Short-Term Memory Net- for complex wetland mapping using generative adversarial net-
work, Chinese Automation Congress (CAC). Chinese Automa- works and Swin transformer. Ecol Inform 72:11. https://​doi.​org/​
tion Congress. Ieee, Hangzhou, PEOPLES R CHINA, pp 4668– 10.​1016/j.​ecoinf.​2022.​101904
4672. https://​doi.​org/​10.​1109/​cac48​633.​2019.​89974​63 Jamei M, Ali M, Malik A, Karbasi M, Rai P (2023) Development of
Guo GC, Liu SM, Wu YP, Li JY (2018) Short-term water demand a TVF-EMD-based multi-decomposition technique integrated
forecast based on deep learning method. J Water Resour Plann with Encoder-Decoder-Bidirectional-LSTM for monthly rain-
Manage 144(12). https://​doi.​org/​10.​1061/​(asce)​wr.​1943-​5452.​ fall forecasting. J Hydrol 617:21. https://​doi.​org/​10.​1016/j.​jhydr​
00009​92 ol.​2023.​129105
Han H, Morrison RR (2022) Improved runoff forecasting performance Jehanzaib M, Ajmal M, Achite M, Kim TW (2022) Comprehensive
through error predictions using a deep-learning approach. J review: advancements in rainfall-runoff modelling for flood miti-
Hydrol 608:13. https://​doi.​org/​10.​1016/j.​jhydr​ol.​2022.​127653 gation. Climate 10(10):17. https://​doi.​org/​10.​3390/​cli10​100147
Han RY, Guan Y, Yu ZB, Liu P, Zheng HY (2020) Underwater image Jiang W (2018) Object-based deep convolutional autoencoders for
enhancement based on a spiral generative adversarial framework. high-resolution remote sensing image classification. J Appl
IEEE Access 8:218838–218852. https://​doi.​org/​10.​1109/​access.​ Remote Sens 12(3):1
2020.​30412​80 Jiang YQ, Li CL, Sun L, Guo D (2021) A deep learning algorithm for
Heo S, Safder U, Yoo C (2019) Deep learning driven QSAR model for multi-source data fusion to predict water quality of urban sewer
environmental toxicology: effects of endocrine disrupting chemi- networks. J Clean Prod 318:10. https://​doi.​org/​10.​1016/j.​jclep​
cals on human health. Environ Pollut 253:29–38. https://​doi.​org/​ ro.​2021.​128533
10.​1016/j.​envpol.​2019.​06.​081 Kabir S, Patidar S, Xia XL, Liang QH (2020) A deep convolutional neu-
Herbert ZC, Asghar Z, Oroza CA (2021) Long-term reservoir inflow ral network model for rapid prediction of fluvial flood inundation.
forecasts: enhanced water supply and inflow volume accuracy J Hydrol 590:16. https://​doi.​org/​10.​1016/j.​jhydr​ol.​2020.​125481
using deep learning. J Hydrol 601:16. https://​doi.​org/​10.​1016/j.​ Karim F, Armin MA, Ahmedt-Aristizabal D, Tychsen-Smith L, Peters-
jhydr​ol.​2021.​126676 son L (2023) A review of hydrodynamic and machine learn-
Hernández-del-Olmo F, Gaudioso E, Dormido R, Duro N (2016) ing approaches for flood inundation modeling. Water 15(3):21.
Energy and environmental efficiency for the N-ammonia removal https://​doi.​org/​10.​3390/​w1503​0566
process in wastewater treatment plants by means of reinforce- Kavya M, Mathew A, Shekar PR, Sarwesh P (2023) Short term water
ment learning. Energies 9(9):17. https://​doi.​org/​10.​3390/​en909​ demand forecast modelling using artificial intelligence for smart
0755 water management. Sust Cities Soc 95:22. https://​doi.​org/​10.​
Hinton GE, Srivastava N, Krizhevsky A, Sutskever I, Salakhutdinov 1016/j.​scs.​2023.​104610
RR (2012) Improving neural networks by preventing co-adapta- Krizhevsky A, Sutskever I, Hinton GE (2017) ImageNet classifica-
tion of feature detectors, pp 212–223. https://​doi.​org/​10.​48550/​ tion with deep convolutional neural networks. Commun ACM
arXiv.​1207.​0580 60(6):84–90. https://​doi.​org/​10.​1145/​30653​86
Hochreiter S (1998) The vanishing gradient problem during learning Kumar SS, Wang MZ, Abraham DM, Jahanshahi MR, Iseley T, Cheng
recurrent neural nets and problem solutions. Int J Uncertain Fuzz JCP (2020) Deep learning-based automated detection of sewer
Knowl-Based Syst 06(2) defects in CCTV videos. J Comput Civil Eng 34(1):13. https://​
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural doi.​org/​10.​1061/​(asce)​cp.​1943-​5487.​00008​66
Comput 9(8):1735–1780. https://​doi.​org/​10.​1162/​neco.​1997.9.​ Kumar R, Khan FU, Sharma A, Siddiqui MH, Aziz IBA, Kamal MA,
8.​1735 Ashraf GM, Alghamdi BS, Uddin MS (2021) A deep neural
Hosseiny B, Mahdianpari M, Brisco B, Mohammadimanesh F, Salehi network-based approach for prediction of mutagenicity of com-
B (2022) WetNet: a spatial-temporal ensemble deep learning pounds. Environ Sci Pollut Res 28(34):47641–47650. https://d​ oi.​
model for wetland classification using Sentinel-1 and Sentinel-2. org/​10.​1007/​s11356-​021-​14028-9
IEEE Trans Geosci Remote Sensing 60:14. https://​doi.​org/​10.​ Kumar L, Afzal MS, Ahmad A (2022) Prediction of water turbidity in
1109/​tgrs.​2021.​31138​56 a marine environment using machine learning: a case study of
Environmental Science and Pollution Research

Hong Kong. Reg Stud Mar Sci 52:14. https://​doi.​org/​10.​1016/j.​ networks. Autom Constr 104:281–298. https://d​ oi.o​ rg/1​ 0.1​ 016/j.​
rsma.​2022.​102260 autcon.​2019.​04.​013
LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature Merow C, Serra-Diaz JM, Enquist BJ, Wilson AM (2023) AI chatbots
521(7553):436–444. https://​doi.​org/​10.​1038/​natur​e14539 can boost scientific coding. Nat Ecol Evol 3. https://​doi.​org/​10.​
Lee S, Lee D (2018) Improved prediction of harmful algal blooms 1038/​s41559-​023-​02063-3
in four major south korea’s rivers using deep learning models. Mill L, Wolff D, Gerrits N, Philipp P, Kling L, Vollnhals F, Ignatenko
Int J Environ Res Public Health 15(7):15. https://​doi.​org/​10.​ A, Jaremenko C, Huang YX, De Castro O, Audinot JN, Nelissen
3390/​ijerp​h1507​1322 I, Wirtz T, Maier A, Christiansen S (2021) Synthetic image ren-
Li C, Bai Y, Zeng B (2016) Deep feature learning architectures dering solves annotation problem in deep learning nanoparticle
for daily reservoir inflow forecasting. Water Resour Manag segmentation. Small Methods 5(7):13. https://​doi.​org/​10.​1002/​
30(14):5145–5161. https://​doi.​org/​10.​1007/​s11269-​016-​1474-8 smtd.​20210​0223
Li CY, Guo JC, Guo CL (2018) Emerging from water: underwater Mittal S, Srivastava S, Jayanth JP (2022) A survey of deep learning
image color correction based on weakly supervised color trans- techniques for underwater image classification. IEEE Trans
fer. IEEE Signal Process Lett 25(3):323–327. https://​doi.​org/​ Neural Netw Learn Syst 15. https://​doi.​org/​10.​1109/​tnnls.​2022.​
10.​1109/​lsp.​2018.​27920​50 31438​87
Li LW, Yan Z, Shen Q (2019a) Water body extraction from very high Muhammad U, Wang WQ, Chattha SP, Ali S, Ieee (2018) Pre-trained
spatial resolution remote sensing data based on fully convo- VGGNet architecture for remote-sensing image scene classifi-
lutional networks. Remote Sens 11(10):19. https://​doi.​org/​10.​ cation. 24th International Conference on Pattern Recognition
3390/​rs111​01162 (ICPR). International Conference on Pattern Recognition. Ieee,
Li LW, Yan Z, Shen Q, Cheng G, Gao LR, Zhang B (2019b) Water Chinese Acad Sci, Inst Automat, Beijing, Peoples R China, pp
body extraction from very high spatial resolution remote sens- 1622–1627
ing data based on fully convolutional networks. Remote Sens Mukherjee A, Su A, Rajan K (2021) Deep learning model for identify-
11(10):19. https://​doi.​org/​10.​3390/​rs111​01162 ing critical structural motifs in potential endocrine disruptors. J
Li L, Rong SM, Wang R, Yu SL (2021a) Recent advances in artificial Chem Inf Model 61(5):2187–2197. https://​doi.​org/​10.​1021/​acs.​
intelligence and machine learning for nonlinear relationship jcim.​0c014​09
analysis and process control in drinking water treatment: a Muluye GV, Coulibaly P (2007) Seasonal reservoir inflow forecasting
review. Chem Eng J 405:17. https://​doi.​org/​10.​1016/j.​cej.​2020.​ with low-frequency climatic indices: a comparison of data-driven
126673 methods. Hydrol Sci J-J Sci Hydrol 52(3):508–522. https://​doi.​
Li XY, Yi XH, Liu ZH, Liu HB, Chen T, Niu GQ, Yan B, Chen C, org/​10.​1623/​hysj.​52.3.​508
Huang MZ, Ying GG (2021b) Application of novel hybrid deep Niu GQ, Yi XH, Chen C (2020) A novel effluent quality predict-
leaning model for cleaner production in a paper industrial waste- ing model based on genetic-deep belief network algorithm for
water treatment system. J Clean Prod 294:12. https://​doi.​org/​10.​ cleaner production in a full-scale paper-making wastewater
1016/j.​jclep​ro.​2021.​126343 treatment. J Clean Prod 265:10. https://​doi.​org/​10.​1016/j.​jclep​
Liu P, Wang J, Sangaiah AK, Xie Y, Yin XC (2019) Analysis and ro.​2020.​121787
prediction of water quality using LSTM deep neural networks Niu GQ, Li XY, Wan X (2022) Dynamic optimization of wastewater
in IoT environment. Sustainability 11(7):14. https://​doi.​org/​10.​ treatment process based on novel multi-objective ant lion optimi-
3390/​su110​72058 zation and deep learning algorithm. J Clean Prod 345:9. https://​
Liu J, Zhou XL, Zhang LQ, Xu YP (2023) Forecasting short-term water doi.​org/​10.​1016/j.​jclep​ro.​2022.​131140
demands with an ensemble deep learning model for a water sup- Oh C, Dang LM, Han D, Moon H (2022) Robust sewer defect detec-
ply system. Water Resour Manag 37(8):2991–3012. https://​doi.​ tion with text analysis based on deep learning. IEEE Access
org/​10.​1007/​s11269-​023-​03471-7 10:46224–46237. https://​doi.​org/​10.​1109/​access.​2022.​31686​60
Loc HH, Do QH, Cokro AA, Irvine KN (2020) Deep neural network Olden JD, Lawler JJ, Poff NL (2008) Machine learning methods with-
analyses of water quality time series associated with water sen- out tears: a primer for ecologists. Q Rev Biol 83(2):171–193.
sitive urban design (WSUD) features. J Appl Water Eng Res https://​doi.​org/​10.​1086/​587826
8(4):313–332. https://​doi.​org/​10.​1080/​23249​676.​2020.​18319​76 O’Neil GL, Goodall JL, Behl M, Saby L (2020) Deep learning using
Long Y, Xu G, Ma C, Chen L (2016) Emergency control system based physically-informed input data for wetland identification. Envi-
on the analytical hierarchy process and coordinated development ron Modell Softw 126:15. https://d​ oi.o​ rg/1​ 0.1​ 016/j.e​ nvsof​ t.2​ 020.​
degree model for sudden water pollution accidents in the Middle 104665
Route of the South-to-North Water Transfer Project in China. Oulebsir R, Lefkir A, Bermad A, Safri A (2018) Optimization of
Environ Sci Pollut Res Int 23(12):12332–12342. https://​doi.​org/​ energy consumption in activated sludge process using deep learn-
10.​1007/​s11356-​016-​6448-0 ing selective modeling, 2nd WaterEnergyNEXUS Conference.
Luppichini M, Barsanti M, Giannecchini R, Bini M (2022) Deep learn- Advances in Science Technology & Innovation. Springer Inter-
ing models to predict flood events in fast-flowing watersheds. Sci national Publishing Ag, Salerno, ITALY, pp 223–225. https://d​ oi.​
Total Environ 813:10. https://​doi.​org/​10.​1016/j.​scito​tenv.​2021.​ org/​10.​1007/​978-3-​030-​13068-8_​55
151885 Pang J, Yang S, He L, Chen Y, Ren N (2019) Intelligent control/opera-
Ma J, Ding YX, Cheng JCP, Jiang FF, Xu ZR (2020) Soft detection of tional strategies in WWTPs through an integrated Q-learning
5-day BOD with sparse matrix in city harbor water using deep algorithm with ASM2d-Guided Reward. Water 11(5). https://​
learning techniques. Water Res 170:12. https://d​ oi.o​ rg/1​ 0.1​ 016/j.​ doi.​org/​10.​3390/​w1105​0927
watres.​2019.​115350 Peng L, Wu H, Gao M (2022) TLT: Recurrent fine-tuning transfer
Mauricio-Iglesias M, Montero-Castro I, Mollerup AL, Sin G (2015) A learning for water quality long-term prediction. Water Res
generic methodology for the optimisation of sewer systems using 225:12. https://​doi.​org/​10.​1016/j.​watres.​2022.​119171
stochastic programming and self-optimizing control. J Environ Man- Perea RG, Garcia IF, Poyato EC, Diaz JAR (2023) New memory-based
age 155:193–203. https://​doi.​org/​10.​1016/j.​jenvm​an.​2015.​03.​034 hybrid model for middle-term water demand forecasting in irri-
Meijer D, Scholten L, Clemens F, Knobbe A (2019) A defect classifica- gated areas. Agric Water Manage 284:13. https://​doi.​org/​10.​
tion methodology for sewer image sets with convolutional neural 1016/j.​agwat.​2023.​108367
Environmental Science and Pollution Research

Pham HN, Dang KB, Nguyen TV, Tran NC, Ngo XQ, Nguyen DA, Shao ZY, Xu L, Chai HX, Yost SA, Zheng ZL (2021) A Bayesian-
Phan TTH, Nguyen TT, Guo WS, Ngo HH (2022) A new deep SWMM coupled stochastic model developed to reconstruct the
learning approach based on bilateral semantic segmentation complete profile of an unknown discharging incidence in sewer
models for sustainable estuarine wetland ecosystem manage- networks. J Environ Manage 297:11. https://​doi.​org/​10.​1016/j.​
ment. Sci Total Environ 838:13. https://​doi.​org/​10.​1016/j.​scito​ jenvm​an.​2021.​113211
tenv.​2022.​155826 Sharma A, Liu XW, Yang XJ (2018) Land cover classification from
Pu ZH, Yan JR, Chen L (2023) A hybrid Wavelet-CNN-LSTM deep multi-temporal, multi-spectral remotely sensed imagery using
learning model for short- term urban water demand forecast- patch-based recurrent neural networks. Neural Netw 105:346–
ing. Front Env Sci Eng 17(2):14. https://​d oi.​o rg/​1 0.​1 007/​ 355. https://​doi.​org/​10.​1016/j.​neunet.​2018.​05.​019
s11783-​023-​1622-3 Shen CP (2018) A transdisciplinary review of deep learning research
Pyo J, Cho KH, Kim K, Baek SS, Nam G, Park S (2021) Cyanobacte- and its relevance for water resources scientists. Water Resour
ria cell prediction using interpretable deep learning model with Res 54(11):8558–8593. https://​doi.​org/​10.​1029/​2018w​r0226​43
observed, numerical, and sensing data assemblage. Water Res Shen C, Zhu KY, Ruan JP, Li JL, Wang Y, Zhao MR, He CY, Zuo ZH
203:12. https://​doi.​org/​10.​1016/j.​watres.​2021.​117483 (2021) Screening of potential oestrogen receptor a agonists in
Pyo J, Hong SM, Jang J, Park S, Park J (2022) Drone-borne sensing pesticides via in silico, in vitro and in vivo methods. Environ
of major and accessory pigments in algae using deep learning Pollut 270:10. https://​doi.​org/​10.​1016/j.​envpol.​2020.​116015
modeling. Gisci Remote Sens 59(1):310–332. https://​doi.​org/​10.​ Shi XY, Lv FS, Seng DW, Zhang JM (2021) Visualizing and under-
1080/​15481​603.​2022.​20271​20 standing graph convolutional network. Multimed Tools Appl
Rahimzad M, Nia AM, Zolfonoon H, Soltani J, Mehr AD (2021) Per- 80(6):8355–8375. https://​doi.​org/​10.​1007/​s11042-​020-​09885-4
formance comparison of an LSTM-based deep learning model Singha S, Pasupuleti S, Singha SS, Singh R, Kumar S (2021) Pre-
versus conventional machine learning algorithms for streamflow diction of groundwater quality using efficient machine learning
forecasting. Water Resour Manag 35(12):4167–4187. https://d​ oi.​ technique. Chemosphere 276:13. https://d​ oi.o​ rg/1​ 0.1​ 016/j.c​ hemo​
org/​10.​1007/​s11269-​021-​02937-w sphere.​2021.​130265
Ren T, Liu XF, Niu JW, Lei XH, Zhang Z (2020) Real-time water level Sit M, Demiray BZ, Xiang ZR, Ewing GJ, Sermet Y (2020) A compre-
prediction of cascaded channels based on multilayer perception hensive review of deep learning applications in hydrology and
and recurrent neural network. J Hydrol 585:14. https://​doi.​org/​ water resources. Water Sci Technol 82(12):2635–2670. https://​
10.​1016/j.​jhydr​ol.​2020.​124783 doi.​org/​10.​2166/​wst.​2020.​369
Riedmiller M (1994) Advanced supervised learning in multi-layer per- Song SR, Liu JH, Liu Y, Feng GQ, Han H (2020) Intelligent object rec-
ceptrons—from backpropagation to adaptive learning algorithms. ognition of urban water bodies based on deep learning for multi-
Int J Comput Stand Interfaces 16(3):265–278 source and multi-temporal high spatial resolution remote sensing
Rodrigues NM, Batista JE, Mariano P, Fonseca V, Duarte B, Silva imagery. Sensors 20(2):25. https://​doi.​org/​10.​3390/​s2002​0397
S (2021) Artificial Intelligence meets marine ecotoxicology: Song HM, Woo DK, Yan Q (2021) Detecting subsurface drainage pipes
applying deep learning to bio-optical data from marine diatoms using a fully convolutional network with optical images. Agric
exposed to legacy and emerging contaminants. Biology-Basel Water Manage 249:9. https://​doi.​org/​10.​1016/j.​agwat.​2021.​
10(9):21. https://​doi.​org/​10.​3390/​biolo​gy100​90932 106791
Rumelhart DE, Hinton GE, Williams RJ (1986) Learning represen- Sun LP, Zhu JJ, Tan JX, Li XF, Li RH, Deng HZ, Zhang XY, Liu BY,
tations by back propagating errors. Nature 323(6088):533–536 Zhu XZ (2023) Deep learning-assisted automated sewage pipe
Sabour S, Frosst N, Hinton GE (2017) Dynamic Routing Between Cap- defect detection for urban water environment management. Sci
sules, 31st Annual Conference on Neural Information Process- Total Environ 882:12. https://​doi.​org/​10.​1016/j.​scito​tenv.​2023.​
ing Systems (NIPS). Advances in Neural Information Processing 163562
Systems. Neural Information Processing Systems (Nips), Long Syafiie S, Tadeo F, Martinez E, Alvarez T (2011) Model-free control
Beach, CA based on reinforcement learning for a wastewater treatment prob-
Sadeghi M, Asanjan AA, Faridzad M, Nguyen P, Hsu K (2019) PER- lem. Appl Soft Comput 11(1):73–82. https://​doi.​org/​10.​1016/j.​
SIANN-CNN: precipitation estimation from remotely sensed asoc.​2009.​10.​018
information using artificial neural networks-convolutional neural Tian WC, Liao ZL, Wang X (2019) Transfer learning for neural net-
networks. J Hydrometeorol 20(12):2273–2289. https://​doi.​org/​ work model in chlorophyll-a dynamics prediction. Environ
10.​1175/​jhm-d-​19-​0110.1 Sci Pollut Res 26(29):29857–29871. https://​doi.​org/​10.​1007/​
Safder U, Loy-Benitez J, Nguyen HT, Yoo C (2022) A hybrid extreme learn- s11356-​019-​06156-0
ing machine and deep belief network framework for sludge bulking Tu JC, Yang XQ, Chen CB, Gao S, Wang JC, Sun C, Ieee (2019) Water
monitoring in a dynamic wastewater treatment process. J Water Pro- quality prediction model based on CNN-GRU hybrid network,
cess Eng 46:13. https://​doi.​org/​10.​1016/j.​jwpe.​2022.​102580 chinese automation congress (CAC). Chinese automation con-
Sagan V, Peterson KT, Maimaitijiang M, Sidike P, Sloan J (2020) Mon- gress. Ieee, Hangzhou, peoples r china, pp 1893–1898
itoring inland water quality using remote sensing: potential and Uddin MG, Nash S, Olbert AI (2021) A review of water quality index
limitations of spectral indices, bio-optical simulations, machine models and their use for assessing surface water quality. Ecol
learning, and cloud computing. Earth-Sci Rev 205:31. https://​ Indic 122:21. https://​doi.​org/​10.​1016/j.​ecoli​nd.​2020.​107218
doi.​org/​10.​1016/j.​earsc​irev.​2020.​103187 Vaishya R, Misra A, Vaish A (2023) ChatGPT: is this version good for
Salloom T, Kaynak O, He W (2021) A novel deep neural network healthcare and research? Diabetes Metab Syndr-Clin Res Rev
architecture for real-time water demand forecasting. J Hydrol 17(4):6. https://​doi.​org/​10.​1016/j.​dsx.​2023.​102744
599:12. https://​doi.​org/​10.​1016/j.​jhydr​ol.​2021.​126353 Van Hasselt H, Guez A, Silver D (2015) Deep reinforcement learning
Mitrović S, Andreoletti D, Ayoub O (2023) ChatGPT or Human? with double q-learning. Proceedings of the AAAI Conference
Detect and Explain. Explaining Decisions of Machine Learning on Artificial Intelligence 30. https://​doi.​org/​10.​1609/​aaai.​v30i1.​
Model for Detecting Short ChatGPT-generated Text 10295
Schuegraf P, Bittner K (2019) Automatic building footprint extrac- Wai KP, Chia MY, Koo CH, Huang YF, Chong WC (2022) Applica-
tion from multi-resolution remote sensing images using a hybrid tions of deep learning in water quality management: a state-of-
FCN. ISPRS Int J Geo-Inf 8(4):16. https://​doi.​org/​10.​3390/​ijgi8​ the-art review. J Hydrol 613. https://​doi.​org/​10.​1016/j.​jhydr​ol.​
040191 2022.​128332
Environmental Science and Pollution Research

Wang KF, Gou C, Duan YJ, Lin YL, Zheng XH, Wang FY (2017) Gen- Yan XB, Song J, Liu YXY, Lu SL, Xu YY, Ma CY, Zhu YQ (2023)
erative adversarial networks: introduction and outlook. IEEE- A Transformer-based method to reduce cloud shadow interfer-
CAA J Automatica Sin 4(4):588–598. https://​doi.​org/​10.​1109/​ ence in automatic lake water surface extraction from Sentinel-2
jas.​2017.​75105​83 imagery. J Hydrol 620:18. https://d​ oi.o​ rg/1​ 0.1​ 016/j.j​ hydro​ l.2​ 023.​
Wang GM, Qiao JF, Bi J, Li WJ, Zhou MC (2019a) TL-GDBN: grow- 129561
ing deep belief network with transfer learning. IEEE Trans Yang F, Xie H, Li HX (2019) RETRACTED ARTICLE: video asso-
Autom Sci Eng 16(2):874–885. https://​doi.​org/​10.​1109/​t ase.​ ciated cross-modal recommendation algorithm based on deep
2018.​28656​63 learning. Appl Soft Comput 82:9. https://​doi.​org/​10.​1016/j.​asoc.​
Wang ZF, Man Y, Hu YS, Li JG, Hong MN, Cui PZ (2019b) A deep 2019.​105597
learning based dynamic COD prediction model for urban sewage. Yang BW, Xiao ZJ, Meng QJ, Yuan Y, Wang WQ (2023a) Deep
Environ Sci-Wat Res Technol 5(12):2210–2218. https://​doi.​org/​ learning-based prediction of effluent quality of a constructed
10.​1039/​c9ew0​0505f wetland. Env Sci Ecotechnol 13:11. https://​doi.​org/​10.​1016/j.​
Wang J, Li P, Deng JH, Du YZ (2020a) CA-GAN: class-condition ese.​2022.​100207
attention GAN for underwater image enhancement. IEEE Access Yang MY, Wang WS, Gao Q, Zhao C, Li CL, Yang XF, Li JX, Li XG,
8:130719–130728. https://d​ oi.o​ rg/1​ 0.1​ 109/a​ ccess.2​ 020.3​ 00335​ 1 Cui JL, Zhang LT, Ji YP, Geng SQ (2023b) Automatic identifica-
Wang ZB, Gao X, Zhang YN, Zhao GH (2020b) MSLWENet: A Novel tion of harmful algae based on multiple convolutional neural net-
Deep Learning Network for Lake Water Body Extraction of works and transfer learning. Environ Sci Pollut Res 30(6):15311–
Google Remote Sensing Images. Remote Sens 12(24):19. https://​ 15324. https://​doi.​org/​10.​1007/​s11356-​022-​23280-6
doi.​org/​10.​3390/​rs122​44140 Yaseen ZM (2021) An insight into machine learning models era in sim-
Wang GM, Jia QS, Zhou MC, Bi J, Qiao JF (2021a) Soft-sensing of ulating soil, water bodies and adsorption heavy metals: review,
wastewater treatment process via deep belief network with event- challenges and solutions. Chemosphere 277:22. https://​doi.​org/​
triggered learning. Neurocomputing 436:103–113. https://​doi.​ 10.​1016/j.​chemo​sphere.​2021.​130126
org/​10.​1016/j.​neucom.​2020.​12.​108 Yin XF, Chen Y, Bouferguene A, Zaman H, Al-Hussein M, Kurach L
Wang LG, Zhao L, Liu X, Fu JJ, Zhang AQ (2021b) SepPCNET: (2020) A deep learning-based framework for an automated defect
deeping learning on a 3D surface electrostatic potential point detection system for sewer pipes. Autom Constr 109:17. https://​
cloud for enhanced toxicity classification and its application doi.​org/​10.​1016/j.​autcon.​2019.​102967
to suspected environmental estrogens. Environ Sci Technol Yosinski J, Clune J, Bengio Y, Lipson H (2014) How transferable are
55(14):9958–9967. https://​doi.​org/​10.​1021/​acs.​est.​1c012​28 features in deep neural networks ?, 28th Conference on Neural
Wang Y, Cao Y, Zhang J, Wu F, Zha ZJ (2021c) Leveraging deep Information Processing Systems (NIPS). Advances in Neural
statistics for underwater image enhancement. ACM Trans Mul- Information Processing Systems. Neural Information Process-
timed Comput Commun Appl 17(3):20. https://​doi.​org/​10.​1145/​ ing Systems (Nips), Montreal, Canada
34895​20 Yu JW, Kim JS, Li X, Jong YC (2022) Water quality forecasting based
Wang YJ, Li SC, Lin YH, Wang MJ (2021d) Lightweight deep neural on data decomposition, fuzzy clustering and deep learning neural
network method for water body extraction from high-resolution network. Environ Pollut 303:10. https://d​ oi.o​ rg/1​ 0.1​ 016/j.e​ nvpol.​
remote sensing images with multisensors. Sensors 21(21):21. 2022.​119136
https://​doi.​org/​10.​3390/​s2121​7397 Zanfei A, Brentan BM, Menapace A, Righetti M, Herrera M (2022)
Wang GM, Jia QS, Zhou MC, Bi J, Qiao JF (2022) Artificial neural Graph convolutional recurrent neural networks for water demand
networks for water quality soft-sensing in wastewater treatment: forecasting. Water Resour Res 58(7):14. https://​doi.​org/​10.​1029/​
a review. Artif Intell Rev 55(1):565–587. https://d​ oi.o​ rg/1​ 0.1​ 007/​ 2022w​r0322​99
s10462-​021-​10038-8 Zhang JF, Zhu Y, Zhang XP, Ye M, Yang JZ (2018) Developing a long
Wang MZ, Cheng JCP (2018) Development and improvement of deep short-term memory (LSTM) based model for predicting water
learning based automated defect detection for sewer pipe inspec- table depth in agricultural areas. J Hydrol 561:918–929. https://​
tion using faster R-CNN, 25th Workshop of the European-Group- doi.​org/​10.​1016/j.​jhydr​ol.​2018.​04.​065
for-Intelligent-Computing-in-Engineering (EG-ICE). Lecture Zhang C, Patras P, Haddadi H (2019) Deep learning in mobile and
Notes in Computer Science. Springer International Publishing wireless networking: a survey. IEEE Commun Surve Tutor
Ag, Lausanne, Switzerland, pp 171–192. https://d​ oi.o​ rg/1​ 0.1​ 007/​ 21(3):2224–2287. https://​doi.​org/​10.​1109/​comst.​2019.​29048​97
978-3-​319-​91638-5_9 Zhang HQ, Yu FS, Sun JC, Shen XQ, Li K (2020) Deep learning for sea
Wang HW, Yang M, Yin G, Dong JN (2023) Self-adversarial generative cucumber detection using stochastic gradient descent algorithm.
adversarial network for underwater image enhancement. IEEE J Eur J Remote Sens 53:53–62. https://​doi.​org/​10.​1080/​22797​254.​
Ocean Eng 12. https://​doi.​org/​10.​1109/​joe.​2023.​32977​31 2020.​17152​65
Xiao X, He JY, Huang HM, Miller TR, Christakos G, Reichwaldt ES, Zhang JS, Xing MD, Sun GC, Chen JL, Li MY (2021) Water body
Ghadouani A, Lin SP, Xu XH, Shi JY (2017) A novel single- detection in high-resolution SAR images with cascaded fully-
parameter approach for forecasting algal blooms. Water Res convolutional network and variable focal loss. IEEE Trans Geo-
108:222–231. https://​doi.​org/​10.​1016/j.​watres.​2016.​10.​076 sci Remote Sensing 59(1):316–332. https://​doi.​org/​10.​1109/​tgrs.​
Xu CW, Wang YZ, Fu H, Yang JS (2022) Comprehensive analysis for 2020.​29994​05
long-term hydrological simulation by deep learning techniques Zhang YT, Li CL, Duan HP, Yan KF (2023) Deep learning based data-
and remote sensing. Front Earth Sci 10:16. https://​doi.​org/​10.​ driven model for detecting time-delay water quality indicators of
3389/​feart.​2022.​875145 wastewater treatment plant influent. Chem Eng J 467:11. https://​
Xu GY, Cheng Y, Liu F, Ping P, Sun J, Ieee (2019) A water level doi.​org/​10.​1016/j.​cej.​2023.​143483
prediction model based on ARIMA-RNN, 5th IEEE Interna- Zhi W, Feng D, Tsai W-P, Sterle G, Harpold A, Shen C, Li L (2021)
tional Conference on Big Data Computing Service and Applica- From hydrometeorology to river water quality: can a deep learn-
tions (IEEE BigDataService) / Workshop on Big Data in Water ing model predict dissolved oxygen at the continental scale?
Resources, Environment, and Hydraulic Engineering / Workshop Environ Sci Technol 55(4):2357–2368. https://​doi.​org/​10.​1021/​
on Medical, Healthcare, Using Big Data Technologies. Ieee acs.​est.​0c067​83
Computer Soc, San Francisco, CA, pp. 221-226. https://​doi.​org/​ Zhong HF, Sun Q, Sun HM, Jia RS (2022) NT-Net: a semantic segmenta-
10.​1109/​BigDa​taSer​vice.​2019.​00038 tion network for extracting lake water bodies from optical remote
Environmental Science and Pollution Research

sensing images based on transformer. IEEE Trans Geosci Remote Zhuang FZ, Qi ZY, Duan KY, Xi DB, Zhu YC, Zhu HS, Xiong H,
Sensing 60:13. https://​doi.​org/​10.​1109/​tgrs.​2022.​31974​02 He Q (2021) A comprehensive survey on transfer learning. Proc
Zhu H, Ma MR, Ma WP (2021a) A spatial-channel progressive fusion IEEE 109(1):43–76. https://d​ oi.o​ rg/1​ 0.1​ 109/j​ proc.2​ 020.3​ 00455​ 5
ResNet for remote sensing classification. Inf Fusion 70:72–87.
https://​doi.​org/​10.​1016/j.​inffus.​2020.​12.​008 Publisher's Note Springer Nature remains neutral with regard to
Zhu NY, Ji X, Tan JL, Jiang YN, Guo Y (2021b) Prediction of dis- jurisdictional claims in published maps and institutional affiliations.
solved oxygen concentration in aquatic systems based on transfer
learning. Comput Electron Agric 180:8. https://d​ oi.o​ rg/1​ 0.1​ 016/j.​ Springer Nature or its licensor (e.g. a society or other partner) holds
compag.​2020.​105888 exclusive rights to this article under a publishing agreement with the
Zhu S, Wei JA, Zhang HR, Xu Y, Qin H (2023) Spatiotemporal deep author(s) or other rightsholder(s); author self-archiving of the accepted
learning rainfall-runoff forecasting combined with remote sens- manuscript version of this article is solely governed by the terms of
ing precipitation products in large scale basins. J Hydrol 616:13. such publishing agreement and applicable law.
https://​doi.​org/​10.​1016/j.​jhydr​ol.​2022.​128727

You might also like