You are on page 1of 12

Process Safety and Environmental Protection 172 (2023) 501–512

Contents lists available at ScienceDirect

Process Safety and Environmental Protection


journal homepage: www.journals.elsevier.com/process-safety-and-environmental-protection

Transforming data into actionable knowledge for fault detection, diagnosis


and prognosis in urban wastewater systems with AI techniques: A
mini-review
Yiqi Liu a, b, *, Pedram Ramin b, Xavier Flores-Alsina b, Krist V. Gernaey b
a
The Key Laboratory of Autonomous Systems and Networked Control, Ministry of Education, School of Automation Science & Engineering, South China University of
Technology 510641, Guangdong, China
b
Process and Systems Engineering Center (PROSYS), Department of Chemical and Biochemical Engineering, Technical University of Denmark, Building 228 A, 2800
Kgs., Lyngby, Denmark

A R T I C L E I N F O A B S T R A C T

Keywords: Recent advances in artificial intelligence (AI) and data analytics (DA) could provide opportunities for the fault
Fault detection management and the decision-making of the urban wastewater treatment systems (UWS) operations. The UWS is
Fault diagnosis typically a large system, including Sewer networks (SNs), Wastewater Treatment plants (WWTPs) and also
Fault prognosis
considering the Receiving media (RM). However, applications of AI and DA in the UWS can be challenging due to
Data analytics
Artificial intelligence
the complexities and size of systems, the large variation in the level of UWS instrumentation, and the relatively
poor data quality. This review goes beyond the state of the art by critically analyzing previous work on AI-based
data-driven methodologies to system-wide fault detection, life cycle fault management and transformation of big
and small data into analytics, particularly, considering two different points of view: process faults (such as
bulking sludge, sewer corrosion & technology specifics) and instrumentation faults (such as sensors and actua­
tors), thereby offering more opportunities to distinguish complex patterns and dynamics. Our analysis reveals the
relative strengths and weaknesses of the different approaches to design fault diagnosis tools and to apply these in
the UWS. Finally, the opportunities and challenges about the inter-play among UWS, data and AI are discussed.

easily adoptable for the UWS. This is mainly due to the stochasticity /
uncertainty of the combination of biological, physical and chemical
1. Introduction reactions in the UWS, such as variation of temperatures across each year,
and occurrence of extreme weather conditions (Nor et al., 2020; Park
With the wake of industry 4.0, Artificial Intelligence (AI) and data et al., 2020; Teh et al., 2020). A wastewater treatment system (WWTS) is
analytics (DA) in the past decades, wastewater treatment processes have composed of interconnected sewer networks (SNs) and Wastewater
been redefined as smart facilities being operated partially or fully Treatment Plants (WWTPs). SNs are purposely used to collect and
automatically (Olsson, 2012a). To maintain a desired and safe perfor­ convey wastewater from households and industry. The collected
mance, a large number of sensors and automation systems are equipped wastewater will then be delivered to wastewater treatment plants
to collect process data as well as to monitor and discover abnormalities (WWTPs) and finally cleaned wastewater enters into the receiving media
with respect to process or instrumentation states in an urban wastewater (RM) (Fig. 1) (Olsson, 2012b). Although WWTSs have already reached a
system (UWS) (Russo et al., 2021, 2020; Kazemi et al., 2020; Rosen significant degree of maturity, faults in SNs or WWTPs will lead to un­
et al., 2004). Depending on the collected data and monitoring system desirable performance in specific parts of the process, which can prop­
states, intelligent fault management is able to provide better operational agate through their entire structure. Development and implementation
safety of the UWS, which is viewed as a set of activities: fault detection, of automatic and intelligent detection of faults (Fault detection), iden­
diagnosis, prognosis and maintenance. tifying rooting causality (Fault diagnosis) and predicting fault evolution
Even though many fault detection algorithms were developed and (Fault prognosis) are urgently needed for safe and efficient operations of
implemented in other fields such as process engineering, they are not

* Corresponding author at: The Key Laboratory of Autonomous Systems and Networked Control, Ministry of Education, School of Automation Science & Engi­
neering, South China University of Technology, 510641 Guangdong, China.
E-mail addresses: aulyq@scut.edu.cn, yiqli@kt.dtu.dk (Y. Liu), pear@kt.dtu.dk (P. Ramin), xfa@kt.dtu.dk (X. Flores-Alsina), kvg@kt.dtu.dk (K.V. Gernaey).

https://doi.org/10.1016/j.psep.2023.02.043
Received 16 November 2022; Received in revised form 24 January 2023; Accepted 14 February 2023
Available online 17 February 2023
0957-5820/© 2023 Institution of Chemical Engineers. Published by Elsevier Ltd. All rights reserved.
Y. Liu et al. Process Safety and Environmental Protection 172 (2023) 501–512

Nomenclature MLR Multiple Linear Regression.


MGPR Multi-Output Gaussian Processes Regression.
AD Anaerobic Digestion. MSPC Multivariate Statistical Process Control.
AI Artificial Intelligence. NO2-N Nitrite Nitrogen.
ASP Activated Sludge Process. NN Neural Networks.
BSM2 Benchmark Simulation Model No.2. PCA Principal Component Analysis.
CC Contribution Charts. RM Receiving Media.
CCA Canonical Correlation Analysis. RNN Recurrent Neural Networks.
DA Data Analytics. RPLS Recursive Partial Least Squares.
DCNN Deep Convolutional Neural Network. RSVM Recursive Support Vector Machines.
DO Dissolved Oxygen. RUL Remaining Useful Life.
ELM Extreme Learning Machine. RVM Relevant Vector Machine.
ENN Ensemble of Neural Networks. SBR Sequencing Batch Reactors.
EPR Evolutionary Polynomial Regression. SID Subspace Identification.
FA Factor Analysis. SNs Sewer Networks.
F/M Food-to-microorganism Ratio. SPE Squared Prediction Error.
FDDP Fault detection, Diagnosis and Prognosis. SRT Sludge Retention Time.
GA Genetic Algorithm. SVI Sludge Volume Index.
GCA Granger Causality Analysis. SVM Support Vector Machine.
GP Gaussian Processes Model. T2 Hotelling’s Statistic.
HGPR Hybrid Gaussian Processes Regression. UWS Urban Wastewater Treatment Systems.
ICA Independent Component Analysis. VAR Vector Auto-regression.
KICA Kernel ICA. VFA Volatile Fatty Acids.
KL Kernel Learning. WWTPs Wastewater Treatment Plants.
KPCA Kernel PCA. MW Moving Window.
MBR Membrane Bioreactors. WM-PCA Moving Window PCA.

UWS. both of them have difficulties in dealing with overly complex systems
To manage abnormal behaviors, such as process faults and equip­ and overly complicated data sets, including challenges related to
ment faults, fault management (detection, diagnosis and prognosis) high-dimensionality, significant nonlinearity and so on (Chi et al.,
approaches are usually categorized into three types: (1) mechanistic- 2018). In addition, the utilization of mechanistic-based and
based, (2) knowledge-based and (3) data-driven methods. knowledge-based methods is mainly limited to a specific application and
Mechanistic-based methods are developed with the help of accurate difficult to be generalized to other areas smoothly. However, as more
mathematical models typically represented by differential equations, and more data can be derived with the help of advanced instrumenta­
which requires a deep understanding of the exact physical, chemical and tion, the problems in which mathematical models are unavailable or
biological behaviors of the system (Abid et al., 2021). On the contrary, overly complex can be solved by the use of data-driven approaches. In
knowledge-based methods can take full use of symbolic representations the past decades, due to the wide-spread use of sensors and automatic
of human knowledge to solve problems, such as expert systems and devices in the wastewater system, a large quantity of data have been
ontology-based models (Wilhelm et al., 2021). The successful use of collected for analytics (Newhart et al., 2019). By analyzing the data
knowledge-based approaches is highly depending on the expert patterns and the relationship between variables in the UWS with
knowledge which is usually difficult to be fully digitalized. Despite the data-driven approaches, the data can be transformed into actionable
wide applications of mechanistic-based and knowledge-based methods, knowledge, then to facilitate data-driven fault detection, diagnosis and

Fig. 1. Overview of a typical Integrated Urban Wastewater Systems.

502
Y. Liu et al. Process Safety and Environmental Protection 172 (2023) 501–512

prognosis, even to improve understanding and decision-making in the • Systematic data transformation to actable and interpretable knowl­
UWS (Abid et al., 2021; Li et al., 2020a, 2020b). edge to support fault detection, diagnosis and prognosis of process
To effectively transform data into actionable knowledge and support faults and instrumentations faults has not been investigated rigor­
data-driven fault detection, diagnosis and prognosis, four types of AI ously. Here we considered multiple sources of data including big data
algorithms are usually used, termed as supervised learning, unsuper­ (e.g. on-line nitrogen sensor, dissolved oxygen sensor), small data (e.
vised learning, semi-supervised learning and reinforcement learning. To g. off-line laboratory analysis in the effluent, COD) analytics with
the best of the authors’ knowledge, only a limited number of papers consideration of nonlinear data and dynamic data.
about reinforcement learning are available reporting on data-driven • The studies on how data-driven models can work together with
fault detection and diagnosis in the wastewater system (Jordan and mechanistic models and knowledge-based models to enhance the
Mitchell, 2015; Wang et al., 2021; Hernandez-del-Olmo and Gaudioso, interpretability of uncertainties are also included in this review.”
2011). Additionally, due to the perpetual operations, the large scale and • System-wide aspects and the life-cycle of fault management for the
the multiple sources of data when composing data sets of a system, the UWS has been investigated and analyzed in this paper. The inspec­
resulting instrumentation and data complexities usually make standard tion of a fault from the earliest stage to emerging conditions (fault
supervised learning, unsupervised learning or semi-supervised learning detection-diagnosis-prognosis) and the analysis of a fault from a
for fault detection and diagnosis inadequate (Aguado and Rosen, 2008; systems point of view (sewers-wastewater treatment-anaerobic
Haimi et al., 2016): digestion) offer a useful cost-effective alternative to refining the
hidden relationships and information across the entire UWS.
• System complexities (i) Due to the a large number of pumping sta­
tions with On-Off behavior especially in pressurized sewer networks The primary purpose of this manuscript is to understand and to
and the continuous treatment processes in the treatment plants, a identify the opportunities related to the implementation of a fault
wastewater system usually exhibits a hybrid behavior (continuous management system in the UWS.
and discrete); (ii) The WWTS is always exposed to high uncertainties,
such as extreme weather (rain and storm events), influent variations 2. Data transformation in fault detection, diagnosis and
(pollution concentrations), and so on; (iii) The management and prognosis
maintenance of the UWS are challenging tasks due to the large scale
and limited accessibility, thus exhibiting a large delay features, for 2.1. Data-driven approaches
example, toxic pollution will propagate from SNs to the WWTPs
taking several hours or even days to reach the treatment plant; (iv) The essential behavior of data-driven methods is to analyze historic
Interactions between micro-scale systems (biological behavior) and datasets collected from the system and discover patterns hidden in the
macro-scale systems (Operations of SNs and WWTPs) adds more historic datasets by means of advanced statistical models or Machine
complexity, and also includes processes such as pH neutralization, Learning. A complete fault management platform can be built through
flocculation, coagulation, ion exchange, and oxidation. AI techniques ranging from data analysis to fault assessment levels. As
• Instrumentation complexities (i) Sensors, actuators, controllers and profiled in Fig. 2, a data-driven fault management platform usually
other mechanical devices in both SN and WWTPs are always exposed consists of fault detection, diagnosis and prognosis (Zhou et al., 2016;
to harsh environments, such as stormy weather, humidity, debris and Sun et al., 2020). The purpose of fault detection is to detect if there is
a corrosive atmosphere. (ii) Due to the frequent operations of devices fault or not, whereas fault diagnosis usually answers the question where
(pumps in SN and blowers in WWTPs), instrumentation is often the fault is located in the system. Then, once a fault has been diagnosed,
maintained poorly as the equipment requires unacceptable frequent fault prognosis can be performed to predict what time an element of a
cleaning and calibration. The malfunction of various technological system will fail or how much Remaining Useful Life (RUL) is left for
components adds further complexity to these events. maintenance (Wen et al., 2022).
• Data complexities (i) Imperfect data: an overwhelming fraction of Depending on the input-output (X-Y) data sets that are available,
missing data and outliers due to instrumentation failure or lack of data-driven solutions can be categorized into: (i) Supervised methods
hardware sensors; (ii) Nonlinearity: multi-valley, multi-peak and which learn from a set of pairs, X-Y and look for differences (e.g., clas­
plateau sections among the dependent and independent variables in sification and regression methods); (ii) Unsupervised methods which
the collected data; (iii) Dynamic and non-stationary features: pump learn only from a set of X without Y and look for similarity, (e.g.,
events or extreme weather or influent variations always result in dimensionality reduction and clustering); (iii) Semi-supervised learning
asynchronous changes over time as new information becomes methods which learn from a set of X and a partial set of Y (e.g., classi­
available; (iv) Co-correlation: the data from macroscale (instrumen­ fication, regression and clustering) (Liu and Xie, 2020). Methods for
tation) and microscale (microbiology) are correlated and collinear dimensionality reduction, such as PCA (Principal Component Analysis),
with each other; (v) Multiple scales: The sample collection time, date FA (Factor Analysis), ICA (Independent Component Analysis) (Zhou
and locations result in multiple sample rates and multiple data scales; et al., 2016), or their kernel variants (Liu et al., 2021), are usually
applied for fault detection. They can reduce the dimensionality of the
Even though many fault detection and diagnosis algorithms were monitored data, thus allowing maximizing differences between diverse
reviewed in process engineering, applications in the UWS are still very data groups and discriminating between faulty samples. Another type of
limited (Sweetapple et al., 2018; Prochaska and Zouboulis, 2020). machine learning algorithm for fault diagnosis is classification which is
However, the system, instrumentation and data complexities in the UWS the task of assigning a label value to a specific class and then recognize if
render standard reviews unsuitable. To sum up, this review is different a new sample belongs to the normal classes or one of the abnormal
from existing reviews (Kazemi et al., 2020; Newhart et al., 2019; Kazor classes (Newhart et al., 2019). The most commonly used approaches
et al., 2016) because: include SVM (Support Vector Machine), NN (Neural Networks), RVM
(Relevant Vector Machine), and ELM (Extreme Learning Machine). In
• Most of the reviews focused on a task with a specific AI algorithm in the classifiers, one-class classifiers are generally used for single fault
the UWS, while none of them investigated and discussed the models detection, while multiple-class classifiers can identify multiple faults
in more detail by coordinating different models for digital twin and simultaneously. Meanwhile, classification or clustering is also able to
by organizing the models according to model type fitting the digital recognize the root causality of the detected faults (Zhang et al., 2018).
twin concept. Regression is mainly used for prediction. In particular, multi-step ahead
prediction assisting with regressions can predict the fault evolution.

503
Y. Liu et al. Process Safety and Environmental Protection 172 (2023) 501–512

Fig. 2. Interactions of data-driven Methodologies, fault Management and the UWS in a fault management platform.

Even though a generic decision to find the best algorithm suitable for driven and mechanistic or knowledge-based approaches with great po­
fault detection, diagnosis and prognosis is difficult, experimenting and tential to improve the predictive or classification power for FDDP
discovering which algorithm and configuration will result in the best (Wilhelm et al., 2021). In general, it is difficult to decide on a universal
performance for a specific task is necessary and expected. FDDP strategy to outperform all other methods under diverse re­
quirements. In this light, combining individual FDDP methods provides
an ability to enhance the effectiveness in terms of diagnosis reliability
2.2. Hybrid of mechanistic or knowledge approaches with data-driven and accuracy. Picabea et. al. proposed a hybrid of a mechanistic
methods approach and a neural network serially to model different types of faults
(Picabea et al., 2021). The proposed hybrid model is then able to identify
Even though a data-driven approach can achieve better predictive the unobservable abnormalities in the process industries (Picabea et al.,
power, they always lack interpretability and extrapolation power. 2021). Also, a data-knowledge hybrid driven method for gas path fault
Mechanistic or knowledge-based approaches have been favored by diagnosis is proposed by integrating a physical model-based gas path
wastewater communities widely mainly due to their straightforward analysis method with a fault diagnosis ontology model (Chen et al.,
interpretability in terms of the incorporated process knowledge. How­ 2022). Generally, serial methods allow to select relatively suitable
ever, these approaches usually must simplify the complex processes that methods at each step and allow for a sequential re-evaluation of results
they should represent at the cost of laborious experiments. To comple­ in each procedure. Additionally, parallel combination of a physical
ment the pros and cons between mechanistic and data-driven models, model with a data-driven approach for fault diagnosis and prognosis for
serial and parallel hybrid structures are usually used to combine data-

504
Y. Liu et al. Process Safety and Environmental Protection 172 (2023) 501–512

rotating machinery was proposed by Leturiondo (2016). Slimani et al. an anaerobic co-digestion process and applied it to support energy
(2018) implemented a parallel combination of an observer based and saving and green gas emission reduction control. However, data-driven
parity equations based fault diagnosis methods, then feeding the com­ iault management, particularly in the UWS (Darvishi et al., 2021).
bined residual set into a threshold test function for fault isolation. It is Matheri et al. (2022) proposed a digital twin in a decentralized water
interesting to note that, by paralleling mechanistic or knowledge-based and wastewater treatment to identify and recognize the spread of
FDDG approaches, domain knowledge can be incorporated into the COVID-19. Investigations from the UWS show that a digital twin com­
data-driven models. Indeed, we are convinced that combining bined with fault management can act more intelligent, but is still in the
data-driven approaches and mechanistic or knowledge-based ap­ infant stage in terms of considering all approaches together for UWS
proaches may enhance the overall performance of the FDDG system. operations (Torfs et al., 2022). A data-driven methodology and fault
management will accelerate the transition of models to digital twin
applications, such as effluent quality control, green gas emission control
2.3. Transition of models for digital twin
and equipment malfunctions.
Even though the aforementioned algorithms can be used for fault
3. Data-driven fault detection, diagnosis and prognosis in the
management separately, it is difficult to achieve sufficient consensus
UWS
across the entire system if many approaches are combined. A digital twin
(DT) provides a cost-effective alternative to combine the individual al­
3.1. Investigation of applications in the UWS
gorithms with each other and to integrate all their basic functions fully
(Torfs et al., 2022). A digital twin is an environment to link data and
In summary, the use of data-driven methods for fault management in
models (regressions, classifications and clustering) to reproduce a vir­
the UWS has been very limited in the past two decades (2001–2022). By
tual representation of a real-world entity or process. For example, to
searching for the terms “Fault” or “Abnormality” together with the
transit the process models to a DT fully, McLamore et al. (2020) com­
keywords ‘Wastewater’ or ‘Activated sludge’ or ‘Sewers’ or ‘Sewage’ or
bined the predictive power of mechanistic models (e.g. ASM, BSM) with
‘Anaerobic digestion’or ‘Water resource recovery’ and ‘Detection’ or
data-driven techniques for obtaining a digital proxy of a bioreactor. In
‘Diagnosis’ or ‘Prognosis’ or ‘Monitoring’ in Scopus, 255 articles were
general, application of DTs can support the existing algorithms towards
found by limiting the search to articles written in English. In these 255
decision-making, optimization, failure analysis, predictive maintenance,
articles, fault detection accounts for 238 articles, whereas fault diagnosis
policy making and investment planning (Torfs et al., 2022; Udugama
accounts for 78 articles. The reason why the total number of articles
et al., 2021). Moretta et al. (2021) used an AD model as a digital twin of

Fig. 3. Contribution percentage of different data-driven methods to the published literature of the past two decades: Principal Component Analysis (PCA), Inde­
pendent Component Analysis (ICA), Canonical Correlation Analysis (CCA), Neural Networks (NN), Kernel learning, Bayesian learning Gaussian Processes Model (GP),
Partial Least Square (PLS), Sampling methods, Factor Analysis (FA), Fuzzy system, Combination/mixture, Expert system, Deep learning, Decision tree.

505
Y. Liu et al. Process Safety and Environmental Protection 172 (2023) 501–512

about fault detection and diagnosis is greater than 255 is because fault (1) Sludge bulking
detection and diagnosis are used together in some articles. As profiled in Sludge bulking is usually considered as one of the most serious
Fig. 3(a) surveyed from Scopus, the number of published papers is problem in the activated sludge process (ASP) for more than 50%
distributed unevenly over the years, but still increased from 2001 to of the WWTPs, mainly resulting from the excessive growth of
2022. It is important to notice that fault detection accounts for most filamentous bacteria in the secondary clarifier. Sludge bulking
applications (Fig. 3(a)). causes poor activated sludge separation and results in a loss of
To further evaluate the contributions and applications of each solids with the final effluent. This leads to higher cost as well as
method, a closer analysis of the major fault detection methods (224 poorer operation (Liu et al., 2016).
articles) and fault diagnosis methods (69 articles) shown in Fig. 3(a) and To recognize a sludge bulking event, Han et al. (2019) pro­
Fig. 3(b). In fault detection, PCA accounts for 17.4% of the published posed a self-organizing type-2 fuzzy-neural-network together
studies (39), whereas other methods exhibit similar percentages. It is with a smart identification method to identify and categorize
interesting to notice that combination or mixture methods attribute a different types of sludge bulking. In this paper, the sludge volume
high percentage relatively. This kind of methods consist of combining or index (SVI) was predicted and used as the indicator for sludge
mixing two or more than two methods to achieve the fault detection or bulking, thus supporting decision-making. Unfortunately, this
diagnosis. Also, the number of fault diagnosis and prognosis studies in method mainly relies on univariate fault detection, which is not a
the wastewater area is much lower than the number of applications of suitable approach to describe multivariate relationships of sludge
fault detection methods. By inspecting the key database in Scopus, a bulking and unable to reflect the true causality. To recognize and
general schematic of all applications of data-driven methods in the UWS predict sludge bulking by monitoring SVI, Liu et al. enhanced
between 2001 and 2021 is displayed in Fig. 3(a). It is obvious that even canonical correlation analysis with the variational Bayesian
though data-driven methods has found its application fault manage­ mixture learning, then to support fault detection design. In this
ment, the data-driven fault detection, diagnosis and prognosis are still in method, the Student’s t-distribution was involved into the vari­
their infancy in the UWS. Particularly, only four articles about fault ational Bayesian mixture of canonical correlation analysis
prognosis in the UWS were found. This is mainly because the fault (VBMCCA) model to ensure that the fault detection method has
degradation data are difficult to collect, thus rendering fault prognosis the ability to account for uncertainties, such as noise and dis­
methods unavailable. turbances. Both sensor faults and sludge bulking were well
recognized accurately and timely with this approach (Liu et al.,
3.2. Process faults 2018). Moreover, to correctly predict sludge bulking beforehand,
a Multi-Output Gaussian Processes Regression (MGPR) together
In the UWS, the process faults mainly consist of sewer corrosion, with a Vector auto-regression (VAR) was used to track down the
sludge bulking and performance failures of wastewater treatment evolution of sludge bulking several days ahead. The associated
technologies as shown in Fig. 4. Wastewater treatment technologies confidence levels in the GPR model were also able to quantify
mainly correspond to treatment efficiency in the WWTPs, such as uncertainty about the model discrepancy (Liu et al., 2017a).
abnormally high energy consumption, too high consumption of expen­ However, the aforementioned methods are only limited to fault
sive chemicals, inefficient operations, and the risk for compliance detention or diagnosis or prognosis. This makes them insufficient
violations. for fault life cycle management. Therefore, a novel maintenance

Fig. 4. Potential process faults in the UWS, including pipe corrosion in SNs, over loading influent, low treatment efficiency, undesired effluent, sludge bulking,
undesired emission in WWTPs and low treatment efficiency in AD.

506
Y. Liu et al. Process Safety and Environmental Protection 172 (2023) 501–512

framework was proposed able to implement collective behaviors which provides a data set able to act as the target values. The
involving fault detection, causality analysis, remaining useful life results suggested that the online ELM outperformed the other
(RUL) and maintenance, then to support sludge bulking man­ investigated models by predicting the concrete mass loss with
agement (Liu et al., 2020). respect to different types of concrete (Zounemat-Kermani et al.,
Even though several AI-based data-driven frameworks were 2021). It is of great importance to note that, rather than big data
proposed to detect, diagnose and predict sludge bulking, the analytics, the prognosis of sewer corrosion is definitely a
specific type of sludge bulking that occurs is difficult to decide small-data problem due to the slow corrosion rate. Therefore,
upon due to the complexities of the interactions between waste­ linear models, SVM and RVM present more suitable behaviors
water composition and the bacteria population in the sludge. and demonstrate better performance. In fact, to deal with the
Therefore, the best way could be to integrate the fault detection, small data issue, the small dataset can be augmented by transfer
diagnosis, and prognosis methodology with image analysis. By learning of domain knowledge and other tasks, such as generative
resorting to image analysis, the type of sludge bulking, such as adversarial networks (Ferguson et al., 2014). This limitation can
filamentous or non-filamentous sludge bulking, can be justified be compensated by the first-principles knowledge, which, in turn,
and proper maintenance strategies can be developed. Zhao et al. can impose constraints on purely data-driven methods and then
(2019). trained a deep U-Net with data augmentation, then per­ can ensure data-driven algorithms converge faster into reason­
formed PCM image analysis to automate floc and filament seg­ able optima.
mentation identification. By analyzing the image segmentation of (3) Wastewater treatment technologies
flocs and filaments, a sludge volume index (SVI) sensor can be
derived and the filamentous bulking can be detected earlier (Zhao The main purpose of wastewater treatment is to remove C, N and P
et al., 2019). But, there is a fundamental issue at stake, since efficiently with the combination of biological, chemical and physical
detection and diagnosis failures could happen due to a large reactions in the treatment plant. The treatment has to ensure that the
number of sludge bulking events. Diverse factors could affect the effluent of WWTPs lives up to a certain quality standard. The most
sludge bulking from the macroscopic or microscopic point of typical methods are the activated sludge process (ASP), membrane
view, such as dissolved oxygen (DO) levels, bioreactor (MBR) processes and sequencing batch reactors (SBR) pro­
food-to-microorganism ratio (F/M), presence of nitrite nitrogen cesses. In the sequel of the wastewater treatment, anaerobic digestion
(NO2-N), carbon to nitrogen (C/N) ratio, pH, sludge retention (AD) is also used to reduce and transform organic waste (sludge) from
time (SRT) (Comas et al., 2008). Understanding the causality of industrial and municipal wastewater treatment plants into a gas mixture
sludge bulking by combining data and prior knowledge will mainly of methane and carbon dioxide. It is imperative to monitor both
potentially facilitate sludge bulking management. wastewater treatment and AD systems to ensure efficient treatment in
(2) Sewer corrosion the entire process (Poh et al., 2016).
Repairing and maintaining sewer networks cost trillions of US Treating wastewater is often combined with complementary tech­
dollars worldwide every year (Pikaar et al., 2014). The emission nologies such as MBR. This technology is widely applied with the
of hydrogen sulfide (H2S) is usually considered as the major advantage of achieving high effluent quality and offering a high rate of
causality of corrosion and odor problems in sewer networks degradation. However, membrane fouling is usually taken as the most
(Jiang et al., 2016). In case that the emission of hydrogen sulfide serious problem to lower treatment efficiency, especially when treating
(H2S) or sulfide can be well predicted in the SNs, proper industrial wastewater. To control membrane fouling, Santos et al. used
decision-making can be achieved to assist engineers and asset PCA and other MSPC to capture the correlations between operating
managers in sewer corrosion management (Jiang et al., 2015). variables and critical analytic variables of a full-scale MBR which aims
Alani et. al. proposed an evolutionary polynomial regression to treat oil refinery wastewater. In this article, several critical variables,
(EPR) based data-driven method to describe the sulphide such as sludge filterability, temperature, and so on, are identified as the
build-up in the sewer pipes. In EPR, a genetic algorithm (GA) most influential variables for membrane fouling. The selected variables
worked together with a least square algorithm to identify the were used for MBR performance prediction, enabling operating faults
parameters, thereby building a predictive model to track down identification. It is interesting to notice that T2 and SPE control charts
the sulphide build-up to support corrosion maintenance and were able to recognize and declare the low efficiency of membrane
prevention (Alani et al., 2014). Different from the EPR model, a permeability operation. Therefore, the MSPC could be utilized for
hybrid of first-principles and data-driven models was built. In the decision-making with respect to fouling control and be set up as a
first layer, a hybrid Gaussian Processes Regression (HGPR) was guideline with respect to the timing to implement chemical cleanings or
also constructed to predict the corrosion rate and corrosion dose membrane permeability improvers (Santos et al., 2021).
initiation time. Then, the service life of sewers was predicted in Another important concern is to monitor the quality variables in the
the second layer by using a first-principles model depending on effluent of WWTPs. Colomer et al. proposed a novel methodology by
the predicted corrosion rate and corrosion initiation time. It is combining both Unfold PCA and a classification algorithm, being able to
important to notice that, since the collected data was limited, an monitor the quality variables in the effluent for an SBR process. In the
interpolation technique was used to extend the data size to ensure proposed methodology, the Unfold PCA model was built and identified
that the data-driven model can be well trained. The proposed by using a normal operational data set, then to discriminate abnormal
model was validated in an Australian SN (Liu et al., 2017b). Even batch events and to acquire a useful fault signature by resorting to the
though the collected data was interpolated by an algorithm, in­ contribution plots. Furthermore, the fault signature was generated and
dependence and Gaussian distribution assumptions among the classified by a classification algorithm, in such a way that the abnormal
data needed to be guaranteed. Also, due to the data limitation, events of the effluent quality can be recognized and estimated. In this
both auto-correlation and partial auto-correlation among data article, nine different classification algorithms are compared and used
were not taken into account, therefore leading to the use of a for monitoring and classifications of the organic matter, ammonium,
steady model for dynamic prediction. nitrate and phosphate removal (Colomer et al., 2013).
Zounemat-Kermani et al. (2021). compared the standard and With the ever-increasing awareness about energy and environment
online version of kernel ELM, NN and MLR, then took full use of security, the AD process has received significant attention from both
the sulfuric acid corrosive factor in SNs to calculate and estimate academic and industrial communities, mainly due to the ability to pro­
concrete mass loss of pipes. To ensure the pipe diversity, six duce bio-energy (biogas) from waste and wastewater. To prevent low
different kinds of concrete were accessed in terms of mass loss efficiency and failures of the AD system, the essential task is to make

507
Y. Liu et al. Process Safety and Environmental Protection 172 (2023) 501–512

sure the four steps of the digestion process (hydrolysis, acidogenesis, solids. Off-line analysis includes. Off-line analysis includes, COD,
acetogenesis, and methanogenesis) perform smoothly and then to pre­ BOD, TP and TN are analyzed off-line, whereas larger WWTPs
vent accumulation of inhibitory or toxic compounds, such as ammonia typically include both off-line and on-line sensors (Olsson,
or sulfide in the system. In the AD system, the VFA concentration, as an 2012a). In the UWS, most sensors are usually exposed to harsh
intermediate compound formed in the acidogenesis step, was selected environments, such as high temperature (e.g. in thermal hydro­
commonly as quality control variable, because it is not only able to lysis), humidity, a corrosive environment, thus leading to fouling
monitor the process health with respect to gas production, but it is also or damages, that are further affecting the sensors accuracy and
capable of reflecting the imbalance in the incoming feed. An SVM-based, reliability (Anter et al., 2020). A sensor failure will degrade the
an ELM-based and an ensemble of NNs (ENN) based data-driven system performance and potentially cause a significant damage,
framework were proposed to predict total volatile fatty acids (VFA) in­ particularly, if the sensing signal is used in control systems or if
side the AD. The residual between actual VFA and predicted VFA was the sensor, e.g. NH+ 4 , is used for decision making (manually or
generated and combined with univariate statistical control charts for automatically). The involvement of sensors in closed loop will
fault detection in the benchmark simulation model No.2 (BSM2) add further potentials for cascade error propagation. Therefore,
(Kazemi et al., 2021). The purpose of efficiency control is to monitor the the early detection, diagnosis and prognosis of faults in sensors
inefficiency and even abnormality, then to locate the root causality of are essentially able to ensure safe and reliable operations of the
inefficiency. Such information is needed to feed information to a suit­ UWS. Based on (Li et al., 2020b), the types of sensor faults are
able controller or an optimizer to enhance the operational efficiency. It classified into degradation failure (bias, drift, gain) and sudden
is important to notice that sewer networks, wastewater treatment plants failure (abrupt, noise, random).
or anaerobic digestion are designed for different purposes. How to co­ Safe and reliable operations of the WWTPs are highly depen­
ordinate different purposes and to optimize the treatment efficiency are dent on the correct function of the DO sensor which is essentially
still open issues for wastewater engineers. an important component of the aeration control system (Åmand
et al., 2013). Aeration accounts for the majority of the energy
consumption in the WWTP, and therefore there are significant
3.3. Instrumentation faults
incentives to properly monitor and control the aeration process.
Luca et al. used PCA to detect different typical DO sensor faults
In the UWS, the instrumentation faults mainly come from sensors,
(Luca et al., 2021). To validate bias, drifting, loss of accuracy,
controllers, actuators as shown in Fig. 5. The effect of such faults could
even complete failure, all aforementioned faults were simulated
be more severe if the sensors or controllers or actuators are incorporated
in a system consisting of Anaerobic-Anoxic-Oxic (A2O) reactors,
in a closed loop.
but not to validate in real data (Henze et al., 1999). The results
demonstrated that all the DO sensor faults can be recognized and
(1) Sensor faults
the performance of the proposed methods was well assessed in
Sensors are widely used to acquire data and information in the
terms of promptness, effectiveness, and accuracy of fault detec­
UWS. Small WWTPs always install standard on-line sensors, such
tion. Deep learning methods, such as variational residual
as DO, Flow, Level, oxidation-reduction potential and suspended

Fig. 5. The general overview of integration of sensors, actuators, controllers, and cyber-physical structure in the UWS.

508
Y. Liu et al. Process Safety and Environmental Protection 172 (2023) 501–512

autoencoders and deep dropout neural network are discussed and between the residual signals and a designed threshold, the decision for a
implemented in other studies for sensor fault detection in WWTPs fault identification and corresponding fault signatures can be derived
(Mali and Laskar, 2020; Ba-Alawi et al., 2022). and applied for a wastewater pump. The article imposed both known
Beside DO, flow sensors and ammonium sensors are also and unknown faulty signatures into a wastewater pump, demonstrating
commonly used in the UWS. Samuelsson et al. enhanced the GP that the proposed method is more sensitive than standard methods, then
model with the sequential Monte Carlo estimation methodology able to achieve more accurate fault diagnosis for both known and un­
(GP-SMC), to predict the drift faults in flow rate and ammonium known faults (Purbowaskito et al., 2021). It is interesting to notice that
sensors. The results demonstrated that GP-SMC achieved the best the former article is based on a purely data-driven model, whereas the
performance for the sensor faults prediction and was able to latter article focuses on the mechanistic model by resorting to a
monitor full-scale WWTPs (Samuelsson et al., 2017). To minimize state-space model. Also, both articles used a divide and conquer strat­
the negative influence of sensor faults on the quality of the con­ egy, aiming to explore sub-models to achieve more accurate perfor­
trol system performance, Dovžan et al. proposed a fault-detection mance. From an engineering point of view, developing an accurate fault
system using the evolving fuzzy method together with an adap­ detection method requires sufficient faulty data in the UWS. However,
tation mechanism to manipulate and identify the models’ pa­ deriving enough data for actuator faults is unrealistic. The alternative
rameters on-line (Dovžan et al., 2015). Incorrect manual solutions are to leverage expert knowledge to work together with
calibrations as well as faults in oxygen, air-flow sensors and data-driven fault detection and diagnosis methods similar with the small
influent ammonia concentration sensors were validated by using data learning methodologies. Furthermore, combining data from actu­
the proposed fault-detection system (Dovžan et al., 2015). In ators and sensors is useful to enhance a fault analysis. However, the
addition, Cheng et al. (2020) proposed a novel Bayesian transfer imbalance of actuator samples and sensor samples could lead to inac­
learning methodology enhanced by ensemble adaptive sparse curate learning for fault classifications. Semi-supervised learning could
learning, then further implemented it for nonlinear fault diag­ provide an alternative.
nosis. In this article, to monitor the operating conditions, the
proposed method took full use of the probabilistic relevance 4. Discussion, challenges and opportunities
vector machine and a Bayesian framework to approach the fault
evolutions. Transfer learning can be used to transfer historical 4.1. Discussion
data or learned models to update the diagnosis model in the new
scenarios. The proposed framework was validated for sensor fault So far, the focus of this review has been on approaches and appli­
detection in a full-scale WWTP (Cheng et al., 2020). Due to the cations showing the potentials of AI and DA to substantially promote the
importance of the effluent quality in view of stringent environ­ fault detection, diagnosis and prognosis of the UWS. These will enable a
mental standards, most papers focus on oxygen, nitrate and ni­ switch from a pure infrastructure to a smart system and result in
trite sensors which all measure variables with a significant effect upgrading existing infrastructures for active management of process
on the effluent quality. Despite these studies, in general, it can be faults (sludge bulking, sewer corrosion, performance degradation) and
concluded that sensor fault diagnosis and prognosis in the UWS instrument faults (sensor and actuator faults). This is especially relevant
still requires significant research and development to include a in view of the ongoing development of information science, artificial
more comprehensive evaluation of faults and its consequences intelligence and automation. In general, nonlinear methods always
related to different sensors especially for the ones connected to require big data for training, particularly, when these nonlinear methods
control systems. are extended to a deep learning structure or broad learning structure
(2) Actuators faults (fast and accurate learning without deep structure). Otherwise,
nonlinear methods could converge into local minima and result in
Actuators are the essential elements in the UWS able to take proper under-fitting. On the contrary, linear methods can be utilized to both big
actions in response to a control system, for example, to manipulate KLa and small data. The main reason why SVM, RVM and other kernel
to ensure proper DO in the aerobic tanks. KLa is the oxygen mass transfer methods can perform well is because they can choose a linear kernel
coefficient, equivalent to airflow in aerated reactors. The typical actu­ function for classifications. It is interesting to notice that all the listed
ators in the UWS mainly involve flow valves, flow splitters, gates, methods are resorting to mapping manipulation which is able to
pumps, moveable weirs as well as inflatable dams. Moreover, a large represent high dimensional data into another subspace of lower
number of actuators have been developed and used with a significant dimensionality. By discriminating the residual between the original and
effect on treatment quality, such as aeration devices and chemical the reconstructed raw data, abnormalities can be recognized properly.
dosing devices. Properly manipulating the actuators ensures optimal Most of the surveyed statistical learning methods, such as PCA, ICA, CCA
opening and closing times, thus optimizing the number of displacements and others, must be combined with contribution plots if the intention is
and reducing the failure frequency. to perform fault diagnosis. The reason why the methodologies are able
To deal with the multi-period fault diagnosis of actuators in an SBR, to deal with multiple faults detection mainly results from the fact that
Zhou et al. proposed an enhanced multi-way principal component the corresponding nonlinear methods can be used for multiple classifi­
analysis (MPCA). In this method, based on the sub-period division and cations simultaneously. Meanwhile, prognosis is mainly depending on
using the similarity measurement strategies, the interference and in­ whether the methods can perform regression or even multiple-step
teractions among faults in multiple periods can be minimized, then to regression. It is obvious that fault diagnosis methods are not present.
ensure accurate sequential fault recognition and identification. The This is mainly based on the fact that, to properly diagnose the root
proposed methodology was used to monitor a large number of variables causality, contribution plots and their variants have to work together
in the SBR reactor of a paper mill, for example, the blower current, the with the linear dimensionality reduction methods. Another alternative is
blower valve opening, the wastewater level as well as the dissolved to resort to graph theory and network analysis methods, such as Granger
oxygen. The results show the feasibility and reliability of the proposed Causality Analysis and Bayesian Networks. Through graph theory
MPCA method (Zhou et al., 2021). In addition, Purbowaskito et al. methods, the inter and intra- relationship among variables can be
proposed a data-driven subspace identification (SID) algorithm to detect analyzed and virtualized.
and diagnose the abnormal events in an induction motor (Purbowaskito
et al., 2021). In this article, a state-space model related to the voltage 4.2. Challenges
and current signals was built and then used for residual signal genera­
tion in a quasi-steady-state condition. Depending on the comparison The availability of more data also raises concerns, even though

509
Y. Liu et al. Process Safety and Environmental Protection 172 (2023) 501–512

advances in technologies and methods are catching up. We suggest fault detection, diagnosis and prognosis results are interpretable
seven challenges which need special considerations: and explainable.
(6) How to deal with the occurrences and interactions of multiple
(1) How to quantify and qualify the fault propagations systematically faults issues in the UWS. Because a large number of sensors and
and to combine them with optimization techniques to assist op­ instruments are installed and work under complex conditions
erators to optimize fault management of the current and future (such as stormy weather, corrosion due to H2S, over-sensitive
scenarios. Even though the fault management of SNs, WWTPs and behaviors of bacteria, and so on) in the UWS, there will be a
ADs is considered separately, few research efforts have been high potential that coexistence of multiple faults could happen.
devoted to take into account the relationships among the However, until now, limited research effort has been devoted to
different processes and take a systematic view on how to safely the multiple faults problem in the UWS (Fragkoulis et al., 2011).
operate the entire urban wastewater system. The super large- If such a situation would occur – i.e. occurrence of multiple faults
scale UWS adds further complexity to the fault propagations simultaneously – combined with the hybrid of discrete and
and typically costs several days across the entire system. Plant- continuous behaviors, the coexistence of multiple faults could
wide or system-wide fault detection, diagnosis and prognosis add further complexity and render faults impossible to be isolated
methods could act as the alternatives to deal with these problems. and diagnosed correctly.
Also, it is interesting to notice that systematic management of (7) How to involve frontline staff behaviors in a closed loop to form
faults can virtualize the fault propagations across the UWS (Ge, the cyber-physical system to improve UWS management. The
2017). wastewater system and cybernetic system (Human-Hardware
(2) How to analyze fault evolutions since the early stage of a fault and devices, i.e., sensors, actuators and controllers-information) are
then to support decision-making in the UWS. Most studies are increasingly complex and interrelated, particularly, the often
mainly focusing on a single fault management step, such as fault unknown interplay between the macroscale (instrumentations,
detection or diagnosis or prognosis (Newhart et al., 2019). Slow human, reaction tanks) and microscale (microbial ecosystem)
evolution rates of undesired bacteria’s behaviors in the sewers or systems. A partial digital twin, rather than a complete digital
WWTPs provides more potential to combine fault detection, twin, could be more practical. Thus, it is important to take into
diagnosis and prognosis. However, a single step is definitely not account the needs of frontline staff to deliver the right actionable
sufficient to support the decision making properly. The devel­ knowledge in the correct format to make the appropriate decision
opment of a life cycle method for fault management is imperative, with more trust and less time consumption. Understanding how a
which will integrate fault detection, diagnosis and prognosis cybernetic system interacts with the UWS and frontline staff and
simultaneously and enable better maintenance actions to be then performing uncertainty analysis, reliability and resilience
taken earlier. These will add further potential for reducing failure analysis of fault management methodologies and the UWS are of
and potential cost-savings. great importance.
(3) How to design a mixed optimization algorithm to coordinate a
hybrid of discrete and continuous behaviors in the UWS to ach­ 4.3. Opportunities
ieve more energy saving, better effluent quality and greenhouse
gas emission. Due to the on-off working patterns in pumping The purpose of transforming data into actionable knowledge in fault
stations, which is different from other process industries, the detection, diagnosis and prognosis of UWS is to improve maintenance,
UWS will exhibit a hybrid of discrete and continuous behaviors. prevent system failure and finally ensure optimal operation and cost-
These hybrid behaviors will add further complexity to fault saving. To achieve this purpose, the UWS needs to transform raw data
detection, diagnosis and prognosis methodologies (Xiao et al., to informative knowledge for fault management and to automate their
2021). Therefore, coordinating discrete and continuous behav­ current safe operation paradigms. Despite a large number of literature
iors is necessary for optimal fault management. references reporting on AI-based data-driven fault management
(4) How to assimilate data sets with different sample scales for fault methods in the UWS, a guidance for method selection is not well
detection, diagnosis and prognosis. In the UWS, the data could be formulated, leading to difficulties in selecting the ‘best’ method to
collected at fast (such as flow rate, DO) or slow sample rates (such address domain issues. Therefore, there is now an ever-increasing need
as COD, VFA). Therefore, it is difficult to decide that the data- to develop standardized protocols rather than new techniques. It is of
driven methodologies are purely dependent on big data or great importance that the existence of tools like the Benchmark Simu­
small data analytics for fault detection, diagnosis and prognosis. lation Model (BSM) platform provides a baseline for method compari­
Transformations of big, median and small data to actionable sons, even to twin or partially twin the true system (Flores-Alsina et al.,
knowledge render standard methods unsuitable. Semi-supervised 2012).
learning could be the option for solutions that ensure full use of The concept of Digital Twins (DT) was proposed for product lifecycle
all the data (slow sample rate and fast sample rate). management (Udugama et al., 2021). By digitalizing the UWS, the be­
(5) How to collaborate data-driven, knowledge-based and mecha­ haviors of the system, such as the macroscale system (e.g. sewer net­
nistic fault management methodologies deeply or iteratively to works, pump stations, WWTP) and the microscale system (for example,
achieve accurate fault management, rather than simply combine aerobic bacteria and sulphate reducing bacteria), can be simulated and
them serially or parallelly, due to the combination of known and predicted, which can be used for fault detection, diagnosis and prog­
unknown physical, chemical and biological behaviors in the nosis. In addition, due to the wide spreading of sewer networks and
UWS. Since data-driven, knowledge-based and mechanistic fault WWTPs around cities, it is inevitable that the UWS exhibits a super large
management methodologies could complement each other in scale management structure. Sending data to the cloud for analysis will
UWS applications, a hybrid of data-driven and knowledge-driven however not likely to be the dominant trend in the UWS in the near
methodologies could potentially be better. Even though artificial future due to the computational burden from the increasing number of
intelligence received significant attention, most methods are not devices and the extensive data traffic in the Internet-of-Things. New
easily interpretable, such as NN. This adds a large amount of computing paradigms, such as edge computing and transparent
uncertainties and unreliability to the data-driven methodologies computing, have emerged to reshape the UWS cyber system such that
and increases the unreliability of fault detection, diagnosis and heavy data-driven models and DTs or partial DTs will become closer and
prognosis methods (Loquercio et al., 2020). Prior knowledge is closer to reality. These models can simulate detailed activities of bio­
needed to assimilate into data-driven methods to ensure that the logical behaviors from the microscale perspective, which usually implies

510
Y. Liu et al. Process Safety and Environmental Protection 172 (2023) 501–512

intensive computation. This can be solved by quantum computing in the Abid, A., Khan, M.T., Iqbal, J., 2021. A review on fault detection and diagnosis
techniques: basics and beyond. Artif. Intell. Rev. 54, 3639–3664.
future (Andersson et al., 2022). By approaching the microscale and
Aguado, D., Rosen, C., 2008. Multivariate statistical monitoring of continuous
macroscale behaviors of the UWS with digital twins, both instrumen­ wastewater treatment plants. Eng. Appl. Artif. Intell. 21, 1080–1091.
tation and bacteria-related faults can be detected, diagnosed and pre­ Alani, A.M., Faramarzi, A., Mahmoodian, M., Tee, K.F., 2014. Prediction of sulphide
dicted properly. build-up in filled sewer pipes. Environ. Technol. 35, 1721–1728.
Åmand, L., Olsson, G., Carlsson, B., 2013. Aeration control – a review. Water Sci.
In summary, the most promising prospect is that data-driven fault Technol. 67, 2374–2398.
detection, diagnosis and prognosis of the UWS can offer novel ways to Andersson, M.P., Jones, M.N., Mikkelsen, K.V., You, F., Mansouri, S.S., 2022. Quantum
facilitate safe and optimal operations, and provide a higher level of computing for chemical and biomolecular product design. Curr. Opin. Chem. Eng.
36, 100754.
maintenance with less failures and lower investment. Anter, A.M., Gupta, D., Castillo, O., 2020. A novel parameter estimation in dynamic
model via fuzzy swarm intelligence and chaos theory for faults in wastewater
5. Conclusions treatment plant. Soft Comput. 24, 111–129.
Ba-Alawi, A.H., Loy-Benitez, J., Kim, S., Yoo, C., 2022. Missing data imputation and
sensor self-validation towards a sustainable operation of wastewater treatment
This review attempted to investigate and analyze the methods and plants via deep variational residual autoencoders. Chemosphere 288.
applications to turn data into actionable knowledge for fault detection, Chen, J., Hu, Z., Lu, J., Zheng, X., Zhang, H., Kiritsis, D., 2022. A data-knowledge hybrid
driven method for gas turbine gas path diagnosis. Appl. Sci.
diagnosis and prognosis of process faults and instrumentation faults in Cheng, H., Liu, Y., Huang, D., Xu, C., Wu, J., 2020. A novel ensemble adaptive sparse
an UWS. In this article, several potential research aspects were found bayesian transfer learning machine for nonlinear large-scale process monitoring.
important and are supposed to be further developed in the future: Sensors 20, 1–17.
Chi, H., Pitter, S., Li, N., Tian, H., 2018. Big data solutions to interpreting complex
systems in the environment. In: Srinivasan, S. (Ed.), Guide to Big Data Applications.
• Qualifications of a fault from a systematic, life-cycle and multiple- Springer International Publishing, Cham, pp. 107–124.
scale view; Colomer, J., Wong, A., Coma, M., Puig, S., Colprim, J., 2013. Qualitative estimation of
SBR biological nutrient removal performance for wastewater treatment. J. Chem.
• Hybrid of data-driven, mechanistic and knowledge-driven models for Technol. Biotechnol. 88, 1305–1313.
fault management; Comas, J., Rodríguez-Roda, I., Gernaey, K.V., Rosen, C., Jeppsson, U., Poch, M., 2008.
• Coordination of continuous and discrete behaviors across the UWS Risk assessment modelling of microbiology-related solids separation problems in
activated sludge systems. Environ. Model. Softw. 23, 1250–1261.
for fault management;
Darvishi, H., Ciuonzo, D., Eide, E.R., Rossi, P.S., 2021. Sensor-fault detection, isolation
• Detection, diagnosis and prognosis of multiple faults simultaneously; and accommodation for digital twins via modular data-driven architecture. IEEE
• Involvement of human behaviors in a closed loop for fault Sens. J. 21, 4827–4838.
management; Dovžan, D., Logar, V., Škrjanc, I., 2015. Implementation of an Evolving Fuzzy Model
(eFuMo) in a Monitoring System for a Waste-Water Treatment Process. IEEE Trans.
Fuzzy Syst. 23, 1761–1776.
By analyzing the above perspectives, challenges and opportunities Ferguson, A.R., Nielson, J.L., Cragin, M.H., Bandrowski, A.E., Martone, M.E., 2014. Big
are massive and could rapidly reshape the UWS facilities. One can data from small data: data-sharing in the ’long tail’ of neuroscience. Nat. Neurosci.
17, 1442–1447.
imagine the UWS benefiting from artificial intelligence and data ana­ Flores-Alsina, X., Gernaey, K.V., Jeppsson, U., 2012. Global sensitivity analysis of the
lytics. The UWS will be upgraded into a smart and sophisticated system BSM2 dynamic influent disturbance scenario generator. Water Sci. Technol. 65,
with reduced failure, better maintenance and even more focus on cost- 1912–1922.
Fragkoulis, D., Roux, G., Dahhou, B., 2011. Detection, isolation and identification of
saving. multiple actuator and sensor faults in nonlinear dynamic systems: Application to a
waste water treatment process. Appl. Math. Model. 35, 522–543.
CRediT authorship contribution statement Ge, Z., 2017. Review on data-driven modeling and monitoring for plant-wide industrial
processes. Chemom. Intell. Lab. Syst. 171, 16–25.
Haimi, H., Mulas, M., Corona, F., Marsili-Libelli, S., Lindell, P., Heinonen, M., et al.,
Yiqi Liu: data curation, investigation, formal analysis, validation, 2016. Adaptive data-derived anomaly detection in the activated sludge process of a
and writing-original draft. Pedram Ramin: conceptualization, meth­ large-scale wastewater treatment plant. Eng. Appl. Artif. Intell. 52, 65–80.
Han, H.G., Liu, H.X., Liu, Z., Qiao, J.F., 2019. Fault detection of sludge bulking using a
odology, writing-review and editing. Xavier Flores-Alsina: wri­ self-organizing type-2 fuzzy-neural-network. Control Eng. Pract. 90, 27–37.
ting—review and editing, supervision, and funding acquisition. Krist V. Henze, M., Gujer, W., Mino, T., Matsuo, T., Wentzel, M.C., Marais, Gv.R., et al., 1999.
Gernaey: writing—review and editing, supervision, and funding Activated sludge model No.2d, ASM2D. Water Sci. Technol. 39, 165–182.
Hernandez-del-Olmo, F., Gaudioso, E., 2011. Reinforcement Learning Techniques for the
acquisition. Control of WasteWater Treatment Plants. Springer Berlin Heidelberg, Berlin,
Heidelberg, pp. 215–222.
Jiang, G., Sun, J., Sharma, K.R., Yuan, Z., 2015. Corrosion and odor management in
Declaration of Competing Interest sewer systems. Curr. Opin. Biotechnol. 33, 192–197.
Jiang, G., Keller, J., Bond, P.L., Yuan, Z., 2016. Predicting concrete corrosion of sewers
The authors declare that they have no known competing financial using artificial neural network. Water Res. 92, 52–60.
Jordan, M.I., Mitchell, T.M., 2015. Machine learning: trends, perspectives, and prospects.
interests or personal relationships that could have appeared to influence Science 349, 255–260.
the work reported in this paper. Kazemi, P., Giralt, J., Bengoa, C., Masoumian, A., Steyer, J.-P., 2020. Fault detection and
diagnosis in water resource recovery facilities using incremental PCA. Water Sci.
Technol. 82, 2711–2724.
Acknowledgements Kazemi, P., Bengoa, C., Steyer, J.P., Giralt, J., 2021. Data-driven techniques for fault
detection in anaerobic digestion process. Process Saf. Environ. Prot. 146, 905–915.
Yiqi Liu thanks for the support of Horizon 2020 Framework Pro­ Kazor, K., Holloway, R.W., Cath, T.Y., Hering, A.S., 2016. Comparison of linear and
nonlinear dimension reduction techniques for automated process monitoring of a
gramme-Marie Skłodowska-Curie Individual Fellowships (891627). This decentralized wastewater treatment facility. Stoch. Environ. Res. Risk Assess. 30,
work was partially supported by the National Natural Science Founda­ 1527–1544.
tion of China (62273151, 61873096, 62073145), the Basic and Applied Leturiondo, U., 2016. Hybrid Modelling in Condition Monitoring ([Doctoral thesis,
comprehensive summary]). University of Technology, Luleå: Luleå.
Basic Research Foundation of Guangdong Province Li, D., Wang, Y., Wang, J., Wang, C., Duan, Y., 2020a. Recent advances in sensor fault
(2020A1515011057, 2021B1515420003), the Guangdong International diagnosis: a review. Sens. Actuators A Phys. 309, 111990.
Scientific Cooperation Research Foundation (2020A0505100024, Li, D., Wang, Y., Wang, J., Wang, C., Duan, Y., 2020b. Recent advances in sensor fault
diagnosis: a review. Sens. Actuators A Phys. 309.
2021A0505060001).
Liu, H., Yang, J., Zhang, Y., Yang, C., 2021. Monitoring of wastewater treatment
processes using dynamic concurrent kernel partial least squares. Process Saf.
References Environ. Prot. 147, 274–282.
Liu, Y., Xie, M., 2020. Rebooting data-driven soft-sensors in process industries: a review
of kernel methods. J. Process Control 89, 58–73.
A. Udugama, I., Öner, M., Lopez, P.C., Beenfeldt, C., Bayer, C., Huusom, J.K., et al., 2021.
Liu, Y., Guo, J., Wang, Q., Huang, D., 2016. Prediction of filamentous sludge bulking
Towards digitalization in bio-manufacturing operations: a survey on application of
using a state-based gaussian processes regression model. Sci. Rep. 6, 31303.
big data and digital twin concepts in Denmark. Front. Chem. Eng. 3.

511
Y. Liu et al. Process Safety and Environmental Protection 172 (2023) 501–512

Liu, Y., Pan, Y., Huang, D., Wang, Q., 2017a. Fault prognosis of filamentous sludge Rosen, C., Jeppsson, U., Vanrolleghem, P., 2004. Towards a common benchmark for
bulking using an enhanced multi-output gaussian processes regression. Control Eng. long-term process control and monitoring performance evaluation. Water Sci.
Pract. 62, 46–54. Technol. a J. Int. Assoc. Water Pollut. Res. 50, 41–49.
Liu, Y., Liu, B., Zhao, X., Xie, M., 2018. A mixture of variational canonical correlation Russo, S., Lürig, M., Hao, W., Matthews, B., Villez, K., 2020. Active learning for anomaly
analysis for nonlinear and quality-relevant process monitoring. IEEE Trans. Ind. detection in environmental data. Environ. Model. Softw. 134, 104869.
Electron. 65, 6478–6486. Russo, S., Besmer, M.D., Blumensaat, F., Bouffard, D., Disch, A., Hammes, F., et al., 2021.
Liu, Y., Yuan, L., Huang, S., Huang, D., Liu, B., 2020. Integrated design of monitoring, The value of human data annotation for machine learning based anomaly detection
analysis and maintenance for filamentous sludge bulking in wastewater treatment. in environmental systems. Water Res. 206, 117695.
Measurement 155, 107548. Samuelsson, O., Björk, A., Zambrano, J., Carlsson, B., 2017. Gaussian process regression
Liu, Y., Song, Y., Keller, J., Bond, P., Jiang, G., 2017b. Guangming prediction of concrete for monitoring and fault detection of wastewater treatment processes. Water Sci.
corrosion in sewers with hybrid Gaussian processes regression model. RSC Adv. 7, Technol. 75, 2952–2963.
30894–30903. Santos, A.V., Lin, A.R.A., Amaral, M.C.S., Oliveira, S.M.A.C., 2021. Improving control of
Loquercio, A., Segu, M., Scaramuzza, D., 2020. A general framework for uncertainty membrane fouling on membrane bioreactors: a data-driven approach. Chem. Eng. J.
estimation in deep learning. IEEE Robot. Autom. Lett. 5, 3153–3160. 426.
Luca, A.-V., Simon-Várhelyi, M., Mihály, N.-B., Cristea, V.-M., 2021. Data driven Slimani, A., Ribot, P., Chanthery, E., Rachedi, N., 2018. Fusion of model-based and data-
detection of different dissolved oxygen sensor faults for improving operation of the based fault diagnosis approaches. IFAC Pap. 51, 1205–1211.
WWTP control system. Processes 9, 1633. Sun, W., Paiva, A.R.C., Xu, P., Sundaram, A., Braatz, R.D., 2020. Fault detection and
Mali, B., Laskar, S.H., 2020. Incipient fault detection of sensors used in wastewater identification using Bayesian recurrent neural networks. Comput. Chem. Eng. 141,
treatment plants based on deep dropout neural network. SN Appl. Sci. 2. 106991.
Matheri, A.N., Belaid, M., Njenga, C.K., Ngila, J.C., 2022. Water and wastewater digital Sweetapple, C., Astaraie-Imani, M., Butler, D., 2018. Design and operation of urban
surveillance for monitoring and early detection of the COVID-19 hotspot: industry wastewater systems considering reliability, risk and resilience. Water Res. 147, 1–12.
4.0. Int. J. Environ. Sci. Technol. Teh, H.Y., Kempa-Liehr, A.W., Wang, K.I.K., 2020. Sensor data quality: a systematic
McLamore, E.S., Huffaker, R., Shupler, M., Ward, K., Datta, S.P.A., Katherine Banks, M., review. J. Big Data 7, 11.
et al., 2020. Digital Proxy of a Bio-Reactor (DIYBOT) combines sensor data and data Torfs, E., Nicolaï, N., Daneshgar, S., Copp, J.B., Haimi, H., Ikumi, D., et al., 2022. The
analytics to improve greywater treatment and wastewater management systems. Sci. transition of WRRF models to digital twin applications. Water Sci. Technol. 85,
Rep. 10, 8015. 2840–2853.
Moretta, F., Rizzo, E., Manenti, F., Bozzano, G., 2021. Enhancement of anaerobic Wang, D., Thunéll, S., Lindberg, U., Jiang, L., Trygg, J., Tysklind, M., et al., 2021.
digestion digital twin through aerobic simulation and kinetic optimization for co- A machine learning framework to improve effluent quality control in wastewater
digestion scenarios. Bioresour. Technol. 341, 125845. treatment plants. Sci. Total Environ. 784, 147138.
Newhart, K.B., Holloway, R.W., Hering, A.S., Cath, T.Y., 2019. Data-driven performance Wen, Y., Fashiar Rahman, M., Xu, H., Tseng, T.-L.B., 2022. Recent advances and trends of
analyses of wastewater treatment plants: A review. Water Res. 157, 498–513. predictive maintenance from data-driven machine prognostics perspective.
Nor, N.M., Hassan, C.R.C., Hussain, M.A., 2020. A review of data-driven fault detection Measurement 187, 110276.
and diagnosis methods: applications in chemical process systems. Rev. Chem. Eng. Wilhelm, Y., Reimann, P., Gauchel, W., Mitschang, B., 2021. Overview on hybrid
36, 513–553. approaches to fault detection and diagnosis: combining data-driven, physics-based
Olsson, G., 2012a. Water and Wastewater Operation : Instrumentation, Monitoring, and knowledge-based models. Procedia CIRP 99, 278–283.
Control and Automation. In: Meyers, R.A. (Ed.), Encyclopedia of Sustainability Xiao, C., Yu, M., Zhang, B., Wang, H., Jiang, C., 2021. Discrete component prognosis for
Science and Technology. Springer New York, New York, NY, pp. 11946–11960. hybrid systems under intermittent faults. IEEE Trans. Autom. Sci. Eng. 18,
Olsson, G., 2012b. ICA and me – A subjective review. Water Res. 46, 1585–1624. 1766–1777.
Park, Y.J., Fan, S. K.S., Hsu, C.Y., 2020. A review on fault detection and process Zhang, L., Lin, J., Karim, R., 2018. Adaptive kernel density-based anomaly detection for
diagnostics in industrial processes. Processes 8, 1123. nonlinear systems. Knowl. -Based Syst. 139, 50–63.
Picabea J., Maestri M., Cassanello M., Horowitz G. Hybrid model for fault detection and Zhao, L.J., Zou, S.D., Zhang, Y.H., Huang, M.Z., Zuo, Y., Wang, J., et al., 2019.
diagnosis in an industrial distillation column. 2021;16:169–80. Segmentation of activated sludge phase contrast microscopy images using U-net
Pikaar I., Sharma K.R., Hu S., Gernjak W., Keller J., Yuan Z. Reducing sewer corrosion deep learning model. Sens. Mater. 31, 2013–2028.
through integrated urban water management. 2014;345:812–814. Zhou, F., Park, J.H., Liu, Y., 2016. Differential feature based hierarchical PCA fault
Poh, P.E., Gouwanda, D., Mohan, Y., Gopalai, A.A., Tan, H.M., 2016. Optimization of detection method for dynamic fault. Neurocomputing 202, 27–35.
wastewater anaerobic digestion using mechanistic and meta-heuristic methods: Zhou, J., Huang, F., Shen, W., Liu, Z., Corriou, J.P., Seferlis, P., 2021. Sub-period division
current limitations and future opportunities. Water Conserv. Sci. Eng. 1, 1–20. strategies combined with multiway principle component analysis for fault diagnosis
Prochaska, C., Zouboulis, A., 2020. A mini-review of urban wastewater treatment in on sequence batch reactor of wastewater treatment process in paper mill. Process
greece: history, development and future challenges. Sustainability 12, 6133. Saf. Environ. Prot. 146, 9–19.
Purbowaskito, W., Lan, C.Y., Fuh, K., 2021. A novel fault detection and identification Zounemat-Kermani, M., Alizamir, M., Yaseen, Z.M., Hinkelmann, R., 2021. Concrete
framework for rotating machinery using residual current spectrum. Sensors 21. corrosion in wastewater systems: prediction and sensitivity analysis using advanced
extreme learning machine. Front. Struct. Civ. Eng. 15, 444–460.

512

You might also like