You are on page 1of 11

Bioresource Technology 343 (2022) 126099

Contents lists available at ScienceDirect

Bioresource Technology
journal homepage: www.elsevier.com/locate/biortech

Review

The role of machine learning to boost the bioenergy and


biofuels conversion
Zhengxin Wang a, b, 1, Xinggan Peng c, 1, Ao Xia a, b, *, Akeel A. Shah a, b, Yun Huang a, b,
Xianqing Zhu a, b, Xun Zhu a, b, Qiang Liao a, b
a
Key Laboratory of Low-grade Energy Utilization Technologies and Systems, Chongqing University, Ministry of Education, Chongqing 400044, PR China
b
Institute of Engineering Thermophysics, School of Energy and Power Engineering, Chongqing University, Chongqing 400044, PR China
c
School of Electrical and Electronic Engineering, Nanyang Technological University, 639798, Singapore

H I G H L I G H T S G R A P H I C A L A B S T R A C T

• Complexity of bioenergy system limits


accurate prediction by experience or
theory.
• Machine learning (ML) opens a new
opportunity in prediction of bioenergy
system.
• Classification, regression, and optimiza­
tion are involved in bioenergy problems.
• ML boosts the research in lignocellulosic
biofuels and microalgae cultivation.
• Novel ML algorithms (such as deep
learning) with large databases are
expected.

A R T I C L E I N F O A B S T R A C T

Keywords: The development and application of bioenergy and biofuels conversion technology can play a significant role for
Bioenergy the production of renewable and sustainable energy sources in the future. However, the complexity of bioenergy
Biofuels systems and the limitations of human understanding make it difficult to build models based on experience or
Machine learning
theory for accurate predictions. Recent developments in data science and machine learning (ML), can provide
Lignocellulosic biomass
new opportunities. Accordingly, this critical review provides a deep insight into the application of ML in the
Algae
bioenergy context. The latest advances in ML assisted bioenergy technology, including energy utilization of
lignocellulosic biomass, microalgae cultivation, biofuels conversion and application, are reviewed in detail. The
strengths and limitations of ML in bioenergy systems are comprehensively analysed. Moreover, we highlight the
capabilities and potential of advanced ML methods when encountering multifarious tasks in the future prospects
to advance a new generation of bioenergy and biofuels conversion technologies.

1. Introduction fossil fuels, the burning of which released over 32 billion tons of carbon
dioxide, causing serious environmental pollution (BP, 2021). In
In 2020, nearly 83.1% of global energy consumption came from contrast, the abundant and diverse biomass waste resources, such as

* Corresponding author.
E-mail address: aoxia@cqu.edu.cn (A. Xia).
1
These authors contributed equally to this work.

https://doi.org/10.1016/j.biortech.2021.126099
Received 9 August 2021; Received in revised form 4 October 2021; Accepted 5 October 2021
Available online 7 October 2021
0960-8524/© 2021 Elsevier Ltd. All rights reserved.
Z. Wang et al. Bioresource Technology 343 (2022) 126099

lignocellulose, algae, food wastes, livestock and poultry manure (Deng (2020) combined an extreme learning machine (ELM) and a cuckoo
et al., 2019; Xia & Murphy, 2016), are widely distributed on the earth’s search algorithm to predict optimal P50S50 biodiesel yield, and the
surface, can offer an alternative solution for sustainable and carbon error between the predicted and experimental values was less than
neutral energy supply. For example, the annual production of terrestrial 0.26%. Phromphithak et al. (2021) used a random forest (RF) to predict
biomass (dominated by lignocellulose) is estimated 130 billion metric the cellulose enrichment factor and solid recovery of lignocellulosic
tons based on dry weight (Potter et al., 2012), which is equal to biomass pre-treated with ionic liquid solvents, which included 23 vari­
approximate 65 billion metric tons of standard coal. These biomass re­ ables. The regression coefficients (R2) were 0.94 and 0.84, respectively.
sources have the potential to be converted into fuels and value-added More specifically, when it aims at a microbial metabolic production
chemicals in bioenergy systems (Leong et al., 2020; Liao et al., 2020), process, it focuses on how to maximise the yield of the desired products,
which may play a significant role for future development to alleviate and and ML models can provide guidance on different scales, whether it is
deal with the energy crisis and environmental issues. gene-annotated strain planning to change metabolic pathways (Hanni­
Nevertheless, bioenergy systems basically involve complex, high- gan et al., 2019; Kumar et al., 2019; Oyetunde et al., 2018) or fermen­
dimensional, and multi-scale conversion processes across temporal tation operation parameter optimisation to improve production of
and spatial variation, which are the result of the interaction of many metabolites (Gopakumar et al., 2018; Zhang et al., 2020).
factors. For example, a more successful case is anaerobic digestion (AD), As a consequence, the intersection between ML and bioenergy covers
with the significant factors of raw materials (components, structure, a wide range in the last five years, including the cultivation and moni­
characteristics, pretreatment etc.), operational parameters (tempera­ toring of microalgae, the extraction and optimisation of energy mate­
ture, hydraulic retention time, volume loading, pH, stir, reactor rials, the interaction analysis of microbial communities, and the
configuration, nutrients, inhibitors etc.), and inoculum (inoculum ratio, prediction of industrial production. To our best, a comprehensive review
microbial community and activity etc.) (Feng et al., 2021). Therefore, of ML in bioenergy is still lacking. Therefore, this work reviews the
repeated trial-and-error is carried out in traditional research, and a se­ recent applications of ML in bioenergy for the first time, to provide some
ries of experience-based hypotheses or conjectures are derived by useful guidelines for bioenergy research. The review aims to:
empirical judgment, contributing to the establishment of theory-driven
(also known as model-driven or hypothesis-driven) models. With these • Analyse the adaptability and flexibility of ML in the bioenergy
models, the rationality and feasibility of a task can be predicted first, context;
thereby greatly reducing the workload. The Anaerobic Digestion Model • Assess the recent applications of ML in bioenergy and biofuel
No. 1 (ADM1) was developed to simulate the AD process (Batstone et al., conversion;
2002; Weinrich & Nelles, 2021; Zhao et al., 2019). Although ADM1 • Identify the future direction of ML in bioenergy and biofuel
performed well in laboratory-scale AD, it had limited applications on the conversion.
industrial scale by virtue of the complexity of the AD system and a lack
of detailed substrate characterization (Derbal et al., 2009; Maharaj et al., 2. Machine learning flexibility in bioenergy research
2019; Weinrich et al., 2021). Therefore, on the one hand, the con­
struction of a successful theory-driven model for a bioenergy system is a Although ML algorithms have been applied to bioenergy utilization
challenge. On the other hand, when it meets with the data collected from with increasing regularity, ML-assisted predictions are still in the initial
complex phenomena, it is difficult to develop comprehensive knowledge stage of development. The current main barriers to employing ML
based on physical principles, especially in processing the big data sets techniques for bioenergy research are the lack of sufficient knowledge
from bioenergy systems. The potential of such data can only be exploited for how the subject incorporates with ML methods to accomplish
based on high-level data analysis methods. specialized tasks. In this section, we discuss the differences between
In the 21st century, artificial intelligence (AI) has been hailed as the traditional theory-driven models and data-driven models, and analyze
fourth industrial revolution (Diez-Olivan et al., 2019). As a vital data the available data sources and their characteristics in the bioenergy
analysis technique, machine learning (ML) has gained explosive growth field.
in the past ten years, which greatly extends AI, extending its problem-
solving scope and ability. ML is a collection of advanced data analysis 2.1. Data-driven and theory-driven modelling
methods based on the principles of mathematics and probability/sta­
tistics, identifying the patterns between features in a data set and A great deal of effort has been directed towards making theoretical
allowing for predictions in the case of new inputs. ML has been suc­ predictions in order to facilitate the deployment of bioenergy technol­
cessfully applied to business, medicine (Stokes et al., 2020), biology ogy and to decrease the time and energy costs. Theory-driven (or hy­
(Kim et al., 2018), physical sciences (Mehta et al., 2019), chemistry pothesis based) models have been built to understand the natural
(Dral, 2020)) and geoscience (Reichstein et al., 2019), with new appli­ mechanisms using known natural laws, specialized knowledge and
cations areas emerging continually. The complex mechanisms of bio­ sometimes intuition (Carleo et al., 2019). The construction of a theory-
logical systems can be captured through multi-modal learning from driven model requires an adequate understanding of the behaviour of a
different views (Li et al., 2018). Advances in optics and imaging tech­ system. With the consideration of specific goals, a mathematical (or
nology have enabled ML to analyse and monitor the trace of cells (Moen empirical) model can be derived from first principles or hypotheses, and
et al., 2019), and even achieve automatic control of atomic force mi­ the parameters of the model will have physical interpretations. In
croscopes at the nanometre scale (DePristo, 2018). With regard to contrast, data-driven methods extract patterns from the given data
human tasks, ML can serve as an auxiliary tool to achieve more efficient without the need of any insight into the physical mechanisms. Theory-
or reliable outcomes. Furthermore, ML can be theoretically competent driven methods impose pre-existing knowledge of the system, pre­
for tasks involving the attribution of causality or discovering connec­ supposing that it is correct, whereas data-driven methods extract such
tions between variables, which is advantageous for solving tasks that knowledge from data using ML algorithms, but face the issue of
cannot be solved through traditional modelling or experiment, and for interpretability.
making new discoveries. The differences between theory-driven methods and data-driven
Notably, ML has attracted increasing attention in the bioenergy field. methods are listed in Table 1. Generally, theory-driven methods
Del Rio-Chanona et al. (2019b) proposed a deep learning (DL) model require enough domain knowledge along with cogent hypotheses.
based on a convolutional neural network (CNN) to optimise a photo­ Nevertheless, many theory-driven models are underdeveloped due to
bioreactor and microalgal production, which greatly reduced the time the incomplete knowledge of certain phenomena, which restricts their
spent compared with traditional simulation calculations. Mujtaba et al. accuracy and applicability. An example is ADM1, which was not

2
Z. Wang et al. Bioresource Technology 343 (2022) 126099

Table 1
Differences between theory-driven methods and data-driven methods.
Class Specialized Hypothesis Process Origin Interpretability Inductive learning Complexity Resource
knowledge characteristic ability from limited cost e
samples
a,b
Theory- Much Yes a,b Data fit model Experience, first High Strong d
Decided by the Experts and
driven principles b brain b time e
methods
a,b
Data-driven Little No a,b Model fit data Mathematics and Low c Weak d
Decided by data Data e
methods probability / statistics and algorithms
b

a
(Toyao et al., 2020)
b
(Reichstein et al., 2019)
c
(NOE et al., 2020)
d
(Lake et al., 2015)
e
(Feffer, 2017)

adopted for large-scale industrial applications (De Clercq et al., 2020). In experimental data) or use only a limited number of full numerical so­
contrast, data-driven methods require only a sufficient number of data lutions as the data set (Brockherde et al., 2017).
samples covering the full variation of phenomena (in terms of the pa­
rameters, operating conditions or evolutionary stages associated with
the problem). 2.2. Data sources
Supervised ML models take input–output pairs as the data set and use
the data set to ‘train’ the model for making predictions at new inputs, for Particular emphasis is placed on the quality and quantity of data,
either classification or regression tasks. Such models use explicitly which are paramount considerations for developing the robust and ac­
(parametric) or implicitly (non-parametric) defined functions to learn curate ML prediction models (Toyao et al., 2020). A clear understanding
relationships between variables in the data. Training is the process by of data sources is necessary for the selection of suitable approaches. Data
which the parameters of the models are estimated from the data, either in the bioenergy conversion context could be generated from laboratory
by minimising a loss function (error between predictions and data) or scale experiments (Yew et al., 2020), simulations (Del Rio-Chanona
maximising a likelihood (probability of generating the data from the et al., 2019b), and industrial operations (De Clercq et al., 2020)
assumed ML model). In parametric approaches, the parameters are those (Fig. 1). According to the types of problems, existing studies at the
appearing in the explicitly defined function, which take the form a basis intersection between bioenergy conversion and ML include tabular data
expansion with a specified basis and undetermined coefficients (the (the most common type in current papers), images (usually associated
parameters, also called weights). In non-parametric approaches the with DL approaches, and applied to the analysis of microalgae), struc­
function does not have an explicit form (e.g., a neural network or ture data (a handful of relevant reports, such as molecular descriptors
Gaussian process (GP) model) and is characterised by certain ‘hyper­ for chemical formulae (Kessler et al., 2017)), sensor signals (Cramer
parameters’ (learned during training) that control the specification of et al., 2018), and texts (Benites-Lazaro et al., 2018). Although raw data
the model, e.g., parameters appearing in the mean and covariance is available, data is usually required to be transformed to the format that
functions in a GP model. One of the main criticisms of ML approaches is the computer is able to understand. With a more appropriate
their lack of interpretability, i.e., the ability to relate the results to the input–output format after data pre-processing and bias-variance trade-
underlying physical mechanisms in order to gain further insights. The off (such as data cleaning and feature engineering), the final model
specific and often complicated equations given by theory can be difficult would have better performances.
to solve with a major expenditure of time (NOE et al., 2020), whereas Of particular note is that many phenomena (such as biological pro­
ML methods are able to bypass the direct solution altogether (with cesses) are dynamic rather than static as in most applications of ML in
physical science (Mehta et al., 2019) or molecular and materials science

Fig. 1. Application of machine learning in bioenergy.

3
Z. Wang et al. Bioresource Technology 343 (2022) 126099

(Butler et al., 2018). Data of inherent structure–property relationships similar results (R2 = 0.963) when using an ANN to forecast higher
with suitable descriptors can be used for the discovery of new materials, heating values. Further improvement in the model accuracy may depend
such as catalysts (Toyao et al., 2020) and organic photovoltaic materials on the descriptions of the biomass composition and structure.
(Sun et al., 2019), or even the antibiotic discovery (Stokes et al., 2020). Daassi-Gnaba et al. (2017) predicted wood moisture content from
In the bioenergy context, ML models may result in poor performance due reflection coefficient measurements, and the SVM models performed a
to lack of data, poor coverage of the range of operation or the presence of slightly better than ANN models, possibly due to the regularization
random noise in the data, but these limitations can be broken through mechanism of SVM. Compared with the frequent overfitting of ANN, the
with promising and advanced ML methods (Reichstein et al., 2019). SVM method was able to reduce the structured risk in the training
Table 2 assesses the different data sources for ML applications in process. The study of Cao et al. (2016) also showed that a SVM model
bioenergy technology. Whether the data is designed or recycled will had a more robust performance compared to an ANN model. Kessler
result in different levels of data quality and quantity, especially for et al. (2017) predicted the cetane number of furan biofuel additives from
biotechnology. Another way to obtain data is from published papers. compound structures (molecular descriptors), and the predicted values
These data seem to be highly correlated, but a lot of random noise will be were the average of the output of five ANN models using ensemble
generated from the different experimental equipment, researchers, and learning.
operation protocols in time-consuming experiments. The performance of
models learning from simulated data will not surpass the accuracy of
fully simulated results. The source of errors in industrial data is different 3.2. Machine learning in biofuels production and application
from that in simulated data, which are due to measurement errors and
missing data. Additionally, industrial data have more redundant fea­ 3.2.1. Biological process
tures. As a typical linear method, principal component analysis (PCA) Biofuels can be produced via biological conversion approaches at a
have better performance than discriminant analysis (LDA) in all mea­ mild environment. The principles underlying the predictions from AD
sures, as well as ML algorithms with PCA performed better results in based on operational parameters appear to be correct, but there are
high dimensional datasets, while ML algorithms without dimensionality other entrenched factors that will seriously affect the performance of the
reduction techniques performed better results in low dimensional model. In the long-term AD process, the evolution of the microbial
datasets (Reddy et al., 2020). community is elusive, bringing significant changes in microbial meta­
bolism. Even with the same operation, equipment, and environment, the
3. Machine learning in bioenergy conversion technology and final results have a certain variability, such as the microbial structure,
system which may change seasonally (Gruszka Vendruscolo et al., 2020).
As shown in Table 4, the generalization ability of models was poor
Many experiments are long-term trials and may need highly so­ when data was generated from biological process (Antwi et al., 2017;
phisticated equipment for monitoring. To avoid trial-and-error or Lesnik et al., 2020; Sydney et al., 2020; Wang et al., 2020), which was
expensive operations and summarize the inner mechanism to achieve possibly caused by the complexity of biological systems and the limi­
the purpose of predictions or optimisation, data-driven models have tations and variation in the data variables. Wang et al. (2020) predicted
been tried when theory-driven models meet with major challenges on methane production values from operational parameters, and the
account of the complexity of bioenergy conversion processes. Further­ dataset was collected from published papers. These data were generated
more, ML models can complete tasks that theory-driven models cannot from the same configuration of AD reactor. However, the limited num­
accomplish, such as image-activated cell sorting (Isozaki et al., 2019; ber of data points may reduce the accuracy of such ML models. Antwi
Nitta et al., 2018). In this section, specific examples of ML predictive et al. (2017) estimated methane yields in an upflow anaerobic sludge
models are discussed mainly in bioproduction processes and bioenergy blanket reactor, and the performance of an ANN model (R2 = 0.9793)
conversion and applications. was better than a nonlinear regression model (R2 = 0.9262). Meanwhile,
the data kept stability at some stages, because the experiment was
continuous. The results showed the potential of simulating wastewater
3.1. Machine learning in physicochemical characteristics of bioresources treatment systems through ML.
Hosseinzadeh et al. (2020) combined ANN and FL to model elec­
In fact, ML can not only make good predictions, but also allows for trochemical performance in a microbial electrolysis cell, and the adap­
attribution of phenomena. Bioresources have various correlated physi­ tive network based on the fuzzy inference system performed better than
cochemical attributes. Some important attributes are difficult to mea­ the ANN, which was consistent with the report of Rego et al. (2018).
sure, and ML can capture the correlation between these attributes and Calero et al. (2018) also indicated the potential of neural fuzzy models
make predictions, such as the photo-property estimation (Yew et al., for breakthrough curve prediction. Lesnik et al. (2020) evaluated the
2020). functional resistance and resilience of microbial fuel cells. Though the
Among various scientific tasks that use ML methods (see Table 3), performance of the model was limited by the quantity of data, it pro­
some predictive results based on the correlated attributes. Xing et al. vided an insight into the biological conversion process based on genomic
(2019b) predicted higher heating values of biomass from proximate or data. In contrast, Gruszka Vendruscolo et al. (2020) investigated the
ultimate analysis, and the R2 (>0.90) of ML models were better than importance of each function group of a microbial community to the
empirical correlations (less than 0.70). Uzun et al. (2017) reported electrochemical performance of a microbial electrolysis cell using PCA

Table 2
Comparison of different data sources in bioenergy.
Data source Size Accessibility Reliability Significance of preprocessing Complexity of robust models Accuracy of model Applied range

Experimental data – – ++ high + ++ –


Simulated data ++ ++ ++ low ++ ++ +
Collected from published papers + ++ + very high + + +
Industrial data + ++ + very high ++ + ++

-: Weak performance for different data sources;


+: Medium performance for different data sources;
++: High performance for different data sources.

4
Z. Wang et al. Bioresource Technology 343 (2022) 126099

Table 3
Machine learning for attribute connection of biomass resources.
Scientific task Methods Number/Type of Samples Accuracy Remark Reference
features

Estimating higher heating values of ANN, SVM, 7/Continuous 495 and + Slightly better performance of RF (Xing et al., 2019b)
biomass RF 190 Optimisation of hyper-parameter
was vital
Predicting wood moisture content SVM, ANN ≤8/Continuous 32 ++ Inaccessible directed variables (Daassi-Gnaba et al.,
Finding out associated factors 2017)
Predicting cetane number for furanic ANN 15/Molecular 291 + Reflecting the ideology of ensemble (Kessler et al., 2017)
biofuel additives descriptors learning
Predicting compost maturity CNN, ResNet 300 × 300 pixels ~30000 ++ Just for classification (Xue et al., 2019)
Data augmentation could be useful
Estimating biomass major chemical RF 3/Continuous 144 + Better than empirical correlation (Xing et al., 2019a)
constituents

ANN: Artificial Neural Network; CNN: Convolutional Neural Network; RF: Random Forest; SVM: Support Vector Machine.
-: Low accuracy; +: Medium accuracy; ++: High accuracy.

and RF algorithms. consuming, and the action of microorganisms or enzymes is highly


Besides the problem of insufficient data caused by experimental nonlinear and complex, making the performance of ML in biological
difficulties, there are challenges in obtaining data in a valid domain. conversion weaker than in thermochemical conversion (as shown in
More efforts should be focused on adaptability, complementarity, and Table 4). While ML led to ideal results in thermochemical optimisation
constraints between them, which would improve the performance of ML processes, overshadowing the predictions for biological conversion.
models. Last but not least, more rigorous experimental design would be
conducive to the generation of high-quality data and coverage of the 3.3. Machine learning in microalgae cultivation
valid domain.
The cultivation and bioproduction of microalgae are affected by
3.2.2. Thermochemical process many factors such as light, carbon source and nutrient composition, pH,
The thermochemical conversion rate is much faster than biological mixing conditions, etc. More subtle control and insight into microalgae
conversion, though it requires a significant devotion of energy. In the production can be acquired through ML. While traditional methods
case of the same configuration conditions and no interference in the depend on extensive of characterization analysis and expert knowledge,
thermochemical reaction process, the numerical result is relatively sta­ ML methods provide a more effective approach to accommodate the
ble and only based on the reaction conditions. As a consequence, the intricacies of the metabolism and production of microalgae. Various ML
trained ML models can achieve high-accuracy predictions and guide the approaches have been applied to predict, control, and design a variety of
design of new plans (Table 4). The combination of ML models and scientific tasks (Table 5). Despite some drawbacks or limitations, these
optimisation algorithms had shown great potential. Mujtaba et al. studies provided notable approaches for applying ML to the respective
(2020) designed a P50S50 with improved cold flow and lower friction topics and have opened up new ideas.
(reduced by 12.37% compared to B10 commercial diesel) when they One obvious application of ML is to predict the occurrence of results
combined an extreme learning machine (ELM) model with a cuckoo based on important factors (so called causality), such as the evaluation
search algorithm. Li et al. (2020) used waste peanut shells to produce of the toxicity of micrometre scale plastics on marine microalgae (Guo
methyl levulinate and achieved efficient catalytic conversion when the et al., 2020) and the combination of variables contributing to high algal
combined an ANN model with a genetic algorithm (GA). Both of them biomass productivity and lipid content (Cosgun et al., 2021). Yew et al.
found the performance of such hybrid models was superior to a response (2020) performed an interesting study to find correlations between
surface methodology (RSM). Analogously, Patil et al. (2017) found that images of microalgae cultivated in molasses wastewater and experi­
an ANN performed better than RSM and SVM in the release of arginine mental outcomes, predicting the changes in biomass concentration, ni­
deiminase through ultrasonic disruption. trogen concentration, and pH. However, the model was not entirely
Zhu et al. (2019) modeled the yield and carbon contents of biochar successful, which may be due to inability of the KNN method to identify
from published data, which indicated that structural information led to a image features well. As an alternative to extracting features from im­
higher R2 (by 5%) compared to elemental information. On the other ages, Reimann et al. (2020) applied bioimage informatics (FastFluoScan
hand, Xing et al. (2019a) used RF to reveal a strong correlation (R2 > software) to obtain the features of microalgae (such as size) for the
0.93) between chemical constituents and elemental analysis. Ullah et al. identification of dead and living microalgae cells or their distribution.
(2021) used RF with the help of genetic algorithm-based feature selec­ Such image feature extraction methods are quite different from ML
tion to predict bio-oil yield (R2 > 0.98). These authors suggested approaches.
biomass characteristics and pyrolysis conditions were important In comparison to advanced image analysis methods, the major
features. reason why a different approach has to be taken is the scarcity of
Engines shows different levels of adaptability to biofuels under datasets. Biological science is benefitting developments in ML, such as
different operating conditions, and it is possible to achieve the best DL techniques (Camacho et al., 2018). DL has enabled the application of
engine performance by tuning the ratio of biofuel blends. ML can computer vision to process problems, and progress in optics has enabled
automatically search for the optimal solution to help overcome the dynamic measurements of single molecules or even living systems
shortcomings of traditional trial-and-error based calibration. The opti­ (Moen et al., 2019). Recent advances, such as image classification
mised engine tends to achieve a higher biofuel utilization efficiency as (Isozaki et al., 2019; Nitta et al., 2018), image segmentation (Bai et al.,
well as lower exhaust emission parameters (Silitonga et al., 2018). An 2017), object tracking (Newby et al., 2018), and augmented microscopy
air–fuel ratio controller based on a ELM model outperformed the engine (Ounkomol et al., 2018) may provide innovative approaches.
built-in controller in the work of Wong and Wong (2018). Moreover, a Simulations can be used to generate large amounts of data if the
kernel-based ELM model could further improve prediction accuracy time–cost is reasonable. Established ML models greatly reduce the cost
(Wong & Wong, 2017; Zhao et al., 2021). of time-consuming simulations (Del Rio-Chanona et al., 2019b), as well
As a consequence, biological conversion approaches are time- as time-consuming experiments (Lopez-Exposito et al., 2019), especially

5
Z. Wang et al. Bioresource Technology 343 (2022) 126099

Table 4
Machine learning for biofuels production and application.
Analytical task Scientific task Methods Number/Type Samples Accuracy Remark Reference
of features

Regression Predicting methane production PCA, ANN, DT, 8/Continuous 3 datasets + Effective PCA (Xiao et al., 2021)
from in-situ biogas upgrading SVM, MLR, GPR, Microbes might become
Ensemble important variable
Predicting pyrolytic gas yield and RF, SVM 11/Continuous 120 and 194 + Biomass characteristics (Tang et al., 2021)
compositions contributed more
Other ML models can be
tried
Predicting bio-oil yield DT, MLR, RF, SVM 11/Continuous 263 ++ Features selection based (Ullah et al., 2021)
on a genetic algorithm
Better performance of RF
Evaluating electrochemical ANN, FL 3/Continuous 24 ++ ANN combined with FL (Hosseinzadeh et al.,
performance performed better 2020)
Predicting methane yield from KNN, RF, LR, SVM 8/Continuous 15 – Insufficient data points (Wang et al., 2020)
operational parameters
Predicting the yield and content RF 12/Continuous 128 and 245 + Structural information (Zhu et al., 2019)
of biochar performed better
More regression
algorithms (ANN) could
be tried
Understanding the pyrolytic SVR, PCA 2/Continuous NA ++ Few inputs and strong (Shahbeig & Nosrati,
behaviour pertinence 2020)
Predicting the production of ANN 5/Continuous NA ++ Only one continuous (Sydney et al., 2020)
biohydrogen experiment
Application limitation
Predicting the engine ELM 2/Continuous 60 ++ Reliable kernel-based (Silitonga et al.,
performance. ELM 2018)
Fitting delignification process ANN, FL 2/Continuous 12 ++ No test set (Rego et al., 2018)
Predicting product distribution ANN, SVM 9/Continuous 175 and 71 + Some search conditions (Chen et al., 2018)
and bio-oil heating value of remained consistent
pyrolysis Data pre-processing
could be attempted
Estimating methane yield in an ANN 8/Continuous 112 ++ Continuous experiments (Antwi et al., 2017)
upflow anaerobic sludge blanket
Classification Analysing the interaction of PCA, RF 22/Discrete 22 NA No validation data point (Gruszka
microbial community Reasonable hypothesis Vendruscolo et al.,
2020)
Predicting resistance and PLR, RF, KNN, ANN 1810/ 17 – Positive idea (Lesnik et al., 2020)
resilience of microbial fuel cell Continuous Results were at the mercy
of data size
State Improving cold flow and lubricity ELM 4/Continuous 30 ++ Better performance of (Mujtaba et al.,
optimisation of biodiesel ELM than RSM 2020)
Optimising methyl levulinate ANN 4/Continuous 29 ++ ANN could capture more (Li et al., 2020)
yield complex process
More accurate of ANN
than RSM
Enhancing the production of FL 4/Continuous 46 ++ Randomness and (Rezk et al., 2019)
hydrogen and syngas fuzziness of process
parameters
Optimal engine calibration for ELM 5/Continuous 130 ++ Aiming to adjust fuel (Wong & Wong,
dual-fuel engine mixture ratio 2017)
dynamically
Optimising alkaline-catalysed ELM, ANN 5/Continuous 46 ++ Kernel-based ELM model (Kusumo et al.,
transesterification was more reliable 2017)
Optimising ethanol production ANN 3/Continuous 20 ++ Data imbalanced (Althuri et al., 2017)

ANN: Artificial Neural Network; DT: Decision Tree; ELM: Extreme Learning Machine; FL: Fuzzy Logic; GPR: Gaussian Process Regression; KNN: K-Nearest Neighbour;
LR: Logistic Regression; MLR: Multiple Linear Regression; NA: Not Available; PCA: Principal Component Analysis; PLR: Partial Least-squares; RF: Random Forest; RSM:
Response Surface Methodology; SVM: Support Vector Machine; SVR: Support Vector Regression.
-: Low accuracy; +: Medium accuracy; ++: High accuracy.

in applications such as optimisation, sensitivity analysis and control. optimisation).


The approach consists of replacing the full simulations with an ML Optimising bioproduction progress to gain effective metabolite
model (called a surrogate model) trained on a selected number of data production is a good example of process optimisation using ML. Two
points (inputs and outputs) obtained from the full model. It should be recent studies involve how to optimise microalgae lutein production
emphasized that the errors in simulated results due to uncertain as­ with innovative ideas. In the first study, Del Rio-Chanona et al. (2019a)
sumptions and the numerical approximations of the equations, domain achieved a>40% increase of total lutein production, and the results of an
and initial-boundary conditions (collectively called modelling error) ANN were better than those of a kinetic model. In the second study,
will also be integrated into the predictions of ML models. Although such Zhang et al. (2020) combined physical knowledge (a known dynamic
errors can be accounted for, it leads to more complex ML approaches model) with an automatic structure identification framework that con­
with more hyperparameters and higher predictive variances. ML sur­ sisted of a data-driven approach to identify the mismatch between the
rogate models can replace full simulations as objective functions in state dynamic model and data. This approach adopted a simple polynomial
optimisation (Nayak et al., 2018) and process optimisation (online model rather than an ANN in order to avoid overfitting.

6
Z. Wang et al. Bioresource Technology 343 (2022) 126099

Table 5
Machine learning for microalgae cultivation and bioproduction.
Analytical task Scientific task Methods Number/Type of Samples Accuracy Remark Reference
features

Regression Photo-to-property estimation for KNN Image was converted 2100 – Plenty of experiments (Yew et al.,
microalgae growth 198 × 146 RGB array Noticing the quality of 2020)
pictures
Predicting the average fractal RF 200/Continuous 3240 – Virtual flocs (Lopez-Exposito
dimension of floc suspensions Synthetic virtual chord et al., 2019)
length distribution
Correlation between growth and FL 3/Continuous 3 datasets ++ Multiobjective tasks (Bhola et al.,
metabolism of Chlorella sp. 2017)
Classification Determining critical factors for DT, AR 17/Discrete and 4670 ++ Low accuracy in regression (Cosgun et al.,
algal biomass production continuous task 2021)
Some variables were
discretized up
Classifying pellet spectra from SVM, DT, RF, 261/Continuous 90 + Different algorithms were (Mancini et al.,
near-infrared spectrophotometer MLP, NB, KNN tried out 2020)
Dimension reduction
technology might be useful
Classifying the living/dead NB, KNN, QDA, 14 Features were NA + Not binary but actually (Reimann et al.,
microalgae population ANN, DT, AB, RF extracted from image gradual viability status 2020)
Extracting functional DT 4/Discrete 32 – Appling few relevant (Zitnik et al.,
dependencies in blackwater features 2019)
treatment
State Improving cell biomass and ANN 4/Continuous 30 ++ Multi-objective (Saini et al.,
optimisation phycobiliproteins yield optimisation 2021)
Strict experimental design
and high-precision model
Optimising microalgae CNN 7/Continuous 360,000 ++ Different outputs showed (Del Rio-
production different deviations Chanona et al.,
Learning from a physical 2019b)
model
Accuracy subjected to
physical model
Online optimisation for fed-batch ANN 4/Continuous 3 datasets + Data was pre-processed by (Zhang et al.,
microalgae lutein production kinetic model 2019)
Artificial datasets were
created
No predicted process in
optimisation result
Dynamic optimisation of the ANN 5/Continuous 5 datasets + Data-driven model (Del Rio-
efficiency for open-loop control outperformed the kinetic Chanona et al.,
model 2019a)
Developing a hybrid model for PR NA/Continuous 3 datasets + Combining with physical (Zhang et al.,
microalgae lutein production model 2020)
Simple ML model was more
suitable
Enhancing microalgae ANN 4/Continuous 30 ++ Combining with genetic (Nayak et al.,
production algorithm 2018)

AB: Adaptive Boosting; ANN: Artificial Neural Network; AR: Association Rule; CNN: Convolutional Neural Network; DT: Decision Tree; FL: Fuzzy Logic; KNN: K-
Nearest Neighbour; MLP: Multilayer Perceptron; NA, Not Available; NB: Naive Bayes; PR: Polynomial Regression; QDA: Quadratic Discriminant Analysis; RF: Random
Forest; SVM: Support Vector Machine.
-: Low accuracy; +: Medium accuracy; ++: High accuracy.

3.4. Full-scale applications of machine learning assessment was used to evaluate the feasibility of hydrothermal treat­
ment aimed at different raw materials (Cheng et al., 2020). Vondra et al.
By concentrating on specific experimental situations, ML methods (2019) developed a techno-economic model based on ML to evaluate the
have been used successfully to predict a restricted number of variables. economics of the evaporation system in biogas plants and the authors
There is significant room for more ambitious applications of ML models provided a heuristic method for stakeholders and decision makers.
to full-scale processes. Occasionally, a trained ML model may not satisfactorily achieve the
Full-scale studies can provide holistic and comprehensive informa­ desired goal. In such cases, an examination of both the data and the
tion, insight and guidance. Prediction can be employed for the timely algorithm is required to identify the source of the problem. The choice of
adjustment of related industrial operations, helping in detection, opti­ ML algorithm, the quality and volume of data, as well as expert
misation and automatic control. In this regard (see Table 6), De Clercq knowledge together contribute to the final performance of the model.
et al. (2019) designed a graphical user interface to assist the enhance­ Simple models such as linear regression are often more accurate
ment of biogas production by trained ML models from the data of two (Camacho et al., 2018). High model complexity (involving more
biogas facilities. The authors predicted daily biomethane production in a hyperparameters) leads to overfitting if insufficient data is available to
subsequent research (De Clercq et al., 2020). Based on a combination of learn the hyperparameters accurately (such models are said to have a
an ANN and a particle swarm optimisation (PSO) algorithm, Pereira low bias and a high variance in the bias-variance trade-off).
et al. (2020) achieved an increase of 10% in industrial bioethanol pro­
duction and concentration.
Furthermore, the fusion of ML methods and traditional life cycle

7
Z. Wang et al. Bioresource Technology 343 (2022) 126099

Table 6
Machine learning for industrial forecast and system assessment.
Object Scientific task Methods Number/Type of features Samples Accuracy Remark Reference

Industrial Predicting daily biomethane RF, EN, 17/Discrete and ≤1398 + No need for detailed substrate (De Clercq
forecast production XGBoost continuous characterization et al., 2020)
Predicting biogas output SVM, RF, LR, 14 and 20/Discrete and NA + Noise data (De Clercq
KNN, XGBoost continuous et al., 2019)
Optimising industrial bioethanol ANN 7/Continuous 3400 ++ Seven vital inputs were select by (Pereira et al.,
production cross-correlation analysis 2020)
Predicting the concentration of RF 10/Continuous 280 – Suitable for algae bloom event (Asnaghi et al.,
toxic algae prediction 2017)
System Analysing the techno-economics MC, ANN, DT 23/Continuous and ≥20000 ++ A heuristic method (Vondra et al.,
assessment of evaporation technology combination of variables ANN selected vital variables 2019)
Predicting yields and MLR, DT, RF 6/Continuous 800 + Fusion of ML and conventional (Cheng et al.,
characteristics of hydrothermal LCA 2020)
treatment Other nonlinear regression can
be tried

ANN: Artificial Neural Network; DT: Decision Tree; EN: Elastic Net; KNN: K-Nearest Neighbour; LCA: Life-Cycle Assessment; LR: Logistic Regression; MC: Monte Carlo;
MLR: Multiple Linear Regression; NA: Not Available; RF: Random Forest; SVM: Support Vector Machine; XGBoost: Extreme Gradient Boosting.
-: Low accuracy; +: Medium accuracy; ++: High accuracy.

4. Future perspectives with advanced machine learning process is also an important topic.
Notwithstanding that DL can accomplish more complex tasks across
With the continuous development in ML and the explosion in interest temporal and spatial scales, it usually requires an enormous amount of
and applications that it has generated, there are currently a plethora of data, which is frequently unavailable or difficult (expensive and time
approaches. The differences in learners can be exploited for ensemble consuming) to obtain. More cooperation to obtain and complement
learning. The difference in views leads to the development of multi-view large datasets is required between researchers in the community,
learning or multi-modal learning. The intersection of ML and bioenergy creating a database for testing new algorithms. In addition, attention
still involves some grand challenges. Overcoming these challenges can should be paid to include more variables in datasets, in order to com­
boost the progress of bioenergy technology. In this section, we make plete more tasks. When the variables directly related to the problem at
some recommendations. hand are difficult to measure, alternative variables that are easily
available and implicitly relevant to the outputs could be sought.

4.1. Start-of-the-art deep learning methods


4.2. Smaller datasets
Although classical ML methods have been applied to problems in
bioenergy conversion, there is room to expand the scope of the models ML models usually perform better with increasing data, but in the
and improve their performance. The success of DL in the complex game actual bioenergy context, the cost of acquiring data tends to be high in
of Go is underlines its potential to solve complex scientific problems. DL terms of time and finances. In the case of limited datasets, it will be a
has had notable success in other fields with remarkable results when wise choice to improve the quality of the data. Data pre-processing is a
facing complex tasks. DL is based on the expanding the size (model ca­ promising, but it is more effective to control the overall quality of the
pacity) of traditional 3-layer neural networks by incorporating more data from the source. More systematic experimental design and data
layers and more complex interactions between nodes in layers. Different collection operation specification should be followed. Data augmenta­
types of layers (beyond fully connected layers) can be introduced, tion can artificially expand datasets based on the principle of known
notably convolutional and recurrent layers, together with additional invariances, so that the trained model can learn these invariances with
pooling operations for pruning the network size. Such layers can be improved generalization. It needs to be emphasized that unreasonable
interpreted in the classical node-activation framework but result in data augmentation will direct to potentially poor predictions. Other
greater flexibility in terms of the tasks that can be performed (regression, solutions to the latter are the adoption of expert knowledge to design the
detection, classification) and the types of data that can be processed. For algorithm structure (such as the addition of physical constraints),
example, a convolutional layer (defined by kernels or windows) can whereby the search space size is reduced, or more advanced ML methods
extract features to find a more meaningful or compact interpretation of such as imitation learning (Bonardi et al., 2020). Moreover, based on a
inputs. Recurrent layer networks are powerful for tasks involving Bayesian criterion, a report stated that the realization of human-level
sequential data, such as natural language processing, or time series performance becomes possible in sparse data (Lake et al., 2015).
analysis. There is a vast number of possibilities for the architecture of a The selection of learners tends to be a prominent problem when
DL network and one of the greatest challenges is in finding a systematic dealing with small datasets. ELM can avoid the problems caused by
method for their design. Currently, it is largely done through trial-and- gradient decent, and has a short learning time with excellent general­
error as well as prior knowledge. Transfer learning allows for networks ization performance. Notably, ELM models showed higher prediction
to be pre-trained on similar data before being applied to a specific task, accuracy for biofuels applications (Kusumo et al., 2017; Mujtaba et al.,
and can greatly enhance the performance of DL networks. 2020; Silitonga et al., 2018; Wong & Wong, 2017). The adjustment of
The diversity of DL networks makes them suitable for many complex algorithms, comparison between different models, and construction of
bioenergy conversion processes compared to classical ML methods. ensemble learning models may help to find a robust model suitable for
Many DL algorithms have been applied to characterize morphological some scientific tasks. More importantly, the context of the data and the
changes or object trace at cellular level (Isozaki et al., 2020; Moen et al., relationships between the variables need to be adequately recognized.
2019; Nitta et al., 2018). The method feeds imaging data to a DL model
when using biotechnology to intelligently identify microalgae cells or 4.3. Integration with theory-driven model
produce biofuel, as in image segmentation for dead or living microalgae
cell analysis and the augmented microscopy for more distinct image Theory-driven models are simpler to interpret but involve fixed as­
information. The modelling of time series in bioenergy conversion sumptions are structures related to the physical process. ML models can

8
Z. Wang et al. Bioresource Technology 343 (2022) 126099

implicitly discover hidden patterns that have eluded consideration. The interests or personal relationships that could have appeared to influence
two approaches are generally considered to have no intersection, but in the work reported in this paper.
fact they can complement each other. For example, some theory-driven
models have parameters that cannot be inferred from first principles. Acknowledgments
Recent work has predicted and optimised the photo-production process
by using a hybrid theory and data driven model, learning the parameters This work was supported by the National Natural Science Foundation
of a kinetic model (Zhang et al., 2020). Interestingly, a simple dynamic of China (Nos. 52022015, 51876016), the Fundamental Research Funds
model can effectively handle the noise in datasets (Zhang et al., 2019). for Central Universities (2020CDJQY-A054), the State Key Program of
In particular, for an integrated model formed by many theory-driven National Natural Science of China (No. 51836001), and the Creative
sub-models, the ML models can replace the weak theoretical sub- Research Groups of the National Natural Science Foundation of China
models. Also, the deviation between the ML model and the theory- (52021004).
driven model can help correct the inadequate theory of the theory-
driven approach. In systems biology, the identifications of key gene References
transcripts and fluxes by ML tools can furnish critical mechanistic details
and improve genome-scale metabolic models to fit the metabolism of Althuri, A., Gujjala, L.K.S., Banerjee, R., 2017. Partially consolidated bioprocessing of
cyanobacteria (Vijayakumar et al., 2020). mixed lignocellulosic feedstocks for ethanol production. Bioresour. Technol. 245,
530–539.
Antwi, P., Li, J., Boadi, P.O., Meng, J., Shi, E., et al., 2017. Estimation of biogas and
4.4. Machine learning for multi-view data methane yields in an UASB treating potato starch processing wastewater with
backpropagation artificial neural network. Bioresour. Technol. 228, 106–115.
Asnaghi, V., Pecorino, D., Ottaviani, E., Pedroncini, A., Bertolotto, R.M., et al., 2017.
In bioinformatics, the integration of different omics data offers a A novel application of an adaptable modeling approach to the management of toxic
more detailed understanding and better predictions for complex phe­ microalgal bloom events in coastal areas. Harmful Algae 63, 184–192.
nomena (Joshi et al., 2020; Kristensen et al., 2014; Ritchie et al., 2015). Bai, M., Urtasun, R., Ieee. 2017. Deep watershed transform for instance segmentation. in:
30th IEEE Conference on Computer Vision and Pattern Recognition, pp. 2858-2866.
On the one hand, the performance of the data-driven model is limited by Batstone, D.J., Keller, J., Angelidaki, I., Kalyuzhnyi, S.V., Pavlostathis, S.G., et al., 2002.
the data, and a model learned from a single view data has a certain The IWA Anaerobic Digestion Model No 1 (ADM1). Water Sci. Technol. 45 (10),
degree of one-sidedness. On the other hand, the diversification of 65–73.
Benites-Lazaro, L.L., Giatti, L., Giarolla, A., 2018. Sustainability and governance of
characterization methods allows the same observation object to have
sugarcane ethanol companies in Brazil: Topic modeling analysis of CSR reporting.
different characteristic morphological representations, and the collected J. Clean Prod. 197, 583–591.
data comes from multiple sources and may have different types. Multi- Bhola, Virthie, Swalaha, Feroz Mahomed, Nasr, Mahmoud, Bux, Faizal, 2017. Fuzzy
view learning can realize the complementation and integration of in­ intelligence for investigating the correlation between growth performance and
metabolic yields of a Chlorella sp exposed to various flue gas schemes. Bioresour.
formation from different views and levels, avoid the limitations of a Technol. 243, 1078–1086.
single-view model, and obtain more comprehensive information. An ML Bonardi, A., James, S., Davison, A.J., 2020. Learning one-shot imitation from humans
model that combines gene expression with flux data provided a more without humans. IEEE Robot. Autom. Lett. 5 (2), 3533–3539.
BP. 2021. BP statistical review of world energy [accessed July 26, 2021]. Available from
complete picture of cyanobacteria metabolism than single view data https://www.bp.com/en/global/corporate/aboutp/energyconomics/statistical-
(Vijayakumar et al., 2020). In the context of bioenergy, the expansion of eviewf-orldnergy/statistical-eviewownloads.html.
task scale and the improvement of accuracy are also likely to provide Brockherde, F., Vogt, L., Li, L., Tuckerman, M.E., Burke, K., et al., 2017. Bypassing the
Kohn-Sham equations with machine learning. Nat. Commun. 8, 872.
ground-breaking progress via the contribution of multi-view learning, Butler, K.T., Davies, D.W., Cartwright, H., Isayev, O., Walsh, A., 2018. Machine learning
such as the combination of imaging data and sensor data to achieve for molecular and materials science. Nature 559 (7715), 547–555.
target monitoring (cultivation of microalgae, biological and thermo­ Calero, M., Iáñez-Rodríguez, I., Pérez, A., Martín-Lara, M.A., Blázquez, G., 2018. Neural
fuzzy modelization of copper removal from water by biosorption in fixed-bed
chemical conversion of biomass, etc.). columns using olive stone and pinion shell. Bioresour. Technol. 252, 100–109.
Camacho, D.M., Collins, K.M., Powers, R.K., Costello, J.C., Collins, J.J., 2018. Next-
5. Conclusion generation machine learning for biological networks. Cell 173 (7), 1581–1592.
Cao, H., Xin, Y., Yuan, Q., 2016. Prediction of biochar yield from cattle manure pyrolysis
via least squares support vector machine intelligent approach. Bioresour. Technol.
ML has the potential to predict and optimise the highly complicated 202, 158–164.
nonlinear bioenergy systems, which are usually difficult to build models Carleo, G., Cirac, I., Cranmer, K., Daudet, L., Schuld, M., et al., 2019. Machine learning
based on experience or theory for accurate predictions. However, the and the physical sciences. Rev. Mod. Phys. 91 (4), 045002.
Chen, X., Zhang, H., Song, Y., Xiao, R., 2018. Prediction of product distribution and bio-
prediction accuracy and the coverage of the data valid domain of bio­ oil heating value of biomass fast pyrolysis. Chem. Eng. Process. 130, 36–42.
energy systems by using ML were still not satisfactory. More research Cheng, F., Porter, M.D., Colosi, L.M., 2020. Is hydrothermal treatment coupled with
work should be carried out to create high quality databases to test and carbon capture and storage an energy-producing negative emissions technology?
Energy Conv. Manag. 203, 112252.
apply new algorithms, to promote the application of advanced ML al­ Cosgun, A., Gunay, M.E., Yildirim, R., 2021. Exploring the critical factors of algal
gorithms (such as deep learning and multi-view learning), and to inte­ biomass and lipid production for renewable fuel production by machine learning.
grate theory-driven models into ML models for a better interpretability Renew. Energy 163, 1299–1317.
Cramer, J.A., Hammond, M.H., Loegel, T.N., Morris, R.E., 2018. Evolving window factor
and predictability. analysis-multivariate curve resolution with automated library matching for
enhanced peak deconvolution in gas chromatography-mass spectrometry fuel data.
CRediT authorship contribution statement J. Chromatogr. A 1581, 125–134.
Daassi-Gnaba, H., Oussar, Y., Merlan, M., Ditchi, T., Geron, E., et al., 2017. Wood
moisture content prediction using feature selection techniques and a kernel method.
Zhengxin Wang: Methodology, Writing – original draft, Writing – Neurocomputing 237, 79–91.
review & editing. Xinggan Peng: Writing – original draft, Writing – De Clercq, Djavan, Jalota, Devansh, Shang, Ruoxi, Ni, Kunyi, Zhang, Zhuxin,
Khan, Areeb, Wen, Zongguo, Caicedo, Luis, Yuan, Kai, 2019. Machine learning
review & editing. Ao Xia: Conceptualization, Writing – review & edit­
powered software for accurate prediction of biogas production: A case study on
ing, Project administration. : . Akeel A. Shah: Writing – review & industrial-scale Chinese production data. J. Clean Prod. 218, 390–399.
editing. Yun Huang: Writing – review & editing. Xianqing Zhu: De Clercq, D., Wen, Z., Fei, F., Caicedo, L., Yuan, K., et al., 2020. Interpretable machine
learning for predicting biomethane production in industrial-scale anaerobic co-
Methodology. Xun Zhu: Project administration. Qiang Liao:
digestion. Sci. Total Environ. 712, 134574.
Supervision. Del Rio-Chanona, E.A., Ahmed, N.R., Wagner, J., Lu, Y., Zhang, D., et al., 2019a.
Comparison of physics-based and data-driven modelling techniques for dynamic
Declaration of Competing Interest optimisation of fed-batch bioprocesses. Biotechnol. Bioeng. 116 (11), 2971–2982.
Del Rio-Chanona, E.A., Wagner, J.L., Ali, H., Fiorelli, F., Zhang, D., et al., 2019b. Deep
learning-based surrogate modeling and optimization for microalgal biofuel
The authors declare that they have no known competing financial production and photobioreactor design. Aiche J. 65 (3), 915–923.

9
Z. Wang et al. Bioresource Technology 343 (2022) 126099

Deng, Z., Xia, A., Liao, Q., Zhu, X., Huang, Y., et al., 2019. Laccase pretreatment of wheat from palm-sesame oil via response surface methodology and extreme learning
straw: effects of the physicochemical characteristics and the kinetics of enzymatic machine - Cuckoo search. Renew. Energy 158, 202–214.
hydrolysis. Biotechnol. Biofuels 12, 159. Nayak, Manoranjan, Dhanarajan, Gunaseelan, Dineshkumar, Ramalingam,
DePristo, M., 2018. Deep learning for biology. Nature 555 (7697), 547. Sen, Ramkrishna, 2018. Artificial intelligence driven process optimization for
Derbal, K., Bencheikh-lehocine, M., Cecchi, F., Meniai, A.-H., Pavan, P., 2009. cleaner production of biomass with co-valorization of wastewater and flue gas in an
Application of the IWA ADM1 model to simulate anaerobic co-digestion of organic algal biorefinery. J. Clean Prod. 201, 1092–1100.
waste with waste activated sludge in mesophilic condition. Bioresour. Technol. 100 Newby, J.M., Schaefer, A.M., Lee, P.T., Forest, M.G., Lai, S.K., 2018. Convolutional
(4), 1539–1543. neural networks automate detection for tracking of submicron-scale particles in 2D
Diez-Olivan, Alberto, Del Ser, Javier, Galar, Diego, Sierra, Basilio, 2019. Data fusion and and 3D. Proc. Natl. Acad. Sci. U. S. A. 115 (36), 9026–9031.
machine learning for industrial prognosis: Trends and perspectives towards Industry Nitta, N., Sugimura, T., Isozaki, A., Mikami, H., Hiraki, K., et al., 2018. Intelligent image-
4.0. Inf. Fusion 50, 92–111. activated cell sorting. Cell 175 (1), 266–276.
Dral, P.O., 2020. Quantum chemistry in the age of machine learning. J. Phys. Chem. Lett. Noe, F., Tkatchenko, A., Mueller, K.-R., Clementi, C. 2020. Machine learning for
11 (6), 2336–2347. molecular simulation. in: Annual Review of Physical Chemistry, (Eds.) M.A.
Feffer, S., 2017. Model-driven and data driven methods for working sensors and signals. Johnson, T.J. Martinez, Vol. 71, pp. 361-390.
Available from. https://www.iottechexpo.com/2017/07/iot/model-driven-vs Ounkomol, C., Seshamani, S., Maleckar, M.M., Collman, F., Johnson, G.R., 2018. Label-
-data-driven-methods-for-working-with-sensors-and-signals/. free prediction of three-dimensional fluorescence images from transmitted-light
Feng, D., Guo, X., LIn, R., Xia, A., Huang, Y., et al. 2021. How can ethanol enhance direct microscopy. Nat. Methods 15 (11), 917–920.
interspecies electron transfer in anaerobic digestion? Biotechnol. Adv., In Press: Oyetunde, T., Bao, F.S., Chen, J.-W., Martin, H.G., Tang, Y.J., 2018. Leveraging
https://doi.org/10.1016/j.biotechadv.2021.107812. knowledge engineering and machine learning for microbial bio-manufacturing.
Gopakumar, V., Tiwari, S., Rahman, I., 2018. A deep learning based data driven soft Biotechnol. Adv. 36 (4), 1308–1315.
sensor for bioprocesses. Biochem. Eng. J. 136, 28–39. Patil, M.D., Dev, M.J., Tangadpalliwar, S., Patel, G., Garg, P., et al., 2017. Ultrasonic
Gruszka Vendruscolo, E.C., Mesa, D., Rissi, D.V., Meyer, B.H., Pedrosa, F.d.O., et al. disruption of Pseudomonas putida for the release of arginine deiminase: Kinetics and
2020. Microbial communities network analysis of anaerobic reactors fed with bovine predictive models. Bioresour. Technol. 233, 74–83.
and swine slurry. Sci. Total Environ., 742, 140314. Pereira, R.D., Badino, A.C., Cruz, A.J.G., 2020. Framework based on artificial
Guo, Y., Ma, W., Li, J., Liu, W., Qi, P., et al., 2020. Effects of microplastics on growth, intelligence to increase industrial bioethanol production. Energy Fuels 34 (4),
phenanthrene stress, and lipid accumulation in a diatom. Phaeodactylum 4670–4677.
tricornutum. Environ. Pollut. 257, 113628. Phromphithak, S., Onsree, T., Tippayawong, N., 2021. Machine learning prediction of
Hannigan, G.D., Prihoda, D., Palicka, A., Soukup, J., Klempir, O., et al., 2019. A deep cellulose-rich materials from biomass pretreatment with ionic liquid solvents.
learning genome-mining strategy for biosynthetic gene cluster prediction. Nucleic Bioresour. Technol. 323, 124642.
Acids Res 47 (18), e110. Potter, C., Klooster, S., Genovese, V., 2012. Net primary production of terrestrial
Hosseinzadeh, A., Zhou, J.L., Altaee, A., Baziar, M., Li, D., 2020. Effective modelling of ecosystems from 2000 to 2009. Clim. Change 115 (2), 365–378.
hydrogen and energy recovery in microbial electrolysis cell by artificial neural Reddy, G.T., Reddy, M.P.K., Lakshmanna, K., Kaluri, R., Rajput, D.S., et al., 2020.
network and adaptive network-based fuzzy inference system. Bioresour. Technol. Analysis of dimensionality reduction techniques on big data. IEEE Access 8,
316, 123967. 54776–54788.
Isozaki, A., Mikami, H., Hiramatsu, K., Sakuma, S., Kasai, Y., et al., 2019. A practical Rego, A.S.C., Valim, I.C., Vieira, A.A.S., Vilani, C., Santos, B.F., 2018. Optimization of
guide to intelligent image-activated cell sorting. Nat. Protoc. 14 (8), 2370–2415. sugarcane bagasse pretreatment using alkaline hydrogen peroxide through ANN and
Isozaki, A., Mikami, H., Tezuka, H., Matsumura, H., Huang, K., et al., 2020. Intelligent ANFIS modelling. Bioresour. Technol. 267, 634–641.
image-activated cell sorting 2.0. Lab Chip 20 (13), 2263–2273. Reichstein, M., Camps-Valls, G., Stevens, B., Jung, M., Denzler, J., et al., 2019. Deep
Joshi, A., Rienks, M., Theofilatos, K., Mayr, M., 2020. Systems biology in cardiovascular learning and process understanding for data-driven Earth system science. Nature 566
disease: a multiomics approach. Nat. Rev. Cardiol. 313–330. (7743), 195–204.
Kessler, T., Sacia, E.R., Bell, A.T., Mack, J.H., 2017. Artificial neural network based Reimann, R., Zeng, B., Jakopec, M., Burdukiewicz, M., Petrick, I., et al., 2020.
predictions of cetane number for furanic biofuel additives. Fuel 206, 171–179. Classification of dead and living microalgae Chlorella vulgaris by bioimage
Kim, H.K., Min, S., Song, M., Jung, S., Choi, J.W., et al., 2018. Deep learning improves informatics and machine learning. Algal Res. 48, 101908.
prediction of CRISPR-Cpf1 guide RNA activity. Nat. Biotechnol. 36 (3), 239–241. Rezk, Hegazy, Nassef, Ahmed M., Inayat, Abrar, Sayed, Enas Taha,
Kristensen, V.N., Lingjoerde, O.C., Russnes, H.G., Vollan, H.K.M., Frigessi, A., et al., Shahbaz, Muhammad, Olabi, A.G., 2019. Improving the environmental impact of
2014. Principles and methods of integrative genomic analyses in cancer. Nat. Rev. palm kernel shell through maximizing its production of hydrogen and syngas using
Cancer 14 (5), 299–313. advanced artificial intelligence. Sci. Total Environ. 658, 1150–1160.
Kumar, S., Dangi, A.K., Shukla, P., Baishya, D., Khare, S.K., 2019. Thermozymes: Ritchie, M.D., Holzinger, E.R., Li, R., Pendergrass, S.A., Kim, D., 2015. Methods of
Adaptive strategies and tools for their biotechnological applications. Bioresour. integrating data to uncover genotype-phenotype interactions. Nat. Rev. Genet. 16
Technol. 278, 372–382. (2), 85–97.
Kusumo, F., Silitonga, A.S., Masjuki, H.H., Ong, Hwai Chyuan, Siswantoro, J., Mahlia, T. Saini, D.K., Rai, A., Devi, A., Pabbi, S., Chhabra, D., et al., 2021. A multi-objective hybrid
M.I., 2017. Optimization of transesterification process for Ceiba pentandra oil: A machine learning approach-based optimization for enhanced biomass and bioactive
comparative study between kernel-based extreme learning machine and artificial phycobiliproteins production in Nostoc sp. CCC-403. Bioresour. Technol. 329,
neural networks. Energy 134, 24–34. 124908.
Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B., 2015. Human-level concept learning Shahbeig, Hossein, Nosrati, Mohsen, 2020. Pyrolysis of biological wastes for bioenergy
through probabilistic program induction. Science 350 (6266), 1332–1338. production: Thermo-kinetic studies with machine-learning method and Py-GC/MS
Leong, Y.K., Chew, K.W., Chen, W.-H., Chang, J.-S., Show, P.L., 2020. Reuniting the analysis. Fuel 269, 117238. https://doi.org/10.1016/j.fuel.2020.117238.
Biogeochemistry of Algae for a Low-Carbon Circular Bioeconomy. Trends Plant Sci. Silitonga, A.S., Masjuki, H.H., Ong, H.C., Sebayang, A.H., Dharma, S., et al., 2018.
26 (7), 729–740. Evaluation of the engine performance and exhaust emissions of biodiesel-bioethanol-
Lesnik, K.L., Cai, W., Liu, H., 2020. Microbial community predicts functional stability of diesel blends using kernel-based extreme learning machine. Energy 159, 1075–1087.
microbial fuel cells. Environ. Sci. Technol. 54 (1), 427–436. Stokes, J.M., Yang, K., Swanson, K., Jin, W., Cubillos-Ruiz, A., et al., 2020. A deep
Li, P., Du, Z., Chang, C., Zhao, S., Xu, G., et al., 2020. Efficient catalytic conversion of learning approach to antibiotic discovery. Cell 181 (2), 475–483.
waste peanut shells into liquid biofuel: an artificial intelligence approach. Energy Sun, W., Zheng, Y., Yang, K., Zhang, Q., Shah, A.A., et al., 2019. Machine learning-
Fuels 34 (2), 1791–1801. assisted molecular design and efficiency prediction for high-performance organic
Li, Y., Wu, F.-X., Ngom, A., 2018. A review on machine learning principles for multi-view photovoltaic materials. Sci. Adv. 5 (11), eaay4275.
biological data integration. Brief. Bioinform. 19 (2), 325–340. Sydney, E.B., Duarte, E.R., Martinez Burgos, W.J., de Carvalho, J.C., Larroche, C., et al.,
Liao, Y., Koelewijn, S.-F., Van den Bossche, G., Van Aelst, J., Van den Bosch, S., et al., 2020. Development of short chain fatty acid-based artificial neuron network tools
2020. A sustainable wood biorefinery for low-carbon footprint chemicals applied to biohydrogen production. Int. J. Hydrog. Energy 45 (8), 5175–5181.
production. Science 367 (6484), 1385–1390. Tang, Q., Chen, Y., Yang, H., Liu, M., Xiao, H., et al., 2021. Machine learning prediction
Lopez-Exposito, P., Negro, C., Blanco, A., 2019. Direct estimation of microalgal flocs of pyrolytic gas yield and compositions with feature reduction methods: Effects of
fractal dimension through laser reflectance and machine learning. Algal Res. 37, pyrolysis conditions and biomass characteristics. Bioresour. Technol. 339, 125581.
240–247. Toyao, T., Maeno, Z., Takakusagi, S., Kamachi, T., Takigawa, I., et al., 2020. Machine
Maharaj, B.C., Mattei, M.R., Frunzo, L., van Hullebusch, E.D., Esposito, G., 2019. ADM1 learning for catalysis informatics: recent applications and prospects. ACS Catal. 10
based mathematical model of trace element complexation in anaerobic digestion (3), 2260–2297.
processes. Bioresour. Technol. 276, 253–259. Ullah, Z., Khan, M., Naqvi, S.R., Farooq, W., Yang, H., et al., 2021. A comparative study
Mancini, M., Mircoli, A., Potena, D., Diamantini, C., Duca, D., et al., 2020. Prediction of of machine learning methods for bio-oil yield prediction-A genetic algorithm-based
pellet quality through machine learning techniques and near-infrared spectroscopy. features selection. Bioresour. Technol. 335, 125292.
Comput. Ind. Eng. 147, 106566. Uzun, H., Yildiz, Z., Goldfarb, J.L., Ceylan, S., 2017. Improved prediction of higher
Mehta, P., Bukov, M., Wang, C.-H., Day, A.G.R., Richardson, C., et al., 2019. A high-bias, heating value of biomass using an artificial neural network model based on
low-variance introduction to machine Learning for physicists. Phys. Rep.-Rev. Sec. proximate analysis. Bioresour. Technol. 234, 122–130.
Phys. Lett. 810, 1–124. Vijayakumar, S., Rahman, P.K.S.M., Angione, C., 2020. A hybrid flux balance analysis
Moen, E., Bannon, D., Kudo, T., Graf, W., Covert, M., et al., 2019. Deep learning for and machine learning pipeline elucidates metabolic adaptation in cyanobacteria.
cellular image analysis. Nat. Methods 16 (12), 1233–1246. iScience 23 (12), 101818.
Mujtaba, M.A., Masjuki, H.H., Kalam, M.A., Ong, Hwai Chyuan, Gul, M., Farooq, M., Vondra, Marek, Touš, Michal, Teng, Sin Yong, 2019. Digestate evaporation treatment in
Soudagar, Manzoore Elahi M., Ahmed, Waqar, Harith, M.H., Yusoff, M.N.A.M., 2020. biogas plants: A techno-economic assessment by Monte Carlo, neural networks and
Ultrasound-assisted process optimization and tribological characteristics of biodiesel

10
Z. Wang et al. Bioresource Technology 343 (2022) 126099

decision trees. J. Clean Prod. 238, 117870. https://doi.org/10.1016/j. proximate and ultimate analysis with machine learning approaches. Energy 188,
jclepro.2019.117870. 116077. https://doi.org/10.1016/j.energy.2019.116077.
Wang, L., Long, F., Liao, W., Liu, H., 2020. Prediction of anaerobic digestion performance Xue, W., Hu, X., Wei, Z., Mei, X., Chen, X., et al., 2019. A fast and easy method for
and identification of critical operational parameters using machine learning predicting agricultural waste compost maturity by image-based deep learning.
algorithms. Bioresour. Technol. 298, 122495. Bioresour. Technol. 290, 121761.
Weinrich, S., Mauky, E., Schmidt, T., Krebs, C., Liebetrau, J., et al. 2021. Systematic Yew, Guo Yong, Puah, Boon Keat, Chew, Kit Wayne, Teng, Sin Yong, Show, Pau Loke,
simplification of the Anaerobic Digestion Model No. 1 (ADM1) - Laboratory Nguyen, The Hong Phong, 2020. Chlorella vulgaris FSP-E cultivation in waste
experiments and model application. Bioresour. Technol., 333, 125104. molasses: Photo-to-property estimation by artificial intelligence. Chem. Eng. J. 402,
Weinrich, S., Nelles, M. 2021. Systematic simplification of the Anaerobic Digestion 126230. https://doi.org/10.1016/j.cej.2020.126230.
Model No. 1 (ADM1) - Model development and stoichiometric analysis. Bioresour. Zhang, D., Del Rio-Chanona, E.A., Petsagkourakis, P., Wagner, J., 2019. Hybrid physics-
Technol., 333, 125124. based and data-driven modeling for bioprocess online simulation and optimization.
Wong, K.I., Wong, P.K., 2018. Adaptive air-fuel ratio control of dual-injection engines Biotechnol. Bioeng. 116 (11), 2919–2930.
under biofuel blends using extreme learning machine. Energy Conv. Manag. 165, Zhang, D., Savage, T.R., Cho, B.A., 2020. Combining model structure identification and
66–75. hybrid modelling for photo-production process predictive simulation and
Wong, K.I., Wong, P.K., 2017. Optimal calibration of variable biofuel blend dual- optimisation. Biotechnol. Bioeng. 117 (11), 3356–3367.
injection engines using sparse Bayesian extreme learning machine and metaheuristic Zhao, Xiaofei, Li, Lei, Wu, Di, Xiao, Taihui, Ma, Yao, Peng, Xuya, 2019. Modified
optimization. Energy Conv. Manag. 148, 1170–1178. Anaerobic Digestion Model No. 1 for modeling methane production from food waste
Xia, A., Murphy, J.D., 2016. Microalgal Cultivation in Treating Liquid Digestate from in batch and semi-continuous anaerobic digestions. Bioresour. Technol. 271,
Biogas Systems. Trends Biotechnol. 34 (4), 264–275. 109–117.
Xiao, J., Liu, C., Ju, B., Xu, H., Sun, D., et al., 2021. Estimation of in-situ biogas Zhao, Y., Li, Y., Fan, D., Song, J., Yang, F., 2021. Application of kernel extreme learning
upgrading in microbial electrolysis cells via direct electron transfer: Two-stage machine and Kriging model in prediction of heavy metals removal by biochar.
machine learning modeling based on a NARX-BP hybrid neural network. Bioresour. Bioresour. Technol. 329, 124876.
Technol. 330, 124965. Zhu, X., Li, Y., Wang, X., 2019. Machine learning prediction of biochar yield and carbon
Xing, J., Luo, K., Wang, H., Fan, J., 2019a. Estimating biomass major chemical contents in biochar based on biomass characteristics and pyrolysis conditions.
constituents from ultimate analysis using a random forest model. Bioresour. Technol. Bioresour. Technol. 288, 121527.
288, 121541. Zitnik, M., Sunta, U., Torkar, K.G., Klemencic, A.K., Atanasova, N., et al., 2019. The study
Xing, Jiangkuan, Luo, Kun, Wang, Haiou, Gao, Zhengwei, Fan, Jianren, 2019b. of interactions and removal efficiency of Escherichia coli in raw blackwater treated
A comprehensive study on estimating higher heating value of biomass from by microalgae Chlorella vulgaris. J. Clean Prod. 238, 117865.

11

You might also like