1 s2.0 S0960852419307710 Main

Bioresource Technology 288 (2019) 121541
Contents lists available at ScienceDirect
Bioresource Technology
journal homepage: www.elsevier.com/locate/biortech
Estimating biomass major chemical constituents from ultimate analysis T

using a random forest model
⁎
Jiangkuan Xing, Kun Luo, Haiou Wang, Jianren Fan
State Key Laboratory of Clean Energy Utilization, Zhejiang University, Hangzhou 310027, China
A R T I C LE I N FO A B S T R A C T
Keywords: Chemical constituents are important properties for utilization of biomass, and experimental approaches are
Biomass always expensive and time-consuming to determinate those properties. Here, a novel random forest (RF) model
Chemical constituents is developed for accurately predicting biomass major chemical constituents from the much-easier available ul-
Random forest timate analysis, and compared with the previous correlation as well as the experimental data. Two databases are
Ultimate analysis
constructed for training and application of the RF model from available literature. The training results show that
the determination coefficients (R2 ) of the RF model predictions are 0.954, 0.933 and 0.968 for cellulose,
hemicellulose and lignin, respectively. The application results show that the present RF model can give accurate
predictions on chemical constituents for various biomasses with MAPE < 20% , and R2 are 0.862, 0.904 and 0.962
for predictions of cellulose, hemicellulose and lignin, respectively. While the previous correlation only works for
a narrow range used to develop the correlation, and gives unrealistic negative predictions with MAPE > 500% for
outside samples.
1. Introduction always obtained by establishing partial least squares regression models

between the measured biomass chemical compositions and the absor-
Biomass, accounting for 13% of the world energy structure, has bance under the characteristic wavelengths in NIR results (Li et al.,
drawn growing interests with the increasing demands on renewable and 2015). In the wet chemical analysis method, the fraction of each
environmental-friendly energy sources (World Bioenergy Association, component is usually carried out by extracting a certain component
2018). Biomass can be directly combusted for heat or converted into with a series of acidic or alkaline solvents, and calculating each com-
biofuels and valuable hydrocarbons via thermochemical or biochemical ponent fraction by weighting (Sharma, 1996). In the TGA based
routes (Xing et al., 2019). The utilization of biomass needs an extensive method, chemical compositions are obtained by deconvolution of TG-
study on its physical and chemistry properties. Generally, biomass has DTG curves. This method has been proved to be efficient to measure
three major chemical constituents, namely: lignin, cellulose and hemicellulose and hemicellulose fractions, but make large deviations for
cellulose (Wang et al., 2017). Many previous biomass thermochemical determination of lignin fraction (Carrier et al., 2011). Those experi-
conversion models were developed based on the three chemical com- mental approaches are always expensive and time-consuming, and the
ponents (Sheng and Azevedo, 2002; Wang et al., 2019; Chen et al., experimental devices might not be available for everyone, especially for
2016; Cai et al., 2013). The differences in chemical constituents of engineers (Yang et al., 2016). Thus, it is necessary to develop some
biomass directly influence their thermochemical behaviours, and economic and convenient models to determine the chemical con-
therefore the efficiency of biomass energy conversion system (Toscan stituents of biomass without expensive and time-consuming experi-
et al., 2017). Thus, it is of great significance to accurately determine ments. Unlike the complex procedures in chemical compositions mea-
mass fractions of these three major chemical constituents. surement, the ultimate and proximate analysis data are much easier to
The major chemical constituents of biomass are usually determined be obtained (Sheng and Azevedo, 2002), and there are many available
through experimental approaches, such as near-infrared (NIR) (Jin databases, such as Phyllis and BIODAT constructed by Energy Research
et al., 2017; Liu et al., 2010), wet chemical analysis (Sharma, 1996; Centre of The Netherlands Organisation for Applied Scientific Research
Sluiter et al., 2010) and Thermogravimetric Analysis (TGA) (Carrier (Reisinger et al., 2012), and BIOBIB constructed by Vienna University
et al., 2011; Saldarriaga et al., 2015; Cozzani et al., 1997). In NIR of Technology (ECN.TNO, 2012). In which ultimate and proximate
method, the chemical components can not be directly measured and are analysis of wide types of biomass feedstock are available, while the
⁎
Corresponding author.
E-mail address: fanjr@zju.edu.cn (J. Fan).
https://doi.org/10.1016/j.biortech.2019.121541
Received 4 April 2019; Received in revised form 20 May 2019; Accepted 21 May 2019
Available online 25 May 2019
0960-8524/ © 2019 Elsevier Ltd. All rights reserved.
J. Xing, et al. Bioresource Technology 288 (2019) 121541
chemical compositions data are often absent. To the best of our 70

knowledge, there is very little research on developing accurate and
efficient models for determinating biomass chemical compositions from 60
Element fractions,H/C and O/C

the much-easier available ultimate and proximate analysis data. Sheng
and Azevedo (2002) proposed a nonlinear correlation from hydro-
carbon ratio (H/C), oxygen-carbon ratio (O/C) and volatile matter (VM) 50
to predict the mass fractions of cellulose, lignin and hemicellulose for
the biomass chemical percolation devolatilization (Bio-CPD) model. 40
They claimed the determination coefficients were 0.9 and 0.81 for
lignin and cellulose, respectively. However, the samples used for de-
30
veloping correlation are in a narrow range, and the correlation per-
formance is acceptable but leaves room for improvement, thus more
accurate and robust models are desirable with more advanced ap- 20
proaches and wider sample range.
With the development of artificial intelligence, machine learning 10
approaches, such as random forest (RF), artificial neural network
(ANN), and support vector machine (SVM), are developed and proved
0
to handle complex nonlinear issues well (Ho, 1995; Nair et al., 2016).
XC 5XH XO 30O/C 30H/C
Those methods have been successfully employed in biomass thermo-
chemical fields, and much better performances can be obtained com- Fig. 1. Data distribution of the constructed database, including the fractions of
pared with the conventional linear and nonlinear correlations (Xing carbon, hydrogen and oxygen, oxygen-carbon ratio and hydrogen-carbon ratio.
et al., 2019; Uzun et al., 2017; Vani et al., 2015). Thus the objective of It is worth noting here that the hydrogen fraction, oxygen-carbon ratio and
the present study is to take the first attempt to estimate biomass che- hydrogen-carbon ratio are timed values of 5, 30 and 30, respectively, for a
mical constituents from a much-easier available ultimate analysis data clearer observation.
with an advanced machine learning approach, random forest. The
training and application databases are constructed for training and that biomass is mainly composed of carbon, hydrogen and oxygen
application of the random forest model from experimental data in elements, thus chemical constituents certainly have tight relations with
available literature, respectively. Performance of the proposed RF these three elements. Carbon fraction usually has the maximum per-
model is compared with that of the empirical correlation proposed by centage and is used to indicate the fuel rank here (Chelgani et al.,
Sheng and Azevedo (2002) as well as the experimental data. 2010). The second reason is that carbon–hydrogen ratio and carbon-
The rest of this paper is organized as the following. The data pre- oxygen ratio are often used as the indicators to represent functional
paration, details about the random forest method, and the evaluation group properties of solid fuels, which have been proved to be efficient
indicators are introduced in Section 2. Results and discussions, in- parameters for representing the chemical structures of solid fuels
cluding the hyper-parameter optimization, training and application (Sheng and Azevedo, 2002; Solomon et al., 1998; Jupudi et al., 2009).
performances of the random forest model, are presented in Section 3. Before the training process, those input and output parameters are
The final section provides some conclusions. normalized for a better performance with a min-max scaler as expressed
below (Despange and Massart, 1998),
2. Materials and methods
xi, exp − x imin
, exp
x i, nor =
2.1. Data preparation x imax min
, exp − x i, exp (1)
In the present study, biomass chemical composition database, in- ith

where x i, nor is the normalized input or output parameter, and x i, exp is
max
cluding 144 samples, is constructed from the available literature. It is the actual value of the ith input or output parameter. x imin
, exp and x i, exp are
noted that we have tried our best to collect available data, and we will the maximum and minimum values of the ith actual input or output
follow up the appropriate literature and update the database in the parameter, respectively. After normalization, the input signal with 0/1
future study. Those samples are divided into training and application value means that this sample has the minimum/maximum actual value.
types with the proportions of 70% and 30%, respectively, as previous It is worth noting that for model application, all input and output
machine learning studies (Xing et al., 2019; Sunphorka et al., 2017). It parameters need to undergo the same normalization and inverse-nor-
is worth noting that proximate analysis data for most samples in the malization treatments as expressed in Eqs. (1) and (2), respectively.
database are not available from the literature, while the ultimate ana-
lysis are available for all samples, thus the next modelling process is yi, pred = yimin max min
, exp + yi, RF × (yi, exp − yi, exp ) (2)
based on ultimate analysis. To compare with the correlation proposed
where yi, RF is the output predicted with the RF model, and yi, pred is the
by Sheng and Azevedo (2002), the 44 application samples are randomly
selected from those samples whose volatile matter fraction is accessible, final prediction we care about. yimin max
, exp and yi, exp are the minimum and
and the rest 100 samples are used as the training samples. In the da- maximum values of the actual targets in the training samples, respec-
tabase, the mass fractions of lignin, hemicellulose and cellulose are tively.
within the range of 0–56.5%, 0–77.8% and 6.7%-92%, respectively.
The carbon, hydrogen and oxygen fractions are within the range of 2.2. Random forest
33.4–62%, 2.78–7.5% and 26.8–56.8%, respectively, and the hydrogen-
carbon ratio and oxygen-carbon ratio are within the range of Random forest is an ensemble machine learning approach, which
0.3387–1.1900 and 0.6224–1.9567, respectively (as shown in Fig. 1). gives prediction as the majority of the modes and the average predic-
Here, the carbon fraction, hydrogen-carbon ratio and oxygen-carbon tion of decision trees in the forest for clarification and regression pro-
ratio are selected as the inputs of the random forest model, and the blems, respectively. Fig. 2 shows the schematic topological of the
outputs are the biomass major chemical constituents. The carbon random forest. Starting with N samples with M features, a bootstrap
fraction, carbon-hydrogen ratio and carbon-oxygen ratio are chosen as sampling method is employed to randomly generate n sub-samples, and
the input parameters for the following two reasons. The first reason is then each sub-sample is randomly divided into in-bag (IB) and out-of-
2
Fig. 2. Schematic topological structure of the random forest model.
bag data (OOB), respectively. The IB data are split into two types based N
1 yi, pred − yi, exp
on different selected features, and the split process is repeated until MAPE =
N
∑ yi, exp
× 100%
i=1 (6)
there are no data to split. The OOB data are not involved into the
training process but used to determine the optimal number of the de-
where ȳexp is the average value for all the training samples, and N is the
cision trees in the forest via a trial-and-error test. The normalized mean
number of total training samples.
sum error (NMSE), as expressed as Eqs. (3) and (4), of the OOB data is
employed to choose the optimal tree number.
Ntree 3. Results and discussions
∑ (yiOOB OOB 2
, pred − yi, exp )
Ntree i=1
MSEOOB = 3.1. Hyper-parameter optimization
Ntree (3)
Ntree Ntree In the random forest model, here all three features are selected for
Ntree MSEOOB − MSEOOB , min
NMSEOOB = Ntree Ntree
each decision trees, and the parameter needs to be determined is the
MSEOOB , max − MSEOOB , min (4) number of the trees in the forest. In the present study, this parameter,
where Ntree
NMSEOOB and Ntree
are the normalized and calculated mean
MSEOOB Ntree , is determined by a trial-and-error test. Fig. 3 shows the NMSE
sum errors for the OOB data when tree number is Ntree , respectively. values of the OOB data for the three major chemical constituents with
yiOOB OOB different number of decision trees. It can be found that the NMSE first
, pred and yi, exp are the predicted and real targets for the OOB data,
Ntree Ntree decreases sharply and then stabilizes with the increasing of the tree
respectively. MSEOOB , min and MSEOOB, max are the maximum and
number for both the IB and OOB data as previous studies found (Xing
minimum values of MSE for the OOB data for the tested range of the
et al., 2019; Genuer et al., 2017). This is because that when tree number
number of decision trees, respectively. The test results will be presented
is small, the complex nonlinear correlations can not be well char-
in Section 3. Finally, the random forest gives predictions as the majority
acterized via limited randomly-generated bootstrap sample datasets,
of the modes or the average value for the classification or regression
which results in the underfitting of the trained RF model. There are
problems, respectively. The details about this approach can be found in
some critical values of the tree number, and when the tree number is
this paper (Ho, 1995).
above this critical parameter, the testing and training performances of
the RF model do not increases with the number of trees. This indicates
2.3. Evaluation indicators that when exceeding the critical tree number, increasing the tree
number does not significantly improve the model accuracy but increase
To quantitatively evaluate the performances of the present proposed the computational time. Thus, the test with the lowest NMSE and
RF model and the previous nonlinear correlation, two common eva- smallest tree number provides the optimal number of decision trees in
luation indicators, including the determination coefficients (R2 ) and the random forest for a good compromise between accuracy and com-
mean absolute percentage error (MAPE), are introduced and can be putational time. The optimized values of tree number for the three
expressed as Eq. (5) and (6), respectively. major chemical constituents are 68, 51 and 66 for the following RF
N modelling of cellulose, hemicellulose and lignin fractions, respectively,
∑ (yi, pred − yi, exp )2 and the training performance of the RF model will be introduced in next
i=1 section.
R2 =1− N
∑ (yi, exp − y¯exp )2
i=1 (5)
3
100
1.0
Training data
0.8 Predicted data
OOB data Best fit
0.6
Predicted cellulose fraction (%)

80 Relative error:-20%
0.4 (a)
Relative error:+20%
Nomralized mean sum error
0.2
0.0 60
1.0
0.8
40
0.6
0.4 (b)
0.2
20
0.0
(a)
1.0
0
0.8 0 20 40 60 80 100
0.6 Measured cellulose fraction (%)
0.4 (c) 80
0.2
Predicted data
Predicted hemicellulose fraction (%)

0.0
Best fit
0 50 100 150 200 250 300 Relative error:-20%
Tree number 60
Relative error:+20%
Fig. 3. Hyper-parameter optimization results of the random forest model for (a)
cellulose, (b) hemicellulose and (c) lignin. It is worth noting that the black dash
lines represent the optimal tree number.
40
3.2. Training performances
This section presents the training performances of the random forest 20

model with the above determined optimal hyper-parameters for the
three major chemical constituents. Fig. 4 shows the comparisons of the (b)
three major chemical constituents predicted with the random forest
model and measured through experimental approaches. It is worth 0
noting that the solid red line in each sub-graph represents the best 0 20 40 60 80
performance (R2 = 1), and the dash lines represent the data distribution Measured hemicellulose fraction (%)
whose relative error is 20%. It can be found that for almost all training 60
data expect for some extreme low values, the relative errors between RF
predictions and the experimental data are within 20%, and the de- Predicted data
termination coefficients (R2 ) and mean absolute percentage errors Best fit
Predicted lignin fraction (%)
(MAPE), which are defined as Eqs. (5) and (6) respectively, are 0.954 Relative error:-20%
and 7.595%, 0.933 and 7.754%, and 0.968 and 6.428%, respectively. Relative error:+20%
The determination coefficients are obviously improved compared with 40
previous correlation proposed by Sheng and Azevedo (2002), which
indicates that the present RF model can better characterize the complex
nonlinear relations between ultimate analysis and chemical con-
stituents compared with the previous correlation. It is worth noting that
the high errors in low value region result from the following two rea- 20
sons. The first one is that there are limited samples around the low
value region, thus the nonlinear correlation can not be accurately
characterized in this region. The second one is that for low values, a (c)
small deviation would result in a larger relative error compared with
those of high values. Above all, the present RF model gives accurate
predictions on the chemical constituents for various biomasses, and the 0
0 20 40 60
determination coefficients have been much improved compared with
Measured lignin fraction (%)
those of the previous correlation.
Fig. 4. Training performances of the random forest models for predictions of (a)
cellulose fraction, (b) hemicellulose fraction and (c) lignin fraction.
3.3. Application performance
available in the application database, a direct comparison between the
To more comprehensively validate the performance of the devel- present RF model and previous correlation proposed by Sheng and
oped RF model, the RF model is used to predict the chemical con- Azevedo (2002), which are expressed as the following equations, is
stituents for the application database, in which all samples are outside achievable.
of the training database. Since mass fraction of volatile matter is
4
70
Cellulose fractions (%)

35
0
-35
(a)
-500
60
Relative error (%)
20
0
-20
-200
Correlation from Sheng and Azevedo (2002)
Present RF model
Experiment
-1600
0 5 10 15 20 25 30 35 40 45
Biomass samples number
Hemicellulose fractions (%)
60
30
(b)
-120
60
Relative error (%)
20
0
-20
-100

Present RF model
Experiment
-600
0 5 10 15 20 25 30 35 40 45
630
Lignin fractions (%)

100 Present RF model
Experiment
30
(c)
-50
3900
Relative error (%)
100
60
20
0
-20
-100
-500
0 5 10 15 20 25 30 35 40 45
Fig. 5. Comparisons of the chemical constituents predicted using correlation proposed in Sheng and Azevedo (2002) and the present RF model with the experimental
data for the application database: (a) cellulose, (b) hemicellulose and (c) lignin.
5
Xcel fraction, O/ C or H / C are outside of the data range used to develop the
correlation as seen in Fig. 6. This also indicates that the previous cor-
= −1019.07 + 293.810(O/ C ) − 187.639(O/ C )2 + 65.1426
relation has a narrow application scope, and slightly outside the data
(H / C ) − 19.3025(H / C )2 + 21.7448(VM ) − 0.132123(VM )2 (7) range would bring an obvious deviation as seen in the previous study
(Estiati et al., 2019). While the present RF model can well reproduce
Xlig the experimental data with relative error less than 20% for all samples
= 612.099 + 195.366(O/ C ) − 156.535(O/ C )2 + 511.357(H / C ) in the application database, and the model performance is much better
than that of the correlation even for biomass samples No.32 to No.44.
− 177.025(H / C )2 − 24.3224(VM ) + 0.145306(VM )2 (8)
This can be attributed to the following two reasons. The first one is that
where VM is the volatile matter fraction in weight percent daf. Xcel , Xlig the present training samples are distributed in a wider range, such as
and Xhem are the mass fractions of cellulose, lignin and hemicellulose, the hydrogen-carbon ratio (0.6224–1.9567) and oxygen-carbon ratio
respectively, and the hemicellulose fraction is calculated through (0.3387–1.1900), and all validation samples are within the present
Xhem = VM − Xcel − Xlig . Sheng and Azevedo (2002) claimed that this training database. Thus the RF model could give good predictions based
correlation was developed from samples with O/C from 0.56 to 0.83, on the well-learned complex nonlinear relations from the training da-
H/C from 1.26 to 1.69, and VM from 73% to 86%. tabase. The second reason is the strong ability of random forest to
Fig. 5 shows the direct comparisons of the chemical constituents handle nonlinear problems. Randomly selecting samples and features
predicted with the present RF model and the previous correlation by and predicting by averaging all tree predictions make it a powerful tool
Sheng and Azevedo (2002) as well as the experimental data. It is worth to overcome the overfitting possibility, thus the RF model has better
noting the green dash lines represent the data distribution whose re- robustness and accuracy compared with the previous correlation.
lative error is 20%, and the Y axis is interrupted for a clear observation
due to the extreme values predicted by the correlation of Sheng and 4. Conclusions
Azevedo (2002). To more comprehensively explain coincidences and
discrepancies of the predictions, we compare the data distribution of A novel RF model is developed for predicting biomass chemical
the validation samples with those of the present training database and constituents from ultimate analysis, and compared with the previous
the previous database used by Sheng and Azevedo (2002). Fig. 6 shows correlation and the experimental data. The training and application
the comparison results. It is worth noting that the dash lines with ar- databases are constructed from experimental data in available litera-
rowheads denote the data range of samples in the training database, ture. The training results show the R2 of the RF model predictions are
and the blue balls represent samples outside the training database. 0.954, 0.933 and 0.968 for cellulose, hemicellulose and lignin, re-
Biomass samples No.32 to No.44 are some of the biomass samples used spectively, which is much improved compared with the previous cor-
in the study of Sheng and Azevedo (2002). It can be found that most relation. The application results show the RF model gives accurate
samples in the validation database are outside the previous database predictions for various biomasses with MAPE < 20%. While the pre-
used by Sheng and Azevedo (2002), while all samples are inside the vious correlation shows a narrow application scope, and even gives
present training database. The previous correlation shows acceptable unrealistic negative predictions with MAPE > 500%.
performances of cellulose and lignin fractions predictions for biomass
samples No.32 to No.44, but obviously errors for prediction of hemi- Acknowledgement
cellulose fraction can still be found for those samples. For other samples
outside their training database, some unrealistic negative values can be The authors are grateful for the support from the National Natural
predicted for those three major chemical constituents, such as biomass and Science Foundation (Grant No: 91741203) and National Key
sample Nos. 2, 3, 6 and 8, and the relative errors are even larger than Research and Development Program of China (Grant:
500% as seen in the bottom figure of each sub-graph in Fig. 5. Those 2017YFB0601805). JX especially thanks to Miss Yuehan Xu for her
deviations can be attributed to the fact that for the volatile matter constant support and accompany during his doctoral degree. JX also
90
60
85
55
80
50
VM (%)
C (%)
45 75
40 70
1 .2
35
65
1. 0 0. 9
30
60
6
0.8 0.8
0.
8
8
0.
0.
0.7
0
/C
0.6
0
1.
1.
O
2
/C
1.
H/
1.
H/C
O
0.6
4
0 .4 C
1.
4
6
1.
1.
0.5
8
0.2
1.
1.
0
2.
8
1.
Fig. 6. Data distribution of the validation samples compared with the present training database (left) and the previous database used by Sheng and Azevedo (Sheng
and Azevedo, 2002) (right).
6
thanks to Mr. Shiling Yang for his helpful discussions. neural network based modeling to evaluate methane yield from biogas in a labora-
tory-scale anaerobic bioreactor. Bioresour. Technol. 217, 90–99.
Reisinger, K., Haslinger, C., Herger, M., 2012. BIOBIB - a database for biofuels. Available
Appendix A. Supplementary data at: http://cdmaster2.vt.tuwien.ac.at/biobib/ (accessed: 12 May 2019).
Saldarriaga, J.F., Aguado, R., Pablos, A., 2015. Fast characterization of biomass fuels by
Supplementary data associated with this article can be found, in the thermogravimetric analysis (TGA). Fuel 140, 744–751.
Sharma, H.S.S., 1996. Compositional analysis of neutral detergent, acid detergent, lignin
online version, athttps://doi.org/10.1016/j.biortech.2019.121541. and humus fractions of mushroom compost. Thermochim. Acta 285, 211–220.
Sheng, C.D., Azevedo, J.L.T., 2002. Modelling biomass devolatilization using the che-
References mical percolation devolatilization model for the main components. Proc. Combust.
Inst. 29, 407–414.
Sluiter, J.B., Ruiz, R.O., Scarlata, C.J., Sluiter, A.D., Templeton, D.W., 2010.
Cai, J.M., Xu, W.W., Liu, R.H., 2013. Sensitivity analysis of three-parallel-DAEM-reaction Compositional analysis of lignocellulosic feedstocks. 1. Review and description of
model for describing rice straw pyrolysis. Bioresour. Technol. 132, 423–426. methods. J. Agric. Food. Chem. 58, 9043–9053.
Carrier, M., Loppinet-Serani, A., Denux, D., Lasnier, J.M., Ham-Pichavant, F., Cansell, F., Solomon, P.R., Hamblen, D.G., Carangelo, R.M., Serio, M.A., Deshpande, G.V., 1998.
Aymonier, C., 2011. Thermogravimetric analysis as a new method to determine the General model of coal devolatilization. Energy Fuel 2, 405–422.
lignocellulosic composition of biomass. Biomass. Bioenergy 35, 298–307. Sunphorka, S., Chalermsinsuwan, B., Piumsomboon, P., 2017. Artificial neural network
Chelgani, S.C., Mesroghli, S., Hower, J.C., 2010. Simultaneous prediction of coal rank model for the prediction of kinetic parameters of biomass pyrolysis from its con-
parameters based on ultimate analysis using regression and artificial neural network. stituents. Fuel 193, 142–158.
Int. J. Coal Geol. 83, 31–34. Toscan, A., Morais, A.R.C., Paixao, S.M., Alves, L., Andreaus, J., Camassola, M., Dillon,
Chen, T.J., Zhang, J.Z., Wu, J.H., 2016. Kinetic and energy production analysis of pyr- A.J.P., Lukasik, R.M., 2017. High-pressure carbon dioxide/water pre-treatment of
olysis of lignocellulosic biomass using a three-parallel Gaussian reaction model. sugarcane bagasse and elephant grass: assessment of the effect of biomass composi-
Bioresour. Technol. 211, 502–508. tion on process efficiency. Bioresour. Technol. 224, 639–647.
Cozzani, V., Lucchesi, A., Stoppato, G., Maschio, G., 1997. A new method to determine Uzun, H., Yıldız, Z., Goldfarb, J.L., Ceylan, S., 2017. Improved prediction of higher
the composition of biomass by thermogravimetric analysis. Can. J. Chem. Eng. 75, heating value of biomass using an artificial neural network model based on proximate
127–133. analysis. Bioresour. Technol. 234, 122–130.
Despange, F., Massart, D.L., 1998. Neural networks in multivariate calibration. Analyst Vani, S., Sukumaran, R.K., Savithri, S., 2015. Prediction of sugar yields during hydrolysis
123, 157–178. of lignocellulosic biomass using artificial neural network modeling. Bioresour.
ECN.TNO, 2012. Phyllis2, database for biomass and waste. Available at: https://phyllis. Technol. 188, 128–135.
nl/ (accessed: 12 May 2019). Wang, C.H., Li, L.Q., Zeng, Z., Xu, X., Ma, X.M., Chen, R.F., Su, C.Q., 2019. Catalytic
Estiati, I., Tellabide, M., Saldarriaga, J.F., et al., 2019. Comparison of artificial neural performance of potassium in lignocellulosic biomass pyrolysis based on an optimized
networks with empirical correlations for estimating the average cycle time in conical three-parallel distributed activation energy model. Bioresour. Technol. 281,
spouted beds. Particuology 42, 48–57. 412–420.
Genuer, R., Poggi, J., Tuleau-Malot, Christine, Vialaneix, N., 2017. Random forests for big Wang, S.R., Dai, G.X., Yang, H.P., Luo, Z.Y., 2017. Lignocellulosic biomass pyrolysis
data. Big Data Res. 9, 28–46. mechanism: A state-of-the-art review. Prog. Energy Combust. Sci. 62, 33–86.
Ho, T.K., 1995. Random decision forests. In: Proceedings of the 3rd ICDAR, pp. 278–282. World Bioenergy Association, 2018. Global Bioenergy Statistics 2018. Available at:
Jin, X.L., Chen, X.L., Shi, C.H., Li, M., Guan, Y.J., Yu, C.Y., Yamada, T., Sacks, E., Peng, https://worldbioenergy.org/global-bioenergy-statistics (accessed: 18 March 2019).
J.H., 2017. Determination of hemicellulose, cellulose and lignin content using visible Xing, J.K., Luo, K., Wang, H.O., Wang, S., Bai, Y., Fan, J.R., 2019. Predictive single-step
and near infrared spectroscopy in Miscanthus sinensis. Bioresour. Technol. 241, kinetic model of biomass devolatilization for CFD applications: a comparison study of
603–609. empirical correlations (EC), artificial neural networks (ANN) and random forest (RF).
Jupudi, R.S., Zamansky, V., Fletcher, T.H., 2009. Prediction of light gas composition in Renew. Energy 136, 104–114.
coal devolatilization. Energy Fuel. 23, 3063–3067. Xing, J.K., Luo, K., Heinz, P., Wang, H.O., Zhao, C.G., Fan, J.R., 2019. Predicting kinetic
Li, X.L., Sun, C.J., Zhou, B.X., He, Y., 2015. Determination of hemicellulose, cellulose and parameters for coal devolatilization by means of Artificial Neural Networks. Proc.
lignin in moso bamboo by near infrared spectroscopy. Sci. Rep. 5, 17210. Combust. Inst. 37, 2943–2950.
Liu, L., Ye, X.P., Womac, A.R., Sokhansanj, S., 2010. Variability of biomass chemical Yang, Z., Li, K., Zhang, M.M., Xin, D.L., Zhang, J.H., 2016. Rapid determination of che-
composition and rapid analysis using FT-NIR techniques. Carbohydr. Polym. 81, mical composition and classification of bamboo fractions using visible-near infrared
820–829. spectroscopy coupled with multivariate data analysis. Biotechnol. Biofuels 9, 35.
Nair, V.V., Dhar, H., Kumar, S., Thalla, A.K., Mukherjee, S., Wong, J.W.C., 2016. Artificial

1 s2.0 S0960852419307710 Main

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

1 s2.0 S0960852419307710 Main

Uploaded by

Copyright:

Available Formats

Bioresource Technology 288 (2019) 121541

Contents lists available at ScienceDirect

Estimating biomass major chemical constituents from ultimate analysis T

1. Introduction always obtained by establishing partial least squares regression models

chemical compositions data are often absent. To the best of our 70

Element fractions,H/C and O/C

In the present study, biomass chemical composition database, in- ith

Fig. 2. Schematic topological structure of the random forest model.

Predicted cellulose fraction (%)

Predicted hemicellulose fraction (%)

3.2. Training performances

This section presents the training performances of the random forest 20

Cellulose fractions (%)

Correlation from Sheng and Azevedo (2002)

Correlation from Sheng and Azevedo (2002)

You might also like