You are on page 1of 11

Industrial Crops & Products 117 (2018) 224–234

Contents lists available at ScienceDirect

Industrial Crops & Products


journal homepage: www.elsevier.com/locate/indcrop

Modeling the seed yield of Ajowan (Trachyspermum ammi L.) using artificial T
neural network and multiple linear regression models
⁎ ⁎
Mohsen Niaziana,b, , Seyed Ahmad Sadat-Nooria, , Moslem Abdipourc
a
Department of Agronomy and Plant Breeding Science, College of Aburaihan, University of Tehran, Tehran-Pakdasht, Iran
b
Department of Tissue and Cell Culture, Agricultural Biotechnology Research Institute of Iran (ABRII), Agricultural Research, Education and Extension Organization
(AREEO), 3135933151 Karaj, Iran
c
Kohgiluyeh and Boyerahmad Agricultural and Natural Resources Research and Education Center, Agricultural Research, Education and Extension Organization
(AREEO), Yasuj, Iran

A R T I C LE I N FO A B S T R A C T

Keywords: Ajowan is a medicinal plant with useful pharmaceutical compounds in its seeds. Seed yield improvement in
Ajowan ajowan through a better understanding of the relationship between seed yield and its components is one of the
Artificial intelligence most important goals of any breeding program. In the present study, artificial neural network (ANN) along with
Medicinal plant multiple regression model (MLR) were applied to predict the seed yield of ajowan through seed yield compo-
Multi-layer perceptron
nents. According to the simple correlation analysis, four characters (number of secondary branches, shoot dry
Selection criteria
weight, number of umbellets in an inflorescence, and biological yield) were selected as input variables in both
artificial neural network and multiple linear regressions models. The network with SigmoidAxon transfer
function, Levenberg-Marquart learning algorithm, one hidden layer with four neurons, 1000 training epochs and
with a root mean square error (RMSE) of 0.147, a mean absolute error (MAE) of 0.127 and a determination
coefficient (R2) of 0.932 was selected as the final ANN model. The performance of ANN was better than MLR
with a RMSE of 0.210 and a r2 of 0.792. Biological yield and shoot dry weight were the most important yield
components traits that affect the seed yield of ajowan and assigned as selection criteria using both ANN and MLR
models.

1. Introduction conventional methods to meet demands (Dalkani et al., 2012). Hence,


attention to stable quality and quantity production of medicinal plants
Ajowan (Carum copticum L.) is an industrial medicinal plant belongs is important to meet the growing needs of pharmaceutical products.
to Apiaceae family and endemic of Egypt (Boskabady et al., 2014; Seed yield is the most important part of ajowan but it is a quantitative
Niazian et al., 2017a). This plant is mainly grows in arid and semi-arid and complicate trait that controlled with many genes and is mainly
regions of Egypt, East of India, and northwest, central and eastern parts influenced by environmental conditions (Dalkani et al., 2011), and
of Iran (Ashraf and Orooj 2006; Joshi 2000; Moosavi et al., 2015; therefore has low heritability (Ghanshyam et al., 2015). In this situa-
Niazian et al., 2017a; Noori et al., 2017). The active substances of seeds tion, plant breeders prefer to use indirect selection through yield
make this plant valuable for medicinal purposes (Dalkani et al., 2011). component traits with high heritability and high correlation with seed
Ajowan seeds contain an essential oil with about 50% content of yield. Morphological evaluation of plant’s agronomic characteristics is
thymol, which has a strong germicide, anti-spasmodic and fungicidal easier and cheaper than other evaluation methods (Dalkani et al.,
effect (Ashraf and Orooj 2006). Carvacrol, γ-terpinene, and p-cymene 2012). There are several methods for analysis of yield components that
are the main components of Iranian and African ajowan, whereas researchers can use them according to objective of project. Techniques
thymol is the main component of south Indian’s ajowan (Boskabady such as analysis of variance, simple correlation coefficient, multiple
et al., 2014). Many of the medicinal and aromatic plants do not have regression and path analysis usually used by researchers to analyze the
stable production in their growing areas and usually gathered in yield components (Fraser and Eaton, 1983). One of the simplest

Abbreviations: ANN, Artificial neural network; BY, Biological yield; IL, Average internodal length; LL, Leaf length; MAE, Mean absolute error; MLP, Multi layered perception; MLR,
Multiple linear regressions; MSE, Mean square error; NB, Number of branches; NSB, Number of secondary branches; NU, Number of umbles; NUI, Number of umbellets in an inflorescence;
PH, Plant height; RMSE, Root mean square error; SDW, Shoot dry weight; SY, Seed yield

Corresponding authors at: Department of Agronomy and Plant Breeding Science, College of Aburaihan, University of Tehran, Pakdasht-Tehran, Iran.
E-mail addresses: mniazian@ut.ac.ir (M. Niazian), noori@ut.ac.ir (S.A. Sadat-Noori).

https://doi.org/10.1016/j.indcrop.2018.03.013
Received 14 November 2017; Received in revised form 3 March 2018; Accepted 5 March 2018
Available online 22 March 2018
0926-6690/ © 2018 Elsevier B.V. All rights reserved.
M. Niazian et al. Industrial Crops & Products 117 (2018) 224–234

Table 1
The geography profile of the origins of 23 Iranian ajowan ecotypes.

No. Ecotype code Province City Geographical state No. Ecotype code Province City Geographical state

1 943 Tehran Tehran Center 13 7893 Ghoom Ghoom Center


2 906 Alborz Karaj Center 14 15484 Yazd Shahedie East
3 13299 Ardabil Ardabil North West 15 15864 Yazd Sadugh East
4 31831 Yazd Yazd East 16 12313 Fars Shiraz South
5 37483 South Khorasan Birjand East 17 – Khorasan Razavi Sabzevar North east
6 23011 Kerman Rafsanjan East 18 1085 Ardabil Ardabil North West
7 22079 Sistan & Baluchestan Iranshahr South East 19 14322 Hamedan Hamedan Center
8 17902 Fars Marvdasht South 20 10583 Ardabil Ardabil North West
9 10563 Ardabil Ardabil North West 21 33683 Yazd Sadugh East
10 37492 South Khorasan Birjand East 22 4077 Esfahan Felaverjan Center
11 14492 Markazi Arak Center 23 37529 South Khorasan Ghaen East
12 23023 Kerman Rafsanjan East

Table 2 to linear relationship of a dependent variable as function of the multiple


Neuron activate function. independent variables (Quirk, 2016). One of the standard procedures of
MLR for variable selection is stepwise regression. In this method, pre-
Activation function Formula
dictors introduce into the model sequentially and one at a time (Chong
SigmoidAxon f (xi , wi ) =
1 and Jun, 2005).
1 + exp[−xilin]
The main problem of regression-based models is that they cannot
LinearSigmoidAxon lin
⎛ 0x i ≺0 explain the highly nonlinear and complex relationship between seed
f (xi , wi ) = ⎜ 1x ilin ≻1 yield and its components (Emamgholizadeh et al., 2015). To overcome
⎜⎜ lin
⎝ x i else this problem, in the recent years, agricultural scientists have been at-
TainhAxon f (xi , wi ) = tanh[x ilin] tracted to use of the artificial intelligence (AI) models such as artificial
LinerTanhAxon lin
⎛− 1x i ≺ − 1
neural networks (ANN), genetic expression programming (GEP), and
f (xi , wi ) = ⎜ 1x ilin ≻1 adaptive neuro-fuzzy inference system (ANFIS) (Azamathulla and
⎜⎜ lin Ghani, 2011; Emamgholizadeh et al., 2013a,b; Iquebal et al., 2014;
⎝ x i else
Mansouri et al., 2016; Samadianfard et al., 2014; Shahinfar et al., 2012;
Silva et al., 2014). Artificial neural network is an intelligence model
that acts like the human brain (Tufail et al., 2008). Artificial neural
network models are classified according to their structure, neurons type
and etc. Furthermore, according to the training convergence in an ANN,
different algorithm can be used (Govindaraju, 2000a,b). One of the
most popular types of ANN that used in biological researches called the
multi-layer perceptron (MLP) (Emamgholizadeh et al., 2015; Naroui
Rad et al., 2015; Safa et al., 2015). A MLP is a feed-forward ANN model
that consists of an input layer, hidden layer (s) and an output layer. In
each MLP, multiple layers of nodes in a directed graph fully connected
to the next one and each node (except for the input nodes) is a neuron
with a nonlinear activation function (Rosenblatt, 1961).
Because of this fact that most of the herbal medicines are free of side
effects, the interest in plant products has considerably increased all over
the world (Ashraf and Orooj, 2006), but this increasing demand re-
quires stable quantity and quality production of medicinal plants
(Niazian et al., 2017b).
Direct selection for improve a quantitative trait such as seed yield is
not effective, so indirect selection via simple morphological traits can
be more helpful. Some conventional statistical approaches such as
correlation coefficient, multiple linear regression analysis and path
analysis can used for indirect improvement of quantitative and complex
traits, but this methods have some shortage that complicated methods
such as artificial neural network method can help to better understand
Fig. 1. Applied structure of multi layered perception model to predict seed yield of these complex and unpredictable developmental phenomenons of bio-
ajowan. (BY = Biological yield; NSB = Number of secondary branches; NUI = Number of
logical systems. The objective of the present study were (a) predict the
umbellets in an inflorescence, SDW = Shoot dry weight; SY = Seed yield).
seed yield of ajowan medicinal plant using artificial neural network
method, (b) compare the predicted results with the results of conven-
methods that can help to better understanding of yield component and tional regression-based method, and (c) find the most important se-
assists in effective selection is correlation coefficient analysis (Mishra lection criteria(s) for seed yield of Iranian ajowan ecotypes using ANN
et al., 2015). The path coefficient analysis is also widely used to de- and MLR models.
termine the nature of the relationships between yield and yield com-
ponents (Dalkani et al., 2011). This method has been used frequently
even in medicinal plants to testing cause/effect relationships among
correlated variables (Bhandari and Gupta, 1991; Cosge et al., 2009;
Dalkani et al., 2011; Lal, 2007). Multiple linear regression (MLR) refers

225
M. Niazian et al. Industrial Crops & Products 117 (2018) 224–234

Table 3
The summary of input and output variables in ANN and MLR models.

Parameter Mean ± SEa Unit Method of measurement

Inputs
1 Number of secondary branches 5.14 ± 0.12 Numerical Counting of the secondary branches branched out from main branch
2 Shoot dry weight 2.7 ± 0.02 gram
3 Number of umbellets in an inflorescence 5.05 ± 0.26 Numerical Counting of umbellets in an umbel inflorescences
4 Biological yield 119.42 ± 13.29 gram Total shoot weight at the end of growing season

Output
1 Seed Yield 51.14 ± 6.97 gram Weighing the all seeds of a single plant

a
Standard Error.

Fig. 2. The convergence of the average of mean square error (MSE) value in final artificial neural network (ANN) structure.

2. Materials and methods variables. Training and testing of ANN were performed using 552 col-
lected samples from four accessions of each ecotype in both 2014 and
2.1. Plant material and field experiments 2015.

Field experiments were conducted in agricultural research field in 2.3. Artificial neural network
College of Aburaihan-University of Tehran in 2014–2015 and
2015–2016 growing seasons. Twenty-three Iranian ecotypes of ajowan Normalization or scaling is not really a functional requirement for
(Table 1) were cultivated in randomized complete block design (RCBD) the ANNs to learn, but it can significantly help as it transposes the input
with three replicates. Each plot consisted of four rows with 7 m length variables into the data range that the Sigmoid [0,1] and/or Tanh [-1, 1]
and with 60- and 10- cm between and within lines, respectively. At the activation functions lied. On the other hand, instead of having another
end of each growing season, seed yield (SY) along with some important layer for scaling the inputs, it is always better to preprocess the data to
yield components traits such as plant height (PH), number of branches normalize it and then feed it to a neural network. Normalization in ANN
(NB), number of secondary branches (NSB), shoot dry weight (SDW), leads to same significance (importance) for all variables during the
number of umbels (NU), number of umbellets in an inflorescence (NUI), learning process (Chegini et al., 2008). Beside all aforementioned
leaf length (LL), average internodal length (IL), and biological yield points, to have better comparison between ANN and MLR models (that
(BY) were recorded separately. This field data were used for training should run with normalized data), same data (normalized data) were
and testing of the ANN and MLR models. used for two methods. For this purpose, the following equation (Eq. (1))
was used to normalization.
2.2. Statistical analysis
x0 − x
x norm = 0.5( ) + 0.5
x max − x min (1)
The simple correlation coefficient of seed yield with other in-
vestigated traits was conducted using combined means matrix (in- where xnorm is the normalized value for x0 input, x is the mean of data,
tegrated means for each investigated trait in 2014 and 2015 years). The xmax and xmin are the maximum and minimum values of data, respec-
simple Pearson correlation coefficients analysis was conducted using tively.
SAS software (Cary, 2004). The multi-layer perceptron of ANN with three learning algorithms
The ANN analysis was conducted using Neuro-Solutions V5.07 for including Momentum, Conjugate Gradient, and Levenberg-Marquardt
Excel (www.Neurosolutions.com) (Anonymous, 2011), using seed yield and four different activation functions including SigmoidAxon,
as dependent variable and the remaining traits as independent LinearSigmoidAxon, TainhAxon, and LinerTanhAxon (Table 2), was

226
M. Niazian et al. Industrial Crops & Products 117 (2018) 224–234

Fig. 3. The Pearson correlation coefficient of input variables with seed yield of ajowan in both ANN and MLR models.

Table 4 ANN is presented in Table 3. To find the best topology, different


The performance of the artificial neural network model with different transfer functions numbers of hidden layers (1–5) and neurons in each hidden layer
to predict seed yield of ajowan. (1–20) were tested through trial and error.
Transfer function Training Testing
The bootstrap resampling technique was used and all 552 field data
were randomly divided to three sections, 60% (331 data points) for
R2a RMSEb MAEc R2a RMSEb MAEc training, 15% (83 data points) for cross-validation, and 25% (138 data
points) for network test. Also, in order to validate the performance of
SigmoidAxon 0.932 0.147 0.127 0.911 0.154 0.135
LinearSigmoidAxon 0.120 0.205 0.153 0.118 0.215 0.164
ANN models, the results of the ANN model were compared with the
TainhAxon 0.861 0.167 0.145 0.843 0.173 0.146 results obtained from the other panel that collected in other experiment
LinerTanhAxon 0.357 0.152 0.132 0.335 0.159 0.139 in the same years. As over-training can cause memorization and over-
fitting of ANN, inadequate training can also result in low-fitting of ANN.
a
Determination coefficient. Some strategies to stop training at the optimal point and avoiding ex-
b
Root mean square error.
c
Mean absolute error.
cessive training are include: the epoch number, mean square error
(MSE) of training data, and the use of cross validation data (Baharlouei
et al., 2010). However, the insufficient number of epochs can decrease
Table 5
The performance of the artificial neural network model with different hidden layers to the ANN performance and too many numbers of epochs can increase the
predict seed yield of ajowan. risk of network overtraining that caused memorization (Aghbashlo
et al., 2012). To reduce the risk of overtraining and memorization a
Number of hidden layer(s) Training Testing pretest with one hidden layers (4) with various numbers of epochs
R2a
RMSE b
MAE c
R2a RMSEb MAEc
(50–2000) was done and the convergence point between training and
validation (epoch 1000) was considered as the completion of training
1 0.932 0.147 0.127 0.911 0.154 0.135 time to avoid over-training (Fig. 2).
2 0.905 0.161 0.140 0.892 0.179 0.151 Different transfer functions and numbers of hidden layers were
3 0.894 0.179 0.151 0.874 0.209 0.163
4 0.876 0.205 0.166 0.860 0.221 0.179
tested to find optimize performance of the final model. The perfor-
5 0.863 0.219 0.173 0.841 0.237 0.190 mance of ANN models was assessed through RMSE (Eq. (2)), MAE (Eq.
(3)) and R2 (Eq. (4)).
a
Determination coefficient.
b
Root mean square error. n
c
∑i = 1 (Oi − Pi )2
Mean absolute error. RMSE =
n (2)
used in the present study.
The data set of four independent variables include NSB, SDW, NUI, 1 n
and BY (that were significantly correlated with seed yield) were directly
MAE =
n
∑i =1 Oi − Pi
(3)
fed to input layer of ANN and then expected result of SY was produced
in output layer (Fig. 1). The summary of input and output variables in

227
M. Niazian et al. Industrial Crops & Products 117 (2018) 224–234

Table 6
The best topology of applied ANN model to predict seed yield of ajowan.

Learning Activation function Number of Number of Number of neuron Number of Best topology Training Testing
Algorithm input layer hidden layer in each hidden output layer
layer R2a RMSEb MAEc R2a RMSEb MAEc

Momentum SigmoidAxon 1–4 1–5 1–20 1 4–5−1 0.721 0.278 0.224 0.702 0.288 0.241
(MOM) LinearSigmoidAxon 1–4 1–5 1–20 1 4–9-7−1 0.203 0.253 0.238 0.190 0.267 0.251
TainhAxon 1–4 1–5 1–20 1 3–4−1 0.684 0.211 0.187 0.651 0.213 0.188
LinerTanhAxon 1–4 1–5 1–20 1 4–8−1 0.274 0.344 0.302 0.254 0.360 0.315
Conjugate SigmoidAxon 1–4 1–5 1–20 1 4−4-3-1 0.657 0.241 0.216 0.634 0.234 0.210
Gradient (CG) LinearSigmoidAxon 1–4 1–5 1–20 1 4–5−1 0.177 0.301 0.268 0.152 0.318 0.280
TainhAxon 1–4 1–5 1–20 1 4−4−4-1 0.602 0.321 0.288 0.584 0.345 0.299
LinerTanhAxon 1–4 1–5 1–20 1 4–7−1 0.194 0.201 0.175 0.182 0.197 0.166
Levenberg- SigmoidAxon 1–4 1–5 1–20 1 4−4-1 0.932 0.147 0.127 0.911 0.154 0.135
Marquart LinearSigmoidAxon 1–4 1–5 1–20 1 3−3-1 0.120 0.205 0.153 0.118 0.215 0.164
(LM) TainhAxon 1–4 1–5 1–20 1 4−4-3-1 0.861 0.167 0.145 0.843 0.173 0.146
LinerTanhAxon 1–4 1–5 1–20 1 4–5-6−1 0.357 0.152 0.132 0.335 0.159 0.139

a
Determination coefficient.
b
Root mean square error.
c
Mean absolute error.

Fig. 4. Measured and predicted seed yield of ajowan in ANN model. (a) Scatter plot in the training stage of ANN. (b) Box plot in the training stage of ANN.

228
M. Niazian et al. Industrial Crops & Products 117 (2018) 224–234

Fig. 5. Measured and predicted seed yield of ajowan in ANN model. (a) Scatter plot in the testing stage of ANN. (b) Box plot in the testing stage of ANN.

n
∑i = 1 (Oi − O )(Pi − P ) normalized data were used for multiple linear regression analysis.
R2 = n n
∑i = 1 (Oi − O )2 ∑i = 1 (Pi − P )2 (4)
3. Results and discussion
where n is the number of data, Oi is the observed values, Pi is the
predicted values and the bar denote the mean of the variable.
3.1. Simple correlation coefficient

2.4. Multiple linear regressions The results of simple correlation analysis showed that biological
yield and shoot dry weight had the highest positive correlations with
In order to obtain a multiple regression model for seed yield using seed yield of Iranian ajowan ecotypes (Fig. 3). Number of secondary
independent variables, a MLR stepwise regression analysis was gener- branches, number of ambles, and number of umbellets in an in-
ated with SAS software. The stepwise regression model was carried out florescence also had significant positive correlations with seed yield at
using the same inputs and output data which used for the MLP-ANN 1% probability level (Fig. 3). The correlation between number of
model (Table 3). branches and seed yield was positive and significant at 5% probability
The independency of errors was confirmed based on Durbin-Watson level (Fig. 3). Dalkani et al. (2012) used simple correlation analysis to
test value (1.94). The lack of collinearity between independent vari- find interrelationship between seed yield and some agro-morphological
ables in the model was tested using the variance inflation factor (VIF). traits in different Iranian ajowan populations and reported a positive
All independent variables with VIF > 10 showed a lack of collinearity. and significant correlation between single plant seed yield and biolo-
The normality of the data was evaluated using the SAS software based gical yield (r = 0.86), and also between seed yield and aerial parts dry
on Kolmogorov-Smirnov test. According to the non-normal data, a weight (r = 0.67). Ghanshyam et al. (2015) reported positive and sig-
logarithm transformation was used to normalize the data and then the nificant genotypic and phenotypic correlations between number of

229
M. Niazian et al. Industrial Crops & Products 117 (2018) 224–234

Fig. 6. Validation of the proposed ANN model to predict seed yield of Ajowan. (MAE = Mean absolute error; RMSE = Root mean square error).

Table 7 TainhAxon function in both training and testing (Table 4), but the least
Stepwise regression analysis of seed yield (dependent variable) and other morphological accuracy of ANN model was achieved by LinearSigmoidAxon transfer
traits (independent variables) of Iranian ajowan ecotypes. function. Although various criteria have been used to assess the per-
Step Variable Variables in model Partial R- R-Squareb
formance of ANN with variable hidden layers and transfer functions,
entered Squarea but MSE, RMSE, MAE and R2 are the most used criteria (Safa et al.,
2015).
1 BY BY 0.321 0.321 In the present study, different numbers of hidden layer were also
2 SDW BY,SDW 0.215 0.537
applied to predict the seed yield of ajowan in MLP/ANN model. The
3 NSB BY,SDW,NSB 0.173 0.710
4 NUI BY,SDW,NSB, NUI 0.110 0.820 accuracy of ANN model using R2, RMSE and AME for different hidden
Durbin-Watson VIF for all Tolerance for all layers showed that one hidden layer with four neurons was the best
value = 1.943 variable variable(1 > TOL) parameter of ANN model (Table 5). One hidden layer led to
(5 < VIF)
RMSE = 0.147, MAE = 0.127 and R2 = 0.932 in training stage and
NSB: number of secondary branches, SDW: shoot dry weight, NUI: number of umbellets in
RMSE = 0.154, MAE = 0.135 and R2 = 0.911 in testing stage of ANN
an inflorescence, BY: biological yield, VIF: variance inflation factor, TOL: tolerance. (Table 5). The number of hidden layers that exhibit the complexity of
a
Partial determination coefficient. the MLP, and the type of transfer function between nods are the two
b
Determination coefficient. important factors that affect the performance of MLP (Rosenblatt,
1961). One or two hidden layers are sufficient and the most helpful
umbellets in an umbel and seed yield in Indian ajowan genotypes. Input number of hidden layer in the most of the cases (Erzin et al., 2008). The
variable selection is a critical part in both ANN and MLR. This step is complexity of ANN structure, total number of inputs, as well as output
particularly important in order to find the optimal function in ANNs, units, number of samples used in training, the extent of noise in the
due to the negative impact that poor selection can have on the per- sample set, and the algorithm used for training are the factors that can
formance of ANNs during the training and deployment (May et al., affect number of hidden layers and hidden units in ANN (Erzin et al.,
2011). In this study, input variables were selected based on the MLR 2008; Movagharnejad and Nikzad, 2007; Naroui Rad et al., 2015;
stepwise regression and their simple correlation with seed yield. Ac- Panda et al., 2010; Zhang et al., 2011). The capability of ANN model to
cording to the results of correlation coefficient analysis, seed yield of produce the high-level of statistical data is depend on hidden layer
ajowan was strongly affected by morphological characteristics in- (Naroui Rad et al., 2015).
cluding number of secondary branches; shoot dry weight, number of The SigmoidAxon transfer function, and one hidden layer were the
umbellets in an inflorescence, and biological yield, so these variables best parameters in ANN model to predict seed yield of ajowan (Tables 4
were applied to estimate the seed yield of ajowan in both ANN and MLR and 5). The summary of the best topology of ANN model, based on the
models. number of input layers, number of hidden layers, number of neurons in
each hidden layer, and number of output layer, which were applied in
3.2. Seed yield prediction using MLP/ANN model different learning algorithms and activation functions is presented in
Table 6. It is obvious that these results can reduce the required time for
In the present study, Neuro-Solutions software successfully applied finding the best topology of ANN analysis in future studies that will be
to conduct ANN analysis. The successful application of Neuro-Solutions done on medicinal plants of Apiacea family. According to scatter plot
software to perform ANN model has also been reported in previous and box plot there was no significant difference between measured data
studies (Mansourian et al., 2017; Mohammadi Torkashvand et al., and predicted data of SY in ANN model in both training (Fig. 4a,b) and
2017; Niazian et al., 2018). The efficiency of ANN was assessed for testing datasets (Fig. 5a,b). The box plot showed same median of
different applied learning algorithms and transfer functions. According measured and predicted data in both testing and training stages that
to the results, the least amounts of RMSE and AME, and the highest R2 indicate to the power of ANN in prediction of ajowan seed yield. In the
values were achieved using SigmoidAxon function followed by

230
M. Niazian et al. Industrial Crops & Products 117 (2018) 224–234

Fig. 7. Measured and predicted seed yield of ajowan in MLR model. (a) Scatter plot in the training stage of MLR. (b) Box plot in the training stage of MLR.

present study, annual data obtained from the other panel that collected formulation (Eq. (5)) shows the importance and the effect of in-
in other experiment (in the same years) were used to evaluate the dependent variables on dependent variable and it reveal that how the
performance of the proposed model for other data sets (validation of the amount of seed yield in ajowan can change by different amounts of
ANN model). The high values of r2 together with the low values of SDW, NUI, NSB, and BY. The results of stepwise regression analysis
RMSE and MAE for the new data showed the repeatability of the model showed that the highest partial R-square was related to BY, which en-
(Fig. 6). These results indicate to high similarity of the experimental tered to model in the first step (Table 7). Shoot dry weight, number of
data with the predicted data given by the ANN model. secondary branches, and number of umbellets in an inflorescence were
the variables that entered to the regression model in next steps
3.3. Seed yield prediction using MLR model (Table 7). Bahmani et al. (2015) used stepwise regression to find con-
tribution of independent variables to the total variation of seed yield of
The respective contributions of input variables including number of fennel (Foeniculum vulgare L.) and reported that 55.41, 12.72, 2.21, and
secondary branches, shoot dry weight, number of umbellets in an in- 11.63% of total variance of seed yield interpreted by weight of dry
florescence, and biological yield in the total seed yield variation of biomass, number of inflorescent, days to 50% flowering, and days to
ajowan were determined using stepwise regression of MLR model. The 70% seed pasty, respectively.
inputs and output of ANN model here is also applied as inputs and Scatter and box plots were also applied to compare observed and
output of regression model (Table 3). Regression model computed by predicted values of seed yield from MLR model in both training and
SAS software was as below: testing datasets. The ability of MLR model to predict SY in training
(5) (r2 = 0.81) and testing (r2 = 0.79) stages are shown in Figs. 7 and 8.
SY = −4.01 + 0.28SDW + 0.07NUI + 0.18NSB + 0.36BY
The performance of MLR model (based on r2 = 0.82) was significantly
where SY is the seed yield, the SDW is the shoot dry weight, the NUI is less than ANN model (with R2 = 0.93).
the number of umbellets in an inflorescence, the NSB is the number of According to scatter plot and box plot in training (Fig. 7a,b) and
secondary branch, and the BY is the biological yield. The MLR

231
M. Niazian et al. Industrial Crops & Products 117 (2018) 224–234

Fig. 8. Measured and predicted seed yield of ajowan in MLR model. (a) Scatter plot in the testing stage of MLR. (b) Box plot in the testing stage of MLR.

testing stages (Fig. 8a,b) of the MLR model, distribution and other the highest R2 values for prediction of both BY and Y in barely.
statistic parameters of the predicted values were more differ than the
measured values, in comparison to the ANN model to predict seed yield 3.4. Sensitivity analysis of the governing variables on the seed yield
of ajowan (Figs. 4 and 5). The graphical representation of measured
values versus predicted values of the seed yield of ajowan obtained by To find the most important input variables that can affect the seed
the ANN and MLR models, confirmed that predicted values by the ANN yield of ajowan, sensitivity tests were conducted in both ANN and MLR
model was more similar to actual measured values than those that models. The results of sensitivity test showed that the highest RMSE
predicted by the MLR model (Fig. 9). The comparison between the (0.225) and MAE (0.182), and the lowest R2 (0.543) were achieved in
experimental and predicted data given by the ANN model indicated to the ANN model without BY (Table 7). In MLR model, also the highest
similar trend of these two sets of data (R2 = 0.89) (Fig. 6), and higher RMSE (0.247) and MAE (0.235) with the lowest R2 (0.442) were cor-
accuracy of ANN model than MLR model in throughout of the estima- responded to the MLR model without BY (Table 8). These results were
tion process (Fig. 9). Anyway, using other computational methods such fully consistent with the results of stepwise regression (Table 7), and
as genetic algorithm can help to optimize of ANN model and reduce its indicated that biological yield is the most important independent
uncertainties (Arab et al., 2016). Emamgholizadeh et al. (2015) used variable that can greatly affect the seed yield of ajowan medicinal plant.
the ANN and MLR models to predict the seed yield of sesame (Sesamum Emamgholizadeh et al. (2015) used the sensitivity test in both ANN and
indicum L.) using five independent variables as input dataset and re- MLR models to find the importance of each input variables on the seed
ported that the ANN model performed better than the MLR model in yield of sesame and reported capsule number per plant as the most
seed yield prediction of sesame. Mokarram and Bijanzadeh (2016) used important input variable that can significantly affect RMSE, MAE and
MLR model along with two models of ANN including MLP and radial R2 of ANN and MLR models. In a study, ANN was applied to predict the
basis function (RBF) models to predict biological yield (BY) and yield final fruit weight and select the most important variables in the Iranian
(Y) of barely (Hordeum vulgare L.) and reported that MLP model lead to population of melon (Cucumis melo L.) and in final, flesh diameter was

232
M. Niazian et al. Industrial Crops & Products 117 (2018) 224–234

Fig. 9. The graphical representation of measured versus predicted seed yield of ajowan by the artificial neural network (ANN), and multiple linear regressions (MLR) models.

Table 8 the optimized ANN model to modeling other quantitative and compli-
Sensitivity analysis of the governing variables on seed yield. cated characteristics of medicinal plants, such as essential oil content
and secondary metabolites, that more valuable for pharmaceutical and
Method ANN MLR
food industries.
2a b c
R RMSE MAE r2a RMSEb MAEc
Acknowledgements
The best ANN/MLR (with 0.911 0.154 0.135 0.792 0.210 0.179
NSB,SDW,NUI,BY as
input)
The authors are thankful to the Research Institute of Forests and
MLR/ANN without BY 0.543 0.225 0.182 0.442 0.247 0.235 Rangelands of Iran for procuring the seeds, and also grateful to Dr. M.H
MLR/ANN without SDW 0.605 0.208 0.174 0.510 0.225 0.219 Asare, the secretary of science and technological development staff of
MLR/ANN without NSB 0.657 0.193 0.153 0.537 0.211 0.193 medicinal plants and traditional medicine of Islamic Republic of Iran
MLR/ANN without NUI 0.763 0.180 0.140 0.664 0.193 0.185
for his kind support of ajowan project.
ANN: artificial neural network; BY: biological yield; MLR: multiple linear regressions
NSB: number of secondary branches, NUI: number of umbellets in an inflorescence, SDW: References
shoot dry weight.
a
Determination coefficient. Aghbashlo, M., Mobli, H., Rafiee, S., Madadlou, A., 2012. The use of artificial neural
b
Root mean square error. network to predict exergetic performance of spray drying process: a preliminary
c
Mean absolute error. study. Comput. Electron. Agric. 88, 32–43.
Anonymous, 2011. Neurosolutions V5.07 for Excel, Neurodimension. http://www.
neurosolutions.com.
reported as the most important independent variable that can affect Arab, M.M., Yadollahi, A., Shojaeiyan, A., Ahmadi, H., 2016. Artificial neural network
final fruit weight of melon (Naroui Rad et al., 2015). genetic algorithm as powerful tool to predict and optimize in vitro proliferation
mineral medium for G × N15 rootstock. Front. Plant Sci. 7, 1526.
Ashraf, M., Orooj, A., 2006. Salt stress effects on growth, ion accumulation and seed oil
4. Conclusion concentration in an arid zone traditional medicinal plant ajwain (Trachyspermum
ammi [L.] Sprague). J. Arid Environ. 64, 209–220.
Azamathulla, H.M., Ghani, A.A., 2011. Genetic programming for predicting longitudinal
Although there are different multivariate regression-based models dispersion coefficients in streams. Water Res. Manage. 25, 1537–1544.
to find the most important associated traits that greatly affect desired Baharlouei, A., Omid, M., Ahmadi, H., Rafiee, S.H., 2010. Predicting moisture content of
quantitative traits with low heritability in plants, but these models are Pistachio nuts (Akbari variety) with artificial neural network. Iran. J. Food Sci. Tech.
3 (1), 45–56.
unable to interpret complex relationships between dependent and in- Bahmani, K., Izadi Darbandi, A., Ramshini, H.A., Moradi, N., Akbari, A., 2015. Agro-
dependent variables, especially when nonlinear relations are exist. One morphological and phytochemical diversity of various Iranian fennel landraces. Ind.
of the powerful methods that can help plant breeders to overcome these Crops Prod. 77, 282–294.
Bhandari, M., Gupta, A., 1991. Variation and association analysis in coriander. Euphytica
shortages is ANN. In the present study, for the first time, ANN model 58, 1–4.
was applied to predict seed yield and also to find the most important Boskabady, M.H., Alitaneh, S., Alavinezhad, A., 2014. Carum copticum L.: a herbal
yield components that affect the seed yield of ajowan medicinal plant. medicine with various pharmacological effects. BioMed Res. Int. 11. http://dx.doi.
org/10.1155/2014/569087.
According to different analyses, MLP model with SigmoidAxon transfer Cary, N.C., 2004. The SAS System for Windows. Release 9. 1. SAS Inst, North Carolina.
function, Levenberg-Marquardt learning algorithm, one hidden layer, SAS Institute.
and four neurons was the best ANN model to predict the seed yield of Chegini, G.R., Khazaei, J., Ghobadian, B., Goudarzi, A.M., 2008. Prediction of process and
product parameters in an orange juice spray dryer using artificial neural networks. J.
ajowan. The prediction of ANN was better than MLR model. Based on
Food Eng. 84 (4), 534–543.
the results of ANN and stepwise regression, biological yield was the Chong, I.G., Jun, C.H., 2005. Performance of some variable selection methods when
most important variable that greatly affects the seed yield of ajowan. multicollinearity is present. Chemometr. Intell. Lab. Syst. 78 (1), 103–112.
The results of present study can be used to improve seed yield of other Cosge, B., Ipek, A., Gorbouz, B., 2009. Some phenotypic selection criteria to improve seed
yield and essential oil percentage of sweet fennel (Foeniculum vulgare Mill var.
valuable medicinal plants of Apiacea family. Plant breeders can also use

233
M. Niazian et al. Industrial Crops & Products 117 (2018) 224–234

Dulce). Tarim Bilimleri Dergisi 15 (2), 127–133. Moosavi, S. Gh., Seghatoleslami, M.J., Jouyban, Z., Ansarinia, E., Moosavi, S.A., 2015.
Dalkani, M., Darvishzadeh, R., Hassani, A., 2011. Correlation and sequential path analysis Response morphological traits and yield of Ajowan (Carum copticum) to water deficit
in Ajowan (Carum copticum L.). J. Med. Plant Res. 5 (2), 211–216. stress and nitrogen fertilizer. Biol. Forum. 7 (1), 293–299.
Dalkani, M., Hassani, A., Darvishzadeh, R., 2012. Determination of the genetic variation Movagharnejad, K., Nikzad, M., 2007. Modelling of tomato drying using artificial neural
in Ajowan (Carum Copticum L.) populations using multivariate statistical techniques. network. Comp. Elect. Agric. 59, 78–85.
Rev. Ciênc Agron. 43 (4), 698–705. Naroui Rad, M.R., Koohkan, S., Fanaei, H.R., Pahlavan Rad, M.R., 2015. Application of
Emamgholizadeh, S., Bateni, S.M., Jeng, D.S., 2013a. Artificial intelligence-based esti- Artificial Neural Networks to predict the final fruit weight and random forest to select
mation of flushing half-cone geometry. Eng. Appl. Artif. Intell. 26 (10), 2551–2558. important variables in native population of melon (Cucumis melo L.). Sci. Hortic.
Emamgholizadeh, S., Kashi, H., Marofpoor, I., Zalaghi, E., 2013b. Prediction of water 181, 108–112.
quality parameters of Karoon River (Iran) by artificial intelligence-based models. Int. Niazian, M., Sadat Noori, S.A., Galuszka, P., Tohidfar, M., Mortazavian, S.M.M., 2017a.
J. Environ. Sci. Technol. 11 (3), 645–656. Genetic stability of regenerated plants via indirect somatic embryogenesis and in-
Emamgholizadeh, S., Parsaeian, M., Baradaran, M., 2015. Seed yield prediction of sesame direct shoot regeneration of Carum copticum L. Ind. Crops Prod. 97, 330–3307.
using artificial neural network. Eur. J. Agron. 68, 89–96. Niazian, M., Sadat Noori, S.A., Tohidfar, M., Mortazavian, S.M.M., 2017b. Essential oil
Erzin, Y., Rao, H., Singh, D., 2008. Artificial neural network models for predicting soil yield and agro-morphological traits in some Iranian ecotypes of Ajowan (Carum
thermal resistivity. Int. J. Ther. Sci. Algor. 47, 1347–1358. copticum L.). J. Essent Oil Bear Pl 20 (4), 1151–1156.
Fraser, J., Eaton, G.W., 1983. Application of yield component analysis to crop research. Niazian, M., Sadat-Noori, S.A., Abdipour, M., Tohidfar, M., Mortazavian, S.M.M., 2018.
Field Crop Res. 39, 787–797. Image processing and artificial neural network-based models to measure and predict
Ghanshyam, N., Dodiya, S., Sharma, S.P., Jain, H.K., Dashora, A., 2015. Assessment of physical properties of embryogenic callus and number of somatic embryos in ajowan
genetic variability, correlation and path analysis for yield and its components in aj- (Trachyspermum ammi (L.) Sprague). In Vitro Cell. Dev. Biol. Plant 54, 54–68.
wain (Trachyspermum ammi L.). J. Spices Aromatic Crop 24 (1), 43–46. Noori, S.A.S., Norouzi, M., Karimzadeh, G., Shirkool, K., Niazian, M., 2017. Effect of
Govindaraju, R.S., 2000a. Artificial neural networks in hydrology. I: preliminary con- colchicine-induced polyploidy on morphological characteristics and essential oil
cepts. J. Hydrol. Eng. 5 (2), 115–123. composition of ajowan (Trachyspermum ammi L.). Plant Cell Tissue Organ Cult
Govindaraju, R.S., 2000b. Artificial neural networks in hydrology. II: hydrologic appli- (PCTOC) 130, 543–551.
cations. J. Hydrol. Eng. 5 (2), 124–137. Panda, S., Ames, D., Panigrahi, S., 2010. Application of vegetation indices for agri-cul-
Iquebal, M.A., Ansari, M.S., Sarika, S.P., Dixit, N.K., Aggarwal Verma, R.A.K., Jayakumar, tural crop yield prediction using neural network techniques. Remote Sens. 2,
S., Rai, A., Kumar, D., 2014. Locus minimization in breed prediction using artificial 673–696.
neural network approach. Stichting Int. Found. Anim. Gen. 45, 898–902. Quirk, T.J., 2016. Correlation and simple linear regression. Excel 2016 for Engineering
Joshi, S.G., 2000. Medicinal Plants. Oxford and IBH Publishing Co. Pvt. Ltd., New Delhi, Statistics. Springer International Publishing, pp. 111–155.
India 491 p. Rosenblatt, Frank X., 1961. Principles of Neurodynamics: Perceptrons and the Theory of
Lal, R.K., 2007. Associations among agronomic traits and path analysis in fennel Brain Mechanisms. Cornell Aeronautical Lab Inc Buffalo NY. Spartan Books,
(Foeniculum vulgare Miller). J. Sustain. Agric. 30 (1), 21–229. Washington DC.
Mansouri, A., Fadavi, A., Mortazavian, S.M.M., 2016. An artificial intelligence approach Safa, M., Samarasinghe, S., Nejat, M., 2015. Prediction of wheat production using arti-
for modeling volume and fresh weight of callus-A case study of cumin (Cuminum ficial neural networks and investigating indirect factors affecting it case study in
cyminum L.). J. Theor. Biol. 397, 199–205. Canterbury Province, New Zealand. J. Agric. Sci. Tech. 17, 791–803.
Mansourian, S., Darbandi, E.I., Mohassel, M.H.R., Rastgoo, M., Kanouni, H., 2017. Samadianfard, S., Nazemi, A.H., Ashraf Sadraddini, A., 2014. M5 model tree and gene
Comparison of artificial neural networks and logistic regression as potential methods expression programming based modeling of sandy soil water movement under sur-
for predicting weed populations on dryland chickpea and winter wheat fields of face drip irrigation. Agric. Sci. Dev. 3 (5), 178–190.
Kurdistan province, Iran. Crop Prot. 93, 43–51. Shahinfar, S., Mehrabani-Yeganeh, H., Lucas, C., Kalhor, A., Kazemian, M., Weigel, K.A.,
May, R.J., Dandy, G.C., Maier, H.R., 2011. Review of Input Variable Selection Methods 2012. Prediction of Breeding Values for Dairy Cattle Using Artificial Neural Networks
for Artificial Neural Networks. InTech, Rijeka, Croatia 19–44 pp. and Neuro-Fuzzy Systems. Hindawi Publishing Corporation Comput. Math. Methods
Mishra, R., Gupta, A.K., Lal, R.K., Jhang, T., Banerjee, N., 2015. Genetic variability, Med., pp. 1–9. http://dx.doi.org/10.1155/2012/127130.
analysis of genetic parameters, character associations and contribution for agrono- Silva, G.N., Tomaz, R.S., Sant’Anna, I.C., Nascimento, M., Bhering, L., Cruz, C.D., 2014.
mical traits in turmeric (Curcuma longa L.). Ind. Crops Prod. 76, 204–208. Neural networks for predicting breeding values and genetic gains. Sci. Agricola. 71
Mohammadi Torkashvand, A., Ahmadi, A., Layegh Nikravesh, N., 2017. Prediction of (6), 494–498.
kiwifruit firmness using fruit mineral nutrient concentration by artificial neural Tufail, M., Ormsbee, L.E., Teegavarapu, R., 2008. Artificial intelligence-based inductive
network (ANN) and multiple linear regressions (MLR). J. Integr. Agric. 16, models for prediction and classification of fecal coliform in surface waters. J.
60345–60347. Environ. Eng. 134 (9), 789–799.
Mokarram, M., Bijanzadeh, E., 2016. Prediction of biological and grain yield of barley Zhang, H., Hu, H., Zhang, X., Zhu, L., Zheng, K., Jin, Q., Zeng, F., 2011. Estimation of rice
using multiple regression and artificial neural network models. Aust. J. Crop Sci. 10 neck blasts severity using spectral reflectance based on BP-neural network. Acta
(6), 895–903. Physiol. Plant. 33, 2461–2466.

234

You might also like