You are on page 1of 28

Chapter II

REVIEW OF LITERATURE
A review of the available literature relevant to the scope of the study has been
presented in this chapter with a view to survey the various methodologies employed
by the researchers. This chapter is divided under the following sections:

2.1 Studies related to Autoregressive Integrated Moving Average (ARIMA)


Model

2.2 Studies related to Exponential Smoothing Technique

2.3 Studies related to Artificial Neural Network (ANN) Model

2.4 Studies related to Regression Techniques

2.5 Studies related to Comparison and Selection of Best Forecasting Model

2.6 Studies related to Spatial Interpolation Techniques

2.7 Studies related to Soil Micronutrients

2.1 Studies related to Autoregressive Integrated Moving Average (ARIMA)


Model

Leuthold et al. (1970) forecasted daily hog prices and daily quantities
supplied by using several alternative techniques. A distinction between econometric
and the Box Jenkins models was made. It was stated that the former identified and
measured both economic and non-economic variables affecting price and quantity,
while the latter identified the stochastic components. The models were tested using
Theils ‘U’ coefficient and the authors concluded that the econometric models yielded
slightly superior forecasts. Finally, it was concluded that although better forecasts
would be obtained by econometric models yet stochastic models were less prone to
error and were less expensive.

Nelson (1972) compared econometric (regression) and time-series


Autoregressive Moving Average (ARMA) methods for a longer time horizon. He
Review of Literature

concluded that the simple ARMA models were relatively more robust with respect to
post sample predictions than the complex econometric models. If the mean square
error were an appropriate measure of loss, an unweighted assessment clearly
indicated that a decision maker would have been better off relying simply on the
ARIMA predictions in the post sample period.

Chatfield and Protharo (1973) observed that the Box Jenkins procedure was
not suitable for the sale forecasts with a multiplicative seasonal component. In this
analysis, monthly data on sales of a company was used. The adequacy of the model
was tested using Box-Pierce Test.

Govindan (1974) used Box Jenkins model to analyze wholesale price indices
of rice, wheat, jowar and gram. The short term forecasts were found to give good
results while the same was not true of long term forecasts. Janus quotients of the
forecasts showed that the model gave good results.

Cooper (1975) concluded autoregressive moving average (ARIMA)


forecasting models could be constructed which predict economic variables about as
well as econometric models.

Protharo and Wallis (1976) examined the extent to which variations in a


series could be explained first by a dynamic econometric model and then by ARIMA
model. Econometric model clearly indicated that they provided a closer estimate of
behaviour of the series during the sample periods.

Chatfield (1977) observed that the Box Jenkins approach, being a valuable
addition in the forecast tool bag, gave a deeper understanding of time series behavior.
Even though it was found to be more expensive yet the accuracy justified the cost.

Makridakis and Hibbon (1979) revealed that accuracy of forecasts were


negatively associated with the error term. Several tests to arrive at the accuracy of
forecasts like mean square error (MSE), Theils ‘U’ coefficient and mean absolute
percentage error (MAPE) were suggested.

~11~
Review of Literature

Chengappa (1980) applied the Box Jenkins model to forecast poor sale and
export auction prices of coffee. The ARIMA seasonal model was applied by using
monthly datadue to the distinct seasonal variation in prices. The poor sale price
forecasts were found to be accurate when compared to forecast of export prices. This
was attributed to a possible lack of stationarity of the data. Hence adoption of
differencing procedure or a transformation to make the data stationary was found
necessary for a better estimate of export prices.

Hillmer and Tiao (1982) examined three of the ARIMA models commonly
fitted to economic time series and showed that certain restrictions must be placed on
the range of parameter values for such a decomposition to exist.

Achoth (1985) analyzed the supply, price and trade of Indian tea by fitting
ARIMA models to data on prices and production. The moving average models were
found to be most suitable. Among the price series a particular month’s price was not
related to the price of the immediate previous month but significantly related to the
price of same month in previous years. However, the production in a particular month
was related both to production of the previous month as well as to the production of
same month in previous years. The forecasts yielded reasonably good results as
judged from the tests of their efficiency. The forecasts of prices were superior when
compared to the forecasts of quantities, which was attributed to the highly structured
pattern of price behaviour.

Devaiah et al. (1988) attempted forecasting the prices of cocoons at the


market by using ARIMA models. The forecasts were made for 13 months from April
1987 to April 1988. The forecasted values were observed to be close to the actual
prices.

Ray (1988) evaluated the performances of three methods namely Box –


Jenkins, Bilinear and Threshold auto regression on the basis of ten Indian economic
time series and concluded that Box Jenkins method had outperformed Threshold
Autoregression method in 79 cases out of 120.

~12~
Review of Literature

Dorfman and McIntosh (1990) suggested that ARIMA Modeling was a


parsimonious approach which could represent both stationary and non-stationary
stochastic processes. The objective was to build an Autoregressive Integrated Moving
average model (ARIMA) which adequately represented the data generating process.

Yin-Runsheng and Mins-Rs (1999) forecasted timber price with univariate


Auto Regression Integrated Moving Average (ARIMA) models by using quarterly
price series from Timber Market. It was suggested that forecasting future prices could
aid timber producers and consumers.

Mastny (2001) demonstrated the possible usage of the Box-Jenkins


methodology for the analysis of time series for agricultural commodities. The paper
illustrated price development forecast for a selected agricultural commodity.

Du Preeza and Witt (2003) made an empirical investigation in forecasting


thetourism demand from four European countries, they concluded that taking all of
the empirical results into account, the moving average ARIMA models seemed to be
the best if a specific choice of forecasting model had to be made.

Yannis (2003) discussed three modeling techniques, which apply to multiple


time series data that correspond to different spatial locations (spatial time series), the
first two methods, namely the Space-Time ARIMA (STARIMA) and the Bayesian
Vector Autoregressive (BVAR) model with spatial priors (apply when interest lies on
the spatio-temporal evolution of a single variable). It was found that STARIMA is
better suited for applications of large spatial and temporal dimension whereas the
BVAR can be realistically performed when the number of locations of the study is
rather small. Next, they considered models that aim to describe relationships between
variables with a spatio-temporal reference and discussed the general class of dynamic
space-time models in the framework presented by Elhorst (2001). Each model class is
introduced through a motivating application.

Dooley and Lenihan (2005) analyzed the ability of two time series
forecasting techniques to predict future lead and zinc prices. With regards to zinc,

~13~
Review of Literature

there was no conclusive evidence to suggest that one model was superior to the other
in its forecasting ability. In the case of lead, however, it was found that forecasting
cash prices using ARIMA modeling led to results that had superior forecasting power
in five out of eight cases and hence difficult to offer an outright conclusion regarding
which forecasting method developed in this paper produced a better result. However,
on balance one would had to acknowledge that out of a total of sixteen cases, in nine
of these ARIMA price forecasts provided a superior result to lagged forward price
models.

Gangadharappa (2005) fitted ARIMA model to study the variation in


arrivals and prices of potato in Bangalore, Belgaum, Kolar, Hassan and Hubli markets
of Karnataka during 1996-97 to 2003-04. Box-Jenkins method was applied for
precise forecasting of arrivals and prices of potato for the monthly data to all the
selected markets. Of all the ten series, he found only two series, which yielded Box –
Pierce ‘Q’ statistic which was significant and AIC was minimum.

Batchelor et al. (2007) tested the performance of popular time series models
in predicting spot and forward rates on major seaborne freight routes investigate the
performance of alternative univariate and bivariate linear time-series models in
generating short-term forecasts of spot freight rates in the international dry bulk
shipping market they found that in predicting forward rates ARIMA and VAR models
forecasted better.

Bharathi (2009) used the Box-Jenkins ARIMA model to forecast the monthly
arrivals and prices of mulberry silk cocoons in the two markets.

Chandrakala (2009) analysed spatial and temporal behavior of arrivals and


prices of groundnut in Karnataka. ARIMA model was employed to forecast the
arrivals and prices of groundnut in selected markets. Among five markets
(Challakere, Chitradurga, Bellary, Yadagir and Davangere markets) the Bellary
market yielded the best results.

~14~
Review of Literature

Valipour (2012) forecasted the inflow of Dez dam reservoir by using ARMA
and ARIMA models while increasing the number of parameters in order to increase
the forecast accuracy to four parameters and comparing them. In ARMA and ARIMA
models, the polynomial was derived respectively with four and six parameters to
forecast the inflow. By comparing root mean square error of the model, it was
determined that ARIMA model can forecast inflow to the Dez reservoir from 12
months ago with lower error than the ARMA model.

Kumari et al. (2014) studied rice yield prediction in India using Auto
regressive integrated moving average approach and out of different eleven ARIMA
model, ARIMA(1,1,1) model was found to be the best for their study.

2.2 Studies related to Exponential Smoothing Technique

Belov and Chepurnoi (1985) examined on promising chopper mechanisms


for forage harvesters. Design features of the chopping mechanisms of forage
harvesters were briefly evaluated and future manufacturing trends were analysed by
means of time series analysis and exponential smoothing. Harvesters had cylinder
choppers and blowers constituted more than 50 percent of the existing population and
it was predicted that this trend would continue. The proportion of harvesters having
disc chopper-forwarders would reach 10-12 percent in the coming 5 year.

Gardner (1985) critically reviewed exponential smoothing technique which


was originally given by Brown (1963). Gardner developed, state of the art guidelines
for exponential smoothing methodology applied to different behavior of data.

Deluyker et. al. (1987) analyzed on modeling daily milk yield in Holstein
cows using time series analysis. Time series analysis of milk yields of cows milked 3
times daily was carried out on 513 partial or complete lactation yield records. It was
found that the exponential smoothing function was most appropriate for the modeling
of individual milking and daily yield data. Model parameters were influenced by
parity, stage of lactation, occurrence of missed milkings and treatment for diseases.
An examination of the residual variances showed that the model to forecast daily total
yield performed as well as the model to forecast individual-milking yield.

~15~
Review of Literature

Sisak (1989) worked on the principle of adaptive models of time series with
regard to short term forecasting and the possibilities for application to cost planning.
An adaptive model for the exponential smoothing of time series data was used to
determine short term forecasts in the development of production costs for a farm
forestry enterprise in Czechoslovakia. The results obtained were compared to those
derived from an extrapolation of regression estimates. It was found that it provided
better quality of forecasts for farm planning/budgeting than the regression model
forecasts.

Manurung et. al. (1991) analysed on forecasting of oil palm hectarage and
the need for seed in the second long term development plan. Forecasts of oil palm
hectarage in Indonesia over the period of the second long term development plan
(1994-2018) were made using the double exponential smoothing method. The
average forecast growth rate was 3.24 percent per year.

Sheldon (1993) examined issues relating to the measurement and forecasting


of international tourist expenditures and arrivals. It showed that the two series
fluctuate differently, and examined the accuracy of six different forecasting
techniques (time series and econometric causal models) to forecast tourism
expenditures. The results showed that the accuracy of the forecasts differs depending
on the country being forecast, but that the no-change model and Brown's double
exponential smoothing were, overall, the two most accurate methods for forecasting
international tourism expenditures.

Lim and McAleer (2001) worked on forecasting tourist arrivals. Various


exponential smoothing models are estimated over the period 1975-99 to forecast
quarterly tourist arrivals to Australia from Hong Kong, Malaysia, and Singapore. The
root mean squared error criterion was used as a measure of forecast accuracy. Prior to
obtaining the one-quarter-ahead forecasts for the period 1998-2000, the individual
arrival series were tested for unit roots to distinguish between stationary and non-
stationary time series arrivals. The Holt-Winters Additive and Multiplicative
Seasonal models outperform the Single, Double, and the Holt-Winters Non-Seasonal

~16~
Review of Literature

Exponential Smoothing models in forecasting. It was also found that forecasting the
first differences of tourist arrivals performs worse than forecasting its various levels.

Kumar et al. (2005) analyzed on price forecasting of different classes of teak


by the application of exponential smoothing model. A single-parameter exponential
smoothing model was used to forecast prices of different classes of teak in the
Dandeli timber depot of Karnataka, India. Prices data for the period May 1987-May
2001 were used, and both ex-post and ex-ante forecasts were made. The results of the
ex-post forecast revealed that the predicted prices were close to the actual prices.

Gajendra and Bhogal (2006) analyzed the food security situation in the state
of Gujarat, India. The study considered the exponential smoothing and moving
averages for making projections of food grains. The estimates of food grain
production and requirement indicated that the overall cereals and pulses requirement
would continue to be in deficit in both periods.

Huertas and Rodriguez (2007) attempted to study forecasting international


tourist demand using Holt-Winters. This examined forecasts of international tourism
arrivals to Spain. A survey was conducted on residents from 10 other major origin
countries with respect to their future visits to Spain. The Holt-Winters exponential
smoothing model was used to forecast the residents' demand for tourism in Spain by
2007-08.

Hyndman et al. (2008)discussed the admissible parameter space for some


state space models, including the models that underly exponential smoothing
methods. They found that the usual parameter restrictions (requiring all smoothing
parameters to lie between 0 and 1) do not always lead to stable models, also all
seasonal exponential smoothing methods are unstable as the underlying state space
models are neither reachable nor observable. This instability did not affect the
forecasts, but did corrupt the state estimates. The problem could be overcome with a
simple normalizing procedure. Therefore, they showed that the admissible parameter
space of a seasonal exponential smoothing model was much larger than that for a

~17~
Review of Literature

basic structural model, leading to better forecasts from the exponential smoothing
model when there was a rapidly changing seasonal pattern.

Andrawis and Atiya (2009) proposed a Bayesian forecasting approach for


Holt's additive exponential smoothing method. Starting from the state space
formulation, a formula for the forecast was derived and reduced to a two-dimensional
integration that can be computed numerically in a straightforward way. In contrast to
much of the work for exponential smoothing, this method produced the forecast
density and, in addition, it considered the initial level and initial trend as part of the
parameters to be evaluated. They also derived a way to reduce the computation of the
maximum likelihood parameter estimation procedure to that of evaluating a two-
dimensional grid, rather than applying a five-variable optimization procedure.
Simulation experiments confirmed that both proposed methods give favorable
performance compared to other approaches.

Taylor (2011) evaluated a recently proposed seasonal exponential smoothing


method that was previously been considered only for forecasting daily supermarket
sales. He termed this method ‘total and split’ exponential smoothing, and apply it to
monthly sales data from a publishing company. The resulting forecasts were
compared against a variety of methods, including several available in the software
currently used by the company. Our results showed total and split exponential
smoothing outperforming the other methods considered. The results were also
impressive for a method that trims outliers and then applies simple exponential
smoothing.

Eva and Oskar (2012) described a relatively simple and versatile technique
for forecasting time series data i.e. simple exponential smoothing. The procedure
gave heaviest weight to more recent observations and smaller weight to observations
in the more distant past. The accuracy of the SES method strongly depended on the
optimal value of the smoothing constant a. To determine the optimal a value in the
paper was used a traditional optimization method based on the lowest mean absolute
error (MAE), mean absolute percentage error (MAPE) and root mean square error
(RMSE).

~18~
Review of Literature

Bermaodez (2013) analysed an extension of the exponential smoothing


formulation that allowed the use of covariates and the joint estimation of all the
unknowns in the model, which improves the forecasting results. The whole procedure
was detailed with a real example on forecasting the daily demand for electricity in
Spain. The time series of daily electricity demand contained two seasonal patterns:
here the within-week seasonal cycle was modelled as usual in exponential smoothing,
while the within-year cycle was modelled using covariates, specifically two harmonic
explanatory variables. Calendar effects, such as national and local holidays and
vacation periods, were also introduced using covariates.

Xiaochen (2013) made a comparison between prediction accuracy of Brown


exponential smoothing method and Holt exponential smoothing method for
forecasting transportation demand and found that Brown exponential smoothing
method had large deviation and poor accuracy of the predicted value than that of Holt
exponential smoothing method.

2.3 Studies related to Artificial Neural Network (ANN) Model

McCulloch and Pitts (1943) developed the first computing machines that
intend to simulate the structure of the biological nervous system and could perform
logic functions, which were used to transmit information from one neuron to another.
This eventually led to the development of binary probit model. According to this
model, the neural unit could either switch on or off depending on whether the
function was activated or not.

Werbos (1974) published the back propagation learning method. The back
propagation technique enabled the determination of parameter values for which the
error was minimized. Prior to the introduction of back-propagation method, it was
difficult to determine multiple parameter values.

Kohonen (1982) introduced the Self-Organising Map (SOM). SOM used an


unsupervised learning algorithm for applications in specifically data mining, image
processing and visualization. The same year Hopfield built a bridge between neural

~19~
Review of Literature

computing and physics. Two years later the Boltzmann machine was invented. The
neural network utilized a stochastic learning algorithm based on properties of the
Boltzmann distribution.

Iebelingand Milton (1996) provided a practical introductory guide in the


design of a neural network for forecasting economic time series data. An eight-step
procedure to design a neural network forecasting model was explained including a
discussion of tradeoffs in parameter selection, some common pitfalls, and points of
disagreement among practitioners.

Anders (1999) discussed the application of statistical procedures in selection


of neural network models. The application of these methods in neural network models
was discussed, paying attention especially to the identification problems encountered.
They proposed five specification strategies based on different statistical procedures
and compare them in a simulation study. It was suggested by them that a statistical
analysis should become an integral part of neural network modelling.

Qi (2001) examined the relevance of various financial and economic


indicators in predicting US recessions via neural network models. He employed a
novel neural network (NN) to model the relationship between the leading indicators
and the probability of a future recession. The out-of-sample results showed that the
NN model were useful in predicting US recessions.

Rivalsand Personnaz (2003) proposed a novel and systematic construction


and selection procedure for neural network modeling and illustrated its efficiency
through large-scale simulations experiments and real-world modeling problems.

Rossi and Conanguez (2005) studied a natural extension of multi-layer


perceptrons (MLP) to functional inputs and showed that fundamental results for
classical MLP can be extended to functional MLP. They obtained universal
approximation results that showed the expressive power of functional MLP is
comparable to that of numerical MLP and also estimation of optimal parameters for

~20~
Review of Literature

functional MLP was statistically well defined. They finally showed on simulated and
real world data that the proposed model performed in a very satisfactory way.

Kumar and Walia (2006) carried out work on application of Artificial Neural
Networks in finance for cash forecasting. They presented two neural network models
for cash forecasting of a bank branch. One was daily model – taking the parameter
values for a day as input to forecast cash requirement for the next day and the other
was weekly model, which takes the withdrawal affecting input patterns of a week to
predict cash requirement for the next week. The system performed better than other
cash forecasting systems.

Manfred (2006) viewed network learning as an optimization problem of


neural network training algorithm, reviewed two alternative approaches of network
learning, and provided insights into current best practice to optimize complexity so to
perform well on generalization tasks of training algorithm.

Zhang (2007) aimed at certain points that were to: 1) point out common
pitfalls and misuses in the neural network research; 2) draw attention to relevant
literature on important issues; and 3) suggest possible remedies and guidelines for
practical applications. The main message was great care must be taken in using ANNs
for research and data analysis.

Halbert (2008) reviewed concepts and analytical results from the literatures
of mathematical statistics, econometrics, systems identification, and optimization
theory relevant to the analysis of learning in artificial neural networks. He focused
primarily on learning procedures for feed forward networks and suggested some
potentially useful new training methods for artificial neural networks.

Christos and Anastasios (2009) developed a formula-based method for the


prediction of students’ mood, and it was tested using data emanated from experiments
made with 153 high school students from three different regions of a European
country. The same set of data was analyzed developing a neural network method.
Furthermore, the formula-based method was used as an input parameter selection

~21~
Review of Literature

module for the neural network method. The results indicated that neural networks and
conventional algorithmic methods should not be in competition but complement each
other for the development of affect recognition systems.

Sang (2009) proposed to increase the number of output nodes per each class
for performance improvement of MLPs. Also, simulations of 50 isolated-word
recognition showed the effectiveness of proposed method.

Sreekanth et al. (2009) studied the performance of the artificial neural


network (ANN) model, i.e. standard feed-forward neural network trained with
Levenberg–Marquardt algorithm. This technique was examined for forecasting
groundwater level at Maheshwaram watershed, Hyderabad, India. The model
efficiency and accuracy were measured based on the root mean square error (RMSE).
The model provided the best fit and the predicted trend followed the observed data
closely. Thus, for precise and accurate groundwater level forecasting, ANN appeared
to be a promising tool.

Hippert and Taylor (2010) said that Neural networks (NNs) had frequently
been proposed for short-term load forecasting (STLF), because of their capabilities
for nonlinear modeling of large multivariate datasets. The family of NN models
known as multilayer perceptrons (MLPs) were probably the most frequently used,
since they had been shown to be universal approximators of functions and could be
used to model the function that relates the electric load to its exogenous variables.

Sang (2010) discussed the design of Multilayer Perceptrons (MLP) especially


for pattern classification problems. This discussion included how to decide the
number of nodes in each layer, how to initialize the weights of MLPs, how to train
MLPs among various error functions, the imbalanced data problems, and deep
architecture.

Karlaftis and Vlahogianni (2011) discussed differences and similarities


between these two approaches; Statistical methods and neural networks. They

~22~
Review of Literature

reviewed relevant literature and attempt to provide a set of insights for selecting the
appropriate approach.

Hossein (2012) studied the abilities of two prediction models in psychological


researches; Logistic Regression (LR) versus Artificial Neural Networks (ANNs).
Four hundred fifty six students were chosen randomly from one of the educational
areas in Tehran (Iran). Eighteen psychological traits and five levels of adjustment
were considered as predictor and predicted variables, respectively. According to the
first assessment, the ANNs were more successful than LR. By reduction of the
adjustment levels from five to three, this superiority of ANNs was changed in the
favour of LR. So there were two definitions for the power of prediction: one refers to
the correctness and the other to the accuracy of prediction.

Sang (2012) proposed a new error function, in order to improve the error
back-propagation algorithm for the classification of imbalanced data sets. This
method was compared with the two-phase, threshold-moving, and target node
methods through simulations in a mammography data set and the proposed method
attained the best results.

Rozman (2012) developed a hybrid model based on image analysis and


neural network. From the end of fruit thinning in June till harvesting digital images of
120 trees of yellow-skin ‘Golden Delicious’ (four times) and 120 trees of red-skin
‘Braeburn’ (five times) were captured from intensive orchards. Firstly, each image
was processed by image analysis algorithm to receive the data on number of fruits
and a yield forecast, for each sampling period separately, which served as the input
information for modeling the yield with the artificial neural network (ANN). The
forecast of the hybrid method showed a higher accuracy than the image analysis for
both varieties, since the new procedure managed to increase the correlation between
the forecasted and weighted yield from 0.73 to 0.83 for ‘Golden Delicious’ and from
0.51 to 0.78 for ‘Braeburn’. The standard deviation/image was decreased from 4.79 to
2.83 kg for ‘Golden Delicious’ and from 3.64 to 2.55 kg for ‘Braeburn’.

~23~
Review of Literature

Sharma et al. (2012) elaborated Artificial Neural Network or ANN, its


various characteristics and business applications. They also showed that “what are
neural networks” and “Why they are so important in today’s Artificial intelligence?”
Because numerous advances have been made in developing Intelligent system, some
inspired by biological neural networks. They mentioned the exciting features and
other applications of ANN which can play important role in today’s computer science
field. There were some Limitations also which were mentioned in this article.

Kumari et al. (2013) developed a model to forecast the productivity and pod
damage by Helicoverpa armigera using artificial neural network model in pigeonpea
(Cajanus Cajan). Sigmoid and linear functions were used as activation function
hidden and output nodes respectively. By using Random Data Division Process, the
data were divided into three mutually exclusive sets (Training, Validation and
Testing) having 70% data in training, 15 % in validation and 15% in testing sets. The
performance of the proposed network when trained with Levenberg-Marquardt back
propagation algorithm was assessed by its Root Mean Squared Error (RMSE) values
along with multiple correlation coefficients (R) between observed and predicted
outputs.

Rudra (2013) presented an application of Artificial Neural Network (ANN)


to forecast inflation in India during the period 1994-2009. The study presented four
different ANN models on the basis of inflation (WPI), economic growth (IIP), and
money supply (MS). The first model was a univariate model based on past WPI only.
The other three were multivariate models based on WPI and IIP, WPI and MS, WPI,
and IIP and MS. In each case, the forecasting performance was measured by mean
squared errors and mean absolute deviations. The paper finally concluded that
multivariate models were better forecasting performance over the univariate model.
In particular, the multivariate ANN model using WPI, IIP, and MS resulted in better
performance than the rest of other models to forecast inflation in India.

Kumari et al. (2014) presented time series forecasting of losses due to pod
borer, pod fly and productivity of pigeonpea (Cajanus cajan) for North West Plain
Zone (NWPZ) by using artificial neural network (ANN). The performance of the

~24~
Review of Literature

proposed network when trained with Levenberg-Marquardt back propagation


algorithm was assessed by its Root Mean Squared Error (RMSE) values along with
multiple correlation coefficients (R) between observed and predicted outputs.

2.4 Studies related to Regression Techniques

Fisher (1924) was the first to tackle the pre-harvest forecasting problem and
assumed that the effect of change in weather variables in successive weeks would not
be an abrupt or erratic change but an orderly one that follows some mathematical law.
Using the polynomials of the fifth degree on rainfall distribution and obtained the
rainfall constants. A multiple regression equation was developed using crop yield as
dependent variable and rainfall distribution constants as independent variables. It was
found that wheat crop yield was significantly affected by rainfall.

Davis and Harrell (1942) fitted third degree polynomials to study effect of
rainfall and average maximum temperature on corn yield at various locations from the
Great Plains to the Atlantic coast. It was found that a systematic change occurs in the
pattern of weather yield relationships from one end of the region to the other.

Sanderson (1943) applied Fisher’s technique in conjunction with other


variables for predicting the yield of a perennial grape crop. It was found that above
average temperature was beneficial to grape yields in Canton of Geneva throughout
the growing season.

Kokate et al. (2000) developed a statistical model for forecasting the yield of
rice in Ratnagiri district of Maharashtra. They applied correlation analysis to know
the association of plant characters and climatological parameters with the yield. They
applied step down regression analysis technique to know dominant variables in the
study. It was found that integrating plant characteristics and climatological factors
together in the model provides better estimates for forecasting rice yield.

Agrawal et al. (2001) developed forecasting model for wheat in


Vindhyanchal Plateau zone of Madhya Pradesh. For developing forecasting models
for rice the two zones viz. Chattisgarh Plain and Baster Plateau were grouped

~25~
Review of Literature

together. Time series data on weather variables and agricultural inputs were used and
Multiple Linear Regression Analysis was applied for developing models. It was
reported that reliable forecasting yield could be obtained when both the crops were 12
weeks old i.e. about 2 month before harvest.

Sarmah and Handique (2001) developed Linear Regression Models for the
pre-harvest forecasting of winter rice yield in Jorhat district of Assam, India. It was
found that 69.10 per cent of the total variability in rice yield is due to weather
variables. The deviation of forecasted yield from that of observed yield was ranging
from 0.06 to 21.36 per cent. The best time for rice yield forecasting was observed in
the 43rd standard meteorological week.

Kandiannan et al. (2002) developed crop-weather model for prediction of


turmeric yield in Coimbatore district of Tamil Nadu. Significance of correlation
coefficient between the monthly climatic variables and turmeric yield was tested. The
developed Multiple Regression Model gave a reliable forecast of the dry turmeric
yield with a coefficient of determination (R2) value as 89 per cent.

Kandiannan et al. (2002) developed crop-weather model for prediction of


rice yield in Coimbatore, Tamil Nadu. It was found that model without solar
radiation, recorded less R2 (0.63) as compared to model with inclusion of solar
radiation, recorded high R2 (0.95). Stepwise Regression Analysis was used taking
seven weather variables and four weather variables were retain in final model with an
R2 value of 92 per cent.

Sharma et al. (2004) developed agrometeorological models based on weather


parameters for forecasting wheat yield in 6 major districts of Himachal Pradesh. It
was observed that rainfall significantly affected the wheat yield which was decreased
by 6.8-32.3 per cent compared to previous years, mainly due to moisture stress
caused by delayed and insufficient rainfall.

Nain et al. (2004) developed methodology for large area yield forecast using a
crop simulation model and a discrete technology trend, and was applied to the

~26~
Review of Literature

coherent wheat yield variability zones of eastern Uttar Pradesh. The regression
coefficients were generated using 10 years’ data (1984/85–1994/95) and the
reliability of the approach were tested on a data set of 5 years’ independent data
(1995-96 to 1999-2000). The results showed that this approach could capture year to
year variability in large area wheat yield with reasonable accuracy. The Root Mean
Square Error (RMSE) between observed and predicted yield was reported as 0.098
t/ha for the mean yield of 2.072 t/ha (4.72%). The pre-harvest forecasts were made
using in-season weather data up to the end of February and climatic-normal for the
rest of the wheat-growing season, which showed good agreement with observed
wheat yields.

Kumar and Bhar (2005) used Multiple Linear Regression Model for
forecasting yield of Indian mustard (Brassica juncea L.) at Hisar district of Haryana.
They developed models for each growing phase of Mustard. It was reported that the
reliable earliest forecasting could be achieved 6 weeks after sowing and the latest
forecasting could be done 4-5 weeks before harvesting.

Lobell (2006) developed weather-based forecasting models for state wise


yields for twelve major crops in California. It was found that the most successfully
modeled crop was almonds, with 81% of yield variance captured by the forecast. It
was found that predictions of the most crops relied on weather measurements well
before harvest time.

Bhattacharya A. (2006) developed a pre-harvest forecast of sugarcane yield.


The forecast was based on plant biometrical characteristics such as plant height, girth
of cane, number of canes per plot and width of third leaf from the top. It was
proposed an alternative approach i.e. goal programming approach. He assessed the
quality of forecasts, variance of residuals obtained from the proposed method has
been compared with that obtained from the conventional regression analysis. The
study revealed that there was no significant difference (P value = 0.43461) in the
variances of the two residual series. Thus, without compromising the quality of
forecast, the proposed alternative methodology could be adopted to estimate the

~27~
Review of Literature

sugarcane yield 3 months before harvest in situations, where the assumptions of


conventional regression analysis were violated.

Agrawal and Mehta (2007) developed several weather based forecasting


models for crop yield of rice, wheat, sorghum, maize and sugarcane at selected
districts/agro climatic zones/states of India using regression analysis, discriminant
function analysis and water balance technique. It was reported that reliable forecast of
crop yield could be provided before harvest. They also developed models for
forewarning of important pests/diseases in rice, mustard, pigeon pea, sugarcane,
groundnut, mango, potato and cotton using regression analysis and Artificial Neural
Network technique. It was found that reliable forewarnings of important
pests/diseases could be achieved at least one week in advance.

Mallick et al. (2007) developed regression model (linear, exponential and


power regression) to forecast rice yield in Punjab, India. It was found that for the
modified models, namely linear, exponential and power the value of multiple
correlation coefficients 0.86, 0.89 and 0.92 respectively, which showed that power
regression model predicted yield more accurately compared to linear and exponential
models.

Singh and Singh (2007) developed Multivariate Regression Model for wheat
yield using soil characteristics and management practices in Udaipur district of
Rajasthan. It was reported that available water capacity strongly affected the crop
yield. It was found that in shallow sandy-loam soils six irrigations were required to
produce the potential yield, while in deep clay-loam soils five irrigations and in deep
clay soils only four irrigations were sufficient to produce the potential yield.

Chattopadhyay et al. (2008) studied the effect of meteorological parameters


on yield of different varieties of cotton (AHH-468, MCU-9 and MCU-10) in Akola
district of Maharashtra. It was observed that minimum temperature at vegetative and
flowering stages was favorable and decrease in maximum temperature at flowering
and boll development stages was conducive for the yield of variety of AHH-468. It
was also found that relative humidity was positively correlated with the yield of

~28~
Review of Literature

varieties of AHH-468 and MCU-10. The rainfall at the beginning of the season was
favorable for the yield of the crop.

Yadav and Patil (2008) studied the influences of agroclimatic indices on fruit
yield of cucumber during Kharif in Dapoli, district Ratnagiri, Maharashtra. They
obtained the relationship between fruit yield of cucumber and agro climatic indices. It
was found that early sowing of cucumber i.e. immediately after onset of monsoon
produced significantly highest fruit yield over the late sowing.

Singh et al. (2008) developed Multivariate Regression Model for maize yield
based on edaphic characters in Udaipur district of Rajasthan. It was observed that
depth of soil strongly affected the crop yield. It was also found that under variability
in rainfall and availability of irrigations the yield could be predicted well in advance
in low, medium and high management conditions.

Lobel and Burke (2010) used a perfect model approach to examine the
ability of statistical models to predict yield responses to changes in mean temperature
and precipitation, as simulated by a process based crop model. The CERES-Maize
model was first used to simulate historical maize yield variability at nearly 200 sites
in Sub-Saharan Africa, as well as the impacts of hypothetical future scenarios of 2 0 C
warming and 20% precipitation reduction. Statistical models of three types (time
series, panel, and cross-sectional models) were then trained on the simulated
historical variability and used to predict the responses to the future climate changes.
Results suggested that statistical models, as compared to CERES-Maize, represented
a useful tool for projecting future yield responses, with their usefulness higher at
broader spatial scales.

Garde et al. (2012) explained techniques for development of weather indices


which were used as explanatory variables (predictors) in the multiple regression
model. The technique was further modified by incorporating technical and statistical
indicators along with developed predictors. The study proposed that modified model
incorporating technical and statistical indicators effectively used for early pre-harvest
forecasting of crop yield particularly up to two and half month before harvest.

~29~
Review of Literature

Garde et al. (2012) derived Multiple Linear Regression (MLR) equations for
estimating wheat productivity for the district of Ghazipur in eastern Uttar Pradesh.
Weather indices were computed using varied weather parameters for the year 1982-
83 to 2005-06. The cross-validation of the developed forecast models were tested
their accuracy using the year 2006-07.Based on a Forecast error percentage it was
found that the forecasting model produced the most accurate forecast for 15th weekof
the crop growing season. The relationship between actual and forecast wheat yield
was highly significant being R2 varied from 0.72 to 0.89 for the different weeks.

2.5 Studies related to Comparison and Selection of Best Forecasting Model

Boken (2000) studied the time series analysis (linear trend, quadratic trend,
simple exponential smoothing, double exponential smoothing, simple moving
averaging, and double moving averaging) for forecasting wheat yield in
Saskatchewan, Canada. It was found that developed model produced more accurate
forecast using deterministic measure (i.e. Mean Squared Error, MSE).

Chatfield et al. (1973) reviewed and compared a variety of potential models


forExponential Smoothing as well as autoregressive integrated moving average and
structural models.

Mathur et al. (2001) developed stock market forecasting models using Neural
Network & Multiple Regression Analysis. The models were based on a company’s
stock price movement data for a complete calendar year and the data were pre-
processed to identify the hidden patterns. It was found that Neural Network model
could provide better forecasts as compared to Multiple Regression Analysis.

Tkacz (2001) employed neural network models to improve the accuracy of


financial and monetary forecasts of Canadian output growth. He found that neural
networks yield statistically lower forecast errors for the year-over-year growth rate of
real GDP relative to linear and univariate models. However, such forecast
improvements were less notable when forecasting quarterly real GDP growth.

~30~
Review of Literature

Ho et al. (2002) investigated suitable time series models for repairable system
failure analysis. A comparative study of the Box-Jenkins autoregressive integrated
moving average (ARIMA) model and the artificial neural network model in
predicting failures were carried out. The neural network architectures were the
multilayer feed-forward network and the recurrent neural network. Simulation results
on a set of compressor failures showed that in modeling the stochastic nature of
reliability data, both the ARIMA and the recurrent neural network models outperform
the feed-forward model; in terms of lower predictive errors and higher percentage of
correct reversal detection. However, both models performed better with short term
forecasting.

Heravi et al. (2004) considered 24 series measuring the annual change in


monthly seasonally unadjusted industrial production for important sectors of the
German, French and UK economies. According to root mean-square error (RMSE),
linear models generally produced more accurate post-sample forecasts than neural
network models at horizons of up to a year. This applied overall and also to the sub-
group of series with substantial sample period evidence of nonlinearity. In contrast,
the neural network models dominated linear ones in predicting the direction of
change.

Huang et al. (2004) observed forecast flows in Apalachicola River using


neural networks. In the present study, an artificial neural network (ANN) model was
successfully developed to forecast river flow in Apalachicola River. The model used
a feed-forward, backpropagation network structure with an optimized conjugated
training algorithm. Using longterm observations of rainfall and river flow during
1939-2000. The ANN model was satisfactorily trained and verified. Model
predictions of river flow match well with the observations. The correlation
coefficients between forecasting and observation for daily, monthly, quarterly and
yearly flow forecasting were 0.98, 0.95, 0.91 and 0.83, respectively. Results of the
forecasted flow rates from the ANN model were compared with those from a
traditional autoregressive integrated moving average (ARIMA) forecasting model.

~31~
Review of Literature

Results indicated ANN model provides better accuracy in forecasting river flow than
does the ARIMA model.

Mani et al. (2005) developed flood prediction models by using soft


computing technique. An efficient flood forecasting model was developed using auto-
regressive integrated moving average (ARIMA) and artificial neural networks
(ANNs). The application of this model was illustrated using a case study of the River
Godavari and its tributaries (India). The study areas were selected based on the
availability of historical daily stream flow data over the period of 29 years from 1972
to 2000. Performance of the ANN model output was compared in terms of the model
efficiency of correlation coefficient (r), absolute average relative error (AARE), root-
mean-square error (RMSE) and Nash-Sutcliffe coefficient of efficiency (CoE),
between back propagation (BP) networks and autoregressive (AR) models. The
obtained results revealed that ANN models provide a better alternative to the
hydrological modelling of flood.

Timm et al. (2006) comparedrecurrent neural network model, standard state-


space model and standard regression models and evaluated relation between a time
consuming and expensive variable(like soil total nitrogen) and other simpler, easier to
measure variables (as for instance, soil organic carbon,pH, etc.). It is found that
recurrent neural network model and standard state-space model had a better
predictive performance of soil total nitrogen as compared to the standard regression
models. Among the standard regression models the Vector Auto-Regression model
had a better predictive performance for soil total nitrogen.

Co and Boosarawongse (2007) compared the performance of artificial neural


networks (ANNs) with exponential smoothing and ARIMA models in forecasting rice
exports from Thailand. The results revealed that while the Holt–Winters and the Box–
Jenkins models showed satisfactory goodness of fit, the models did not perform well
in predicting unseen data during validation. On the other hand, the ANNs performed
relatively well as they were able to track the dynamic non-linear trend and
seasonality, and the interactions between them.

~32~
Review of Literature

Pal et al. (2007) made an attempt to forecast milk production using statistical
time series modeling techniques such as double exponential smoothing and Auto-
Regressive Moving Average (ARIMA) for the study period of twenty five years
(1980-81 to 2004-05). On validation of the forecast from these models, ARIMA
model performed better than the other one.

Liang (2009) presented a hybrid forecasting method that combined the


Seasonal ARIMA (SARIMA) model and neural networks with genetic algorithms.
Analytical results generated by the SARIMA model were inputted as the input data of
a neural network. Subsequently, the number of neurons in the hidden layer and the
number of learning parameters of the neural network architecture were globally
optimized using genetic algorithms. This model was subsequently adopted to forecast
seasonal time series data of the production value of the mechanical industry in
Taiwan. The results obtained provided a valuable reference for decision makers in
industry.

Shabri et al. (2009) investigated a hybrid methodology that combined the


individual forecasts based on artificial neural network (CANN) approach for
modeling rice yields. To assess the effectiveness of these models, they used 38 years
of time series records for rice yield data in Malaysia from 1971 to 2008. The
prediction by CANN gave better result as compared to conventional Artificial Neural
Network (ANN) model, the autoregressive integrated moving average (ARIMA) and
exponential smoothing (EXPS) models.

Terzi and Onal (2012) used artificial neural networks (ANN) and developed
multiple linear regression (MLR) using the same input parameters to forecast monthly
flow for Kizilirmak River in Turkey. It was found that ANN models shows better
performance when compared to MLR models.

Mishra and Singh (2013) forecasted the price of groundnut oil in Delhi
(India) by using ANN and ARIMA methodologies. They compared forecasting
capabilities of these models with the help of Root Mean Square Error (RMSE), Mean
Square Error (MSE) and Mean Absolute Percentage Error (MAPE).

~33~
Review of Literature

Kumari et al. (2014) developed various exponential smoothing models to


forecast the productivity of rice crop. The forecasted values of these developed
models were assessed with the help of different selection measures value and the
model having least value of all these measures was considered as the best explained
model. On the basis of obtained result, it was found that out of all forecasting model,
Holt Two Parameter Linear Model was best fitted model in predicting productivity
rice efficiently.

2.6 Studies related to Spatial Interpolation Methods

Goovaerts (1998) reviewed the main applications of geostatistics to the


description and modeling of the spatial variability of microbiological and physico-
chemical soil properties. Basic geostatistical tools such as the correlogram and
semivariogram was introduced to characterize the spatial variability of each attribute
separately as well as their spatial interactions by him.

Marinoni (2003) showed how smoothing effects around zero value zones can
be reduced significantly by the use of a combined ordinary-indicator Kriging
approach in this paper. The focus was mainly put on the statistics of the mechanical
parameters of soil and rock masses that go directly into the safety calculations.

Kumar and Remadevi (2006) applied the spatial techniques, Kriging for the
spatial analysis of groundwater levels. With the use of measured elevations of the
water table, experimental semivariograms were constructed that characterises the
spatial variability of the measured groundwater levels.

Reza et al. (2010) interpreted and analysed the spatial variability of soil
properties, carried out by ordinary kriging and inverse distance weighting (IDW)
methods to generate continuous sample for site-specific management. A total of 535
soil samples (0-25 cm) were collected at an interval of 2 km grid in Dhalai district of
Tripma, India. The data were interpolated by ordinary kriging and IDW with power 2.
All the selected soil chemical parameters were strongly spatially dependent, but the
range of spatial dependence was found to vary within soil parameters. The study

~34~
Review of Literature

showed that prediction of spatial variability for different soil parameters (except
available nitrogen) may be better understood by ordinary Kriging than by IDW
method.

Jafari et al. (2011) assessed the spatial variation of chemical and physical soil
properties and then used this information to select an appropriate area to install a
pasture rehabilitation experiment in the Zereshk in region, Iran. They performed
conventional statistical methods and geostatistics in order to analyse soil properties
spatial dependence. Soil properties such as pH, P, SAR and Na were best fitted by
spherical semivariogram models. Kriging was performed in order to analyze spatial
variation of chemical and physical soil properties then for enhancing estimation
accuracy and comparing results co-Kriging technique was also used. They compared
the results using statistical techniques showed that Kriging technique has acceptable
accuracy in characterizing the spatial variability.

Yahya et al. (2013) presented an overview of interpolation techniques with a


precise emphasis on the characteristics of significance to its implementations for
determination of hazard in the mining manufacturing, but, some sites has
implementations that different from others, might be the pollutants elements are
concentrated in these sites.

Almasi (2014) evaluated some interpolation techniques for mapping spatial


distribution of A horizon depth and OM in Shahrekord, Iran. 15000 hectares of South
West Shahrekord soils were studied in which totally 92 soil profiles were excavated
and classified according to USDA. The performance of methods was evaluated by
RMSE, ME and R2. Calculated RMSE for depth of A horizon were 0.01074, 0.19670
and 0.19858, respectively by IDW and OK (with Spherical and Exponential models).
The RMSE for surface horizon OM were obtained 0.05593, 0.12121 and 0.05078,
respectively by IDW and OK (with Spherical and Exponential models). The results
showed that IDW could estimate the variability of A horizon depth and Ok (with
Exponential semivariogram) could estimate the variability of depth of A horizon
more better than other methods.

~35~
Review of Literature

Shukla et al. (2015) estimated the content of deficient micronutrients namely


Zn, B and Fe in the soil of Kashi Vidyapeeth block of Varanasi District of Uttar
Pradesh, (India) at different locations by using test results of sampled soils. The
Kriging interpolation method (Krige, 1951) was used for preparing the maps to show
spatial distribution of deficient micronutrients. The method can be used for
recommending judicious applications of micronutrients for sustainable soil
management.

2.6 Studies related to Soil Micronutrients

Kumar and Babel (2010) made a study to evaluate available micronutrient


(Fe, Cu, Zn, Mn and B) status and their relationship with soil properties. To study
this, there were seventy surface soil (0-30 cm depth) and plant samples, each
collected from wheat growing fields of Jhunjhunu tehsil. They analyzed soil for
physico-chemical properties and status of available micronutrients. It was found that
the availability of micronutrients indicating positive and significantly correlated with
silt, clay, organic carbon and CEC of soils, whereas, negative and significantly
correlated with sand, calcium carbonate and pH of the soils and the availability of
micronutrients in wheat grains and straw positively correlated with silt, clay, organic
carbon and CEC and negatively correlated with sand, CaCO3 and pH of soils.

Vijaykumar et al. (2011) studied the relationship between soil properties and
macro and micro-nutrients in the soil. Their study indicated that the soil properties
pH, EC, OC and OM are the main characteristics playing major role in controlling the
availability of micronutrients.

Singh et al. (2015) collected two thousand three hundred thirty four surface
soil samples from Chandauli, Mirzapur, Sant Ravidas Nagar and Varanasi districts of
eastern Uttar Pradesh during 2011-12 under GPS and GIS based soil fertility mapping
project. Analysis of these soil samples revealed the occurrence of acidic soils (pH
<5.5) in Chandauli and Mirzapur districts. A wide variation in pH (4.5-10.4)
indicated acidic to alkali nature of the soils in this region. These soils have electrical
conductivity (EC) less than 1 dS m-1 and organic carbon (OC) content ranged from

~36~
Review of Literature

0.6 to 13.3 g kg-1. High OC content was noticed in some Vindhyan low land soils of
Chandauli and Mirzapur district. Available sulphur (S) content ranged from 0.43 to
165 mg kg-1 with a mean value of 16.10, 9.63, 13.05 and 12.36 mg kg-1 in
Chandauli, Mirzapur, Sant Ravidas Nagar and Varanasi districts, respectively. The
corresponding deficiency of Sin soils of these districts was 39, 63, 45 and 56 per cent.
Nutrient index (NI) indicated S fertility level of low to medium. Soils were also found
to be highly deficient in available boron (B), with mean contents of 0.55, 0.49, 0.66
and 0.62 mg kg-1 in Chandauli, Mirzapur, Sant Ravidas Nagar and Varanasi districts,
respectively showing deficiency in 55, 61, 30 and 37 per cent soil samples. High
magnitude of B deficiency was noticed in soils of Vikas Khand Nawgarh (94%)
followed by Rajgarh (85%) and Marihan (80%), representing the area of low pH
Vindhyan soils.

Oliver and Gregory (2015) discussed thedirect and indirect effects of soil or
its constituents on human health are through its ingestion, inhalation or absorption.
Theyfocussed on four trace elements (iodine, iron, selenium and zinc) whose
deficiencies have substantial effects on human health. They reviewed as the world's
population increases issues of food security become more pressing, as does the need
to sustain soil fertility and minimize its degradation. Lack of adequate food and food
of poor nutritional quality lead to differing degrees of under-nutrition, which in turn
causes ill health. Soil and land are finite resources and agricultural land is under
severe competition from other uses. Relationships between soil and health are often
difficult to extricate because of the many confounding factors present. Nevertheless,
recent scientific understanding of soil processes and factors that affect human health
are enabling greater insight into the effects of soil on our health. Multidisciplinary
research that includes soil science, agronomy, agricultural sustainability, toxicology,
epidemiology and the medical sciences will facilitate the discovery of new antibiotics,
a greater understanding of how materials added to soil used for food production affect
health and deciphering of the complex relationships between soil and human health.

~37~

You might also like