You are on page 1of 10

Chapter 1: Introduction

“What is the Reservoir?” A reservoir is a large natural or artificial lake, storage pond, or
impoundment created using a dam to store water. Reservoirs are essential for households and
have a significant contribution to the economy. They are the primary means for the
Generation of electricity, navigation, supply of water, and controlling the flood. There is 91
major Reservoir including all large capacity artificial lakes in India. Over half of the major
reservoirs are constructed on flooding rivers to control or affect river flow. Therefore, River
water reservoir management is a crucial problem. Among all the task in the management of
reservoir prediction of Water level in the reservoir is important in many aspects like
Evaluating structural problem in dams (Reference 1), Maintain water supply and availability
of resources (Reference 2), and Safeguard against floods.

The task of forecasting water levels in reservoirs can be approached by considering a variety
of dependent variables and forecasting techniques. Even though several algorithms and
hydrological models have been proposed so far, but accurate forecasting remains robust. The
water level in the reservoir is predicted using two major techniques, First using the
hydrological model for a reservoir. Another way is by pattern recognition utilizing data
mining and machine learning techniques.

In recent years, there has been a major increase in the use of Machine Learning (ML)
algorithms in this task, including several ranges of learning algorithms such as Artificial
feedforward neural networks, decision tree methods, SVM, and Time-Series Analysis. ANN
and ARIMA are the most widely used machine learning techniques to forecast time series
variables.

A statistical approach such as Autoregressive (AR), Moving average (MA), Autoregressive


moving average (ARMA), Autoregressive integrated moving average (ARIMA) multiple
regression can be used to forecast the water level in a reservoir. When the past values have a
significant correlation with the current value then the Autoregressive model is the best
choice. AR model regress against the past correlated value in a time series. Past errors are
used as an explanatory variable in the MA model. AR and MA together can be used to fit
time series variable in that case model are known as ARMA model. All of this model works
when our data is stationary if it is not then we differentiate our variable to make it stationary.
When we fit an ARMA model after making our data stationary by integrating it, the model is
known as the ARIMA model. Further sections go through the details of this technique.
In this study, we present short-term analysis and prediction of water level in a reservoir. We
evaluated the use of statistical techniques (Including the ARIMA model), together with an
ANN model for analysis of water levels at Indravati Reservoir (Godavari, Odisha, India).
Daily water level data in a reservoir and daily climate data from the year 2000 to 2020 are
used to fit models. All algorithms are implemented using python and its libraries.

The main objective of this paper is as follow:

• Visualizing the data as a Time series plot.


• Analysing the data to see whether there is some trend or seasonality.
• Finding out the hyperparameters for the model.
• Evaluating model and predicting future values.

The rest of the project report is arranged as follows; Section 2 discussed the methodology
adopted and provides information on the observational dataset used for the analysis. In this
section, we select the hyperparameter for our model and predict the water level at the
objective reservoir. Section 3 presents the article's analytical section, which evaluates the
output of our model and compares them. Section 4 concludes the paper with a conclusion and
remarks.
Chapter 2: Datasets & Methodology

This section will look at the dataset used in this study and the methodology carried out. The
analysis of the water level in the reservoir was formulated on weekly basis, where different
time series and machine learning techniques were evaluated, such as ARIMA, and AN.

2.1 Dataset for Time-Series Forecasting

The Indravati reservoir in Odisha is constructed on the Indravati River, a tributary of the
Godavari. It is linked to the Indravati Dam by a 4.32 km long and 7 m wide headrace tunnel.
With a capacity of 600MW, it is the largest power-generating dam in eastern India. Its total
capacity is 2.3 BCM, with a surface of 42 sq. mi. Figure 2.1 (a) shows the geographical
location of the Indravati Dam.

Fig. 2.1(a) Location of Indravati Reservoir in India


Water Level in the reservoir(m) and Storage (BCM) daily data are available on WRIS-India
since the reservoir's operation began, 1 January 2000, until 31 December 2020. We converted
the daily data into weekly, by averaging the daily water level in weekly cycles, resulting in a
total of 1089 data points. For Time series models, we only considered the historic time series
of water level at Indravati Dam. However, for the ML model Upstream flow and atmospheric
variables are also considered. Figure 2.1 (b) shows the time series of weekly water levels at
the Indravati reservoir. Data given by CWC on water level has missing values that need to be
addressed before fitting any model. We used a forward fill method to fill those values
because our data indicate a strong correlation with the previous lag.
Table 2.1 Descriptive statistics of Dependent variable (Water level at Indravati Dam).

Time Series Descriptive Statistics Water Level Data since 2000 (m)

Mean 634.857899

Standard Deviation 3.935806

1st Quartile 631.944286

Median 634.995000

3rd Quartile 638.035714

Figure 2.1(b) Time Series plot of weekly averaged water level at Indravati Dam

2.2 Time-Series Forecasting

“What is time series?” Time series is a collection of data points collected at regular time
intervals. Time series are analysed to determine the long-term trend and to forecast the future
value of a variable.

There are four components of a time series -

• Trend
• Seasonal Variations
• Cyclic Variations
• Random or Irregular movements
Figure 2.2(a) shows the components of time series of weakly water level at Indravati Dam.
These components are critical because before fitting any time series model we need to check
if our dataset stationary or not. Stationarity means that the statistical properties of a method
that generates the data do not change over time.

Figure 2.2(a) Components of Time-series at Indravati Dam

2.3 Step by step process of Time Series Forecasting

1. Plotting the Data - As a first step, we must plot our data against time, which is
referred to as a time series plot as shown in figure 2.1(b).
2. Check if Time-series is stationary - There should be no trend and seasonality in a
time series for it to be stationary. Referring to figure 2.2(a) we note that there is a
seasonality of one year in the time series of water level at Indravati.
3. Identifying the best forecasting model - For forecasting, any of the following
models may be used: AR, MA, ARMA, and ARIMA. We are using SARIMA
(Seasonal Autoregressive Integrated Moving Average) because our data is nonlinear
and is not stationary it shows strong seasonal behaviour.
4. Fitting a model - It is the most important step where we need to train and choose the
best possible hyperparameter for our model. Here in SARIMA, there are 7
hyperparameters (p, d, q) (P, Q, D, s) that needs to be calculated.
5. Evaluate your model - To measure the accuracy of our model, we must choose error
metrics. In our case, we used RMSE as error metric.

2.4 Seasonal Auto-Regressive Integrated Moving Average (SARIMA)

AR in ARIMA stands for Autoregressive, and it refers to making predictions based on lagged
values of our target variable. Here comes our first hyperparameter, p, which tells how many
lags we need to consider. It is determined using a Partial autocorrelation plot (PACF). The
PACF plot of water levels at Indravati Dam is shown in Figure 2.4(a), looking at the PACF
plot, we can estimate to use AR(4) for our model, since first 4 lags are out of the significance
level.

Figure 2.4(a) Partial Autocorrelation Plot

The letter I denote the word "integrated." It implies that rather than taking the raw target
values, we are differencing them. When our data exhibits a pattern, we must eliminate it in
order to make it stationary. This can be accomplished by differencing the data with previous
values. Here d denotes the number of times we are differencing to make our data stationary in
our case d = 0 this because our data shows no trend.
MA stands for Moving Average. The lagged prediction errors are inputs into a moving
average model. It is not a strictly measurable parameter like the others (and it's not constant
since it varies for the other parameters in the model). At a high stage, feeding the model's
errors back to it allows it to get closer to the right answer (the actual Y values). It comes with
another hyperparameter, q, which implies how many lags we need to consider the prediction
error. The value of q is determined using an Autocorrelation plot. Figure 2.4(b) shows the
ACF plot, looking at the ACF plot, we can estimate to use MA (11) for our model, since lag
till 11 are out of the confidence interval, and lag 11 is inside the level of significance.

Figure 2.4(b) Autocorrelation Plot

S stands for Seasonality, which must be considered as our data exhibits strong seasonality.
The parameters P, D, and Q are the relevant seasonal autoregressive parameter, seasonal
integrated parameter, and seasonal moving average parameter.

General form of SARIMA(p, d, q)(P, D, Q, s) is –

ΦP(Bs)φ(B)∆sD∆dxt = ϴQ(Bs)ϴ(B)

{wₜ} denotes the nonstationary TS, B is the backshift operator and s is period of TS; ϴ(B)
and φ(B) are ordinary moving average and autoregressive component of order p and q; φP(Bs)
and ϴQ(Bs) are seasonal autoregressive and moving average components with order P and Q;
∆sD and ∆d are seasonal and ordinal differencing terms having order D and d.

In this study, we concentrate on TS of weekly water level at Indravati Reservoir and our
water level has period of 1 year, therefore s = 52.
2.5 Dataset for ANN

You might also like