Professional Documents
Culture Documents
“What is the Reservoir?” A reservoir is a large natural or artificial lake, storage pond, or
impoundment created using a dam to store water. Reservoirs are essential for households and
have a significant contribution to the economy. They are the primary means for the
Generation of electricity, navigation, supply of water, and controlling the flood. There is 91
major Reservoir including all large capacity artificial lakes in India. Over half of the major
reservoirs are constructed on flooding rivers to control or affect river flow. Therefore, River
water reservoir management is a crucial problem. Among all the task in the management of
reservoir prediction of Water level in the reservoir is important in many aspects like
Evaluating structural problem in dams (Reference 1), Maintain water supply and availability
of resources (Reference 2), and Safeguard against floods.
The task of forecasting water levels in reservoirs can be approached by considering a variety
of dependent variables and forecasting techniques. Even though several algorithms and
hydrological models have been proposed so far, but accurate forecasting remains robust. The
water level in the reservoir is predicted using two major techniques, First using the
hydrological model for a reservoir. Another way is by pattern recognition utilizing data
mining and machine learning techniques.
In recent years, there has been a major increase in the use of Machine Learning (ML)
algorithms in this task, including several ranges of learning algorithms such as Artificial
feedforward neural networks, decision tree methods, SVM, and Time-Series Analysis. ANN
and ARIMA are the most widely used machine learning techniques to forecast time series
variables.
The rest of the project report is arranged as follows; Section 2 discussed the methodology
adopted and provides information on the observational dataset used for the analysis. In this
section, we select the hyperparameter for our model and predict the water level at the
objective reservoir. Section 3 presents the article's analytical section, which evaluates the
output of our model and compares them. Section 4 concludes the paper with a conclusion and
remarks.
Chapter 2: Datasets & Methodology
This section will look at the dataset used in this study and the methodology carried out. The
analysis of the water level in the reservoir was formulated on weekly basis, where different
time series and machine learning techniques were evaluated, such as ARIMA, and AN.
The Indravati reservoir in Odisha is constructed on the Indravati River, a tributary of the
Godavari. It is linked to the Indravati Dam by a 4.32 km long and 7 m wide headrace tunnel.
With a capacity of 600MW, it is the largest power-generating dam in eastern India. Its total
capacity is 2.3 BCM, with a surface of 42 sq. mi. Figure 2.1 (a) shows the geographical
location of the Indravati Dam.
Time Series Descriptive Statistics Water Level Data since 2000 (m)
Mean 634.857899
Median 634.995000
Figure 2.1(b) Time Series plot of weekly averaged water level at Indravati Dam
“What is time series?” Time series is a collection of data points collected at regular time
intervals. Time series are analysed to determine the long-term trend and to forecast the future
value of a variable.
• Trend
• Seasonal Variations
• Cyclic Variations
• Random or Irregular movements
Figure 2.2(a) shows the components of time series of weakly water level at Indravati Dam.
These components are critical because before fitting any time series model we need to check
if our dataset stationary or not. Stationarity means that the statistical properties of a method
that generates the data do not change over time.
1. Plotting the Data - As a first step, we must plot our data against time, which is
referred to as a time series plot as shown in figure 2.1(b).
2. Check if Time-series is stationary - There should be no trend and seasonality in a
time series for it to be stationary. Referring to figure 2.2(a) we note that there is a
seasonality of one year in the time series of water level at Indravati.
3. Identifying the best forecasting model - For forecasting, any of the following
models may be used: AR, MA, ARMA, and ARIMA. We are using SARIMA
(Seasonal Autoregressive Integrated Moving Average) because our data is nonlinear
and is not stationary it shows strong seasonal behaviour.
4. Fitting a model - It is the most important step where we need to train and choose the
best possible hyperparameter for our model. Here in SARIMA, there are 7
hyperparameters (p, d, q) (P, Q, D, s) that needs to be calculated.
5. Evaluate your model - To measure the accuracy of our model, we must choose error
metrics. In our case, we used RMSE as error metric.
AR in ARIMA stands for Autoregressive, and it refers to making predictions based on lagged
values of our target variable. Here comes our first hyperparameter, p, which tells how many
lags we need to consider. It is determined using a Partial autocorrelation plot (PACF). The
PACF plot of water levels at Indravati Dam is shown in Figure 2.4(a), looking at the PACF
plot, we can estimate to use AR(4) for our model, since first 4 lags are out of the significance
level.
The letter I denote the word "integrated." It implies that rather than taking the raw target
values, we are differencing them. When our data exhibits a pattern, we must eliminate it in
order to make it stationary. This can be accomplished by differencing the data with previous
values. Here d denotes the number of times we are differencing to make our data stationary in
our case d = 0 this because our data shows no trend.
MA stands for Moving Average. The lagged prediction errors are inputs into a moving
average model. It is not a strictly measurable parameter like the others (and it's not constant
since it varies for the other parameters in the model). At a high stage, feeding the model's
errors back to it allows it to get closer to the right answer (the actual Y values). It comes with
another hyperparameter, q, which implies how many lags we need to consider the prediction
error. The value of q is determined using an Autocorrelation plot. Figure 2.4(b) shows the
ACF plot, looking at the ACF plot, we can estimate to use MA (11) for our model, since lag
till 11 are out of the confidence interval, and lag 11 is inside the level of significance.
S stands for Seasonality, which must be considered as our data exhibits strong seasonality.
The parameters P, D, and Q are the relevant seasonal autoregressive parameter, seasonal
integrated parameter, and seasonal moving average parameter.
ΦP(Bs)φ(B)∆sD∆dxt = ϴQ(Bs)ϴ(B)
{wₜ} denotes the nonstationary TS, B is the backshift operator and s is period of TS; ϴ(B)
and φ(B) are ordinary moving average and autoregressive component of order p and q; φP(Bs)
and ϴQ(Bs) are seasonal autoregressive and moving average components with order P and Q;
∆sD and ∆d are seasonal and ordinal differencing terms having order D and d.
In this study, we concentrate on TS of weekly water level at Indravati Reservoir and our
water level has period of 1 year, therefore s = 52.
2.5 Dataset for ANN