Professional Documents
Culture Documents
Minor Project
Minor Project
By
Mayank Shahabadee (1803096)
Mayank Dubey (1803131)
April 2021
1
© KALINGA INSTITUTE OF INDUSTRIAL TECHNOLOGY KIIT-
DEEMED TO BE UNIVERSITY, BHUBANESWAR ALL RIGHTS
RESERVED
2
CERTIFICATE
Signature of Supervisor 1
Prof. Banishree Misra
Assistant Professor
School of Electrical Engineering,
KIIT
................................................................................................................................................
The Project was evaluated by us on
EXAMINER 1 EXAMINER 2
EXAMINER 3 EXAMINER 4
3
CERTIFICATE
Signature of Supervisor 1
Prof. Banishree Misra
Assistant Professor
School of Electrical Engineering,
KIIT
................................................................................................................................................
The Project was evaluated by us on
EXAMINER 1 EXAMINER 2
EXAMINER 3 EXAMINER 4
4
ACKNOWLEDGEMENTS
Mayank
Shahabadee
Mayank
Dubey
5
Preface
• Paul Saffo
6
Chapter 1 - Background and Introduction:
Demand Forecasting (Big Data and Predictive Analysis):
If we can predict the demand of future, like in the current
scenario, where we are failing to predict the demand of oxygen
cylinder and beds in covid-19 hospitals, accurate prediction can
lead to solving many challenges in day-to-day life.
7
Forecasting Types:
8
Problems with Traditional Time Series Problems:
• There is no systematic approach for the identification and selection
of an appropriate model, and therefore, the identification process is
mainly trial-and-error.
• There is difficulty in verifying the validity of the model
• Most traditional methods were developed from intuitive and
practical considerations rather than from a statistical
foundation
• Autoregressive AR process:
• Series current values depend on its own previous values
• AR(p) - Current values depend on its own p-previous values
• P is the order of AR process
9
Series current values depend on its own previous values
10
Example y(t) data:
11
Arima (p, d, q) modeling:
To build a time series model issuing ARIMA, we need to study the time
series and identify p, d, q:
1. Ensuring Stationarity
Determine the appropriate values of d
2. Identification:
Determine the appropriate values of p & q using the ACF, PACF,
and unit root tests
<<<p is the AR order, d is the integration order, q is the MA order>>>
3. Estimation:
Estimate an ARIMA model using values of p, d, & q you
think are appropriate.
4. Diagnostic checking:
Check residuals of estimated ARIMA model(s) to see if they are white
noise, pick best model with well-behaved residuals.
5. Forecasting:
Produce out of sample forecasts or set aside last few data points for in-
sample forecasting.
12
Stationarity:
In order to model a time series with the Box-Jenkins approach, the series
has to be stationary:
In statistical terms, a stationary process is assumed to be in a
particular state of statistical equilibrium, i.e., p(xt) is the same for all t
In particular, if zt is a stationary process, then the first difference Δzt
= zt - zt-1 and higher differences Δd zt are stationary.
13
Achieving Stationarity:
Power of Differentiation:
14
Other Methods to Achieve Stationarity:
15
Chapter 2 – Model Selection
Prophet Model (By Facebook):
Prophet is a Generalized Additive Model which models a Time Series
prediction as a curve fitting exercise with the given formula:
y(t) = g(t) + s(t) + h(t) + ε(t)
ε(t) - Error/Residual
Advantages:
Code Sample:
16
Recurrent Neural Networks (RNNs):
• Designed to handle sequential data.
• And Time Series is a Sequential data.
17
Chapter 3 - Time Series as Regression:
To use the time series data as a supervised data for regressor, we use
what’s called a sliding window method, where we take t1, t2, t3 as x and
then t4 is y, we are using t1, t2, t3 to predict t4 and them moving the
window ahead, we get t2, t3, t4 to predict t5. That’s how we generate
different samples of data for the supervised learning.
18
Chapter 4 - Feature Engineering:
19
Chapter 5 - Common Problems faced in Regression:
20
2. Lag and Rolling Features: In Pandas, we have the mean rolling
window function, by default it includes the current timestamp
version also, which is the direct leak of the target into the rolling
mean, hence we shift it by one and then apply the rolling feature.
This makes it absolutely sure that the target is not leaked.
21
4. Direct Forecast: Single Models in Direct Forecast have added
advantages over Multi Model Forecast, for 18 Months of data we are
using 18 models and difficult to manage. When regressor supports
multiple targets, we can use the Single Models in Direct Forecast,
which basically comes under the neural network world where we
can take past data and do direct predictions on t+1, t+2, and t+3.
22
Chapter 6 - Forecasting at Scale:
23
Automated Model Selection (Auto ML) and Hyper Parameter
Tuning:
Our Default choice for model selection is
Grid Search and on top of it we do hyper
parameter tuning. Trying out all possibilities
for hyperparameter tuning is computation-
ally expensive for large datasets.
Random Search:
It’s basically like throwing darts in a dark room and expecting it to hit the
target.
24
- This is the prior believe in Bayesian world.
3. Now we sample three parameters from these prior distributions and
record the value of the metric. Based on the value of the metric, we
update the distribution of the three parameters – This is called the
posterior.
4. Now we repeat this for N number of iterations, each iteration
sharpening the posterior and moving closer to the optimal solution.
Sample Code:
25
Pooled vs Un-pooled Model:
VS.
26
Chapter 7 - Time Series Segmentation:
27
Chapter 8 - Unsupervised Clustering for Time Series
o Euclidean Distance
• Sensitive to time shifts
• Only works for equal length time Series
Autoencoders:
o Uses a Deep Neural Network to learn feature representation
of a time series.
o Uses normal clustering algorithms on those representations
28
Python Libraries:
tslearn implements the classical time series clustering.
Autoencoders can be coded using any of the popular DL
frameworks.
29
From Point Forecast to Probabilistic Forecast:
30
31
Chapter 9 - Conclusion:
1.> For Large Scale data, Neural Network Models gives the best
output.
2.> For Small Datasets, Precise Hyperparameter Tuned Model turns
the best.
3.> Multivariate Dataset can perform better in Larger Datasets than
Univariate Datasets.
32
Chapter 10 - Future Challenges and Research Problems:
1.> Self-Optimization of Model according to the kind of dataset
given or is available.
2.> Optimizing Pretrained Models and customizing it to fit the
dynamic nature of data.
3.> Still Data Analysis misses due to lack of insights due to access
to classified data. Data should be openly available for analysis
without personal identity violations.
33
Chapter 11 – References
V. Demir, M. Zontul and İ. Yelmen, "Drug Sales Prediction with ACF and
PACF Supported ARIMA Method," 2020 5th International Conference on
Computer Science and Engineering (UBMK), Diyarbakir, Turkey, 2020,
pp. 243-247.
doi: 10.1109/UBMK50275.2020.9219448
34
keywords: {Time series analysis;Drugs;ARIMA;ACF;PACF;Time Series
Prediction;Drug Sales Prediction},
URL: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=92194
48&isnumber=9219361
35
International Conference on Signal Processing and Communication
Systems (ICSPCS), Adelaide, SA, Australia, 2020, pp. 1-7.
doi: 10.1109/ICSPCS50536.2020.9310033
keywords: {5G mobile communication;Forecasting;Time series
analysis;Monitoring;Predictive models;Data models;Tools;Machine
Learning;Time Series Forecasting;3GPP 5G Core;Open5GCore
Toolkit;Failure Prediction},
URL: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=93100
33&isnumber=9309996
36
series model},
URL: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=92481
42&isnumber=9248088
37
keywords: {Chaos;Neural networks;Artificial neural networks;Chaotic
communication;Computer simulation;Computational
modeling;Feedforward neural networks;Recurrent neural networks;Limit-
cycles;Logistics},
URL: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=71420
6&isnumber=15470
38
networks: an example," Proceedings of 1993 International Conference on
Neural Networks (IJCNN-93-Nagoya, Japan), Nagoya, Japan, 1993, pp.
995-998 vol.1.
doi: 10.1109/IJCNN.1993.714079
keywords: {Time series analysis;Neural networks;Testing;Chemical
analysis;Heat engines;Laboratories;Chemical engineering;Electronic
mail;Performance analysis;Feedforward neural networks},
URL: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=71407
9&isnumber=15487
39
analogues for iterated time series prediction," Neural Networks for Signal
Processing VIII. Proceedings of the 1998 IEEE Signal Processing Society
Workshop (Cat. No.98TH8378), Cambridge, UK, 1998, pp. 458-466.
doi: 10.1109/NNSP.1998.710676
keywords: {Neural networks;Nonlinear systems;Time series
analysis;Nonlinear dynamical systems;Ear;Iron;Multidimensional
systems;State-space methods;Tracking;Coordinate measuring
machines},
URL: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=71067
6&isnumber=15338
40
(TGNN)},
URL: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=80664
55&isnumber=8440865
41