Professional Documents
Culture Documents
We would like to express our deepest gratitude to Mr. Kartik Muskula and Ms. Anavadhya for
their invaluable patience and feedback. Thank you for entrusting us with this opportunity.
We could not have undertaken this journey without your guidance and expertise.
Business Objective
Oil is a product that goes completely in a different direction for a single market event as the oil prices are
rarely based on real-time data, instead, it is driven by externalities making our attempt to forecast it even more
challenging. As the economy will be highly affected by oil prices our model will help to understand the pattern
in prices to help the customers and businesses to make smart decisions.
Introduction
The volatility and complexity of global oil markets make predicting oil prices a
challenging yet crucial task for various stakeholders, including investors, policymakers,
and industry professionals. In recent years, data science has emerged as a powerful tool
to analyze historical trends, identify patterns, and build predictive models to forecast
future oil prices.
This data science project aims to leverage advanced analytics and machine learning
techniques to develop an accurate and reliable model for predicting oil prices. The
project involves collecting and preprocessing a diverse set of data that influence the
energy market.
PROJECT ARCHITECTURE/ PROJECT FLOW
Business
Understanding
Model
Deployment
Data Collection
Exploratory Data
Visualization
Analysis
DATA COLLECTION
• We have taken Data from the site www.eia.gov Shape of dataset:- (456, 2)
• Here we have taken date as Independent variable and COSP Data Types:- Date datetime64[ns] COSP float64
as the Dependent variable
mean 46.886338
std 29.567799
min 11.350000
25% 20.085000
50% 38.170000
75% 70.375000
max 133.880000
DATASET
Observations:
Observations:
1. From the year 1986 to 1998, there is
not much variance,
but after that there is a lot of variance
with some extreme highs and lows.
2. Post 1998, there is an increasing
trend in Crude oil price
3. On 15th December 1998, the
minimum price was $11.35/barrel,
15th June 2008, maximum price
$133.88/barrel
4. There are some peak values
observed in 2014
5. In April 2020, there are dips in the
price of Crude oil
Final Observation: Since 1998, there is
increase in Crude oil Price, with peak
values observed in year
2008, 2013, 2014 and 2022
LINE PLOT
Observations:-
We can also verify by this Line Plot that the increase in price in significantly high
around the year 2005 and 2006
Lag Plot (Monthly)
We will do a sequential split that’s because the order of sequence should be intact in a Time series
dataset to use it for Forecasting
We have total 38 years of data, of which we will use the last 2 years data as Test and the remaining
36 years data as Train. The Final model will be trained on Entire dataset for making predictions.
TIME SERIES DECOMPOSITION
• Trend: A trend is said to exist when there is a long-term increase or decrease in the time series data.
• Seasonality: A seasonal pattern is observed when a time series is affected by seasonal factors occurring
yearly, monthly, daily etc . Seasonality refers to a known frequency, e.g. Quarterly = 4,monthly =12
To extract the Trend, Seasonality and error we used the decompose() and forecast::stl() function to split our
time series in to seasonality, trend and error components.
Observation:
1. Actual data
2. Trend : There is an Increasing Trend
3. Seasonality: There is Seasonality associated
4. Resid : Residual is the left over after decomposition of the two major components (Trend and Seasonality)
TEST FOR STATIONARITY
Assumptions of AIRMA model
∙ Data should be Stationary: A Time series is said to be Stationary if its statistical properties like mean, variance, autocorrelation etc are all
constant over time implying it does not have Trend or Seasonal effect. If a Trend appears and Stationarity is not evident, many of the
computations throughout the process cannot be made and produce the intended results. Statistical modelling methods assume or
require the time series to be stationary to be effective.
∙ Data should be Univariate: AIRMA works on a single variable. (Auto regression is all about regression with past values)
Conclusion: Our time series is Non-stationary. We need to make our non-Stationary time series to
Stationary before we begin model building.
Now we will apply transformations to the non-stationary time series and check with ADF test if the
time series has become stationary.
The stationary time series will help to analyze the data further and understand the variance in the
data.
❖ Square root Transformation
❖Log Transformation:
❖ Differencing :
In this we compute the differences between consecutive observations in the time series. If Yt denotes the value of the time series
Y at period t, then the first order difference of Y at period t is equal to [First order differencing Y’t = Yt - Yt-1]. Differencing is done
to get rid of the varying mean.
Observation:
We can see that p-value is 6.590164e-18, Since p-value < 0.05, we reject the Null hypothesis.
Conclusion: Our time series is Stationary. It means it does not have any Trend and Seasonality. The
data does not depend on the time when it is captured. We can use this data for Model Building.
Model based methods (monthly data)
Observation: Linear model has performed well with Raw data, followed by Ordinary least square
with Multiplicative Additive Seasonality
DATA DRIVEN MODELS (MONTHLY DATA)
Triple_Exp_Add_Mul 88.349556
RMSE_ARIMA = 21.84
SARIMA MODEL (MONTHLY DATA)
RMSE_SARIMA = 22.43
PROPHET MODEL (MONTHLY DATA)
m = Prophet()
rec no ds yhat yhat_lower yhat_upper
m.fit(train)
451 31/07/23 20.857337 3.272883 38.198375
future = m.make_future_dataframe(24,freq='M') #MS
for monthly, H for hourly 452 31/08/23 48.247309 31.433898 66.395961
forecast = m.predict(future) 453 30/09/23 57.71588 40.133683 73.89622
454 31/10/23 67.195082 50.459086 82.728286
455 30/11/23 27.835174 10.847369 45.154253
RMSE_Prophet = 47.93
PROPHET MODEL CONTD
LSTM MODEL (MONTHLY DATA)
• The Long Short-Term Memory network, or LSTM network, is a recurrent neural network trained
using Backpropagation Through Time that overcomes the vanishing gradient problem. It can be
applied to time series forecasting.
from keras.preprocessing.sequence import TimeseriesGenerator
# define generator
n_input = 3
n_features = 1
generator = TimeseriesGenerator(scaled_train, scaled_train,
length=n_input, batch_size=1)
# generator: Input for 12 months
n_input = 12
generator = TimeseriesGenerator(scaled_train, scaled_train,
length=n_input, batch_size=1)
# define model
model = Sequential()
model.add(LSTM(100, activation='relu', input_shape=(n_input,
n_features)))
model.add(Dense(1))
model.compile(optimizer='adam', loss='mse')
LSTM MODEL (MONTHLY DATA)
RMSE_LSTM = 5.45
SILVERKITE MODEL (MONTHLY DATA)
Silverkite is a forecasting algorithm developed by LinkedIn. It supports different kinds of growth,
interactions, and fitting algorithms.
# specify dataset information
RMSE_Prophet = 9.93
metadata = MetadataParam(
time_col="ds", # name of the time column
value_col="y", # name of the value column
freq="M", # "H" for hourly, "D" for daily,"M" for monthly etc.
train_end_date = pd.to_datetime('2021-06-30’))
#
forecaster = Forecaster() # Creates forecasts and stores the result
result = forecaster.run_forecast_config( # result is also stored as
`forecaster.forecast_result`.
df=df,
config=ForecastConfig(
model_template=ModelTemplateEnum.SILVERKITE.name,
forecast_horizon=24, # forecasts 2 years ahead
coverage=0.95, # 95% prediction intervals
metadata_param=metadata))
SILVERKITE MODEL (CONTD)
ALL MODELS WITH RMSE VALUES
Sr No: Models RMSE_Values
1 RMSE_LSTM 5.44
2 RMSE_SILVERKITE 9.93
3 Triple_Exp_Mul 11.76
Error Evaluation:
4 Simple_Exp 13.11 Root Mean Square
5 Triple_Exp_Mul_Add 13.21
6 Triple_Exp_Add 13.26
7 Double_Exp 13.62
8 RMSE_Linear 14.12
9 RMSE_Mult_Add_Sea 15.91
10 RMSE_Exp 16.07
11 RMSE_Add_Sea_Quad 20.51
12 RMSE_Quad 20.71 Observation: The LSTM model performed quite
13 RMSE_ARIMA 21.84 well on the Raw data, followed by Silverkite and
14 RMSE_SARIMA 22.43 Holt’s Winter Triple Exponential Smoothing
15 RMSE_Add_Sea 43.31 model with Multiplicative Trend and Seasonality
16 RMSE_PROPHET 47.93
17 RMSE_Mult_Sea 51.37
18 Triple_Exp_Add_Mul 88.35
MODEL DEPLOYMENT
• We used LSTM Model for deployment.
Final Model Graph
MODEL DEPLOYMENT (CONTD)
• Limited Predictive Power: Date alone does not provide sufficient information to predict oil
prices with a high degree of accuracy. While historical price trends may exhibit certain patterns
over time, these patterns may not necessarily persist in the future. The lack of contextual
information limits the model's predictive power and makes it susceptible to random
fluctuations or noise in the data.
• The ARIMA model source code was taking time to execute. So we used Google Collaboratory
to develop the code.
Thank you