0% found this document useful (0 votes)

48 views11 pages

Champagne Sales Data Analysis

Arima model implemented and practiced

Uploaded by

zainali.x21

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

48 views11 pages

Champagne Sales Data Analysis

Arima model implemented and practiced

Uploaded by

zainali.x21

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

# import the libraries

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline

# read the dataset

df = pd.read_csv("perrin-freres-monthly-champagne-.csv")

# show the first five rows

df.head()

Month Perrin Freres monthly champagne sales millions ?64-?72

0 1964-01 2815.0
1 1964-02 2672.0
2 1964-03 2755.0
3 1964-04 2721.0
4 1964-05 2946.0

# show the last five rows

df.tail()

Month \
102 1972-07
103 1972-08
104 1972-09
105 NaN
106 Perrin Freres monthly champagne sales millions...

Perrin Freres monthly champagne sales millions ?64-?72

102 4298.0
103 1413.0
104 5877.0
105 NaN
106 NaN

# change the column names

df.columns = ['Month', 'Sales']

df.head()

Month Sales
0 1964-01 2815.0
1 1964-02 2672.0
2 1964-03 2755.0
3 1964-04 2721.0
4 1964-05 2946.0

# drop the missing rows

df.dropna(inplace = True)
# check the data again !
df.tail()

Month Sales
100 1972-05 4618.0
101 1972-06 5312.0
102 1972-07 4298.0
103 1972-08 1413.0
104 1972-09 5877.0

# convert Month into DateTime

df['Month'] = pd.to_datetime(df['Month'])

df.head()

Month Sales
0 1964-01-01 2815.0
1 1964-02-01 2672.0
2 1964-03-01 2755.0
3 1964-04-01 2721.0
4 1964-05-01 2946.0

# remove the index

df.set_index('Month', inplace = True)

df.head()

Sales
Month
1964-01-01 2815.0
1964-02-01 2672.0
1964-03-01 2755.0
1964-04-01 2721.0
1964-05-01 2946.0

# let's discuss some statistical analysis of data

df.describe()

Sales
count 105.000000
mean 4761.152381
std 2553.502601
min 1413.000000
25% 3113.000000
50% 4217.000000
75% 5221.000000
max 13916.000000

# check the null values

df.isnull().sum()
Sales 0
dtype: int64

# to see the shape of data

df.shape

(105, 1)

# to get information of data

df.info()

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 105 entries, 1964-01-01 to 1972-09-01
Data columns (total 1 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Sales 105 non-null float64
dtypes: float64(1)
memory usage: 1.6 KB

visualization of Data
df.plot()

<Axes: xlabel='Month'>
# Test wheather the data is stationary or non-stationary !
from statsmodels.tsa.stattools import adfuller

test_result = adfuller(df['Sales'])

def adfuller_test(sales):
result = adfuller(sales)
label = ['ADF test Statistic', 'p2-value', '#Lags used', 'Number
of observation used']
for value, label in zip(result, label):
print(label + " : ", str(value))
if result[1] <= 0.05 :
print('It is Stationary')
else:
print("It is not Stationary")

adfuller_test(df['Sales'])

ADF test Statistic : -1.8335930563276195

p2-value : 0.3639157716602467
#Lags used : 11
Number of observation used : 93
It is not Stationary
Since the data is non-stationary, we must have to make it
stationary. Let's Go !

Differencing ....!
df['Sales First Difference'] = df["Sales"] - df['Sales'].shift(1)

df['Seasonal First Difference'] = df['Sales'] - df['Sales'].shift(12)

df.head(20)

Sales Sales First Difference Seasonal First Difference

Month
1964-01-01 2815.0 NaN NaN
1964-02-01 2672.0 -143.0 NaN
1964-03-01 2755.0 83.0 NaN
1964-04-01 2721.0 -34.0 NaN
1964-05-01 2946.0 225.0 NaN
1964-06-01 3036.0 90.0 NaN
1964-07-01 2282.0 -754.0 NaN
1964-08-01 2212.0 -70.0 NaN
1964-09-01 2922.0 710.0 NaN
1964-10-01 4301.0 1379.0 NaN
1964-11-01 5764.0 1463.0 NaN
1964-12-01 7312.0 1548.0 NaN
1965-01-01 2541.0 -4771.0 -274.0
1965-02-01 2475.0 -66.0 -197.0
1965-03-01 3031.0 556.0 276.0
1965-04-01 3266.0 235.0 545.0
1965-05-01 3776.0 510.0 830.0
1965-06-01 3230.0 -546.0 194.0
1965-07-01 3028.0 -202.0 746.0
1965-08-01 1759.0 -1269.0 -453.0

# Again test the dicky fuller test

adfuller_test(df['Seasonal First Difference'].dropna())

ADF test Statistic : -7.626619157213166

p2-value : 2.0605796968136632e-11
#Lags used : 0
Number of observation used : 92
It is Stationary

df['Seasonal First Difference'].plot()

<Axes: xlabel='Month'>
Auto-Regressive Model
from pandas.plotting import autocorrelation_plot
autocorrelation_plot(df['Sales'])
plt.show()
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf

import matplotlib.pyplot as plt

import statsmodels.api as sm

fig = plt.figure(figsize=(12, 8))

# Create the first subplot for ACF

ax1 = fig.add_subplot(211)
sm.graphics.tsa.plot_acf(df['Seasonal First Difference'].dropna(),
lags=40, ax=ax1)
ax1.set_title('ACF Plot')

# Create the second subplot for PACF

ax2 = fig.add_subplot(212)
sm.graphics.tsa.plot_pacf(df['Seasonal First Difference'].dropna(),
lags=40, ax=ax2)
ax2.set_title('PACF Plot')

# Display the plots

plt.tight_layout()
plt.show()
# for non-seasonal data

from statsmodels.tsa.arima_model import ARIMA

import pandas as pd
from statsmodels.tsa.arima.model import ARIMA

# Assuming 'df' is your DataFrame and 'Sales' is the column you're

modeling
# Define the ARIMA model, here (1, 1, 1) refers to (p, d, q)
model = ARIMA(df['Sales'], order=(1, 1, 1))

# Fit the ARIMA model

model_fit = model.fit()

# Print the summary of the model

print(model_fit.summary())

C:\Users\Multan IT\anaconda3\Lib\site-packages\statsmodels\tsa\base\
tsa_model.py:473: ValueWarning: No frequency information was provided,
so inferred frequency MS will be used.
self._init_dates(dates, freq)
C:\Users\Multan IT\anaconda3\Lib\site-packages\statsmodels\tsa\base\
tsa_model.py:473: ValueWarning: No frequency information was provided,
so inferred frequency MS will be used.
self._init_dates(dates, freq)
C:\Users\Multan IT\anaconda3\Lib\site-packages\statsmodels\tsa\base\
tsa_model.py:473: ValueWarning: No frequency information was provided,
so inferred frequency MS will be used.
self._init_dates(dates, freq)

SARIMAX Results

======================================================================
========
Dep. Variable: Sales No. Observations:
105
Model: ARIMA(1, 1, 1) Log Likelihood
-952.814
Date: Sun, 29 Sep 2024 AIC
1911.627
Time: 00:19:15 BIC
1919.560
Sample: 01-01-1964 HQIC
1914.841
- 09-01-1972

Covariance Type: opg

======================================================================
========
coef std err z P>|z| [0.025
0.975]
----------------------------------------------------------------------
--------
ar.L1 0.4545 0.114 3.999 0.000 0.232
0.677
ma.L1 -0.9666 0.056 -17.316 0.000 -1.076
-0.857
sigma2 5.226e+06 6.17e+05 8.473 0.000 4.02e+06
6.43e+06
======================================================================
=============
Ljung-Box (L1) (Q): 0.91 Jarque-Bera (JB):
2.59
Prob(Q): 0.34 Prob(JB):
0.27
Heteroskedasticity (H): 3.40 Skew:
0.05
Prob(H) (two-sided): 0.00 Kurtosis:
3.77
======================================================================
=============

Warnings:
[1] Covariance matrix calculated using the outer product of gradients
(complex-step).

model = sm.tsa.statespace.SARIMAX(df['Sales'], order = (1,1,1),

seasonal_order = (1,1,1,12))
results = model.fit()

df['Forecast'] = results.predict(start = 90, end = 103, dynamic =

True)
df[['Sales', 'Forecast']].plot(figsize = (12,8))

<Axes: xlabel='Month'>
Best Of Luck ...!!!

Time Series Analysis
No ratings yet
Time Series Analysis
5 pages
Time Series
No ratings yet
Time Series
38 pages
Time Series Methods Introduction Guide
No ratings yet
Time Series Methods Introduction Guide
34 pages
Time Series Analysis of Air Passengers
No ratings yet
Time Series Analysis of Air Passengers
9 pages
q3 R Software
No ratings yet
q3 R Software
12 pages
Time Series Forecasting Business Report
No ratings yet
Time Series Forecasting Business Report
42 pages
Liquor Sales Forecasting Model Guide
No ratings yet
Liquor Sales Forecasting Model Guide
1 page
Q3AB
No ratings yet
Q3AB
15 pages
Time - Series - Forecasting Using Teleco Telecom Revenue
No ratings yet
Time - Series - Forecasting Using Teleco Telecom Revenue
27 pages
Q3a Q3B
No ratings yet
Q3a Q3B
13 pages
HDFC Bank Time Series Analysis
No ratings yet
HDFC Bank Time Series Analysis
10 pages
Dev Lab Record
No ratings yet
Dev Lab Record
21 pages
Heathrow Sunshine Time Series Analysis
No ratings yet
Heathrow Sunshine Time Series Analysis
19 pages
Assignment 3 Teleco Telecom Revenue - Copy1
No ratings yet
Assignment 3 Teleco Telecom Revenue - Copy1
33 pages
Apoorva P 17th March TSF
No ratings yet
Apoorva P 17th March TSF
47 pages
Projected Stock Price Analysis 2005-2008
No ratings yet
Projected Stock Price Analysis 2005-2008
17 pages
Sunspot Time Series Data Analysis
No ratings yet
Sunspot Time Series Data Analysis
31 pages
Walmart Store Sales Data Analysis
No ratings yet
Walmart Store Sales Data Analysis
18 pages
Practical No 6
No ratings yet
Practical No 6
5 pages
ARIMA Model for Time Series in Python
No ratings yet
ARIMA Model for Time Series in Python
11 pages
Daily Minimum Temperatures Data Analysis
No ratings yet
Daily Minimum Temperatures Data Analysis
13 pages
Alizing Time Series Data in Python
No ratings yet
Alizing Time Series Data in Python
47 pages
Wine Sales Time Series Forecasting
No ratings yet
Wine Sales Time Series Forecasting
18 pages
Ibd Manual
No ratings yet
Ibd Manual
12 pages
Data Cleaning Techniques in Python
No ratings yet
Data Cleaning Techniques in Python
17 pages
Data Science Unit 2 Second Half Notes
No ratings yet
Data Science Unit 2 Second Half Notes
18 pages
Pandas Guide for Beginners
No ratings yet
Pandas Guide for Beginners
18 pages
Quick Start Guide to Pandas
No ratings yet
Quick Start Guide to Pandas
26 pages
Shoe Sales Time Series Analysis
100% (3)
Shoe Sales Time Series Analysis
105 pages
Modelo Arima
No ratings yet
Modelo Arima
11 pages
Numpy Boolean Indexing: Filter
No ratings yet
Numpy Boolean Indexing: Filter
39 pages
Walmart Weekly Sales Analysis Notebook
No ratings yet
Walmart Weekly Sales Analysis Notebook
16 pages
Pandas
No ratings yet
Pandas
18 pages
Time Series Using Python
No ratings yet
Time Series Using Python
18 pages
Seasonal Adjustment Techniques Explained
No ratings yet
Seasonal Adjustment Techniques Explained
81 pages
Lab Record Dev
No ratings yet
Lab Record Dev
20 pages
Australian Beer Production Forecasting
No ratings yet
Australian Beer Production Forecasting
19 pages
DS (Pandas)
No ratings yet
DS (Pandas)
17 pages
Timeseries Forecasting Assignment - Rose
No ratings yet
Timeseries Forecasting Assignment - Rose
1,329 pages
GDP Forecasting Using ARIMA Model
No ratings yet
GDP Forecasting Using ARIMA Model
10 pages
Acknowledgment and Practical File for Informatics
No ratings yet
Acknowledgment and Practical File for Informatics
46 pages
Granger Causality Analysis in Python
No ratings yet
Granger Causality Analysis in Python
1 page
Pandas Quick Start Guide
No ratings yet
Pandas Quick Start Guide
23 pages
Class 12 Informatics Practices File
No ratings yet
Class 12 Informatics Practices File
23 pages
Gas Prod
100% (3)
Gas Prod
24 pages
Quick Start Guide to Pandas
No ratings yet
Quick Start Guide to Pandas
24 pages
Unit3 - 3) Pandas - Ipynb - Colab
No ratings yet
Unit3 - 3) Pandas - Ipynb - Colab
11 pages
Practical File Ip
No ratings yet
Practical File Ip
27 pages
TSF - Rose Data
No ratings yet
TSF - Rose Data
31 pages
Pandas Library
No ratings yet
Pandas Library
5 pages
Module 3
No ratings yet
Module 3
5 pages
DataFrame Operations with Pandas
No ratings yet
DataFrame Operations with Pandas
18 pages
11.2 Pandas
No ratings yet
11.2 Pandas
24 pages
Time Series Analysis with R: Stationarity
No ratings yet
Time Series Analysis with R: Stationarity
9 pages
Pandas DataFrame Notes - 12pages-Pages-7
No ratings yet
Pandas DataFrame Notes - 12pages-Pages-7
1 page
Time Series Sales Analysis and Trends
No ratings yet
Time Series Sales Analysis and Trends
31 pages
Rainfall Data for Indian Subdivisions
No ratings yet
Rainfall Data for Indian Subdivisions
31 pages
Mit18 443S15 Lec15
No ratings yet
Mit18 443S15 Lec15
30 pages
Understanding Statistics and Data Analysis
0% (1)
Understanding Statistics and Data Analysis
74 pages
MLB Salary Regression Analysis
No ratings yet
MLB Salary Regression Analysis
3 pages
Newbold Chapter 7
No ratings yet
Newbold Chapter 7
62 pages
Time Series Forecasting Basics
No ratings yet
Time Series Forecasting Basics
20 pages
39.01 IA Criteria
No ratings yet
39.01 IA Criteria
3 pages
09 Handout 1
No ratings yet
09 Handout 1
8 pages
Day 2: CRUSH The Statistics Part of The Data Science Interview ?
No ratings yet
Day 2: CRUSH The Statistics Part of The Data Science Interview ?
7 pages
Exact Inference in Bayesian Networks
No ratings yet
Exact Inference in Bayesian Networks
13 pages
Correlation and Its Significance
No ratings yet
Correlation and Its Significance
15 pages
Bayesian Econometrics Overview
No ratings yet
Bayesian Econometrics Overview
12 pages
Understandable Statistics 12th Edition Brase C.H. Ebook Updated File Version
100% (1)
Understandable Statistics 12th Edition Brase C.H. Ebook Updated File Version
94 pages
Lampiran Hasil Analisis Jalur Dengan Lisrel
No ratings yet
Lampiran Hasil Analisis Jalur Dengan Lisrel
7 pages
Bivariate Analysis: Correlation & Regression
No ratings yet
Bivariate Analysis: Correlation & Regression
10 pages
CFAI CFA2 Quant Practice Question M4 25ver
No ratings yet
CFAI CFA2 Quant Practice Question M4 25ver
10 pages
Correlation and Regression in Statistics
No ratings yet
Correlation and Regression in Statistics
23 pages
Sas Sur
No ratings yet
Sas Sur
9 pages
L11-L12 Assessment Regression
No ratings yet
L11-L12 Assessment Regression
5 pages
Unit 5
No ratings yet
Unit 5
18 pages
A Project Report On Forecasting of Rainfall in India
100% (1)
A Project Report On Forecasting of Rainfall in India
35 pages
Statistical Analysis of Smoking and Lung Capacity
No ratings yet
Statistical Analysis of Smoking and Lung Capacity
7 pages
Chapter 15 CRAVEN SALES MODEL - Multiple Regression
No ratings yet
Chapter 15 CRAVEN SALES MODEL - Multiple Regression
19 pages
Categorical Variable Modeling in Stata
No ratings yet
Categorical Variable Modeling in Stata
16 pages
Survival Data Regression Models
No ratings yet
Survival Data Regression Models
21 pages
MACF Dan MPACF (Tiao - Box1981)
No ratings yet
MACF Dan MPACF (Tiao - Box1981)
16 pages
Basic Statistics & Measurement Levels
No ratings yet
Basic Statistics & Measurement Levels
4 pages
Stata Ordered Logistic Regression Guide
No ratings yet
Stata Ordered Logistic Regression Guide
32 pages
W02 Case Study-Math124 - Doc - W02CaseStudyOneSampleZ - Osazee
No ratings yet
W02 Case Study-Math124 - Doc - W02CaseStudyOneSampleZ - Osazee
8 pages
Correlation and Linear Regression Guide
No ratings yet
Correlation and Linear Regression Guide
23 pages
MAT2337 December 2010 Final Exam
No ratings yet
MAT2337 December 2010 Final Exam
11 pages