You are on page 1of 11

Project – Time Series

Forecasting

Sparkling Dataset

SAI RAMAKANTH
Problem 1: Time Series Forecasting

For this particular assignment, the data of different types of wine sales in the 20th century is to
be

analyzed. Both of these data are from the same company but of different wines. As an analyst
in the

ABC Estate Wines, you are tasked to analyze and forecast Wine Sales in the 20th century.

Data set for the Problem: Sparkling.csv

Q 1.1. Read the data as an appropriate Time Series data and plot the data.

Answer 1.1 :

Step a : Import the libraries

Step b : Reading/loading the data as an appropriate time series and checking the head of the
data :
Observation : Time series dataset looks good based on initial records seen in top 5 count.

Step c : Plotting the time series


Q 1.2. Perform appropriate Exploratory Data Analysis to understand the data and also perform

decomposition.

Answer 2.1 :

Performing EDA to understand the data :

#Describing the data :

There are 187 columns and 1 rows in the dataset.

Plotting a boxplot to understand the sales of ROSE wine across different years and within
different

months across years. :


The above picture shows the trend over the period between 1990 to 1995 with outliers in most
of the

years.

#Monthly boxplot :

The above picture shows the trend of sales in months , December seems to have the highest
sales and

January seems to have lowest sales.

#Graphical monthplot of the give Time Series :


The above picture shows the monthly graphical representation. #Plotting a graph of monthly sales
across years :

 1987 had the highest sales of all years in December.

 1995 January had the least sales

The trend and seasonality are present same as in case of additive model. But residuals plot is clearly
showing the concentration of data towards 1 point. Hence it can be concluded that series is
multiplicative
Q 1.3. Split the data into training and test. The test data should start in 1991.

Answer 1.3 :

Train has been split with data before 1991 and test with data after 1991

Train has 132 rows and test has 55 row


#Graphical representation of train and test data :
Q 1.4. Build various exponential smoothing models on the training data and evaluate the model
using

RMSE on the test data.

Answer 1.4 :

#Model 1: Linear Regression


The predicted trend is increasing indicating.

Defining the accuracy metrics :

Model Evaluation: Test RMSE

The RSME on test data value is 1389.135, value is not very high but since seasonality is also not
taken

care by model this model is not suitable predictions on Sparkling time series data.

#Model 2: Naive Approach:

Head (5 rows) of Test data :

You might also like