Professional Documents
Culture Documents
Complete each section. When you are ready, save your file as a PDF document and submit it
here: https://coco.udacity.com/nanodegrees/nd008/locale/en-
us/versions/1.0.0/parts/7271/project
Considering the K-means report, Adjusted Rand and Calinski-Harabasz indices as depicted
below, the optimal number of store formats is 3.
Both AR and CH indices have highest median value at 3 clusters. So this corresponds to 3 store
formats.
2. How many stores fall into each store format?
This is as below:
3. Based on the results of the clustering model, what is one way that the clusters differ from one
another?::
• Cluster 1 stores:
o Sold more General Merchandise in terms of percentage
o Has highest medial total sales
o Range is also highest in Cluster 1
• Cluster 2
o Sold more Produce in terms of percentage
• Cluster 3:
o Most similar- most compact range
Cluster 1 stores have highest medial total sales when compared to the other 2. Its range of total sales
and most of other categorical sales are also the largest. Cluster 3 stores are the most similar in terms
of sales due to more compact range.
4. Please provide a Tableau visualization (saved as a Tableau Public file) that shows the location
of the stores, uses color to show cluster, and size to show total sales.
This is as below:
The Tableau links are as below:
https://public.tableau.com/profile/siddharth3961#!/vizhome/Task1_15540608764730/Vizualizatio
nbyCluster
I have chosen the Boosted Model since it is highest in both Accuracy and F1 score.
2. What format do each of the 10 new stores fall into? Please fill in the table below.
Basis the score tool, the segments for the new stores have been predicted as below:
1. What type of ETS or ARIMA model did you use for each forecast? Use ETS(a,m,n) or
ARIMA(ar, i, ma) notation. How did you come to that decision?
ARIMA models are displayed in the terms (p,d,q). These are explained as below:
I have re-plotted the ACF and PACF graphs after taking 1 seasonal difference. Even after this,
the ACF still shows high co-relation.
I have re-plotted the ACF and PACF graphs after taking the 1st difference. There is no
significant co-relation now.
The accuracy of ETS model is higher compared to ARIMA model. I have used a holdout sample
of 6 months data.
The in-sample error measure for ETS Time Series are as below:
The forecast table is as below having forecasts from both existing and new stores:
This is for the year 2016.
Existing New
Month Total
Stores Stores
Jan-16 21539936 2587451 24127387
Feb-16 20413771 2477353 22891124
Mar-16 24325953 2913185 27239138
Apr-16 22993466 2775746 25769212
May-16 26691951 3150867 29842818
Jun-16 26989964 3188922 30178886
Jul-16 26948631 3214746 30163376
Aug-16 24091579 2866349 26957928
Sep-16 20523492 2538727 23062219
Oct-16 20011749 2488148 22499897
Nov-16 21177436 2595270 23772706
Dec-16 20855799 2573397 23429196
https://public.tableau.com/profile/siddharth3961#!/vizhome/Task3_15540600989490/Tot
alProduceSalesForecast?publish=yes
Task 1: Workflow
Task 2: Workflow
Task 3 Workflows:
Cont..