Professional Documents
Culture Documents
PREDICTION
Team 6
Kiran S (2113050)
Viswa M (2113112)
Danel Hilton W (2111011)
Pramod Raja (2111074)
Prasana (2111075)
Problem Statement
• Primary aim of any business is to maximize sale of products.
• But businesses find it difficult to predict sales of products.
• Product properties have to be considered for sale prediction.
• Big Mart has collected data regarding:
• Sale of 1559 products
• Attributes of each product is captured.
• Aim of the project:
• To categorize products according to properties.
• To predict the sale value of products.
Data
Collection
• Data Source:
https://datahack.analyticsvidhya.co
m/contest/practice-problem-big-ma
rt-sales-iii/
• Two sets of data are available:
• Training set (independent
variables with sale results)
• Test set (only independent
variables)
• Data about 6113 customers
• To build a model using training set
• To predict the survival result for
test set
Data Collection
• Item_Identifier, Outlet_Identifier
• Nominal variables
• Cannot be used for analysis
• Item_Weight, Item_Fat_Content,
Item_Visibility, Item_MRP,
Item_Outlet_Sales
• Quantitative variables
• Can be used directly for analysis
• Item_Type, Item_Fat_Content,
Outlet_Size,
Outler_Location_Type,
Outlet_Type
• Categorical variables
• Dummy variables for analysis
• Outlet_Establishment_Year
• Nominal variable
• Can be converted into quantitative
variable for analysis
Preprocessing
the data
• Presence of null values in Item
weight column.
• If removed, loss of 1400 datapoints.
• So, interpolation done to remove null
values.
𝑆𝑎𝑙𝑒 𝑉𝑎𝑙𝑢𝑒=2146.1966 −51.63 ( 𝑌𝑒𝑎𝑟𝑠 )+1843.74 ( 𝑆𝑢𝑝𝑒𝑟𝑚𝑎𝑟𝑘𝑒𝑡 𝑡𝑦𝑝𝑒 1 ) +2738.33 (𝑆𝑢𝑝𝑒𝑟𝑚𝑎𝑟𝑘𝑒𝑡 𝑡𝑦𝑝𝑒 3)
Cluster 1: Low weight,
Low Price products
𝑆𝑎𝑙𝑒𝑉𝑎𝑙𝑢𝑒=666.2371−26.71 ( 𝑌𝑒𝑎𝑟𝑠 )+447.34 ( 𝑀𝑒𝑑𝑖𝑢𝑚) +560.29 ( 𝑆𝑚𝑎𝑙𝑙 )+780.57 ( 𝑇𝑖𝑒𝑟 3 ) +902.94 ( 𝑆𝑢𝑝𝑒𝑟𝑚𝑎𝑟𝑘𝑒𝑡 𝑡𝑦𝑝𝑒1 ) +1438.07 (𝑆𝑢𝑝𝑒𝑟𝑚𝑎𝑟𝑘𝑒𝑡 𝑡𝑦𝑝𝑒3)
Cluster 2: High weight,
Low Price products
𝑆𝑎𝑙𝑒 𝑉𝑎𝑙𝑢𝑒=540.72 −490.78 ( 𝑆𝑜𝑓𝑡 𝑑𝑟𝑖𝑛𝑘𝑠 )+1439.93 ( 𝑇𝑖𝑒𝑟 3 ) +2135.39 ( 𝑆𝑢𝑝𝑒𝑟𝑚𝑎𝑟𝑘𝑒𝑡 𝑡𝑦𝑝𝑒 1 )+ 2055.7(𝑆𝑢𝑝𝑒𝑟𝑚𝑎𝑟𝑘𝑒𝑡 𝑡𝑦𝑝𝑒 3)
Next step-Prediction
• When product attributes are given, it is important to identify
cluster before predicting sale value.
• So, classification is done before regression is done.
• Use the same train set to predict cluster type (but without
including sale value)
Classification models under consideration
Classification models in
Model validation
consideration:
• Classification Regression Technique Training set data has been broken into
(CART) 70% train set & 30% test set.
• Random Forest
• k-Nearest Neighbour Algorithm
(KNN)
• Adaptive Boosting
• Gradient Boosting
• Naïve Bayes classifier
• Support Vector Machine (SVM)
Comparing accuracy of models
S.No Name of the model Accuracy
High weight, High Price Products Low weight, Low Price Products
Recommendations
• Sell in all locations • Sell in Tier 3 location
• Sell in all outlet sizes • Sell in Small size outlets
for maximizing • Try to sell in new outlets • Try to sell in new outlets
sale value
High weight, Low Price Products Low weight, Low Price Products