You are on page 1of 25

Sales Prediction For Big Mart

Group Members :
Mohammed Mansoor (160318733083)
Abdullah(160318733088)
Mohammed Yousuf Ullah khan(160318733086)
INDEX

INTRODUCTION

LITERATURE SURVEY

PROBLEM STATEMENT

PROPOSED SYSTEM

IMPLEMENTTATION

CONCLUSIONS

REFERENCES
Introduction

The dataset built with various


Huge shopping centers such as This can then further be used for
dependent and independent
big malls and marts are recording forecasting future sales by means
variables is a composite form of
data related to sales of items or of employing machine learning
item attributes, data gathered by
products algorithms
means of customer
Abstract

Machine learning algorithms To find out what role certain


Data gathered by means of
such as the random forests and properties of an item play and
customer, and also data related
simple or multiple linear how they affect their sales by
to outlet in a data warehouse
regression model understanding Big Mart sales
Applications

Helps every business make better Sales teams achieve their goals by
Allows companies to efficiently
business decisions. It helps in overall identifying early warning signals in
allocate resources for future growth
business planning, budgeting, and risk their sales pipeline and course-correct
and manage its cash flow
management. before it’s too late
Advantages Disadvantages

o Measuring the Company Health o Possibility of High Error


o Building Marketing Plans o Data Acquisition
o Planning of Supply based on Demand o Time and Space
o Capital for Small Businesses o Algorithm Selection
Literature Survey

Using Machine Learning to predict sales and anticipate demand

• How much stock should be ordered?


• How much revenue can be expected in a particular month?
• How many staff should be hired?

All these questions show how central sales forecasting is to business planning.
What is sales forecasting?

• company’s sales revenue for a specific time period – commonly a


month, quarter, or year

• A sales forecast is prediction of how much a company will sell in


the future.
Types of Sales forecasting:

Rule-based forecasting:
• manually developed rules and assumptions
• based on past data and known trends
• Ex : growing by 5% each year
• Sales Tomorrow = Sales Last Year * 1.05

Machine Learning forecasting:


• It learn the rules that would have to be manually designed
• Supervised learning is the task of learning the relationship
between outputs (sales) and inputs (past sales, economic
indicator, holiday calendar etc.)
Machine Learning to predict sales

• sales forecasting is a time-series


regression problem
• Time-series regressions are a particular
case of regression.
types of time series regressions models:

1. Auto-regressive models: These 2. Multivariate models: Multivariate models are


models predict future sales solely based based on a variety of inputs, including past sales,
on past sales values. They generate holiday calendars, or even economic
predictions by indicators. These models include Linear
finding trends and seasonality patterns Regressions, Neural Networks, Decision Tree-
based methods and Support Vector Machines.
Case Study: Parkdean Resorts

• Parkdean Resorts, the largest Holiday Park Operator in the


UK
• develop a model generating hourly food and beverage
sales forecasts in more than 180 venues

• overstaffing can be a substantial cost driver

• understaffing can significantly impact customer


satisfaction

• A reliable hourly forecast could therefore help fight these


issues
• A rule-based forecasting model would never have been
manageable
• A rule-based model designed in 2018 might not hold in
2019
Problem statement

• The objective is to predict the future sales from given data of the previous
year's using Machine Learning Techniques

• Another objective is to conclude the best model which is more efficient


and gives fast and accurate result by using XG Boost Regressor

• To find out key factors that can increase their sales and what changes
could be made to the product or store's characteristics.
Proposed System

The steps followed in this work, right from the dataset preparation to
obtaining results are represented in Fig.

Implementin
Pre- Training of g Machine
Raw Data Result
processing Dataset learning
model
Dataset and its Preprocessing

• The data scientists at BigMart


have collected 2013 sales data for
1559 products across 10 stores in
different cities.

• Using all the observations it is


inferred what role certain
properties of an item play and
how they affect their sales
Preprocessing

Data may contain null values, or redundant values, or various types of ambiguity, which demands for
pre-processing of data

Missing values : Checking for null values in each column and then replacing or filling them with
supported appropriate data types

Outliers: Any datapoint which is faraway or doesn’t follow the regular pattern. It can be treated by
deletion or Imputing the values

Feature Engineering:
feature transformation : When information is not represented in best possible way.
feature creation : Creating new features by combining existing features or another feature
feature scaling : Used to normalize the range of independent variables or features of data
feature selection : reducing the number of input variables
Algorithms

Random Forest regression :


Random forest algorithm is a very accurate algorithm
It operates by constructing several decision trees
during training time and outputting the mean of the
classes as the prediction of all the trees

Linear Regression Algorithm :


Regression can be termed as a parametric technique
which is used to predict a continuous or dependent
variable on basis of a provided set of independent
variables. It finds out a linear relationship between x
(input) and y(output). Hence, the name is Linear
Regression
Algorithms

Decision tree regression:


Decision tree builds regression or classification
models in the form of a tree structure. It breaks down
a dataset into smaller and smaller subsets while at
the same time an associated decision tree is
incrementally developed. The final result is a tree
with decision nodes and leaf nodes

Extremely randomized trees regression :


It is a variant of a random forest. Unlike a random
forest, at each step the entire sample is used and
decision boundaries are picked at random, rather
than the best one. In real world cases,
performance is comparable to an ordinary random
forest, sometimes a bit better
Implementation
Conclusion

Dataset Information

The data scientists at BigMart have collected 2013 sales data for 1559 products across 10 stores in different cities.
Also, certain attributes of each product and store have been defined. The aim is to build a predictive model and
find out the sales of each product at a particular store.
Using this model, BigMart will try to understand the properties of products and stores which play a key role in
increasing sales.
Libraries
o Pandas
o Matplotlib
o Sea born
o Sickit learn

Algorithms
o Linear Regression
o Decision Tree
o Random Forest
o Extra Trees

Mean Squared Error: 0.28


References

 https://www.kaggle.com/

 https://www.geeksforgeeks.org/python-programming-language/

 https://www.geeksforgeeks.org/machine-learning/

 https://www.geeksforgeeks.org/what-is-data-science/

 https://www.analyticsvidhya.com/blog/2021/05/an-introduction-to-statistics-for-data-sc
ience-basic-terminologies-explained/

 https://www.geeksforgeeks.org/exploratory-data-analysis-in-python/

 https://www.javatpoint.com/machine-learning-algorithms

You might also like