You are on page 1of 8

Sales Prediction using Machine Learning

R. Praveena) D. Praveen Kumar b) A. Prince Sam c) and G. Sivakama Sundarid)

Department of Computer Science and Engineering, National Engineering College,


Kovilpatti, Tamil Nadu, India.

a)
Corresponding author: praveenkrish333@gmail.com
b)
gsscse@nec.edu.in

Abstract: This sales prediction project aims to set the sales target to give a better result. The primary use of sales prediction
is to establish sales performance goals for the concern and maintain inventory products. This project offers accurate results
and helps in estimating the number of raw materials for future purposes. In this project, the sales are predicted for the
products which are bought in a combination. By knowing the better inventory products, a retailer or a shop keeper can
improve their business in terms of profit. Based on the dataset of the company, this can predict the future sales of that sales or
products of that concern.

Keywords: Forecasting, inventory products, combination, predict, future sales

INTRODUCTION

Our sales prediction project is the process of estimating future sales and is to set the sales target to give a better
result. The main objective of this work is to establish sales performance goals for the organization and to maintain
the inventory products. This is easier for established companies to predict future sales based on years of past
business data. Newly founded companies have to base their prediction on less-verified information, such as market
research, to forecast future business. It is one of the pillars of proper financial planning. Sales forecasting gives
insight into how a company should manage its workforce, cash flow, and resources. Accurate sales forecasts enable
companies to make informed business decisions and predict short-term and long-term performance. This project
gives accurate results and helps in estimating the number of raw materials for future purposes. Companies can base
their predict on past sales data, industry-wide comparisons, and economic trends.

In this project, sales are predicted for the products bought in a combination implemented using python. A
primary GUI platform of python (Tkinter) is for the betterment of user-friendly application. Based on the company's
data, the model can predict the future sales of that sales or products of that concern. Association Mining rules aim to
extract attention-grabbing correlations, frequent patterns, and association structures among the set of items or
objects in transaction data based relational databases or separate data repositories. Two statistical measures that rule
Association Rule Mining rules are Support and Confidence. Support is measured as it occurs in the database.
Confidence is gauged to seek out the strength of the state. The Association rules are fascinating if they satisfy each a
minimum Support threshold and a minimum confidence threshold. This paper aims to commence the Apriori
algorithm. The Apriori algorithm generates frequent candidate itemsets.

`
Recent review papers (Benjamin Flesch et al., 2015) proposed an alternate approach to big social data analytics,
social set analysis (SSA), based on the sociology of associations, the mathematics of set theory, and advanced visual
analytics of event studies. They discussed the new analytical approach to social set analysis. They concluded with a
discussion of the benefits of set-theoretical strategies based on associational sociology's social, philosophical
approach.

Giuseppe Nunnari and Valeria Nunnari (2017) presents a case study concerning forecasting monthly retail time-
series recorded by the US Census Bureau from 1992 to 2016. The modelling problem is tackles in two steps. First,
original time series are de-trended by using a moving windows averaging approach. The residual time series are
subsequently modelled by Non-linear Auto-Regressive (NAR) models, using both Neuro-Fuzzy, and Feed-Forward
Neural Networks approaches. The goodness of the forecasting models objectively assesses by calculating the bias
errors. Finally, the model skill index calculates, considering the traditional persistent model as areference.

Daniel Bachar proposed that sales forecasting has tremendous value across the entire organization. To gain
profit, however, requires a clear understanding of where it is used. In general, any planning function requires some
educated guesses about what will happen in the future – and that's the critical value of a good forecast.

Wenjie Huang et al. (2015) proceed with a new perspective that focuses on choosing an appropriate approach to
forecast sales with higher effectiveness and more accurate precision. This paper's data provides a well-known,
competitive Chinese online shopping company that is part of the B2C market in e-commerce book sales. We search
a new research field, e-commerce, and apply actual sales data to several classical prediction models, aiming to
discover a trigger model that could select the appropriate forecasting model to predict sales of a given product.

Nari Sivanandam Arunraj and Diane Ahrens (2015) attempted to develop a seasonal autoregressive unified
moving average with exterior variables (SARIMAX) model to forecast daily sales of biodegradable food. The
process of attaching a SARIMAX model in this study involves: (i) the development of the Seasonal Autoregressive
Integrated Moving Average (SARIMA) model and (ii) combining the SARIMA model and the demand determining
factors using linear regression. As the SARIMAX using multiple linear regression (SARIMA-MLR) model
produced only mean forecast, underestimating and overestimating is very high due to high service level, peak, and
sparse sales in the retail food industry. Therefore, hybrid SARIMA and Quantile Regression (SARIMA-QR)
developed to construct high and low quantile predictions.

Kayalvizhi Subramanian et al. (2020) used a novel hybrid model for automobile sales prediction results of time
series forecasting. They proposed a technique that uses automobile sales data of recent years and prediction
outcomes of time series to train the proposed forecasting model. The validation was carried out using real-world
automobile sales data for both training and testing purposes. The proposed method outperforms the other neural
network forecasting model in terms of automobile sales forecasting.

Quiman Huanga and Feng Zhou (2017) used the two-year sales data of a supermarket to validate the proposed
clustering algorithm, achieve the goal of subdividing customers, and then analyze the clustering results to help
enterprises take different marketing strategies for other customer groups to improve salesperformance.

Ayla Sayli, Isil Ozturk, and Merve Ustunel (2016) implemented a brand loyalty analysis system to discover
brand loyalty using data mining techniques. Data are increasing day by day. Companies need new technologies and
analysis to support their system automatically and intelligently analyze large data repositories to obtain helpful
information. We use the data clustering algorithm of K-means for data analysis. Our policy is based on the data
preparation algorithm, and then it constructs the sales tables, which contain the sale quantity for each product.

(M Korolev and K Ruegg , 2015) Machine learning forecasting techniques have several applications for
businesses. In particular, these techniques can work quite well for large chain stores that can gather essential data.
Using this data to make enlightened decisions for the future can have enormous financial consequences when

`
scaling to many stores. Accurately, this information can inform businesses on optimal staff levels, product
shipments, and sales promotions at each branch.

Anies Fuady et al. (2020) aimed to describe the process of students' reflective abstraction in terms of impulsive
and thoughtful cognitive styles in solving mathematical problems. This research is a descriptive study utilizing a
qualitative approach. This study wants to get a picture of a student's reflective abstraction in solving mathematical
problems from the perspective of impulsive and thoughtful children. The output of this study can see the effects of
written tests andinterviews.

METHODOLOGY

In this project, the algorithm implemented to solve the problem is the Apriori Algorithm. Apriori Algorithm is
one of the association rule mining techniques.

For instance, if items A and B bought together more frequently, several steps can increase the profit. ( For
example, A→Bread and B→Jam )

The process of identifying an association between products is known as association rule mining. There are three
major parts of the Apriori algorithm:

1. Support

2. Confidence

3. Lift

The above components are explained using an example :

Suppose we have a record of 1 thousand customer transactions, and we want to find the Support, Confidence,
and Lift for two items, e.g., A and B. Out of 1000 sales, 100 contain B while 150 include A. Out of 150 transactions
where A is purchased, 50 deals have B as well. Using this data, we want to find the support, confidence and lift .

Support

Support refers to the default popularity of an item. By finding the number of transactions containing a specific
item divided by the total number of sales, give the support value. Suppose we want to find support for item B, The
Mathematical Representation for that is:

Support (B) = (Transactions containing (B)) / (Total Transactions) (1)

`
For instance, if out of 1000 transactions, 100 transactions contain B, then the support for item B can be
calculated as:

Support(B) = (Transactions containing B)/(Total Transactions) Support(B) = 100/1000 = 10%

Confidence

Confidence refers to the likelihood that item B is also bought if item A is obtained. By finding the number of
transactions where A and B are purchased together, divided by the total a Number of transactions where A is bought
will give the confidence value. Mathematically, it is represented as:

Confidence (A→B) = (Transactions containing both (A and B)) / (Transactions containing A)(2)

We had 50 transactions where A and B were bought together. While in 150 sales, As are purchased. Then we can
find the probability of buying B when A is purchased can be represented as the confidence of A→B and can be
mathematically written as:

Confidence(A→B) = (Transactions containing both (A and B))/(Transactions containing A) Confidence(A→B)

= 50/150 = 33.3%

Lift

Lift(A→B) refers to the increase in B's ratio when A is sold. Lift(A→B) can be calculated by dividing
Confidence(A→B) divided by Support(B). Mathematically it is represented as:

Lift (A→B) = (Confidence (A→B)) / (Support (B)) (3)

Lift(A→B) is calculated as:

Lift(A→B) = (Confidence (A→B))/(Support (B)) Lift(A→B) = 33.3/10 = 3.33

Lift tells us that the likelihood of buying A and B together is 3.33 times more than the probability of just purchasing
the B. A Lift of 1 means there is no association between products A and B. Lift of greater than one means products
A and B are more likely to be bought together. Finally, a Lift of less than 1 refers to the case where two products are
unlikely to be purchased together.

`
FIGURE1. Flow Diagram

Steps Involved in Apriori Algorithm

For vast sets of data, there can be hundreds of items in nonpareils transactions. The Apriori algorithm tries to
withdraw rules for each possible combination of things.

For instance, Lift is calculated for item 1 and item 2, item 1 and item 3, item 1 and item 4 and then item 2 and
item 3, item 2 and item 4 and then combinations of elements, e.g., item 1, item 2 and item 3; similarly item 1, item2,
and item 4, and so on.

As you can see from the above example, this process prolongs due to the combinations.

To boost up the process, we must do the following steps:

2.2.1 Fix a minimum value for support and confidence shows that we are only interested in finding rules for the
items with particular default existent (e.g., support) and have a minimum value for co-occurrence with different
things (e.g., confidence).

2.2.2 Extract all the subsets having a more significant value of support than the minimum threshold.

2.2.3 Select all the conditions from the subsets with a confidence value more significant than the minimum
threshold.

2.2.4 Order the rules by higher to lower values of Lift.

`
RESULTS AND DISCUSSION

Association rule mining algorithms like Apriori are beneficial for identifying simple associations between the
data items. They are easy to execute and have high explainability.

The following picture shows the initial step of this work :

FIGURE 2.Initial stage - giving input

After this step, upload the dataset file as input. This input file will get uploaded and process the data according to
the implementation.

FIGURE3. Choosing dataset file

`
The above step is the process of uploading the input dataset file. By clicking on the upload button, the system file
explorer will open, and the data needed is to be uploaded.

After uploading the input dataset file, the Sales Predictor will process that file and gives the output. The two
combinational products will display in the production, which has attained the minimum value of the used
components.

FIGURE4 . Output-Combinational Products

The above-displayed picture shows the output corresponding to the input dataset file.

REFERENCES

1. B. Flesch, R. Vatrapu, R. R. Mukkamala, A. Hussain, "Social set visualizer: A set-theoretical approach to big social
data analytics of real-world events," Big Data (Big Data) 2015 IEEE International Conference on, pp. 2418-
2427, 2015, October.

2. Giuseppe Nunnari, Valeria Nunnari, "Forecasting Monthly Sales Retail Time Series: A Case Study," Proc. of IEEE
Conf. on Business Informatics (CBI), July2017.

3. [online] Available: https://halobi.com/blog/sales-forecasting-five-uses/.

4. W. Huang, Q. Zhang, W. Xu, H. Fu, M. Wang, X. Liang, "A Novel Trigger Model for Sales Prediction with Data
Mining Techniques," Data Science Journal, vol. 14, 2015.

`
5. N. S. Arunraj, D. Ahrens, "A hybrid seasonal autoregressive integrated moving average and quantile regression for
daily food sales forecasting," Int. J. Production Economics, vol. 170, pp. 321-335, 2015.

6.Kayalvizhi Subramanian, Mahmod Bin Othman, Rajalingam Sokkalingam, and Gunasekar Thangarasu, "A New
Approach For Forecast Sales Growth In Automobile Industry," International Journal of Scientific and
Technology Research, Vol. 9, Issue 1, January 2020 Edition.

7.Q. Huang, F. Zhou, "Research on retailer data clustering algorithm based on a spark," AIP Conference
Proceedings, vol. 1820, no. 1, pp. 080022, 2017, March.

8.A.Sayli, I. Ozturk, M. Ustunel, "Brand loyalty analysis system using K-Means algorithm," Journal of Engineering
Technology and Applied Sciences, vol. 1, no. 3, 2016.

9.M. Korolev, K. Ruegg, Gradient Boosted Trees to Predict Store Sales, 2015.

10. Anies Fuady, Purwanto, Susiswo, Swasono Rahardjo, "Student Reflective Abstraction Of Impulsive And Reflective
In Solving Mathematical Problem," International Journal of Scientific and Technology Research, Vol. 9, Issue
2, February 2020 Edition.

You might also like