You are on page 1of 9

DATA ANALYTICS FOR MANAGERS

CIA-1: DATA VISUALISATION AND PROBABILITY


TOPIC: ANALYSIS OF SUPERSTORE DATA

SUBMITTED TO:
Dr. RAJASHREE KAMATH K

SUBMITTED BY:
ANAND KRISHNAN
G

REGISTER NUMBER:
2227210

CLASS:
1-MBA-
C
TABLE NO. TABLE TITLE PAGE
NUMBER

1. Probability Table 7

GRAPH NO. GRAPH TITLE PAGE


NUMBER

1. Bar Chart 3

2. Pie Chart 4

3. Scatter Plot 5

4. Histogram 6

5. Normal Distribution 8

SR. NO. CHAPTER TITLE PAGE


NUMBER

1. Introduction 3

2. Data Visualization 3

3. Probability Distribution 7

4. Conclusion 8

5. References 9

2
INTRODUCTION

The dataset is of a superstore spanning across The United States of America.


This dataset has many features such as ship mode, Segment, country, City,
State, Postal code, Region, category, sub-category, sales, quantity, discount,
purchase, and profit. The operation of the superstore and its sustainability in
the future will be examined using a variety of visualizations using this data.

DATA VISUALIZATION

BAR CHART:

A bar chart is a type of graph that uses rectangular bars with heights or
lengths proportional to the values they represent to display categorical data.
Here it is used to compare different sub-groups in the data. The below
visualization was performed using the sub-category of products and their
profit to depict which sub-category brings in the most profit.

3
From the visualization, the conclusion can be made that Copiers, Phones, and
Accessories brought in the most profit. Tables and Bookcases brought the most
losses. Tables production needs to be stopped to avoid further incurring losses.

PIE CHART:

A pie chart is a circular statistical representation graphic divided into slices to


show numerical proportions. In a pie chart, the arc length of each slice is
proportional to the quantity it represents. This visualization shows the
category of products and how much quantity of each product is sold.

Technology
Furniture
19%
21%

Furniture
Office Supplies Technology

Office Supplies 60%

It can be observed that Office supplies are sold more than the sales of
Furniture and Technology combined contributing to more than half of the total
sales which shows the demand for Office supplies is very high.

4
SCATTER PLOT:
Scatter plots are graphs that present the relationship between two variables in
a data set.

Profit
10000
8000
6000
4000
2000
0

0 5000 10000 15000 20000 25000


-2000

-4000
-6000
-8000

From the above visualization, it can be concluded that Sales and Profit are
moderately correlated.

Discount
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0

0 5000 10000 15000 20000 25000

From the above visualization, it can be concluded that Sales and Discounts are
negatively correlated. It can be assumed that discounts did not necessarily
create that many purchases amongst the customers to generate profit.

5
HISTOGRAM:
A histogram is a graph of data that utilizes bars of varied heights to depict the
data. A histogram's bars divide numbers into ranges. Taller bars indicate that
more data falls within that range. A histogram displays the shape and
distribution of continuous sample data.

Histogram
100
90 87
84
80
70
60
50
Frequen

44
38
40 33 31
28
30

20 14
8
10
2 1 1 2 2
0
1 2 3 4 5 6 7 8 9 1011121314
Bin

For this visualization, the quantity sold was plotted as a histogram. This
illustration clearly shows that the most frequently bought product quantity is
2-3 while the least bought quantity is 11-12.
DATA DISTRIBUTION:
The distribution's shape is a fundamental property of data that can be used to
determine which measure of central tendency best represents the data's
centre.The direction of the skew in a skewed distribution indicates which way
the long tail extends. The long tail of a right-skewed distribution extends to
the right, while the majority of values cluster on the left, as shown in our
histogram.The overall sales are thus dominated by sales of lesser quantity as
compared to bulk orders.
6
PROBABILITY DISTRIBUTION:
In general, the term probability distribution refers to a representation of a data
series' frequency distribution. Among the variable values, it depicts the
possibility of specific trials under specific conditions. The basic rule of the
probability distribution is that the greater the probability of a value, the
greater the frequency, and vice versa.
After calculating the frequency of quantity sold, the average and standard
deviation of this data were calculated. Then using that Normal Distribution was
calculated as shown below.
The normal probability distribution graph, also known as the bell curve, is a
method for determining a dataset's value distribution. This function is entirely
dependent on the dataset's mean and standard deviation values.
Quantity Frequency Average Std Dev Normal Distribution
1 33 26.78571 29.17897 0.01336568
2 87 0.001625972
3 84 0.001999684
4 44 0.011488482
5 38 0.012698892
6 31 0.013530397
7 28 0.013660422
8 14 0.012420727
9 8 0.011113081
10 2 0.009531446
11 1 0.009252538
12 1 0.009252538
13 2 0.009531446
14 2 0.009531446

7
From the below graph it can be inferred that quantity is normally distributed.

Probability Distribution
100

80

60

40

20

0
0 2 4 6 8 10 12 14 16
-20

Frequency Normal Distribution

Conclusion

 Copiers, Phones, and Accessories brought in the most profit. Tables and
Bookcases brought the most losses. Tables production needs to be
stopped to avoid further incurring losses.
 Office supplies are sold more than the sales of Furniture and Technology
combined contributing to more than half of the total sales which shows
the demand for Office supplies is very high.
 It can be concluded that Sales and Profit are moderately correlated and
that Sales and Discounts are negatively correlated. It can be assumed
that discounts did not necessarily create that many purchases amongst
the customers to generate profit.
 The quantity column of this data is not normally distributed but left
skewed. This implies people prefer buying in smaller quantities than in
bulk.

8
References
Dataset

Probability Distribution

You might also like