You are on page 1of 21

NATIONAL INSTITUTE OF AGRICULTURAL

EXTENSION MANAGEMENT

BUSINESS ANALYTICS
Batch of 2018-20

Retail Sales Project


FINAL REPORT
Date of Submission:07-02-2019

Submitted to – Prof. (Dr.) Sridhar Vaithianathan.

Submitted by –
Abhishek Sharma (05)
Basina Kiran Kumar (31)
1
I. Table of Contents
Q. No Topic Covered Page No
1 Basic Questions
2 Probability
3 Conditional Probability
4 Joint Probability
5 Binomial Distribution
6 Normal Distribution
7 Sampling Distribution
8 Sample Size
9 Interval Estimate
10 Sample Size Estimation

II. About the Dataset.


This data is of BIGMART retail store in 2017 for 1457 products
across 10 outlets in different cities. It is having different qualitative
as well as quantitative variables.
From this data Bigmart wants to find the factor that increase or
decrease the sales- output. What is the key factor to increase the
sale?
As a new intern I need to solve and find various questions regarding
this dataset.

2
III. DATA SET: ANALYSIS & RESULTS.
QUESTION 1.- Solutions

a) 1457 products
As par with the data given, If we assume the margin of error of 5%
and level of confidence 95%, we need to take 360 samples to analyze
the data.

b) 12 variables in data as listed


i. Item Identifier
ii. Item Weight
iii. Item Fat Content
iv. Item Visibility
v. Item Type
vi. Item MRP
vii. Outlet Identifier
viii. Outlet Establishment Year
ix. Outlet Size
x. Outlet Location Type
xi. Outlet Type
xii. Item Outlet Sales
c) Classification of data- Nominal, Ordinal, Interval, Ratio
i. Item Identifier = Nominal
ii. Item Weight = Ratio
3
iii. Item Fat Content = Ordinal
iv. Item Visibility = Ratio
v. Item Type = Nominal
vi. Item MRP = Ratio
vii. Outlet Identifier = Nominal
viii. Outlet Establishment Year = Interval
ix. Outlet Size = Ordinal
x. Outlet Location Type = Ordinal
xi. Outlet Type = Ordinal
xii. Item Outlet Sales = Ratio

d)

Qualitative Quantitative
Item identifier Item Weight
Item fat content Item Visibility
Item type Item MRP
Outlet Identifier Outlet Establishment
Year
Outlet Size Item Outlet Sales
Outlet Location Type
Outlet Type

e) This is a Cross sectional data, because there are many variables


which are collected at same time period

4
f) MEAN MEDIAN MODE OF QUANTITATIVE DATA

ITEM OUTPUT ITEM VISIBILITY ITEM WEIGHT ITEM MRP OUTPUT


SALES ESTABLISHMENT
YEAR
MEAN 2170.014226 0.070877867 12.86293625 140.371766 1997.84475
MEDIAN 1777.686 0.058132006 12.6 142.2996 1999
MODE 958.752 0.058542509 12.15 172.0422 1985

For Item Visibility – Median is preferred too many extreme values are
also there.

For Item Weight- Mean is preferred because here it helps in getting


average weights of different items

For Item MRP – Median will be preferred because extremes are there,
like a person wants to buy something neither too high nor too low price

But we can also go for mean .

For Outlet Establishment Year- we can use mode and median, but the
best will be MODE because we will be able to know that in 1985
maximum outlets opened

For Item Output Sales- Median is preferred because of extreme values


you can see the range below its too high

5
g) ITEM ITEM ITEM ITEM OUTPUT
OUTPUT VISIBILITY WEIGHT MRP ESTABLISHMENT
SALES YEAR
Mean 2170.014226 0.070877867 12.86294 140.3718 1997.845
Standard
Error 19.06090598 0.000561118 0.051995 0.696453 0.093568
Median 1777.686 0.058132006 12.6 142.2996 1999
Mode 958.752 0.058542509 12.15 172.0422 1985
Standard
Deviation 1704.859259 0.050187881 4.650581 62.29264 8.369014
Sample
Variance 2906545.094 0.002518823 21.6279 3880.373 70.0404
Kurtosis 1.72237215 1.731135114 -1.22985 -0.89485 -1.20356
Skewness 1.202300733 1.199493109 0.077047 0.137919 -0.3983
Range 13053.6748 0.32481625 16.795 235.5984 24
Minimum 33.29 0.003574698 4.555 31.29 1985
Maximum 13086.9648 0.328390948 21.35 266.8884 2009
Sum 17360113.81 567.0229365 102903.5 1122974 15982758
Count 8000 8000 8000 8000 8000

6
h)

Grand
Tier 1 Tier 2 Tier 3 Total
Sum of Item 4169175.0 6056090.8 7134847.9 17360113.
Outlet Sales 17 34 57 81

Based on location type sales

Standard Deviation 1722.331009


Average 2130.927738
COEFFICIENT OF VARIATION 0.808254066 SD/AVERAGE

i)

Row Labels Low Fat Regular Grand Total


Baking Goods
Breads
Breakfast
Canned
Dairy
Frozen Foods
Fruits and Vegetables
Hard Drinks
Health and Hygiene
Household
Meat
Others

7
Seafood
Snack Foods
Soft Drinks
Starchy Foods
Grand Total

Row Labels Low Fat Regular Grand Total


High
Medium
Small
(blank)
Grand Total

Row Labels Low Fat Regular Grand Total


Tier 1
Tier 2
Tier 3
Grand Total

items Low Fat Regular Grand Total


Baking Goods 329 318 647
Breads 140 111 251
Breakfast 41 69 110
Canned 341 308 649
Dairy 415 261 676
Frozen Foods 453 407 860

8
Fruits and Vegetables 630 602 1232
Hard Drinks 214 214
Health and Hygiene 278 278
Household 637 637
Meat 170 255 425
Others 161 161
Seafood 37 27 64
Snack Foods 691 508 1199
Soft Drinks 374 75 449
Starchy Foods 82 66 148
Grand Total 4993 3007 8000

Grand
items High Medium Small (blank) Total
Baking Goods 73 203 186 185 647
Breads 25 83 71 72 251
Breakfast 13 36 30 31 110
Canned 65 217 189 178 649
Dairy 79 215 196 186 676
Frozen Foods 93 275 492 860
Fruits and
Vegetables 142 413 328 349 1232
Hard Drinks 23 75 50 66 214

9
Health and
Hygiene 32 91 73 82 278
Household 78 202 172 185 637
Meat 41 149 119 116 425
Others 15 49 53 44 161
Seafood 5 21 20 18 64
Snack Foods 125 407 335 332 1199
Soft Drinks 50 138 128 133 449
Starchy Foods 19 48 38 43 148
Grand Total 878 2622 2480 2020 8000

establishment year Sum of Item_Weight


1985 17655.245
1987 11396.08
1997 11174.47
1998 6752.085
1999 11292.14
2002 10976.61
2004 11290.225
2007 11166.625
2009 11200.01

10
Q2=

a)

Grand
Gender Low Fat Regular Total

1 2442 1558 4000

2 5102 2898 8000


Grand
Total 7544 4456 12000

Gender Low Fat Regular fat Total


1 1558 1558 4000
2 2551 1449 4000
4993 3007 8000

b) Simple-Probability that the randomly selected respondent is


likely to purchase item with LOW FAT CONTENT
Joint- Probability that the randomly selected respondent is likely
to purchase item with REGULAR FAT CONTENT and is male.
c) 0.624125 (4993/8000)
d) 0.19475 (1558/8000)
e) Yes, they are independent
f) 0.6105 (2442/4000)
g) 0.48187 (1449/3007)

11
 Q3-

Item type Frozen Foods

Sum of Item Outlet


Row Labels Sales defective
Cold King 498607.633 23743.22062
Cool Stone 362217.8372 13931.45528
Mountain Dew 971772.3848 88342.94407
Grand Total 1832597.855 126017.62

i. What is the probability that the item is supplied by Cool Stone ?


P (CS/D) = P(CS) P(D/CS)
P(CS) P(D/CS) + P(MD) P(D/MD) + P(CK)

P(CS/D) = 0.008/0.073 = 0.109


P(CS) = 362217.8372/1832597.855
P(D/CS)= 13931.45528/362217.8372
P(CS)P(D/CS) = 0.0079=.008
Similarly find for

12
P(mountain dew
971772.3848/1832597.855)P(88342.94407/971772.3848)
P(cold king 498607.633/1832597.855) (88342.94407/971772.3848)

ii. Based on the probability obtained by the above question calculate the
probability that the item selected by quality supervisor found to be
defective and supplied by Cold king (two Level )

p(CK/D) = 0.015/0.073 = 0.205

4. BIGMART Sales Manager sent out fliers to its loyal customers


indicating that they have already won of three different prizes: An
automobile valued at $25000, $100 Mobile recharge card or $5
BIGMART Shopping card. To claim the prize a loyal customers need
to present the flier at the store. The fine print at the back of the flier
indicates the probabilities of wining. The chance of wining the car was
1 out of 8000, the chance of winning the Mobile recharge card was 1
out of 8000, and the chance of winning the BIGMART shopping card
was 7998 out of 8000.

a. How many fliers do you think the Managers sent out?


If we assume in total, we have 8000 customers so 8000 fliers
b. Using your answer to (a) and the probabilities listed on the flier what
is the expected value of the prize won by the loyal customer who
has received flier.
65090/8000= 8.13625
13
Prizes No. Amount
25000 1 25000
100 1 100
5 7998 39990
65090

c. Using your answer to (a) and the probabilities listed on the flier what
is the standard deviation of the value of the prize won by the loyal
customer who has received flier.
Standard deviation = 0.34375
d. Do you think this is an effective promotion? Why or why not?
If this promotion leads to the increase in the profits more than the $65,090
then it is effective.

5. According to BIGMART store analysis, among their stores 50% of


them provide home delivery service if the value of the item total
purchased exceeds $100. Using binomial distribution what is the
probability that the next 6 stores surveyed

By USING Binomial function in excel we get these values for -


0 0.015625
1 0.09375
2 0.234375
3 0.3125
4 0.234375 4+5+6 0.34375
5 0.09375
6 0.015625

14
a. Four stores will provide home delivery services
Probability = 6C4 (0.5)4 (0.5)2 = 0.2343
b. All six stores will provide home delivery services
Probability = 6C6 (0.5)6 (0.5)0 = 0.015625
c. At least four stores will provide home delivery services
Probability = P(4) + P(5) + P(6) = 0.34375
e. What are the mean and standard deviation of number of stores which
provides home delivery services in a survey of six stores?

6. Calculate Mean and standard deviation for the normally


distributed random variable “Item Outlet Sales” from the
BIGMART sales data.
Mean = 2170.014
Standard Deviation = 1704.75

Mean 2170.014226
Standard
deviation 1704.859259

Sales more than “5000” =584 a) Probability=0.073

15
Sales between “1000-3000”
=3537 b) Probability=0.442125

a. what is probability that the randomly selected store has achieved sales
more than $5000
P(x>$5000) = stores having more than 5000= 584 divided by
total 8000 = 0.073
b. what is probability that the randomly selected store has achieved sales
between $1000 and $3000
Probability =0.442125
c. Between what two values will the middle 95% of the store sales will
fall

z value ( 0.025 and 0.975)= - 1.96 and +1.96


1.96 = x – 2170.014 /1704.75 and -1.96 = x – 2170.014
/1704.75
-1171.301 < x < 5511.32

7. Weight of the assorted Dairy Products kept in a gift box in


BIGMART store is approximately normally distributed, with a mean
of _13.38294_ (calculate) pound and a standard deviation of
_4.7038__ (calculate) pound. If you select a random sample of 16
boxes
7.560949 21.2531 14.40703
5.428368 16.56232 10.99534
14.48709 22.71341 18.60025

16
10.14817 9.77351 9.960842
11.03473 12.23806 11.6364
11.54 12.53984 12.03992
8.510741 13.18551 10.84813 BY EXCEL DATA ANALYTICS TOOL
11.14093 18.18251 14.66172
RANDOM SAMPLE GENERATOR WE
6.819343 15.38313 11.10124
10.50682 9.760462 10.13364 FIND RANDOM SAMPLE
6.331485 20.65215 13.49182
6.956565 18.66692 12.81174
17.43719 16.6474 17.0423
10.50855 9.374413 9.941481
17.42311 20.36109 18.8921
18.7159 7.071754 12.89382
13.09111
13.09111

a. What is the sampling distribution of the mean?


For sample size 16, the sampling distribution of the mean will
be 13.0911
b. What is the probability that the sample mean is less than 12 pounds?
P(x<12) = 12 –13.38/4.70= -0.2936 P VALUE=.3847
d. Between what two values the sample mean has 60% probability of
being symmetrically distributed around the population mean?

z value ( 0.025 and 0.975)= - 1.96 and +1.96


1.96 = x – 2170.014 /1704.75 and -1.96 = x – 2170.014
/1704.75
-1171.301 < x < 5511.32

17
8. BIGMART store is estimating annual sales from its business across
10 stores. The standard deviation of the annual sales for the entire
population (8000 Stores) is 1704 pounds. How large the sample size
should the BIMART store considered in order to estimate the mean
annual sales of last year within $1000 and at 95% confidence level.
SD = 1704
= 0.05

= 0.025

= 1.96

n = [1.96* 1704 / 1000]2

N=11.15

n=11

9. Store Manager of BIGMART want to estimate the mean amount


(Pounds) spent on vegetable & fruit category. Suppose a random
sample of 350 products under vegetable & fruit category yielded a

18
mean of __136.95__ (calculate) pounds and standard deviation is
___54.40_____ (calculate) pounds.
a. Construct 95% confidence interval for the mean spending for all
products in the vegetable & fruit category (Population Standard
deviation is unknown)
ME = t0.05/2 X 54.40/√350
= 2.011* 2.90
= 5.847
C.I = 136.95 ± 5.847

b. Interpret the interval constructed in (a).


With a 95% confidence, the mean spending for all products in the
vegetable & fruit category will be between 131.103 and 142.79.

10. BIGMART Stores Market research wing found that on an


average consumer spend around _____140.371766 __ (calculate)
pounds per month for Frozen foods.
a. Assuming a standard deviation of 8 pounds what sample size is
needed to estimate, with 95% confidence the mean per consumer
monthly spent to be within ±9 pounds
9 = 1.96 X 8 /√𝑛
n= 1,74^2 = 3
b. Assuming a standard deviation of 10 pounds what sample size is
needed to estimate, with 95% confidence the mean per consumer
monthly spent to be within ±9 pounds
19
95% confidence
SD= 10 n =9 alpha= .05%
ME =< 9 Pounds
1.96*10/√𝑛< 9
2.18<√𝑛
N= 4.75=5
c. Assuming a standard deviation of 12 pounds what sample size is
needed to estimate, with 95% confidence the mean per consumer
monthly spent to be within ±9 pounds.
ME<9
1.96*12/√𝑛 < 9
2.61<√𝑛
N=6.8=7
d)
Discuss the effect of variation on sample size
As our sample size increases the standard deviation of the sample
also increases and vice versa

20
21