EXTENSION MANAGEMENT

BUSINESS ANALYTICS

Batch of 2018-20

FINAL REPORT

Date of Submission:07-02-2019

Submitted by –

Abhishek Sharma (05)

Basina Kiran Kumar (31)

1

I. Table of Contents

Q. No Topic Covered Page No

1 Basic Questions

2 Probability

3 Conditional Probability

4 Joint Probability

5 Binomial Distribution

6 Normal Distribution

7 Sampling Distribution

8 Sample Size

9 Interval Estimate

10 Sample Size Estimation

This data is of BIGMART retail store in 2017 for 1457 products

across 10 outlets in different cities. It is having different qualitative

as well as quantitative variables.

From this data Bigmart wants to find the factor that increase or

decrease the sales- output. What is the key factor to increase the

sale?

As a new intern I need to solve and find various questions regarding

this dataset.

2

III. DATA SET: ANALYSIS & RESULTS.

QUESTION 1.- Solutions

a) 1457 products

As par with the data given, If we assume the margin of error of 5%

and level of confidence 95%, we need to take 360 samples to analyze

the data.

i. Item Identifier

ii. Item Weight

iii. Item Fat Content

iv. Item Visibility

v. Item Type

vi. Item MRP

vii. Outlet Identifier

viii. Outlet Establishment Year

ix. Outlet Size

x. Outlet Location Type

xi. Outlet Type

xii. Item Outlet Sales

c) Classification of data- Nominal, Ordinal, Interval, Ratio

i. Item Identifier = Nominal

ii. Item Weight = Ratio

3

iii. Item Fat Content = Ordinal

iv. Item Visibility = Ratio

v. Item Type = Nominal

vi. Item MRP = Ratio

vii. Outlet Identifier = Nominal

viii. Outlet Establishment Year = Interval

ix. Outlet Size = Ordinal

x. Outlet Location Type = Ordinal

xi. Outlet Type = Ordinal

xii. Item Outlet Sales = Ratio

d)

Qualitative Quantitative

Item identifier Item Weight

Item fat content Item Visibility

Item type Item MRP

Outlet Identifier Outlet Establishment

Year

Outlet Size Item Outlet Sales

Outlet Location Type

Outlet Type

which are collected at same time period

4

f) MEAN MEDIAN MODE OF QUANTITATIVE DATA

SALES ESTABLISHMENT

YEAR

MEAN 2170.014226 0.070877867 12.86293625 140.371766 1997.84475

MEDIAN 1777.686 0.058132006 12.6 142.2996 1999

MODE 958.752 0.058542509 12.15 172.0422 1985

For Item Visibility – Median is preferred too many extreme values are

also there.

average weights of different items

For Item MRP – Median will be preferred because extremes are there,

like a person wants to buy something neither too high nor too low price

For Outlet Establishment Year- we can use mode and median, but the

best will be MODE because we will be able to know that in 1985

maximum outlets opened

you can see the range below its too high

5

g) ITEM ITEM ITEM ITEM OUTPUT

OUTPUT VISIBILITY WEIGHT MRP ESTABLISHMENT

SALES YEAR

Mean 2170.014226 0.070877867 12.86294 140.3718 1997.845

Standard

Error 19.06090598 0.000561118 0.051995 0.696453 0.093568

Median 1777.686 0.058132006 12.6 142.2996 1999

Mode 958.752 0.058542509 12.15 172.0422 1985

Standard

Deviation 1704.859259 0.050187881 4.650581 62.29264 8.369014

Sample

Variance 2906545.094 0.002518823 21.6279 3880.373 70.0404

Kurtosis 1.72237215 1.731135114 -1.22985 -0.89485 -1.20356

Skewness 1.202300733 1.199493109 0.077047 0.137919 -0.3983

Range 13053.6748 0.32481625 16.795 235.5984 24

Minimum 33.29 0.003574698 4.555 31.29 1985

Maximum 13086.9648 0.328390948 21.35 266.8884 2009

Sum 17360113.81 567.0229365 102903.5 1122974 15982758

Count 8000 8000 8000 8000 8000

6

h)

Grand

Tier 1 Tier 2 Tier 3 Total

Sum of Item 4169175.0 6056090.8 7134847.9 17360113.

Outlet Sales 17 34 57 81

Average 2130.927738

COEFFICIENT OF VARIATION 0.808254066 SD/AVERAGE

i)

Baking Goods

Breads

Breakfast

Canned

Dairy

Frozen Foods

Fruits and Vegetables

Hard Drinks

Health and Hygiene

Household

Meat

Others

7

Seafood

Snack Foods

Soft Drinks

Starchy Foods

Grand Total

High

Medium

Small

(blank)

Grand Total

Tier 1

Tier 2

Tier 3

Grand Total

Baking Goods 329 318 647

Breads 140 111 251

Breakfast 41 69 110

Canned 341 308 649

Dairy 415 261 676

Frozen Foods 453 407 860

8

Fruits and Vegetables 630 602 1232

Hard Drinks 214 214

Health and Hygiene 278 278

Household 637 637

Meat 170 255 425

Others 161 161

Seafood 37 27 64

Snack Foods 691 508 1199

Soft Drinks 374 75 449

Starchy Foods 82 66 148

Grand Total 4993 3007 8000

Grand

items High Medium Small (blank) Total

Baking Goods 73 203 186 185 647

Breads 25 83 71 72 251

Breakfast 13 36 30 31 110

Canned 65 217 189 178 649

Dairy 79 215 196 186 676

Frozen Foods 93 275 492 860

Fruits and

Vegetables 142 413 328 349 1232

Hard Drinks 23 75 50 66 214

9

Health and

Hygiene 32 91 73 82 278

Household 78 202 172 185 637

Meat 41 149 119 116 425

Others 15 49 53 44 161

Seafood 5 21 20 18 64

Snack Foods 125 407 335 332 1199

Soft Drinks 50 138 128 133 449

Starchy Foods 19 48 38 43 148

Grand Total 878 2622 2480 2020 8000

1985 17655.245

1987 11396.08

1997 11174.47

1998 6752.085

1999 11292.14

2002 10976.61

2004 11290.225

2007 11166.625

2009 11200.01

10

Q2=

a)

Grand

Gender Low Fat Regular Total

Grand

Total 7544 4456 12000

1 1558 1558 4000

2 2551 1449 4000

4993 3007 8000

likely to purchase item with LOW FAT CONTENT

Joint- Probability that the randomly selected respondent is likely

to purchase item with REGULAR FAT CONTENT and is male.

c) 0.624125 (4993/8000)

d) 0.19475 (1558/8000)

e) Yes, they are independent

f) 0.6105 (2442/4000)

g) 0.48187 (1449/3007)

11

Q3-

Row Labels Sales defective

Cold King 498607.633 23743.22062

Cool Stone 362217.8372 13931.45528

Mountain Dew 971772.3848 88342.94407

Grand Total 1832597.855 126017.62

P (CS/D) = P(CS) P(D/CS)

P(CS) P(D/CS) + P(MD) P(D/MD) + P(CK)

P(CS) = 362217.8372/1832597.855

P(D/CS)= 13931.45528/362217.8372

P(CS)P(D/CS) = 0.0079=.008

Similarly find for

12

P(mountain dew

971772.3848/1832597.855)P(88342.94407/971772.3848)

P(cold king 498607.633/1832597.855) (88342.94407/971772.3848)

ii. Based on the probability obtained by the above question calculate the

probability that the item selected by quality supervisor found to be

defective and supplied by Cold king (two Level )

indicating that they have already won of three different prizes: An

automobile valued at $25000, $100 Mobile recharge card or $5

BIGMART Shopping card. To claim the prize a loyal customers need

to present the flier at the store. The fine print at the back of the flier

indicates the probabilities of wining. The chance of wining the car was

1 out of 8000, the chance of winning the Mobile recharge card was 1

out of 8000, and the chance of winning the BIGMART shopping card

was 7998 out of 8000.

If we assume in total, we have 8000 customers so 8000 fliers

b. Using your answer to (a) and the probabilities listed on the flier what

is the expected value of the prize won by the loyal customer who

has received flier.

65090/8000= 8.13625

13

Prizes No. Amount

25000 1 25000

100 1 100

5 7998 39990

65090

c. Using your answer to (a) and the probabilities listed on the flier what

is the standard deviation of the value of the prize won by the loyal

customer who has received flier.

Standard deviation = 0.34375

d. Do you think this is an effective promotion? Why or why not?

If this promotion leads to the increase in the profits more than the $65,090

then it is effective.

them provide home delivery service if the value of the item total

purchased exceeds $100. Using binomial distribution what is the

probability that the next 6 stores surveyed

0 0.015625

1 0.09375

2 0.234375

3 0.3125

4 0.234375 4+5+6 0.34375

5 0.09375

6 0.015625

14

a. Four stores will provide home delivery services

Probability = 6C4 (0.5)4 (0.5)2 = 0.2343

b. All six stores will provide home delivery services

Probability = 6C6 (0.5)6 (0.5)0 = 0.015625

c. At least four stores will provide home delivery services

Probability = P(4) + P(5) + P(6) = 0.34375

e. What are the mean and standard deviation of number of stores which

provides home delivery services in a survey of six stores?

distributed random variable “Item Outlet Sales” from the

BIGMART sales data.

Mean = 2170.014

Standard Deviation = 1704.75

Mean 2170.014226

Standard

deviation 1704.859259

15

Sales between “1000-3000”

=3537 b) Probability=0.442125

a. what is probability that the randomly selected store has achieved sales

more than $5000

P(x>$5000) = stores having more than 5000= 584 divided by

total 8000 = 0.073

b. what is probability that the randomly selected store has achieved sales

between $1000 and $3000

Probability =0.442125

c. Between what two values will the middle 95% of the store sales will

fall

1.96 = x – 2170.014 /1704.75 and -1.96 = x – 2170.014

/1704.75

-1171.301 < x < 5511.32

BIGMART store is approximately normally distributed, with a mean

of _13.38294_ (calculate) pound and a standard deviation of

_4.7038__ (calculate) pound. If you select a random sample of 16

boxes

7.560949 21.2531 14.40703

5.428368 16.56232 10.99534

14.48709 22.71341 18.60025

16

10.14817 9.77351 9.960842

11.03473 12.23806 11.6364

11.54 12.53984 12.03992

8.510741 13.18551 10.84813 BY EXCEL DATA ANALYTICS TOOL

11.14093 18.18251 14.66172

RANDOM SAMPLE GENERATOR WE

6.819343 15.38313 11.10124

10.50682 9.760462 10.13364 FIND RANDOM SAMPLE

6.331485 20.65215 13.49182

6.956565 18.66692 12.81174

17.43719 16.6474 17.0423

10.50855 9.374413 9.941481

17.42311 20.36109 18.8921

18.7159 7.071754 12.89382

13.09111

13.09111

For sample size 16, the sampling distribution of the mean will

be 13.0911

b. What is the probability that the sample mean is less than 12 pounds?

P(x<12) = 12 –13.38/4.70= -0.2936 P VALUE=.3847

d. Between what two values the sample mean has 60% probability of

being symmetrically distributed around the population mean?

1.96 = x – 2170.014 /1704.75 and -1.96 = x – 2170.014

/1704.75

-1171.301 < x < 5511.32

17

8. BIGMART store is estimating annual sales from its business across

10 stores. The standard deviation of the annual sales for the entire

population (8000 Stores) is 1704 pounds. How large the sample size

should the BIMART store considered in order to estimate the mean

annual sales of last year within $1000 and at 95% confidence level.

SD = 1704

= 0.05

= 0.025

= 1.96

N=11.15

n=11

(Pounds) spent on vegetable & fruit category. Suppose a random

sample of 350 products under vegetable & fruit category yielded a

18

mean of __136.95__ (calculate) pounds and standard deviation is

___54.40_____ (calculate) pounds.

a. Construct 95% confidence interval for the mean spending for all

products in the vegetable & fruit category (Population Standard

deviation is unknown)

ME = t0.05/2 X 54.40/√350

= 2.011* 2.90

= 5.847

C.I = 136.95 ± 5.847

With a 95% confidence, the mean spending for all products in the

vegetable & fruit category will be between 131.103 and 142.79.

average consumer spend around _____140.371766 __ (calculate)

pounds per month for Frozen foods.

a. Assuming a standard deviation of 8 pounds what sample size is

needed to estimate, with 95% confidence the mean per consumer

monthly spent to be within ±9 pounds

9 = 1.96 X 8 /√𝑛

n= 1,74^2 = 3

b. Assuming a standard deviation of 10 pounds what sample size is

needed to estimate, with 95% confidence the mean per consumer

monthly spent to be within ±9 pounds

19

95% confidence

SD= 10 n =9 alpha= .05%

ME =< 9 Pounds

1.96*10/√𝑛< 9

2.18<√𝑛

N= 4.75=5

c. Assuming a standard deviation of 12 pounds what sample size is

needed to estimate, with 95% confidence the mean per consumer

monthly spent to be within ±9 pounds.

ME<9

1.96*12/√𝑛 < 9

2.61<√𝑛

N=6.8=7

d)

Discuss the effect of variation on sample size

As our sample size increases the standard deviation of the sample

also increases and vice versa

20

21

