You are on page 1of 38

Statistics for Business Analytics

(SBA)

Dr. Sridhar Vaithianathan


Associate Professor (Analytics& IT)
Institute of Management Technology,
Hyderabad

sridhar.v@imthyderabad.edu.in
sridhar.vaithianathan@gmail.com Mobile: 99899 04245
Recap
 Population and Sample,
 Datasets: Elements, Variables and Observations.
 Scales of Measurements
(Nominal, Ordinal, Interval and Ratio)

 Qualitative and Quantitative Data.


 Cross sectional and Time series Data.
 Descriptive and Inferential Statistics
Retail Case Dataset
Topic Outline

Understanding DATA

 Tabular Summary

 Graphical Representation
Graphical Summaries
 Bar Chart Categorical
 Pie Chart Variables

 Histogram Numerical
 Box Plot Variables
Retails Case Dataset
Graphical Summaries
Frequency of ITEM Types
Item Type Count of Item_Type
Baking Goods 647
Breads 251
Breakfast 110
Canned 649
Dairy 676
Frozen Foods 860
Fruits and Vegetables 1232
Hard Drinks 214
Health and Hygiene 278
Household 637
Meat 425
Others 161
Seafood 64
Snack Foods 1199
Soft Drinks 449
Starchy Foods 148
Grand Total 8000
BAR Chart - Frequency of ITEM Types

1232

1199
860
676
649
647

637
Total

449
425
278
251

214

161

148
110

64
s s st ry s s ks e ld t s d s ks s
od
d ed d le n ea er o d d
a
kf
a n ai o n e o o o n o
o re n D Fo ab ri gi eh M th af Fo ri Fo
g
G B ea Ca n et D
H
y
u
s O
Se k D y
in Br ze eg ar
d
d o ac ft ch
ak ro
V H an
H
Sn So r
B F d ta
an lt
h S
s
it ea
u H
Fr
PIE CHART - Item FAT Content Type
Re
gul
ar Lo
[PE w
RC Fat
EN [PE
TA RC
GE EN
] TA
GE
]
Low Fat 4993
Regular 3007
Grand Total 8000
BINS (10) Frequency
From (Rounded) To (Rounded)  
$ 33 $ 1,339 3104
$ 1,339 $ 2,644 2301
$ 2,644 $ 3,949 1418
$ 3,949 $ 5,225 677
$ 5,255 $ 6,560 334
$ 6,560 $ 7,865 111
$ 7,865 $ 9,171 35
$ 9,171 $ 10,476 16
$ 10,476 $ 11,782 2
$ 11,782 $ 13,087 2
    8000
Box Plot - Outlet Sales ($)
Salaries($) of Twelve Senior Managers in an IT Firm.
Find outliers using Box plot method?

3310 3355 3450 3480 3480 3490


3520 3540 3550 3925 3650 3730

Salary ($)
Descriptive Statistics

 Measure of Central Tendency (3M) :

 Mean (Arithmetic and Weighted), Median & Mode.

 Measures of Dispersion :

 Range, Variance and Standard Deviation,


 Co-efficient of Variation (CV).
 Outliers - Box Plot.

Quiz

Dr. Sridhar Vaithianathan IMT


Hyderabad 14
Summary Measures:
Population Parameters / Sample Statistics


Measures of 
Measures of Variability
Central Tendency Range
Median Variance
Mode Standard Deviation
Co-efficient of Variation
Mean


Other
summary
measures:
 Skewness
 Kurtosis Dr. Sridhar Vaithianathan IMT
Hyderabad 15
e…
i m
T
i z
Q u
Central Tendency & Dispersion
 In computing the mean of a sample, the value
of sum of xi’s are divided by
 a. n
 b. n-1
 c. n+1
 d. n-2
Central Tendency & Dispersion
 The most frequently occurring value of a data
set is called the
 a. range
 b. mode
 c. mean
 d. median
Central Tendency & Dispersion
 The standard deviation of a sample of 100
observations equals 64. The variance of the
sample equals
 a. 8
 b. 10
 c. 6400
 d. 4,096
Central Tendency & Dispersion
 The variance of a sample of 81 observations
equals 64. The standard deviation of the
sample equals
 a. 9
 b. 4096
 c. 8
 d. 6561
Central Tendency & Dispersion
 The descriptive measure of dispersion that is
based on the concept of a deviation about the
mean is
 a. the range
 b. the interquartile range
 c. the absolute value of the range
 d. the standard deviation
Central Tendency & Dispersion
 The measure of location which is the most
likely to be influenced by extreme values in
the data set is the
 a. range
 b. median
 c. mode
 d. mean
Central Tendency & Dispersion
 The coefficient of variation is

 a. the same as the variance


 b. the standard deviation divided by the mean

times 100
 c. the square of the standard deviation
 d. the mean divided by the standard deviation
Central Tendency & Dispersion
 The heights (in inches) of 25 individuals were recorded
and the following statistics were calculated

 mean = 70 range = 20
 mode = 73 variance = 784
 median = 74

 The coefficient of variation equals


 a. 11.2%
 b. 1120%
 c. 0.4%
 d. 40%
Central Tendency & Dispersion

Since the population is always larger than the


sample, the value of the sample mean
 a. is always smaller than the true value of the

population mean
 b. is always larger than the true value of the

population mean
 c. is always equal to the true value of the

population mean
 d. could be larger, equal to, or smaller than

the true value of the population mean


Central Tendency & Dispersion

In 2005, the average age of


students at UTC was 22 with
a standard deviation of
3.96. In 2006, the average
age was 24 with a standard
deviation of 4.08. In which
year do the ages show a
more dispersed distribution?
Show your complete work
and support your answer.

Hint : CoV

Sri’s AQs

Dr. Sridhar Vaithianathan IMT


Hyderabad 26
Measures of Dispersion
 The measure of
dispersion which is not
measured in the same
units as the original
data is the
 a. median
 b. standard deviation
 c. coefficient of
determination
 d. variance

Sri’s CQs

Dr. Sridhar Vaithianathan IMT


Hyderabad 27
Central Tendency & Dispersion
 Which of the following
symbols represents the
size of the sample
 a. s2
 b. s
 c. N
 d. n

Sri’s CQs

Dr. Sridhar Vaithianathan IMT


Hyderabad 28
Covariance Vs Correlation
 Understand relationship between Two
Variables.

 Covariance :
◦ Sxy =Covar (x,y) = √ ( ∑(x- x ) * (y – y) / n-1)

 Correlatiion
◦ Γxy = Correl (x,y) = Sxy
(Rho) S x * Sy
e…
i m
T
i z
Q u
Relationship between Two Variables
 A numerical measure
of linear association
between two variables
is the
 a. variance
 b. covariance
 c. standard deviation
 d. coefficient of
variation

Sri’s CQs

Dr. Sridhar Vaithianathan IMT


Hyderabad 31
Relationship between Two Variables
 Positive values of
covariance indicate

 a. a positive variance of
the x values
 b. a positive variance of
the y values
 c. the standard deviation
is positive
 d. positive relation
between the independent
and the dependent
variables
Sri’s CQs

Dr. Sridhar Vaithianathan IMT


Hyderabad 32
Relationship between Two Variables
 The coefficient of
correlation ranges
between
 a. 0 and 1
 b. -1 and +1
 c. minus infinity and
plus infinity
 d. 1 and 100

Sri’s CQs

Dr. Sridhar Vaithianathan IMT


Hyderabad 33
IMPORTANCE OF GRAPHICAL
REPRESENTATION
Recap
Population and Sample,  Measures of Central Tendency
Datasets: Elements, Variables and (3M)
Observations. ◦ Mean
Scales of Measurements ◦ Median
(Nominal, Ordinal, Interval and ◦ Mode
Ratio)  Measures of Dispersion
Qualitative and Quantitative Data. ◦ Standard Deviation
Cross sectional and Time series ◦ Variance
Data. ◦ Range
Descriptive and Inferential ◦ Coefficient of Variation
Statistics  Covariance & Correlation
Word of Caution:

2015 Bangalore Kolkata Delhi Chennai


Month Price Sales Price Sales Price Sales Price Sales
Jan 10.0 8.04 10.0 9.14 10.0 7.46 8.0 6.58
Feb 8.0 6.95 8.0 8.14 8.0 6.77 8.0 5.76
Mar 13.0 7.58 13.0 8.74 13.0 12.74 8.0 7.71
Central Apr 9.0 8.81 9.0 8.77 9.0 7.11 8.0 8.84
May 11.0 8.33 11.0 9.26 11.0 7.81 8.0 8.47
Tendency & Jun 14.0 9.96 14.0 8.10 14.0 8.84 8.0 7.04
Jul 6.0 7.24 6.0 6.13 6.0 6.08 8.0 5.25
Dispersion Aug 4.0 4.26 4.0 3.10 4.0 5.39 19.0 12.50

Measures Sep
Oct
12.0 10.84
7.0 4.82
12.0
7.0
9.13
7.26
12.0
7.0
8.15
6.42
8.0
8.0
5.56
7.91
Nov 5.0 5.68 5.0 4.74 5.0 5.73 8.0 6.89
Average

Variance

Average price is the same. Average sales is the same too.


Variance in price is the same. So is the variance in sales.
Do these four cities look
identical to you?

PORTANCE OF GRAPHICAL REPRESENTATION?

Average price is the same. (9) Average sales is the same too (7.5).

Variance in price is the same(10). So is the variance in sales (3.75).

Let’s Go for a SCATTER PLOT


Plotting of RAW DATA Makes
Perfect Sense
Bangalore Kolkata

But in fact, the four cities are


totally different in behaviour.

Bangalore’s sales has generally


increased with price.

Delhi has a nearly perfect increase


in sales with price, except for one Delhi Chennai
aberration.

Kolkata shows a decline in sales


beyond a price of 10.

Chennai’s sales fluctuates despite a


nearly constant price.

You might also like