You are on page 1of 69

Business

Statistics
1 10:28 AM
Recommended Readings (Books)
Introduction to Statistics,
Walpole, R. E., 3rd Edition
(2000)
Statistical Methods for Practice
and Research by Ajai S. Gaur
and Sanjaya S. Gaur



2 10:28 AM
Mode of Teaching
Lecture
SPSS Workshop
Discussion Session
3 10:28 AM
Introduction
to
Statistics
4 10:28 AM
Constant
A characteristic or
property that does not
change from individual
to individual.
5 10:28 AM
Variable
A characteristic or
property that varies
from individual to
individual.
6 10:28 AM
Types of Variable
7 10:28 AM
Types of
Variables
Qualitative
Nominal Ordinal
Quantitative
Discrete Continuous
Nominal Scale
Variable categories are mutually
exclusive and exhaustive.
Variable categories have no
logical order.
Eye Color, Hair Color, Gender.
8 10:28 AM
Ordinal Scale
Data categories are mutually
exclusive.
Data classifications are ranked or
ordered according to the
particular trait they possess.
Level of Knowledge about SPSS
9 10:28 AM
Interval Scale
Data categories are mutually exclusive.
Data classifications are ranked or ordered
according to the particular trait they
possess.
Equal differences in the characteristic are
not represented by equal differences in
the measurements.
Temperature, Shoe Size and IQ scores
10 10:28 AM
11
Ratio Scale
Data categories are mutually exclusive.
Data classifications are ranked or ordered
according to the particular trait they possess.
Equal differences in the characteristic are
represented by equal differences in the
measurements.
The zero point is the essence of the
characteristic.
Height, Weight, Distance.
10:28 AM
12
Scale
Nominal
Data may only
be classified
Eye color,
Hair Color
Gender.
Ordinal
Data are
ranked
Level of
Knowledge
about SPSS
Interval
True Zero Point
does not
Exist.
Temperature,
Shoe Size,
IQ Scores
Ratio
Meaningful Zero
point and Ratio
Between values
Height, Weight,
Distance.
Measurement Scales
10:28 AM
13
Data
The information collected
for any kind of investigation.
Usually Numerical but can
be Qualitative.
10:28 AM
14
Primary Data
The initial material collected
during the research process.
The information collected
directly from the respondent.
Personal Invetigation, Through Investigator, Through Questionnaire,
Through Local Sources, Through Telephone,
10:28 AM
15
Secondary Data
The information
collected and processed
by the people other than
the researcher
Government Organizations, Semi-Government
Organizations,

10:28 AM
Data Collection
Any of the following methods may be
adopted:
(a) Personal interview
(b) Direct observation
(c) Mail interview (internet interview)
(d) Telephone interview
(e) Main Data Bases
16 10:28 AM
Data management
Office Editing,
Post Coding,
Data entry and Verification,
Data Import.

17 10:28 AM
Data organization and Analysis
Preparing data for analysis,
Extracting descriptive measures
from the data,
Using advanced statistical
techniques to analyze the data
and draw inference there from.
18 10:28 AM
SPSS
Workshop
10:28 AM 19
Frequency
Distribution
10:28 AM 20
Cross
Tabulation
10:28 AM 21
22
Measures of Central Tendency
Arithmetic Mean
Quantiles
(Median, Quartiles, Deciles, Percentiles)
Mode
10:28 AM
23
Arithmetic Mean
A value obtained by dividing the sum of all the observations by
their number.
n n
X X X
X
n
1 i
i
n 2 1
X

=
=
+ + +
=

If X
1
, X
2
, , X
n
are n observations of a variable X then
ns observatio the of Number
ns observatio the all of Sum
Mean Arithmetic =
10:28 AM
24
Arithmetic Mean
The marks obtained by 8 students are:
Marks 5 . 68
8
548
8
63 72 67
X = =
+ + +
=

67 72 68 70 65 68 75 63

10:28 AM
25
Quantiles
For individual observations/discrete frequency
distribution, the ith quartile, jth decile and kth
percentile are located in the array/discrete frequency
distribution by the following relations
3 2, 1, i on, distributi in the n observatio th
4
1) i(n
Q
i
=
+
=
,9 2, 1, j on, distributi in the n observatio th
10
1) j(n
D
j
=
+
=
,99 2, 1, k on, distributi in the n observatio th
100
1) k(n
P
k
=
+
=
10:28 AM
26
The weekly TV Watching times (Hours):
25 41 27 32 43 66 35 31 15 5
34 26 32 38 16 30 38 30 20 21

Quartiles
The array of the above data is given below:
5 15 16 20 21 25 26 27 30 30
31 32 32 34 35 37 38 41 43 66

10:28 AM
27
Quartiles
Hours 22.0 21} - 0.25{25 21
obs.} 5th - obs. 0.25{6th obs. th 5
on distributi in the n observatio th 25 . 5
on distributi in the n observatio th
4
1) 1(20
Q
1
= + =
+ =
=
+
=
10:28 AM
28
Hours 30.5 30} - 0.50{31 30
obs.} 10th - obs. 0.50{11th obs. th 10
on distributi in the n observatio th 50 . 10
on distributi in the n observatio th
4
1) 2(20
Q
2
= + =
+ =
=
+
=
Quartiles
10:28 AM
29
Quantiles
10:28 AM
30
Mode
The mode is a value which occurs
most frequently in a set of data. Or
mode is a value that occurs
maximum number of times in a
sequence of observations.
10:28 AM
31
The total automobile sales (in millions) in
the United States for the last 14 years.
9.0 8.2 8.0 9.1 10.3 11.0 11.5
10.3 10.5 9.8 9.3 8.2 8.2 8.5

Mode
Mode = 8.2 million
10:28 AM
32
Measures of variation measure the
variation present among the values
of a data set, so measures of
variation are measures of spread of
values in the data.
10:28 AM
33
Absolute Measures of
Dispersion
Range
Inter-Quartile Range (I.Q.R)
Quartile Deviation (Q.D)
Mean (Average) Deviation (M.D)
Variance and Standard Deviation (S.D)
10:28 AM
34
Relative Measures of
Dispersion
Coefficient of Range
Coefficient of Quartile Deviation
Coefficient of Mean Deviation
Coefficient of Variation (CV)
10:28 AM
35
Range
Difference between the largest
and the smallest observations
Largest Smallest
Range X X =
10:28 AM
36
Ignores the way in which data are distributed



Sensitive to outliers


7 8 9 10 11 12
Range = 12 - 7 = 5
7 8 9 10 11 12
Range = 12 - 7 = 5
Disadvantages of the Range
1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,5
1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,120
Range = 5 - 1 = 4
Range = 120 - 1 = 119
10:28 AM
Inter-quartile Range (IQR)


Inter-quartile range = 3
rd
quartile 1
st
Quartile
Q3 - Q1


IQR is independent of outliers

37 10:28 AM
Inter-quartile Range
38
Median
(Q2)
X
maximum
X
minimum
Q1 Q3
25% 25% 25% 25%
12 30 45 57 70
Inter-quartile Range (IQR)
= 57 30 = 27
10:28 AM
Quartile Deviation (Q.D)
Quartile Deviation is half of
Inter-Quartile Range
39 10:28 AM
40
The Mean (absolute) Deviation
X
8 3
5 0
2 -3
0
Mean Deviation is the average of absolute
deviations taken form the mean value.
( )
6
2
3
x x
n

= =

3
0
3
6
( ) X X
X X
10:28 AM
41
Variance
Variance is the average
of the squared
deviations taken from
the mean value.
X cm (X-Mean)^2 X
2

4 36 16
6 16 36
9 1 81
12 4 144
13 9 169
16 36 256
60 102 702
2
2 2
2
2 2
2 2
( )
102
( ) 17
6
702 102
( ) 17
6 6
x x
i S cm
n
X X
ii S cm
n n

= = =
| |
| |
= = =
|
|
|
\ .
\ .


10:28 AM
42
Comparing Standard Deviations
Mean = 15.5
S = 3.338
11 12 13 14 15 16 17 18 19 20 21
Data A
11 12 13 14 15 16 17 18 19 20 21
Mean = 15.5
S = 4.567
Data C
The smaller the standard deviation, the more tightly
clustered the scores around mean
The larger the standard deviation, the more spread out
the scores from mean

10:28 AM
11 12 13 14 15 16 17 18 19 20 21
Data B
Mean = 15.5
S = 0.926
43
Relative Measures of Variation
Largest Smallest
Largest Smallest
Coefficient of Range
X X
X X

=
+
3 1
3 1
Coefficient of Quartile Deviation
Q Q
Q Q

=
+
Coefficient of Mean Deviation
MD
Mean
=
10:28 AM
Coefficient of Variation (CV)
Can be used to compare two or more
sets of data measured in different
units or same units but different
average size.
10:28 AM 44
100%
X
S
CV
|
|
.
|

\
|
=
45
Use of Coefficient of Variation
Stock A:
Average price last year = $50
Standard deviation = $5


Stock B:
Average price last year = $100
Standard deviation = $5
but stock B is
less variable
relative to its
price
10% 100%
$50
$5
100%
X
S
CV
A
= =
|
|
.
|

\
|
=
5% 100%
$100
$5
100%
X
S
CV
B
= =
|
|
.
|

\
|
=
Both stocks
have the
same
standard
deviation
10:28 AM
46
Appropriate Choice of Measure
of Variability
If data are symmetric, with no serious
outliers, use range and standard
deviation.
If data are skewed, and/or have serious
outliers, use IQR.
If comparing variation across two data
sets, use coefficient of variation (C.V)
10:28 AM
47
Five Number Summary
The five number summary of a data set consists of the
minimum value, the first quartile, the second quartile, the
third quartile and the maximum value written in that order:
Min, Q
1
, Q
2
, Q
3
, Max.

From the three quartiles we can obtain a measure of central
tendency (the median, Q
2
) and measures of variation of the
two middle quarters of the distribution, Q
2
-Q
1
for the
second quarter and Q
3
-Q
2
for the third quarter.
10:28 AM
48
The weekly TV viewing times (in hours).
25 41 27 32 43 66 35 31 15 5
34 26 32 38 16 30 38 30 20 21

The array of the above data is given below:
5 15 16 20 21 25 26 27 30 30
31 32 32 34 35 37 38 41 43 66

Five Number Summary
10:28 AM
49
Hrs 22.0 21} - 0.25{25 21 obs.} 5th - obs. 0.25{6th obs. 5th ; Q1 of VALUE
obs. 5.25th data in the obs. th
4
1) 1(20
; Q1 of LOCATION
= + = +
=
+
Five Number Summary
Hrs 30.5 30} - 0.50{31 0 3 obs.} 10th - obs. 0.50{11th obs. th 10 ; Q2 of VALUE
obs. th 50 . 10 data in the obs. th
4
1) 2(20
;
2
Q of LOCATION
= + = +
=
+
Minimum value=5.0 Maximum value=66.0
Hrs 36.5 35} - 0.75{37 35 obs} 15th - obs {16th 75 . 0 obs 15th ;
3
Q of VALUE
obs. 15.75th data in the obs. th
4
1) 3(20
;
3
Q of LOCATION
= + = +
=
+
10:28 AM
50
Box and Whisker Diagram
A box and whisker diagram or box-plot is a
graphical mean for displaying the five number
summary of a set of data. In a box-plot the first
quartile is placed at the lower hinge and the
third quartile is placed at the upper hinge. The
median is placed in between these two hinges.
The two lines emanating from the box are
called whiskers. The box and whisker diagram
was introduced by Professor Jhon W. Tukey.
10:28 AM
51
Construction of Box-Plot
1. Start the box from Q1 and end at
Q3
2. Within the box draw a line to
represent Q2
3. Draw lower whisker to Min.
Value up to Q1
4. Draw upper Whisker from Q3 up
to Max. Value
Q1
Q3
Q2
10:28 AM
Max
Value
Min
Value
52
Construction of Box-Plot
1. Q1=22.0 Q3=36.5
2. Q2=30.5
3. Minimum Value=5.0
4. Maximum Value=66.0
70

60

50

40

30

20

10

0
10:28 AM
53
Interpretation of Box-Plot
70

60

50

40

30

20

10

0
Box-Whisker Plot is useful to identify
Maximum and Minimum Values in the data
Median of the data
IQR=Q3-Q1,
Lengthy box indicates more variability in the data
Shape of the data From Position of line within box
Line At the center of the box----Symmetrical
Line above center of the box----Negatively skewed
Line below center of the box----Positively Skewed
Detection of Outliers in the data
10:28 AM
54
Outliers
An outlier is the values that falls well outside the overall
pattern of the data. It might be

the result of a measurement or recording error,
a member from a different population,
simply an unusual extreme value.

An extreme value needs not to be an outliers; it might,
instead, be an indication of skewness.

10:28 AM
55
Inner and Outer Fences
If Q1=22.0 Q2=30.5 Q3=36.5
( )
( )

= + =
= =
25 . 58 IQR 1.5 Q Fence Inner Upper
25 . 0 IQR 1.5 Q Fence Inner Lower
: Fences Inner
3
1
( )
( )

= + =
= =
0 . 80 IQR 3 Q Fence Outer Upper
5 . 21 IQR 3 Q Fence Outer Lower
: Fences Outer
3
1
10:28 AM
56
Identification of the Outliers
1. The values that lie within inner
fences are normal values
2. The values that lie outside inner
fences but inside outer fences
are possible/suspected/mild
outliers
3. The values that lie outside outer
fences are sure outliers
80

70

60

50

40

30

20

10

0
Plot each suspected outliers with an asterisk
and each sure outliers with an hollow dot.
*
Only
66 is a
mild
outlier
10:28 AM
57
Box plots are
especially suitable for
comparing two or more
data sets. In such a
situation the box plots
are constructed on the
same scale.
Uses of Box and Whisker Diagram
Male
Female
10:28 AM
Standardized Variable
A variable that has mean 0 and Variance 1 is
called standardized variable
Values of standardized variable are called
standard scores
Values of standard variable i.e standard scores are
unit-less
Construction


Variable of Deviation Standard
Variable of Mean Variable
Z

=
10:28 AM 58
X Z
3 25 -1.3624 1.8561
6 4 -0.5450 0.2970
11 9 0.81741 0.6682
12 16 1.0899 1.1879
32 54 0 4.009
5 . 13
4
54
8
4
32
2
= =
= = =

x
S
n
X
X
2
) ( X X
67 . 3
8
=

=
X
Sx
X X
Z
1
4
009 . 4
0
2
~ =
= =

z
S
n
Z
Z
2
) ( Z Z
Variable Z has mean 0 and
variance 1 so Z is a standard variable.
Standard Score at X=11 is

8174 . 0
67 . 3
8 11
=

=
Sx
X X
Z
10:28 AM
Standardized Variable
60
The industry in which sales rep Mr. Atif works has mean
annual sales=$2,500
standard deviation=$500.
The industry in which sales rep Mr. Asad works has mean
annual sales=$4,800
standard deviation=$600.
Last year Mr. Atifs sales were $4,000 and
Mr. Asads sales were $6,000.
Performance evaluation by z-scores
Which of the representatives would you hire
if you have one sales position to fill?
10:28 AM
61
Performance evaluation by z-scores
3
500
500 , 2 000 , 4
=

=
B
B
B B
B
Z
S
X X
Z
Sales rep. Atif
X
B
= $2,500
S
B
= $500
X
B
= $4,000
Sales rep. Asad
X
P
=$4,800
S
P
= $600
X
P
= $6,000
2
600
800 , 4 000 , 6
=

=
P
P
P P
P
Z
S
X X
Z
Mr. Atif is the best choice
10:28 AM
62
values of 68.26% about contains 1S X
The Empirical Rule
X
68.26%
1S X
values of 99.73% about contains 3S X
values of 95.45% about contains 2S X
95.45%
X 2S
X 3S
99.73%
10:28 AM
63
A distribution in which the values equidistant from
the centre have equal frequencies is defined to be
symmetrical and any departure from symmetry is
called skewness.

1. Length of Right Tail = Length of Left
Tail
2. Mean = Median = Mode
3. Sk=0
a) Sk=(Mean-Mode)/SD
b) Sk=(Q3-2Q2+Q1)/(Q3-Q1)
10:28 AM
Measures of Skewness
64
A distribution is positively skewed, if the observations
tend to concentrate more at the lower end of the possible
values of the variable than the upper end. A positively
skewed frequency curve has a longer tail on the right
hand side
1. Length of Right Tail > Length of Left
Tail
2. Mean > Median > Mode
3. SK>0
Measures of Skewness
10:28 AM
65
A distribution is negatively skewed, if the
observations tend to concentrate more at the upper
end of the possible values of the variable than the
lower end. A negatively skewed frequency curve has a
longer tail on the left side.
1. Length of Right Tail < Length of Left
Tail
2. Mean < Median < Mode
3. SK< 0
10:28 AM
Measures of Skewness
10:28 AM 66
The Kurtosis is the degree of peakedness or flatness of a
unimodal (single humped) distribution,
When the values of a variable are highly concentrated around
the mode, the peak of the curve becomes relatively high; the
curve is Leptokurtic.
When the values of a variable have low concentration around
the mode, the peak of the curve becomes relatively flat;curve
is Platykurtic.
A curve, which is neither very peaked nor very flat-toped, it
is taken as a basis for comparison, is called
Mesokurtic/Normal.
Measures of Kurtosis
67 10:28 AM
Measures of Kurtosis
68
Measures of Kurtosis
1. If Coefficient of Kurtosis > 3 ----------------- Leptokurtic.
2. If Coefficient of Kurtosis = 3 ----------------- Mesokurtic.
3. If Coefficient of Kurtosis < 3 ----------------- is Platykurtic.
( )
( )
4
2
2
n X-X
Coefficient of Kurtosis=
X-X
(

10:28 AM
SPSS
Statistical Package
for Social Sciences
10:28 AM 69

You might also like