You are on page 1of 69

Statistical Data Analysis Using

Excel
By Dr. Shailaja Rego
Introduction to statistics
Definition
1. Statistical Data
By Statistics we mean aggregates of facts affected to a
marked extent by multiplicity of causes, numerically
expressed, enumerated or estimated according to a
reasonable standards of accuracy collected in a
systematic manner for a predetermined purpose and
placed in relation to each other
2. Statistical Methods
Statistics may be defined as science of collection
organisation presentation analysis and interpretation of
numerical data.
Applications of Statistics in
Various Areas
• Marketing
• Economics
• Finance
• Insurance
• Operations
• Human Resource Management or Development
• Information Systems
• Data Mining
Illustrative List of Statistical
Techniques and their Applications
Sr Statistical
Field Specific Application
No Technique
Binomial Quality
1 Sampling Inspection
Distribution Assurance
Cluster Target Marketing, Customer
2 Marketing
Analysis Profiling
Cluster Planning and
3 Identifying Similar Groups
Analysis Management
Production
4 Control Chart Quality Control
Engineering
Correlation and
5 Regression Financial Risk Hedging of Investments
Analysis
Illustrative List of Statistical Techniques and their Applications

Sr
Statistical Technique Field Specific Application
No

Correlation and
6 Marketing Cross-Market Analysis
Regression Analysis
Investments, Portfolio
7 Decision Theory Finance Selection, Mergers and
Acquisitions
Discriminant
8 Finance Credit Risk Analysis
Analysis
Discriminant
9 Marketing Customer Profiling
Analysis
10 Forecasting Banking Business Forecasting
Illustrative List of Statistical Techniques and their Applications

Sr Statistical
Field Specific Application
No Technique

Pricing of Financial
11 Forecasting Finance Products, Return on
Investment
12 Forecasting HRD Manpower Planning

13 Forecasting Insurance Determining Premiums

14 Forecasting Marketing Demand Forecasting,


Wholesale and Consumer
15 Index Numbers Economics
Price Indices
Illustrative List of Statistical Techniques and their Applications

Sr Statistical
Field Specific Application
No Technique
Logistic
16 Finance Credit Risk Analysis
Regression
Normal
17 Equity Research EPS
Distribution
Normal
18 Finance Risk Management
Distribution
Normal
19 Finance Yield Curve
Distribution
Normal
20 HRD Performance Appraisal
Distribution
Normal Production
21 Six Sigma
Illustrative List of Statistical Techniques and their Applications

Sr
Statistical
N Field Specific Application
Technique
o
Normal Production
22 Statistical Quality Control
Distribution Engineering
Normal Project
23 PERT / CPM
Distribution Management
24 Percentiles Education Relative Ranking
Formulating Compensation
25 Percentiles HR
Strategies
Rankings in Contests With
26 Rank Correlation Rankings
Multiple Judges
Rankings with Multiple
27 Rank Correlation Rankings
Criteria
28 Sampling Election Opinion/Exit Polls
Illustrative List of Statistical Techniques and their Applications

Sr Statistical
Field Specific Application
No Technique
29 Sampling Market Research Consumer Survey
Production
30 Sampling Inspection and Quality Control
Engineering
Testing of Agriculture/
31 Testing a Pesticide on Field
Hypothesis Chemical
Testing of Paramedical-
32 Testing a Drug on Clinical Trial
Hypothesis Pharmaceutical
Sensex, NIFTY, Wholesale Price
33 Weighted Average Finance
and Consumer Price Indices
WACC (Weighted Average Cost
34 Weighted Average Finance of Capital) and EVA (Economic
Value Added)
Illustrative List of Decision Situations and Corresponding
Statistical Techniques
Area Decision Situation Statistical Techniques
Applicable

Marketing Assessment/Forecast of Time Series


Demand for the Product Correlation and Regression
or a Service Analysis
Statistical Inference

Customer Profiling Cluster Analysis

Market Research Sample Surveys


Conjoint Analysis
Multidimensional Scaling

Retail Management Identifying Customer Cluster Analysis


Buying Behaviors and Correlation and Regression
Patterns Analysis
Conjoint Analysis
Illustrative List of Decision Situations and Corresponding
Statistical Techniques

Area Decision Situation Statistical Techniques Applicable

Finance and Banking Evaluation of Regression Analysis, Decision


Investment Analysis
Volatility of Stocks ‘’ Analysis

Predicting EPS Regression Analysis

Derivatives ‘’ Analysis and Regression


Analysis
Assessing Credit Discriminant Analysis
Worthiness Logistic Regression
Correlation Analysis
Insurance Determining the Probability. Time Series
Premium Regression Analysis
Illustrative List of Decision Situations and Corresponding
Statistical Techniques
Area Decision Situation Statistical Techniques
Applicable
Impact of Different Factors on Regression Analysis
Health and Life Discriminant Analysis
Operations Controlling and Improving Statistical Quality
Production Process and Control
Quality Six Sigma
Sampling Inspections
Statistical Inference
Inventory Management ABC Analysis

HRD Performance Appraisal and Normal Distribution


Reward System Percentiles
Retail Identifying customer buying Cluster Analysis
Management behaviors and patterns Correlation
Conjoint Analysis
Limitations of statistics

• Does not deal with individual measurements


• Deals only with quantitative characters
• Results are true only on an average
• Only one of the methods of solving a problem
• Can be misused
Ways of manipulating the data
• Changing definition of a variable to suit preconceived
notion
• Inadequate sample or unrepresentative sample
selection
• Manipulative collection of data
• Interpretation of association and correlation
• Inappropriate comparison
• Defective data – definition confusing or not
considering all possibilities
Types of Data
• Primary and Secondary
• Qualitative / Categorical and Quantitative /
Numerical
• Nominal, Ordinal, Interval and Ratio
• Discrete
– Nominal and Ordinal
• Continuous
– Interval and Ratio
Nominal, Ordinal, Interval and Ratio
Data
• There are four levels or types of
measurement scales
– Nominal
– Ordinal
– Interval
– Ratio
There are basic FIVE types of statistical analysis

• Descriptive Analysis
• Inferential Analysis
• Differences Analysis (Test of analysis)
• Associative Analysis
• Predictive Analysis
Type Description Example Statistical
Concepts
Descriptive Data Describes the typical Mean, mode,
Reduction respondents; median Standard
Describes how similar deviation range
respondents are to the frequency
typical respondent distribution
Inferential Determine Values estimate Standard error, null
population population hypothesis
parameters test
hypothesis
Differences Determine if Evaluate statistical Z-test and t-test of
differences significance of differences,
exists between difference in the analysis of
groups means of two groups variance
in a sample
Type Description Example Statistical
Concepts
Associati Determine Determine if two Correlation
ve associations variables are Cross
related in a tabulation
systematic way
Predictive Forecast Estimate the level Time series
based on a of Y, given the analysis,
statistical amount of X regression
model
When to use a particular descriptive measure?

Type of Central tendency Dispersion/Variability


measurement
Nominal Mode Frequency/ Percentage

Ordinal Median Cumulative


percentage distribution
Interval / Ratio Mean Standard deviation
and range
Formation of frequency distribution
Class limits
Class intervals (Exclusive Inclusive)
Class frequency
Class Midpoint
Formation of a Continuous frequency Distribution
Class Limits
Class Intervals
Class Frequency
How to fix number of classes?
i = (L-S)/K
K= 1+3.332 log N
N= Total number of observations
Log=logarithms of the number

Magnitude of class interval :

i = Range
1+3.332 log N
Diagrammatic or graphical Representation
Types of Graphs
One Dimensional
Two Dimensional
Three Dimensional
One Dimensional or Bar Diagrams
Types of Bar Diagram
Simple
Multiple
Deviation
Sub-Divided
Percentage
Country Birth Rate
India 33
Germany 16 Birth Rate per thousand for different countries
U.K. 20
45 40
China 40 40
33
35
New Zaland 30 30
30

Birth Rates
Sweden 15 25
20 16
20
15
Birth Rate

15
10
5
0

Countries
Year Marine Inland Total
1991-92 5.34 2.18 7.52
Sub divided
1992-93 8.8 2.8 11.6 bar diagram
1993-94 10.86 6.7 17.56
1994-95 15.55 8.87 24.42
1995-96 16.98 11.03 28.01
1996-97 17.16 11.6 28.76
1997-98 12.47 8.42 20.89

Growth of production of fish


Production (Lakh Tonnes)

35
30
25
11.03 11.6
20 8.87 Inland
8.42
15 6.7 Marine
10 2.8 17.16
2.18 15.55 16.98
5 8.8 10.86 12.47
5.34
0
1991- 1992- 1993- 1994- 1995- 1996- 1997-
92 93 94 95 96 97 98
Multiple Bars
Year West North East South Centre
1996 78.4 88.9 83.7 89.9 86.5
1997 75.6 62.5 103.6 75.5 77.4
1998 121.2 116.5 107.6 123.9 90.3

Zonewise Rainfall

140 121.2 123.9


116.5
120 107.6
103.6
100 88.9 83.7 89.9 86.5 90.3
78.4 77.4
Rainfall

75.6 75.5 1996


80 62.5 1997
60
1998
40
20
0
West North East South Centre
Zones
particulars
Wages
1996
45
1997
50
1998
52.5
Percentage Bars
other costs 30 33.3 35
polishing 15 16.7 17.5
profit or loss 10 0 -5
sale price 100 100 100
total cost 90 100 105

100%

80%

60% profit or loss


polishing
40%
other costs
20% Wages

0%
1 2 3
-20%
Rupee Comes From

2%
3%
Excise
3%
Customes
6%
Internal Borrowing
7% 22%
Non Tax Revinew
7% Deficit
Other Capital Reciepts
18%
14% Corporation Tax
18% Income Tax
External Assistance
Other Taxes
Graphs of frequency distribution
Histogram
Frequency polygon
Smoothed frequency curve
Ogives or cumulative frequency curves
Histogram

70 60
60 52
No of Students

50 40 40
40 35
30 No of Students
30 22
20 8 12
10 5
0
0-10 10- 20 - 30 - 40 - 50 - 60 - 70 - 80 - 90 -
20 30 40 50 60 70 80 90 100
Marks
No of Students

70
60
50
40
No of Students
30
20
10
0
0-10 10- 20 - 30 - 40 - 50 - 60 - 70 - 80 - 90 -
20 30 40 50 60 70 80 90 100
Scattered Diagrams
x 2 3 5 6 8 9
y 6 5 7 8 12 11

14
12
10
8
y
y

6
4
2
0
0 2 4 6 8 10
x
Requisites of a good average

Easy to understand
Simple to compute
Based on all items
Not be unduly affected by extreme observations
Rigidly defined
Capable of further algebraic treatment
Sampling stability
Types of averages

Arithmetic Mean- Simple and weighted


Median
Mode
Arithmetic Mean- Simple
X1+ X2 + X3 +…..+ XN
X=
N

OR

X= X
N
Arithmetic Mean- Discrete Series

X= fX
N

Ex- From following data of the marks obtained from 60


students calculate arithmetic mean
Marks no of students marks no of students
20 8 50 10
30 12 60 6
40 20 70 4
Marks No of students fX

20 8 160
30 12 360
40 20 800
50 10 500
60 6 360
70 4 280

fx =2460
2460
= 41
X= 60
Arithmetic Mean- Simple
X1+ X2 + X3 +…..+ XN
X=
N

OR

X= X
N
Arithmetic Mean- Discrete Series

X= fX
N

Ex- From following data of the marks obtained from 60


students calculate arithmetic mean
Marks no of students marks no of students
20 8 50 10
30 12 60 6
40 20 70 4
Arithmetic Mean- Continuous Series

X= fm
N
Where m is midpoint of various classes
f is frequency of each class
N = total frequency

Ex- From the following data compute arithmetic mean


Marks 0-10 10-20 20-30 30-40 40-50 50-60
No of st 5 10 25 30 20 10
Marks Mid pt No of st. fm

0-10 5 5 25
10-20 15 10 150
20-30 25 25 625
30-40 35 30 1050
40-50 45 20 900
50-60 55 10 550

N=100 fm=3300

X = 33
Combined Mean of two groups`

X12 = N1X1 + N2X2

N1 + N2

Weighted Mean
For frequency Distribution
WX Xw = W(fX)
Xw =
W W
Median
th
N+1
If N is Odd value of observation
2
th th
N N
If N is Even average value of And +1
2 2
observations
Median – Discrete Series
Size of N+1/2 th item
Ex- From following data find value of median

Income 1000 1500 800 2000 2500 1800


No of per 24 26 16 20 6 30
Median – Continuous Series

N/2 –c.f. x i
Median = L + f
L = Lower limit of median class
c.f. = cumulative frequency of class preceeding to
median class
f = simple frequency of median class
i=class interval of median class
Calculate Median for following frequency distribution
Marks No of std
45-50 10
40-45 15
35-40 26
30-35 30
25-30 42
20-25 31
15-20 24
10-15 15
5-10 7
Calculation Of Mode Frequency Distribution

Mode = L + 1 Xi
1+ 2
L = Lower limit of the modal class
1 = Difference between the frequency of
modal class & freq of pre modal class

2 = Difference between the frequency of


modal class & freq of post modal class
i=class interval of modal class
Measures Of Dispersion/ Central Tendency

Series A Series B Series C


100 100 1
100 105 489
100 102 2
100 103 3
100 90 5
Total 500 500 500
X 100 100 100
Quartile
Percentiles
Distributions with same
mean but with different
dispersion

x
Distributions with
different mean but
with same
dispersion x1 x2
Distributions with
different mean and
different dispersion
x1 x2
Significance of Measuring variation

1. To determine reliability of an average


2. To serve as a basis for the control of variability
3. To compare two or more series with regard to
their variability
4. To facilitate the use of other statistical measure.
Methods of studying variation

1. The Range
2. Mean deviation
3. The Standard Deviation
Range
Range = L - S

Coefficient of Range
Coefficient of Range = L–S
L+S

Series A 46 6 46 46 46 46 46 46
Series B 6 10 6 6 46 46 46 46
Series C 6 6 15 25 30 32 40 46
Despite Serious limitations Range is useful in
following cases

1. Quality Control
2. Fluctuations in share prices
3. Weather forecast
4. Everyday Life
Mean Deviation
Mean Deviation – Individual Observations

M.D. = 1N X-A
1 D Or D
N
N

Where D = X - A
Coefficient of M.D. = M.D
Median
Ex- Calculate the M.D. and its coefficient for the two income
groups of 5 and 7

Group 1 4000 4200 4400 4600 4800


Group 2 3000 4000 4200 4400 4600 4800 5800

Group 1 Group 2
Deviation from
Median 4400
4000 400 3000 1400
4200 200 4000 400
4400 0 4200 200
4600 200 4400 0
4800 400 4600 200
4800 400
N=5 |D| 1200 5800 1400
N=7 |D| 4000
Mean Deviation – Discrete series

fD
M.D =
N
Where | D | denotes deviation from median
ignoring signs

Ex- Calculate Mean deviation from following series

X 10 11 12 13 14
F 3 12 18 12 3
Mean Deviation – Continuous series
M.D. = fD
N
Ex- Find the median and mean deviation of the following data

Size Frequency
0-10 7
10-20 12
20-30 18
30-40 25
40-50 16
50-60 14
60-70 8
The Standard Deviation
Introduced by Karl Pearson

Measures of Variability – Standard Deviation

Compute standard deviation for the


following sample:
5, 4, 8, 2, 8, 10, 1, 2,
5, 7, 3, 6, 9, 2
Many times it is practical to calculate SD with follow

2
d
( d
)
2

 = -
N N

Ex- Blood serum cholesterol levels of 10


persons are as under

240,260,290,245,255,288,272,263,277,251

Calculate SD with help of assumed mean


X d= X-264 d2
240 -24 576
260 -4 16
290 26 676
245 -19 361
255 -9 81
288 24 576
272 8 64
263 -1 1
277 13 169
251 -13 169

X =2641 d2 = 2689
2
d 2
d
 =
N (
-
N
) =16.398
Calculation of SD – Discrete series

fX2
 =
N
X2 = (X – X)2

2
fd
(
fd
)
2
 = -
N N

Where d= X-A
Calculation of SD – Contineous series

2
fd
(
fd
)
2
 = - Xi
N N

Where d = m-A
i

i=class interval
Combined SD

2 2
N1 1 + N2 2 + N1D12 +N2D22
 12 =
N 1 + N2

Coefficient of variation


CV= x 100
X
Normal distribution: inflection points and standard deviations
Skewness
“Skewness” refers to deviations from symmetry
with respect to a location measure. The quantity,
often referred to as b1, that is commonly used as a
measure of asymmetry
Karl Pearson Coefficient of Skewness
Mean - Mode
Skp = 
In moderately skewed distribution the averages have
following relationship
Mode = 3 Median – 2 Mean
3 (X – Median)
Skp =

Bowley’s Coefficient of Skewness
(Q3-Median) – (Median – Q1)
Skb =
(Q3-Median) + (Median – Q1)

= (Q3 + Q1- 2 Median)


Q3- Q1
Kurtosis
“Kurtosis” denotes the degree of
‘peakedness’ of the distribution, often as
compared to a Normal distribution.
Thank You

You might also like