Professional Documents
Culture Documents
& ANALYTICS
KMBN104
Dr. Mani Tyagi
UNIT 1
DESCRIPTIVE STATISTICS
STATISTICS IS A SCIENCE DEALING WITH COLLECTION,
ANALYSIS, INTERPRETATION, AND PRESENTATION OF
NUMERICAL DATA.
“ AGGREGATES OF FACTS AFFECTED TO A
MARKET EXTENT BY MULTIPLICITY OF
CAUSES NUMERICALLY EXPRESSED,
ENUMERATED, OR ESTIMATED ACCORDING TO
REASONABLE STANDARDS OF ACCURACY,
COLLECTED IN A SYSTEMATIC MANNER FOR
PRE DETERMINEDPURPOSE AND PLACED IN TO
EACH OTHER”
HORACE SECRIST
IMPORTANT CHARACTERSTICS
• Collect data
– e.g., Survey
• Present data
– e.g., Tables and graphs
• Characterize data
– e.g., Sample mean =
X i
n
Methods of descriptive statistics
• Graphic method
• Numeric method
12
Summary of Types of Variables
13
FUNCTIONS OF STATISTICS
• Collection of Data
• Tabulation of Data
• Analysis of Data
• Interpretation of Data
Data Collection Methods
Collecting Data
Primary Secondary
Data Collection Data Compilation
Print or Electronic
Observation Survey
Experimentation
Primary & Secondary
• Direct personal
interview. • Government
• Indirect PI. publications.
• Questionnaire. • International
publications.
• Enumerators
• Semi official
publications.
• Private publications.
• Unpublished sources.
Classification & Tabulation of Data
THREE TYPES OF SERIES
INDIVIDUAL SERIES
DISCRETE SERIES
CONTINOUS SERIES
INDIVIDUAL SERIES:
DISCRETE SERIES:
CONTINUOUS SERIES :
1, 1, 3, 7, 10, 13
Mode = 1
MODE
“The mode or the modal value is that in a series of
observations which occurs with the greatest frequency.”
Calculation of mode:
Individual Series:
37 43 44 46
39 43 44 46
40 43 44 46
40 43 45 48
Calculation of Mode
Discrete Series:
Grouping Table:
Example:
Marks : 10 15 20 25 30 35 40
Numbers: 08 12 36 25 28 18 09
Calculation of mode
Continuous Series:
1 i
Mo = L +
1+ 2
Where:
L = the lower limit of the modal class
1, 3, 7, 10, 13
Median = 7
Median
• Middle value in an ordered array of numbers.
• Applicable for ordinal, interval, and ratio data
• Not applicable for nominal data
• Least affected by extremely values.
Median: Computational
Procedure
• First Procedure
– Arrange the observations in an ordered array.
– If there is an odd number of terms, the median is
the middle term of the ordered array.
– If there is an even number of terms, the median is
the average of the middle two terms.
• Second Procedure
– The median’s position in an ordered array is given
by (n+1)/2.
How to Find the Median in
a Group of Numbers
• Step 1 – Arrange the numbers
in order from least to greatest.
21, 18, 24, 19, 27
18, 19, 21, 24, 27
How to Find the Median in
a Group of Numbers
• Step 2 – Find the middle
number.
Steps:
• Arrange data in ascending/descending order.
• Find out cumulative frequency.
• Apply formula : Median = Size of N+1/2 th
item.
• Now look at cumulative frequency column and
find that value which is either equal to or next
higher to N+1/2.
• Determine the value of variable corresponding
to it.
• This value is the median.
EXAMPLE:
1000 24
1500 26
1501 16
2000 20
2500 06
1800 30
MEDIAN (CONTINUOUS SERIES)
Where:
L = Lower limit of the median class
c.f. = Cumulative frequency of the class preceding the
median class
f = Simple frequency of the median class
i = the class interval of the median class
EXAMPLE:
5_10 07
10_15 15
15-20 24
20-25 31
25-30 42
30-35 30
35-40 26
40-45 15
45-50 10
ARITHMETIC MEAN
Definition
• Mean – the average of a
group of numbers.
2, 5, 2, 1, 5
Mean = 3
ARITHMETIC MEAN
“ Sum of observed values of a set divided by the
number of observations in the set is called a mean
or an average.”
ARITHMETIC MEAN – INDIVIDUAL SERIES
A or X = X1+X2+……..+Xn 1 Xi
=
N N
EXAMPLE :
Income (Rs): 1780, 1760, 1690, 1750, 1840, 1920, 1100, 1810,
1050, 1950
Calculate the arithmetic mean of incomes.
SHORT-CUT METHOD:
Where
X=A+ d A = Assumed mean
d = deviation of items from assumed
N mean
ARITHMETIC MEAN – DISCRETE SERIES
i) DIRECT METHOD
i) DIRECT METHOD :
X= fX Where
f = frequency
f=N X = variable in question
N = total number of observations
EXAMPLE :
0-10 05
10-20 10
20-30 25
30-40 30
40-50 20
50-60 10
MEASURES OF
DISPERSION
What is dispersion?
Dispersion means variation in size of data.
Dispersion literally means scatter ness, whether
there is homogeneity or heterogeneity in frequency
distribution.
Objectives related to the measurement of Dispersion:
1.To estimate the average distance of items from average
of series.
2. To know the construction or formation of series .
3. To know the limit of variation of item values.
4. To make a comparative study of the variability of two
series .
5. To see that which limit and series the average represents.
Absolute and Relative Measures of Dispersion:
Absolute Measure:
When variation or scatter ness of a series is measured in terms of
original units of a series, it is called absolute measure of dispersion.
Ex: income, height, weight, age will be represented in absolute
measure.
Relative Measure :
For comparative study absolute measurement is changed into
relative measurement as percentage of ratio by dividing it by
average.
Relative dispersion =
Absolute measurement X 100
Average
Ex: In two factories average wages of labour are Rs.
250 and Rs 300 respectively and absolute dispersion
in both factories is Rs 40, then it will be wrong to say
that dispersion in both factories is same. For
comparative study, the relative dispersion of lab our in
both factories is as follows:
Relative dispersion in first factory:
40 x 100 = 16%
250
Relative dispersion in second factory:
40 x 100 = 13.33%
300
Thus it is clear that dispersion of labour is more in first factory
Methods Of Measuring Dispersion:
30
25
20
15
10
0
0 2 4 6 8 10 12
RANGE
The difference between largest value and smallest value of a
data series is called a range.
Co-efficient of Range
For comparative study of dispersion, it is necessary to know
the coefficient of range .
Co-efficient of Range = L - S
L + S
The following represents the current year’s Return on
Equity of the 25 companies in an investor’s portfolio.
4, 4, 5, 6, 8, 8, 8, 9, 9, 9, 10, 12
Lower Upper
Median
Quartile Quartile
= 8
= 5½ = 9
Example 2: Find the median and quartiles for the data below.
6, 3, 9, 8, 4, 10, 8, 4, 15, 8, 10
Order the data
Q1 Q2 Q3
Lower Upper
Quartile Median Quartile
= 4 = 8 = 10
Inter Quartile Range:
Interquartile range = Q3 - Q1
The Interquartile
This distance will
range is the distance
include the middle 50
between the third
percent of the
quartile Q3 and the
observations.
first quartile Q1.
For a set of
observations the third
quartile is 24 and the
first quartile is 10.
What is the quartile
range?
The inter quartile range is
24 - 10 = 14.
Methods of Averaging Deviations:
Quartile Deviations:
This measurement of dispersion is also based on Q1 and Q3.
Quartile Deviation = Q3 – Q1
2
Coefficient Of Quartile Deviation
Coefficient of Q.D = Q3 – Q1
Q3 + Q1
Important Formulas (For Individual and
Discrete Series)
M.D = | D | / N,
Where | D | = X – mean/ median
Coefficient of M.D. = Mean Deviation / Mean or Median
Example:
Calculate the mean deviation from median of the
following series.
4000
4200
4400
4600
4800
Calculate mean deviation and its co-efficient from the
following data from arithmetic mean and median.
Price (Rs) 47,50,45,40,52,55,58,53,60,65,69
Discrete And Continuous Series:
= x2 Also
N s.d = X2 - X 2
where X= variables
Where N N
x=X-X
Where
d = X - A (Assumed mean)
Example:
240 288
260 272
290 263
245 277
255 251
= fd
N
f d
N
= fd
N
f d
N
X i
Here d = X – A / i
Example:
The annual salaries of a group of employees are given
in the following table:
C.V = / X x 100
X Y
35 108
54 107
52 105
53 105
56 106
58 107
52 104
50 103
51 104
49 101
Variance: It is the square of the standard deviation
i.e..Variance = 2
Example:
The number of employees, wages per employee and the variance
of the wages per employee for two factories is given below
Factory A Factory B
No. of employees 100 150
Average wage 3200 2800
Variance of wage 625 729
(a) In which factory is there a greater variation in the distribution
of wages per employee.
(b) Suppose in factory B, the wages of an employee were
wrongly noted as Rs. 3050 instead of Rs. 3650, what
would be the correct variance for factory B.
Example:
The mean of 5 observations is 4.4 and the variance is 8.24.
If the three of the five observations are 1, 2 and 6, find the other two.
Example:
The following table gives the marks obtained by a group of
80 students in an examination. Calculate the variance.
Marks obtained No. of Students
10-14 02
14-18 04
18-22 04
22-26 08
26-30 12
30-34 16
34-38 10
38-42 08
42-46 04
46-50 06
50-54 02
54-58 04
Skewness:
“It refers to the asymmetry or lack of symmetry
in the shape of a frequency distribution.”
As far the study of central tendency the statistical average is
calculated and for scatter of values, dispersion is measured. In the
same way to study the symmetrical or asymmetrical nature of series,
skew ness is calculated
Types of frequency distribution:
2. Asymmetrical Distribution
Normal frequency distribution
One main feature of normal distribution is that mean, median
and mode are found equal .in such a distribution the frequencies
gradually increase, they are maximum in the center and then
decrease. When this distribution is plotted on a graph it will be a
bell-shaped graph. It is also called normal curve.
M ean
M e d ia n
M ode
(i) Positive Skew ness: When in a series mean is more than median
and median is more than mode then skew ness is positive i.e curve
is seen more towards left.
Mean>Median>Mode
M ode M ea n
M ed ia n
Negatively Skewed: Mean and Median are to the left of the Mode.
Mean<Median<Mode
M ea n M ode
M ed ia n
Karl Pearson’s Coefficient of Skewness:
SB = Q3 + Q1 – 2 Median/Q3 – Q1