Professional Documents
Culture Documents
UNIT 2
CHAPTER 8
DESCRIPTIVE STATISTICS
Measures of central tendency
To describe the central theme of data, and summarize the characteristics of an entire
mass of data.
The most common and useful measure of central tendency is the arithmetic mean.
The other measures are median, mode, geometric mean, harmonic mean and weighted
mean.
Measures of dispersion
Describe the extent of scatter of the values around measure of central tendency.( hoe far
or how near are they to an average)
Standard deviation is the most important and common measure of dispersion.
The other measures of dispersion are range, quartile deviation, decile range and mean
deviation.
Chapter 9
measures of central tendency - averages
Averages
Definition: An average is a value that summarizes the characteristics of an entire mass
of data.
Objectives: (i) To present huge mass of statistical data in a simple and concise manner.
(ii) It makes the central theme of the data readily understandable. (iii) It is useful for
purposes of camparison.
Types of Averages: (i) arithmetic mean (ii) median (iii) mode (iv) geometric mean (v)
harmonic mean
Arithmetic mean / Mean
It is defined as the sum of all the variates of a variable divided by the total number of
items in the sample.
It should be expressed in the same unit in which the data is given.
Median
It is the value of the middle item of a given series of data arranged in ascending order of
magnitude
It should be expressed in the same unit in which the data is given.
Mode
Mode is defined as that value which occurs most frequently in a sample.
A sample with a single mode is referred to as unimodal. If a sample has two modes, it is
called bimodal. Multimodal or polymodal samples also do occur. A sample with no mode
is called a no modal sample of ill-defined mode.
Geometric mean
It is defined as the nth root of the product of the n items in an ungrouped data.
It is used when the average of a rate of change is required
Harmonic mean
It is defined as the reciprocal of the arithmetic mean of the reciprocals of the given data.
It is an appropriate measure to average the speed and time.
Weighted averages
Properties of Mean:
The arithmetic mean possesses certain properties, some are desirable and some are not
so desirable. These properties include the following:
1. Uniqueness. For a give set of data there is one and only arithmetic mean.
2. Simplicity. The arithmetic mean is easily understood and easy to compute.
3. since each and every value in a set of data enters into the computation of the mean,
it is affected by each value. Extreme values, therefore, have an influence on the mean
and, in some cases, can so distort it that it becomes undesirable as a measure of central
tendency
Merits and Demerits of arithmetic mean:
Merits:
It is easy to understand and easy to compute.
It is rigidly defined.
It is based upon all the observations.
Demerits:
It cannot be obtained if a single value is lost.
Not suitable for open end class.
It is not suitable for qualitative phenomenon.
Properties of Median
Uniqueness. As is true with the mean, there is only one median for a given set of
data.
Simplicity. The median is easy to calculate.
It is not as drastically affected by extreme values as is the mean.
Relation between AM,GM,HM
AM>GM>HM
Chapter 10.
Measures of Dispersion
Measures of dispersion
It is defined as an absolute or relative measure of differences of the values of the various
items from a measure of central tendency of these items.
The difference b/w the value of an item and a measure of central tendency is called
deviation.
An average of the deviations of the values of various items from a measure of central
tendency is called a measure of dispersion.
The different measures of dispersion are range, quartile deviation, decile range,
standard deviation and mean deviation.
Range
It is defined as the difference b/w maximum value and minimum value of the given
series of data.
Quartile Deviation
The given data( in ascending order) is divided into four equal parts called quartiles.
Q1: first quartile or lower quartile
Q2: second quartile or middle quartile or median
Q3: third quartile or upper quartile
Quartile deviation = ( Q3 Q1 ) / 2
Coefficient of deviation = ( Q3-Q1) / ( Q3 + Q1)
Decile Range
The given data ( in ascending order ) is divided into 10 equal parts ( D1,D2,,D9,D10)
Decile range = D9 D1
Mean Deviation from mean, median or mode
It is the arithmetic mean of the absolute deviations of the various items from a measure
of central tendency ( mean, median or mode)
Standard deviation (SD)
SD is defined as the square root of the arithmetic mean of the squared deviations of the
various items from arithmetic mean.
Variance is defined as the arithmetic mean of the squared deviations of the various items
from arithmetic mean.
Relation b/w SD and variance:
SD = square root of variance
Coefficient of variation(cv)
The relative measure of standard deviation is called the coefficient of variation.
It is used to study the variability or consistency of the data.
More cv => less consistent
Less consistent => more cv
cv = SD / mean * 100
Chapter 11
skewness and kurtosis
Skewness
Skewness : to study the lack of symmetry in the shape of the frequency curve
mean mod e
Coefficient of skewness =
( Karl Pearsons )
SD
(or)
Q3 Q1 2median
=
( Bowleys)
Q3 Q1
(or)
2
3
( method of moments)
3
2
1 0 - negatively skewed
1 0 - positively skewed
1 0 - symmetric
If mean = median = mode, symmetrical distribution
(ii) If mean > median > mode, positively skewed distribution
(iii) If mean < median < mode, negatively skewed distribution
1
(i)
Kurtosis
The degree of peaked ness of a frequency polygon
2
Types of Kurtosis
(i)
(ii)
(iii)
4
2
2
if 2 >3 , leptokurtic
if 2 <3, platikurtic
if 2 =3, mesokurtic.
.
A- mesokurtic
B Platykurtic
C- leptokurtic
STEM-AND-LEAF-DIAGRAM
STEM-AND-LEAF-DIAGRAM
A simple technique to visualize the nature of the population using the data from a
sample of that population is the stem-and-leaf-diagram. It is one of the exploratory data
analysis (EDA) tools, which can be constructed easily and quickly. A stem-and-leafdiagram is constructed as a series of horizontal rows of numbers. The first number of
each row is label of that row and called the stem. The remaining numbers in a row
following the stem number are called the leaves.
Construction of a Stem-and-Leaf-Diagram
Step
1: Not less than five numbers are chosen from the given data as stems. Usually the
first one or two digits of numbers in the given data is chosen as the stems.
Step 2: The rows are labelled using the stem numbers.
Step 3: If the first one or two digits do not provide sufficient number of stems to
visualize the shape of the distribution, each stem may be used twice. The first of the twin
stems is to enter the lower levels such as 0,1,2,3 & 4 and the second one for the higher
levels viz., 5,6,7,8 & 9.
Step 4: canning the data, the digits following the stem number are reproduced as a leaf
on the appropriate stem.
Step 5: The diagram is turned on its side to visualize how the numbers are distributed.
Specifically the following aspects are considered:
Whether there is any tendency for the leaves to cluster close to a particular stem or
stems.
Whether there is any tendency for the data to taper towards one end or the other.
Whether a smooth curve drawn across the top of the diagram forms a rough bell shaped
curve. If so, whether the curve is symmetric, flat or peaked.
Step 6: The observations of the stem-and-leaf-diagram with reference to the above
aspects would throw light on the nature of the population, such as its pattern, symmetry
etc.
BOX PLOT
The box plot is a diagrammatic representation of data series to give visual information
about measures of central tendency, dispersion and direction of skewness.
Chapter 12
Inferential Statistics
Inferential Statistics
To reach decisions about a large body of data by examining only a small part of data.
[Inferential Statistics: A decision, estimate, prediction, or generalization about a
population, based on a sample. ]
Any descriptive measure of the sample ( population) is called as sample statistic (
population parameters)
Inferential Statistics includes Hypothesis testing and Tests of significance
Chapter 13
Probability
Probability Basic Definitions
Random experiment is an experiment whose outcomes cannot be predicted in advance.
Problem
Venn Diagrams
A diagrammatic representation of a sample space enclosing all possible events
associated with an experiment is the Venn diagram. (J.Venn)
Chapter 14
Theoretical Probability Distributions
Theoretical Probability Distributions
Binomial
Poisson
Normal
Measure
AM
Individual
Observations / raw data
Discrete
x
N=
GM
log xi
Antilog
Continuous
fx
i
N
fi
f i log xi
xmid i
N
f
f
Antilog
N=
Antilog
log xmid i
HM
n
1
x
i
N
f
x
i
Median
n 1
Size of
th position
2
Mode
Max no. of repeated values
xmid
N 1
th position .
2
N 1
th position
2
h
2 cf
Median = L
i
f
Size of
Size of
f1 f 0
2 f1 f 0 f 2
Mode= L + h
Range
n 1
Size of
th position
4
Q3
n 1
th position
4
Size of 3
Inter
Quartile
range
QD
MD about
3
measures
SD
N 1
th position
4
Size of
N 1
th position
4
N 1
th position
4
Size of
N 1
th position
4
Size of 3
Size of 3
Q3 Q1
Q3 Q1
Q 3 Q1
2
Q 3 Q1
2
Q3 Q1
Q 3 Q1
2
x mean(or )median(or ) mod e
x x
f x x
f x x