You are on page 1of 9

Biostatistics

UNIT 2

CHAPTER 8
DESCRIPTIVE STATISTICS
Measures of central tendency
To describe the central theme of data, and summarize the characteristics of an entire
mass of data.
The most common and useful measure of central tendency is the arithmetic mean.
The other measures are median, mode, geometric mean, harmonic mean and weighted
mean.
Measures of dispersion
Describe the extent of scatter of the values around measure of central tendency.( hoe far
or how near are they to an average)
Standard deviation is the most important and common measure of dispersion.
The other measures of dispersion are range, quartile deviation, decile range and mean
deviation.

Chapter 9
measures of central tendency - averages
Averages
Definition: An average is a value that summarizes the characteristics of an entire mass
of data.
Objectives: (i) To present huge mass of statistical data in a simple and concise manner.
(ii) It makes the central theme of the data readily understandable. (iii) It is useful for
purposes of camparison.
Types of Averages: (i) arithmetic mean (ii) median (iii) mode (iv) geometric mean (v)
harmonic mean
Arithmetic mean / Mean
It is defined as the sum of all the variates of a variable divided by the total number of
items in the sample.
It should be expressed in the same unit in which the data is given.
Median
It is the value of the middle item of a given series of data arranged in ascending order of
magnitude
It should be expressed in the same unit in which the data is given.
Mode
Mode is defined as that value which occurs most frequently in a sample.

A sample with a single mode is referred to as unimodal. If a sample has two modes, it is
called bimodal. Multimodal or polymodal samples also do occur. A sample with no mode
is called a no modal sample of ill-defined mode.
Geometric mean
It is defined as the nth root of the product of the n items in an ungrouped data.
It is used when the average of a rate of change is required
Harmonic mean
It is defined as the reciprocal of the arithmetic mean of the reciprocals of the given data.
It is an appropriate measure to average the speed and time.
Weighted averages

Properties of Mean:
The arithmetic mean possesses certain properties, some are desirable and some are not
so desirable. These properties include the following:
1. Uniqueness. For a give set of data there is one and only arithmetic mean.
2. Simplicity. The arithmetic mean is easily understood and easy to compute.
3. since each and every value in a set of data enters into the computation of the mean,
it is affected by each value. Extreme values, therefore, have an influence on the mean
and, in some cases, can so distort it that it becomes undesirable as a measure of central
tendency
Merits and Demerits of arithmetic mean:

Merits:
It is easy to understand and easy to compute.
It is rigidly defined.
It is based upon all the observations.
Demerits:
It cannot be obtained if a single value is lost.
Not suitable for open end class.
It is not suitable for qualitative phenomenon.
Properties of Median

Uniqueness. As is true with the mean, there is only one median for a given set of
data.
Simplicity. The median is easy to calculate.
It is not as drastically affected by extreme values as is the mean.
Relation between AM,GM,HM

AM>GM>HM

Comparison of Mean, Median, Mode

A distribution in which the values of mean, median, mode coincide it is known as a


symmetrical distribution. When the values of mean, median, mode are not equal the
distribution is known as asymmetrical or skewed.
Karl Pearsons (Empirical formula) has expressed this relationships follows.
Mode = 3 Median 2 mean ( this formula is used to find mode for ill-defined
distributions)

Chapter 10.
Measures of Dispersion
Measures of dispersion
It is defined as an absolute or relative measure of differences of the values of the various
items from a measure of central tendency of these items.
The difference b/w the value of an item and a measure of central tendency is called
deviation.
An average of the deviations of the values of various items from a measure of central
tendency is called a measure of dispersion.
The different measures of dispersion are range, quartile deviation, decile range,
standard deviation and mean deviation.
Range
It is defined as the difference b/w maximum value and minimum value of the given
series of data.
Quartile Deviation
The given data( in ascending order) is divided into four equal parts called quartiles.
Q1: first quartile or lower quartile
Q2: second quartile or middle quartile or median
Q3: third quartile or upper quartile
Quartile deviation = ( Q3 Q1 ) / 2
Coefficient of deviation = ( Q3-Q1) / ( Q3 + Q1)
Decile Range
The given data ( in ascending order ) is divided into 10 equal parts ( D1,D2,,D9,D10)
Decile range = D9 D1
Mean Deviation from mean, median or mode
It is the arithmetic mean of the absolute deviations of the various items from a measure
of central tendency ( mean, median or mode)
Standard deviation (SD)
SD is defined as the square root of the arithmetic mean of the squared deviations of the
various items from arithmetic mean.
Variance is defined as the arithmetic mean of the squared deviations of the various items
from arithmetic mean.
Relation b/w SD and variance:
SD = square root of variance

Coefficient of variation(cv)
The relative measure of standard deviation is called the coefficient of variation.
It is used to study the variability or consistency of the data.
More cv => less consistent
Less consistent => more cv
cv = SD / mean * 100

Chapter 11
skewness and kurtosis
Skewness
Skewness : to study the lack of symmetry in the shape of the frequency curve
mean mod e
Coefficient of skewness =
( Karl Pearsons )
SD
(or)
Q3 Q1 2median
=
( Bowleys)
Q3 Q1
(or)
2

3
( method of moments)
3
2
1 0 - negatively skewed
1 0 - positively skewed
1 0 - symmetric
If mean = median = mode, symmetrical distribution
(ii) If mean > median > mode, positively skewed distribution
(iii) If mean < median < mode, negatively skewed distribution
1

(i)

Kurtosis
The degree of peaked ness of a frequency polygon

2
Types of Kurtosis
(i)
(ii)
(iii)

4
2
2

if 2 >3 , leptokurtic
if 2 <3, platikurtic
if 2 =3, mesokurtic.

.
A- mesokurtic
B Platykurtic
C- leptokurtic
STEM-AND-LEAF-DIAGRAM
STEM-AND-LEAF-DIAGRAM
A simple technique to visualize the nature of the population using the data from a
sample of that population is the stem-and-leaf-diagram. It is one of the exploratory data

analysis (EDA) tools, which can be constructed easily and quickly. A stem-and-leafdiagram is constructed as a series of horizontal rows of numbers. The first number of
each row is label of that row and called the stem. The remaining numbers in a row
following the stem number are called the leaves.
Construction of a Stem-and-Leaf-Diagram
Step

1: Not less than five numbers are chosen from the given data as stems. Usually the
first one or two digits of numbers in the given data is chosen as the stems.
Step 2: The rows are labelled using the stem numbers.
Step 3: If the first one or two digits do not provide sufficient number of stems to
visualize the shape of the distribution, each stem may be used twice. The first of the twin
stems is to enter the lower levels such as 0,1,2,3 & 4 and the second one for the higher
levels viz., 5,6,7,8 & 9.
Step 4: canning the data, the digits following the stem number are reproduced as a leaf
on the appropriate stem.
Step 5: The diagram is turned on its side to visualize how the numbers are distributed.
Specifically the following aspects are considered:
Whether there is any tendency for the leaves to cluster close to a particular stem or
stems.
Whether there is any tendency for the data to taper towards one end or the other.
Whether a smooth curve drawn across the top of the diagram forms a rough bell shaped
curve. If so, whether the curve is symmetric, flat or peaked.
Step 6: The observations of the stem-and-leaf-diagram with reference to the above
aspects would throw light on the nature of the population, such as its pattern, symmetry
etc.
BOX PLOT
The box plot is a diagrammatic representation of data series to give visual information
about measures of central tendency, dispersion and direction of skewness.

Chapter 12
Inferential Statistics
Inferential Statistics
To reach decisions about a large body of data by examining only a small part of data.
[Inferential Statistics: A decision, estimate, prediction, or generalization about a
population, based on a sample. ]
Any descriptive measure of the sample ( population) is called as sample statistic (
population parameters)
Inferential Statistics includes Hypothesis testing and Tests of significance

Chapter 13
Probability
Probability Basic Definitions
Random experiment is an experiment whose outcomes cannot be predicted in advance.

Performing a random experiment is known as trial


Each possible outcome of an experiment is called an event
All possible events of a trial are known as Exhaustive Events.
A sample space of a random experiment is the collection of all possible outcomes.
A group of events is said to be Equiprobable Events if there is equal chance for each

event in the group to occur.


Two or more events are said to be Mutually Exclusive if the occurrence of one event
excludes the occurrence of the other.
Two or more events are said to be Independent if the occurrence of any one of them
does not affect the occurrence of the other.
Simple choice, Permutation and Combination
If one operation performs in m ways and second operation perform in n ways then these
two operations performed together
m x n ways.
Permutation means arrangement
Combination means selection
Types of Probability
TYPES OF PROBABILITY:
We can distinguish two types of probability for the purpose of computation of the
value of probability of occurrence of an event.
Apriori probability (mathematical probability)
Aposteriori probability (statistical probability)
Apriori Probability:
The probability is computed based on established facts.
If an event can happen in ways and fail to happen in ways and all these ways
are mutually exclusive and equiprobable, then the probability of occurrence of the event
is and the probability of failure of the event is . It can also be described as
Apriori probability (p) = Number of favourable cases / Total number of
possible cases
Aposteriori Probability:
Aposteriori probability (statistical or empirical probability) is a ratio of the
number of occurrences of an event to the total number of trials.
Aposteriori Probability:
P = Number of times the event occurred / Total number of trials
Theorems or rules of probability
Addition Rule: If A and B are Mutually exclusive events, then
P(A or B)= P(A)+P(B)
Product Rule: If A and B are independent then P(A and B)= P(A). P(B)
Conditional probability: If A and B are dependent events then
(i) P(A/B)= P(A and B) / P(B)
(ii) P(B/A)= P(A and B) / P(A)
Application of Principles of Probability to biological problems

Problem

1: Phenylketonuria(PKU) is inherited as a simple autosomal recessive trait. (i)


What is the probability that 2 normal persons will produce a PKU child if we know that
both sets grandparents are carriers? (ii) Suppose the couple has 3 children .What is the
probability that atleast one of the 3 children will be afflicted with PKU?
Problem 2: A man and a lady are getting married.
The man has a brother who is afflicted with a sort of mental retardation that is inherited
as a simple recessive trait. The man and his parents are all normal. The lady is normal
and no one in her family has this form of mental abnormality. (i) What is the probability
that this couple will have a mentally afflicted child? The probability that any individual,
picked at random from the population, is heterozygous carrier of this mental retardation
gene is 1/100. (ii) Suppose the man also has no family history for this abnormality, what
is the probability of this couple having an afflicted child?
Problem 3: A women has a haemophilic brother. She is married to a normal man. Her
parents were normal. What is the probability that any son born to her will be
haemophilic?
Problem 4: A dihybrid cross between AaBb and AaBb where A and B are dominant
produces 3 offsprings. What is the probability that atleast two of the three will be of the
genotype aabb?

Venn Diagrams
A diagrammatic representation of a sample space enclosing all possible events
associated with an experiment is the Venn diagram. (J.Venn)

Chapter 14
Theoretical Probability Distributions
Theoretical Probability Distributions
Binomial
Poisson
Normal

Measure
AM

Individual
Observations / raw data

Discrete

x
N=

GM

log xi
Antilog

Continuous

fx
i

N
fi

f i log xi

xmid i
N

f
f
Antilog
N=

Antilog

log xmid i

HM

n
1
x
i

N
f
x
i

Median

n 1
Size of
th position
2

Mode
Max no. of repeated values

xmid

N 1
th position .
2

Value Corresponds to Next


higher frequency.

N 1
th position
2
h

2 cf
Median = L
i
f

Value corresponds to max


frequency

Modal class corresponds to max


frequency.

Size of

Size of

f1 f 0

2 f1 f 0 f 2

Mode= L + h
Range

Highest value Lowest value

Highest value Lowest value

Highest value Lowest value


Q1

n 1
Size of
th position
4
Q3

n 1
th position
4

Size of 3

Inter
Quartile
range
QD

MD about
3
measures
SD

N 1
th position
4

Size of

N 1
th position
4

N 1
th position
4

Size of

N 1
th position
4

Size of 3

Size of 3

Q3 Q1

Q3 Q1

Q 3 Q1
2

Q 3 Q1
2

Q3 Q1

Q 3 Q1
2
x mean(or )median(or ) mod e

x mean(or )median(or ) mod e

midx mean(or )median(or ) mod

x x

f x x

f x x

You might also like