You are on page 1of 208

√ ₋ G

≠ × O
≠ O
√ ⁺ ₋ ×D

× √ M
≠ O
R
⁺ ₋ ≠ N
√ I
× ≠ ⁺ ₋ ⁺ ₋ N
× G
‘’ The essence of mathematics is not to make simple
things complicated, but to make complicated things
simple.”
PRINCIPLE OF
BIOSTATISTICS AND ITS
APPLICATION IN
DENTISTRY
Contents
• Introduction
• Basic terminologies

• Principles of biostatistics and its application


in dentistry
o Collection of the data
o Organize the data

o Analyze the data


o Presenting the data

o Interpret the data

• Statistical packages
• Conclusions

introduction
 Statistics has been derived from the Latin word
status.

 Statistics today refers to interpret quantitative or


qualitative information in the numerical
manner.

 Statistics may be defined as the discipline


concerned with the treatment of numerical data
derived from group of individuals.
Biostatistics (Biometrics)
Biometry
Metron (measurement
Bios ( life)
(measured) of life)

Is a method of collection, organizing, analyzing,


tabulating and interpretation of data related to
living organisms and human beings.
When you can measure what you are
speaking about and express it in numbers,
you know something about it but when you
cannot measure, when you cannot express
it in numbers, your knowledge is of meagre
and unsatisfactory kind.”
- LORD KELVIN
( Mathematician & Physicist , Glasgow University )
Basic terminologies
• Variable : It is any characteristic of an object that
can be measured or categorized

Object
Variables

Qualitative
Quantitative

characteristic of people
characteristic of
or objects that can’t be people or objects that
naturally expressed in can be naturally
a numeric value. expressed in a numeric
value.
• Age
• Sex( male, female ) • Height
• • Blood pressure
Orthodontic facial type (brachy /
• Attachment level
dolico /meso cephalic) • Survival time of implants
• Level of oral hygiene (G/F/P) • Fluoride concentration of water
• Bone loss affected by periodontitis
• Random variable:
o If a variable can assume a number of different values
such that any particular value is obtained purely by
chance.
Discrete
variable

TYPES

Continuous
variable
• Discrete variable:
o Is a random variable that
can take on a finite number of
values or a countable infinite
numbers(as many as there
are whole numbers) of
values.

• Ex:
o The number of DMF teeth. It
can be any one of the 32
numbers, 0,1,2,3 …………32
o Size of the family
• Continuous variable:
o A random variables

that can take a range of


values on a continuum;
i.e , its range is
uncountably infinite

• Ex:
o Amount of new bone

growth
o Force required to

extract teeth
o Amount of blood loss in

surgical procedures
Levels of Measurement
Nom Ordi Inter Rati
inal nal val o

The ●
can be ordered,

Values are ●

possesses
categories and precise
unordered differences the same
can be
categories between units properties of
ordered or

Have no of measure the interval
ranked exist
quantitative scale

Can not be ●there is no

relationship ●
there exists
quantified meaning for
s absolute zero a true zero.
Interval
DATA:

Are the set of values of one or more variables


recorded on one or more individuals.

- OR -

DATA can be defined as a set of values recorded on


one or more observational units.
Types of Data
Quantitative

Qualitative

DATA

Primary

Secondary
Basic terminology
• Quantitative Data:
o Are those which can be quantified-that is the character
which we take into consideration can be expressed in
numeric value.
o Ex. Plaque score, Incisor width, Height, weight, pulse
rate etc
Basic terminology
• Qualitative data:
o Are those which can not be quantified that is the character which
we take into consideration can not be expressed in numeric
value.
Ex. Beauty, Oral Hygiene status G, F, P etc.
Basic terminology
• Primary Data:
o Are those which are collected afresh and the first
time, and thus happen to be original in character.
Basic terminology
• Secondary Data:
o Are those which have already been collected by someone else &
which have already been passed through the statistical process
Principles of Biostatistics

Interpretation
Presenting of the Data
the Data
Analyzing
the Data
Organization
of the Data

Collection
of the
Data
population

 In statistics population means the totality of the


individual observations about which inferences
are to be made.

 Populations can be finite or infinite.

 Samples of varied size can be drawn carefully


with appropriate procedures from their
populations which are either finite or infinite.
sample

 It is a part of the population.


 It is a small collection of observations from some
larger aggregate about which we want to have
information.
 Samples drawn should be representative of the
population.

Larger the sample, better is the degree of


representation of the sample selected.
sampling

 Samples can be drawn from the entire population


through various procedures.
 Sampling can be:

Probability
sampling

Non probability
sampling
 Probability sampling

Simple
Systematic
random
sampling
sampling

Stratified
Cluster
random
sampling
sampling

Multistage Multiphase
sampling sampling
Simple Random
Sampling
(UNRESTRICTED
RANDOM SAMPLING)

Applicable when
population is small,
homogenous and
readily available.

Used mainly in
experimental medicine
or clinical trials to
check the efficacy of a
particular drug.
Systematic
sampling
Simple procedure.
Utilized when a complete list
of population from which
sample is to be drawn is
available.
Systematic procedure is
followed to choose a sample by
taking every Kth house or
patient where k refers to the
sample interval

K = total population/sample size desired


Merits of systemic
sampling
1. Procedure is simple and
• An element of
convenient for use. randomness is
2. Relatively time to be devoted introduced into this
kind of sampling by
and labor needed are small. randomly selecting
3. If the population is from the first K units,
the unit with which to
sufficiently large and start. – RANDOM START.
homogenous and if the • Sample so chosen is
numbering of the subjects is sometimes called as
“Every K’th systematic
available, this method can sample”
provide good results.
Stratified random
sampling
Followed when the population is not

homogenous.

Population under study is first divided into

homogenous groups called strata and the sample

is drawn from each stratum at random in

proportion to its size.

Gives more representative sample than simple

random sampling in a given large population.


Merits of
Better
representation
Greater to each strata
accuracy compared to
Stratified simple random
random sampling.
sampling
Cluster (Area) sampling
Cluster is a group consisting of units

such as villages, wards, blocks,

factories, workshops etc.

Simple random sampling or

systematic sampling procedure is

utilized for selection of clusters.

After the selection of clusters

randomly, enumeration of individuals

in the cluster is carried out.


Merits De-merits

1. Simple and 1. Costlier


time saving 2. Provides
figures with
higher standard
errors than other
procedures
Multi stage sampling
 Refers to sampling procedures
carried out in several stages using
random sampling techniques.

 Employed in large scale, country


wise or region wise surveys.

 Stage wise sampling procedures are


to be utilized for selection of
households or subjects.
Multi phase sampling
 Here part of information is Merits
collected from whole
sample and part from the More
Less costly. purposeful
sub sample.
Less laborious.
 Numbers in 2nd and 3rd phase
will become successively
smaller and smaller.
Ex. To know the no. respondents wants
different treatment (N=2000) in outreach programme
Sample population, n=500

How many needs Referred to college


treatment, n=300 200

Treatment types

Conservative OP+OHI EXTRACTION


n=100 n=150 n=50
03/12/2020 33
Non parametric
samplings

Quota sampling
Convenience
Voluntary – convenience
sampling –
sampling – the sampling
sampling those
sample is self- within groups
most
selected of the
convenient
population
Non parametric
samplings

Purposive Dimensional Snowball


sampling – sampling – sampling –
handpicking multi- building up a
supposedly dimensional sample
typical or quota through
interesting cases sampling informants
Why non probability
sampling plays the role
• Non-probability sampling approaches are used when the
researcher lacks a sampling frame for the population in
question, or where a probabilistic approach is not judged
• If we are carrying out a • may be studying an issue • Market researchers
to be necessary
series of in-depth
interviews with adults
which is relatively
sensitive, such as sexual
commonly use a quota
sampling approach, with
about their working orientation in the armed targets for the numbers
• Ex: experiences, we may be forces, and have to build they have to interview
content to restrict ourself up a sample confidentially with different socio-
to suitable friends or and through known and demographic
colleagues trusted contacts characteristics.
Other kinds of sampling
• Event sampling – using routine or special events as the
basis for sampling
• Time sampling – recognizing that different parts of the
day, week or year may be significant
COLLECTING THE DATA
Collection of the data
 Can be of two types:

ry
dat onda
dat mary
a

a
Sec
Pri
Primary data Collection
• Observation method
• Interview method
• Through questionnaire
• Through schedules

• Other methods
o Warranty cards
o Distribution audits
o Content analysis etc
Observation methods
• This method is mainly
used in studies relating to
behavioral sciences
• This method becomes
scientific tool when the
method of collection of
data serves a formulated
research purpose
Observation Methods
Advantage Dis
s advantages

epen
Indep nt of
endeen
spoon
resp ent’s
nden
pond.
willingness to res

vided
rmat
Inform atiioon obtained ed Information pro
hod rreellaates by this method is
by this meetth limited
at is ccu
to what ntly
urrrreen
happening

od
Subjeccttiivve bias is Expensive Meth
eliminated
Interview method
• Involves collecting data through oral verbal stimuli and
reply in terms of oral verbal responses

Personal Telephone
Interview
interview interview
Personal interview Personal interview

Interview is presented with


Structured exactly the same questions
in the same order

Questions can be changed


Unstructured or adapted to meet the
respondent's intelligence,
understanding or belief
Personal interview
ADVANTAGES

Easy Expensive
More information method

DISADVANTAGES
obtained More time
Interviewer by his consuming
own skills can
overcome the
resistance
Telephone interview
• Merits :
o Faster than other methods

o Cheaper than personal interviewing method

o Recall is easy

o Non response is very low compared to mailing method.

o Replies can be recorded without causing

embarrassment to respondents.

o No field staff is required


Demerits of telephone
interview
1. Little time is given to 4. It is not suitable for
respondents for intensive surveys
considered answers. where comprehensive
answers are required
2. Surveys are restricted to various questions.
to respondents who
have telephone 5. Possibility of bias of
facilities. the interviewer is
relatively more.
3. Extensive
geographical coverage
6. Questions have to be
may get restricted by
short and to the
cost considerations.
point.
Questionnaire methods
o A predetermined set of

questions
Closed
Ended o Adopted by private
(MCQs)
individuals, research

Questionnaire workers, private and public

organizations and even by

Open governments
Ended
o A questionnaire consists of a

set of questions printed or

typed in a definite order on a

form or set of forms.


Questionnaire methods

Low cost Low rate of response


Free from interviewer bias Education & co operation
Adequate time to respondents needed
Respondents can be reached Control over questionnaire lost
conveniently Inbuilt inflexibility
Large samples Ambiguous replies, omission of
replies
Difficult to know whether truly
representative
Slowest method
Schedules
• Generally filled out by enumerators who are
specially appointed
• So non-response is low
• Information collected is usually complete & accurate
• Population census
Differences
Schedule Questionnaire
Cost High Low

Response rate Higher Lower


Completion of High Low
questionnaire

Complexity of Can be high Should be minimized


questions

Interviewer bias May be present Not relevant

Interviewer variability May be present Not relevant

Total study duration Considerably fast Slow


Other methods
WARRANTY CARDS:
 Used by dealers of consumer durables to collect information
regarding their products.

DISTRIBUTOR AND STORE AUDITS:


 Performed by distributors and manufacturers through their
salesmen at regular intervals.
 Used to get information to estimate the market size, market
share, seasonal purchasing pattern etc.
Secondary data
Published data Unpublished data

1. Various publications of central, state or local govts.


2.1. Various publications of foreign govts or of
Diaries, letters.
international bodies and their subsidiary
2. organizations.
Unpublished biographies and autobiographies.
3. Technical and trade journals.
Published Unpublished
4.3. Books, magazines
May be available and newspapers.
data with scholars,
data research
5. Reports
workers,and publications
trade of various
associations, associations
labor bureaus and
6. Reports prepared
other public by research
or private scholars,
individuals or universities.
7. Public records and statistics, historical documents
organizations.
and other sources of information.
ORGANISING THE DATA
Organizing the Data
• Data when collected in original form is called “raw data”.
• Ex:
Organizing the Data

Coding the
Data

Frequency Editing the


distribution Data
Organizing the Data
• Coding the Data:
o Translation of information into numerical Form

Qualitative • Ex:sex:(male/female)
data

Quantitativ • (Male=1,
e data • Female =2)
Organizing the Data
• Editing the Data:
o Examining the raw data for errors & omissions &
correcting them
o Condensed into manageable groups & tables which
are amenable for further analysis.
Organizing the Data
• Frequency Distribution
o The researches organizes the raw data by using
frequency distribution.
o The frequency is the number of values in a
specific class of data.
o A frequency distribution is the organizing of raw
data in table form, using classes and frequencies.
Organizing the Data
• For the first data set, a frequency distribution is
shown as follow:

Raw Data Frequency distribution

Class limits Frequency


1-3 10
4-6 14
7-9 10
10-12 6
13-15 5
16-18 5
• In summary, Organization of the Data
gives
o a meaning full , intelligent way of listing the data

o Enables the reader to make comparisons among classes


o Enables the reader to have a crude impression of the shape of
distribution
Analyzing the Data
Analyzing the Data
• Classification and tabulation of data are helpful in
reducing and understanding the bulk of large mass
of data.

• But they are descriptive in nature.

• So there is a need to find a constant which will be a


representative of a group of data
• So, such Constants are:

Dispersion
(variation)
Measure
of central Shape
tendency
Numerica
l data
(constant)
Measurement of central tendency

• It is seen that after collection of the data, it has

to summarize by means of a couple of numbers that are


descriptive of the entire data.

• The statistical measures that describe such


characteristics as the center or middle of the data are
called Measure of the central tendency or Measure of
location .

03/12/2020 65
IN SIMPLE TERM MEASURE OF
CENTRAL TENDANCY MEANS
Condense the
entire mass of
data

OBJECTIVES
OF CENTRAL
TENDENCY

Facilitate
comparison

03/12/2020 67
MEASURE OF CENTRAL TENDENCY

1) It should be easy to understand and compute.

2) Should be based on each and every item in the series.

3) It should not be affected by extreme observation

(either too small or too big)

4) Should be capable of further statistical computations

5) It should have sampling stability

03/12/2020 68
Measure of central tendency

Harmonic mean

Geometric mean

MODE

MEDIAN

WEIGHTED MEAN

ARITHMETIC
MEAN
ARITHMETIC MEAN
• Mean is obtained by summing up all the observations
and dividing the total by the number of observations
Example
• Community dentist selected 7 chronic periodontitis patients
and measured their attachment loss in mm
• His observations were (in mm): 2.5, 3.1, 1.9, 2.0, 2.97,
1.75, 3.7.
• The mean attachment loss in millimeters of seven
periodontitis patients are given by

03/12/2020 71
For Grouped Data , Mean can be calculated by
using the formula :

•  Mean =

• OR
_
x = (f x) / n
Where,
x = Mid value of the class
f = Frequency of the class
n = Total number of observations
Merits Demerits

1.Rigidly defined 1. Affected by


easy to understand extreme values

2.It based upon all 2. If one


the observations observation is
missed, mean can’t
be calculated.
3.Amenable to
algebraic treatment
3.Can’t be
calculated by
inspection
Calculated for quantitative data , measured in interval/ ratio lev

What is the
difference
between MEAN
and
AVERAGE ??????

WEIGHTED MEAN
• In order to properly reflect the relative impor­tance of the
observations, it is necessary to assign them weights and
then calculate a mean

• Let X1+X2 …..Xn be the measurement,

• w1 ,w2. . . . Wn - weights of the measurements

• But we know that sum of the weights = 1.0


n
• i.e, w
i 1
1  w1  w2  .........  wn  1
• Therefore weighted mean (or weighted average) is given by

X  w1 X 1  w2 X 2  .........wn X n
n

w X
i 1
i i

• =

• Ex: Internal assessment given in BDS examinations


• Internal assessment = marks scored in 3 sessional examination (60%) +
(25 )
record , threshold completion in time (20%) + good behavior and attendance
(20%)
MEDIAN
• When all the observations of a variable are

arranged in either ascending or descending order of

magnitude the middle value is median.


th
 n 1
• Median= Size of the   item.
 2 

If n is odd number,

• Median divides the observations exactly into half

03/12/2020 78
MEDIAN
• If n is even number,
Median is the MEAN of the middle two terms.

• E.g.: Erythrocyte sedimentation rate of 8 subjects is 7, 5,


3, 4, 6, 4, 5, 2. Calculate median.

Here n = 8,

2, 3, 4, 4, 5, 5, 6, 7,

• Median = [4+5] / 2 = 4.5

03/12/2020 79
Median
Easy to understand It is not based
and calculate. upon all the
observation.
Not affected by
extreme values It is not amenable
to algebraic
treatment
MODE
It is the value of the variable which occurs most
frequently in a series of observations.
E.g.: Find the mode respiration rate per minute in 9 cases
when the rate was found to be 23, 22, 20, 24, 16, 17, 22, 18,
19.
The value of mode is 22.

If the Mode is ill-defined, formula is:

Mode = 3Median-2mean.
03/12/2020 81
Mode
Merits

It is easy to understand and


calculate
It is not based on all the
observations
It can be calculated by
graphically also
In some cases mode is ill
defined
It is not affected by
fluctuations of samp`ling

Demerits
Calculated both from
qualitative and quantitative
data
Geometric mean
• Geometric mean is the rate of growth is multiplicative not
additive
• Ex:
o During a flu epidemic, 80 cases were reported to the county public
health department in the first week, 160 cases in the second week, 320
cases in the third week, and 640 cases in the fourth week

• Geometric mean can be calculated by

 
1
X G  n X 1 * X 2 * ........ X n  X 1. X 2 ......... X n n

• Ex:
o The geometric mean of two values 4 and 6 is  416  8
Harmonic mean
•  
• Example :
o Suppose a dental clinic is 10 miles away from Raj’s home. On the
way to his office the traffic was light and Raj was able to drive 60
miles per hour. However, on the return trip the traffic was heavy and
he drove 30 miles per hour and he totally travelled 20 miles in 30
min. What was the average speed?

• Average speed = ()=45 miles/hour


• But according to physics actual avrg velocity=

• = = 40mph
Harmonic mean
• It is defined as the reciprocal of the arithmetic mean of
the reciprocals of the n observations
• Harmonic mean is given by
1
X HM 
1 n
( 1 / X i )
n i 1

• Previous example can be calculated by


• X 1 2 120
HM     40 mph
1 1 1  1 2  3
(  )   
2 60 30  60 60 
• Ex:
o Randomly 7 each dental and medical students are selected for
plaque control evaluation
o From all the subjects sample of supra gingival plaque collected ,
incubated for 24hrs
o Bacterial Colony Growth (BCG) is observed, they are as follows
o MEDICAL:30,150,250,280,310,410,530
o DENTAL: 230,260,265,280,295,300,330
o MEAN=MEDICAL=DENTAL=280
So, measure of central tendency
Dental
alone s not enough to describe
the data Medical

280
Colony count
Measures of variability

Coefficient of
Range
variation

Standard
Percentile
deviation

Inter-quartile
Range
Range
• Distance b/n largest and smallest observation
• R = X max – X min
• Ex:
o From the last example,

o For medical students: R = 530 – 30 = 500

o For dental students : R = 330 – 230 = 100

• Depends on 2 extreme values


• Rarely used
Percentile
• A point below which a specified percent of observations lie
• Percentage ≠ Percentile

• If a Dental student has scored 82 of possible 100 on entrance


exam,
o he obtained percentage score =82.

• But it doesn’t explain the exact position of his score relative to


the scores obtained by entire dental students who took the
same exam.
• On the other hand, his score is 82, corresponds to 90th
percentile
o He has performed better than 90% of all the dental students
Interquartile Range
• In quartiles, data is divided into 4 equal parts
• 25th,50th,and 75th percentile are known as 1st,2nd and 3rd
quartile
• Inter quartile range is the distance b/n Q1 and Q3
• IQR = Q3 – Q1

• Better than range


Standard Deviation
• It is defined as the square root of the arithmetic
mean of the squared deviation of the individual
values from their arithmetic mean.

S.D. = √(x -̅x )2 / n-1 for small samples

= √(x - ̅x )2 / n for large samples

x = mean of the observations


In simple word…,
• The Standard Deviation is a measure of how
spread out numbers are.
• SD(σ) = √ variance
• Variance :
o The average of the squared differences from the Mean.
Example

The heights are: 600mm, 470mm, 170mm, 430mm and 300mm.

What is the Standard deviation???


Example cont..
•  
• First step :find out the Mean

• Mean = = 394
• Second step: we calculate each dogs difference from the
Mean
• Third step : calculate the variance, i.e. take each
difference, square it, and then average the
result
• Forth step: calculate the standard deviation

• Standard Deviation: σ = √21,704 = 147.32... ≈ 147


Coefficient of Variation
•  
• A measure used to compare the variability
among 2 or more sets of data presenting
different quantities withdifferent unit of
measurement
• CV = * 100
Example of CV
o Strength of 2 types of prefabricated posts, i.e. Carbon
Fiber Post (CFP), Polyethylene Fiber-Reinforced
Post (PFRP)
Mean SD
o
Carbon Fiber Post 67.57 kg 26.57 kg
(CFP),
Fiber-Reinforced 132.55 lbs. 36.19 lbs.
Post (PFRP)

26 . 57
CV CFP  x100  39 . 33 %
CV of the fracture load for CFP and are 67 . 57
36 . 19
CV  x 100  27 . 30 %
CV of the fracture load for PFRP are . PFRP
132 . 55

This indicates a PFRP post is less dispersed and more precise.


Population vs. samples
Sampling
distribution
Sampling
Standard error
• The standard error is the standard deviation of the
sampling distribution of a statistic.
• It is a measure of variability of the mean sample or
the variation in the mean values.

Standard deviation
Standard error is =
 Sample size

03/12/2020 104
Skeweness
• Skeweness means lack of symmetry.

• Skewness indicate whether the curve is turned more to


one side than to the other, i.e the curve has a longer tail
on one side.

03/12/2020 105
• Skewness is said to be ‘positive’ if the curve is more
elongated to the right side i.e mean > median.
• Skewness is said to be ‘negative’ if the curve is more
elongated to the left side i.e median > mean.

03/12/2020 106
Kurtosis
• The relative flatness or peakedness of the frequency
curve is called kurtosis.

03/12/2020 107
If the value of the Skeweness is zero and the value of the
kurtosis is 3, then the frequency distribution is known as
normal distribution

03/12/2020 108
Good Morning
PRINCIPLE OF
BIOSTATISTICS AND ITS
APPLICATION IN
DENTISTRY
part 2

BY:
MANOHAR BHAT
1ST yr. PG
Contents
• Introduction
• Basic terminologies

• Principles of biostatistics and its application


in dentistry
o Collection of the data
o Organize the data

o Analyze the data


o Presenting the data

o Interpret the data

• Statistical packages
• Conclusions

NORMAL DISTRIBUTION

After collection of large samples, prepare a


frequency Distribution with small class intervals, to
see the following points:

03/12/2020 114
NORMAL DISTRIBUTION

1) Some observations are above the mean and


others are below the mean.

2) If they are arranged in order, deviating


towards the extremes from the mean, on plus or
minus side, maximum number of frequencies
will be seen in the middle around the mean and
fewer at the extremes, decreasing smoothly on
both03/12/2020
sides. 115
NORMAL DISTRIBUTION

3) Normally almost half the observations lie above


and half lie below the mean.

4) All the observations are symmetrically


distributed on each side of the mean.

A distribution of this nature is called as NORMAL


DISTRIBUTION OR GAUSSIAN
DISTRIBUTION.
03/12/2020 116
Normal distribution
Importance of Normal
Distribution
• Most of the biometric characters tend to follow normal
distribution
• All distributions can be converted into normal distribution
for large samples
• Test of significance are developed based on the
assumption of normality
Null hypothesis and
Alternative hypothesis
• It is a test of statistical hypothesis, it is a criteria which
specifies for what sample results of the hypothesis is to
be accepted or rejected.

• The hypothesis which is to be tested is generally called


the null hypothesis (Ho)

• The hypothesis against which it is to be tested is


generally called the alternative hypothesis (H1)
03/12/2020 119
REGIONS OF ACCEPTANCE
AND REJECTION

The region of acceptance may be defined as a range


of values such that if the sample statistic falls in the
range, the null hypothesis is accepted.

The region of rejection may be defined as a range of


values such that if the sample statistic don’t falls in
the range, the null hypothesis is rejected.

03/12/2020 120
The limits of acceptance and rejection region may
be constructed from mean ± 1.96SE (5%) and mean
± 2.58SE (1%)
Acceptance region

Rejection region
Rejection region

-1.96 +1.96
95%CL
03/12/2020 121
Type 1 & type 2 error

Patient doesn’t have
Patient has disease &
the disease &
diagnostic test detects
diagnostic test doesn’t
the condition
detect the disease

True True
Positive negative

False False
negative positive

Type 2
Patient has the
disease but Type 1

Patient doesn’t have
the disease &
diagnostic test is diagnostic test detects
negative
error error the disease
Type 1 and type 2 errors
` Diseased Healthy Reject Null
Hypothesis
(H0), when
it is true
Diagnosed
positive
T+ F+
ROR
E R
1
Y PE
T
Diagnosed R
negative F- E RR
O T-

Y PE
T
α error
Accept Null
Hypothesis(H0),
when it is false
β error
Power of the test
TESTS OF SIGNIFICANCE

Are the mathematical methods by which


probability (P) or relative frequency of an
observed difference occurring by chance is
found

03/12/2020 125
P value
• “p” values are used to assess the degree of
dissimilarity between two or more sets of
measurements or between one set of measurements
and a standard.
• p values measure the strength of evidence in scientific
studies.

• p value between 0.05 and 0.01 – statistically significant


• p value < 0.01 – highly (statistically) significant
• p value < 0.001 or 0.005 – very highly significant
Procedure and steps
1. Find the type of problem and question to the
answer.

2. State null hypothesis

3. State the alternative hypothesis

4. Selection of appropriate tests to be utilized

5. Fixation of the level of significance (α)

03/12/2020 127
Procedure and steps
6. Comparison of the calculated test criterion value with
that of theoretical at the prefixed level of significance

7. Test criterion value is ↓ theoretical value – null hyp 


ACCEPT.

Test criterion value is ↑ theoretical value – null hyp 


REJECT.

8. Drawing the conclusion

03/12/2020 128
Types of Tests of
Significance

1. Parametric tests

2. Non-Parametric tests

03/12/2020 129
Types of tests of significance
Parametric tests
• Z-test for large samples & Z-proportionality test

• Students t-test
• Unpaired t-test for small samples
• Paired t-test for small samples

• Chi-square test

• Poisson test

• Analysis of Variance

• Analysis of Covariance
03/12/2020 130
Z-TEST (Large Samples)
• This test is for testing significant difference between
two means (n>30).
• They compare between two means to suggest whether
both samples come from the same population
CRITERIA
1. Random Samples

2. Quantitative Data

3. Normally Distributed & n>30

x1 - x2
Where, SE( x1 -x2) is defined
z=
SE( x1 - x2)
03/12/2020 131
Z-PROPORTIONALITY TEST

For significant difference between two proportions,

P1 - P2
z =
SE( P1 - P2)

Where, P1 = Prop. of Ist sample


P2 = Prop. of IInd Sample

Where, SE (P1 - P2) is defined

03/12/2020 132
t-TEST:
This test applied to small samples

t-test Unpaired t-test


Paired t-test
CRITERIA
1. Random samples
2. Quantitative Data
3. Normally Distributed & sample size <30

03/12/2020 133
UNPAIRED t-TEST

Applied to independent samples checking


significant difference between two means
x1 - x 2
t=
SE (x1 - x2)

SE(x1 - x2) =  /n
1 1 +  2 /n2

 1
and  2 respectively called S.D’s
03/12/2020 134
PAIRED t-TEST
It is applied to paired data of independent
observation from one sample only when each
individual gives pair of observation
d
t =
SD/ n
Where,
d = difference between x1 and x2
SD = Std. deviation for the difference
n = sample size
CHI-SQUARE TEST:

Is an alternative method of testing the significance


difference between two or more proportions.

(O - E)2
2 =
E
Where, O = Observed frequency
E = Expected frequency & it is given by

E = Row total x column total


Total of the total
03/12/2020 136
CHI-SQUARE TEST:
1)Test of proportion

2)Test of independence or Association


(if there is no association between 2 variables. E.g.,

fluorides & gingivitis)

3) Goodness of fit:
(to check if data is normally distributed)

03/12/2020 137
Poisson test
• It is a discrete distribution of the number of

times a rare event occurs.

• The relationship between variance and mean suggests


whether an observed frequency distribution fits in
poisson fashion or not.

2
P= SD
X
03/12/2020 138
Analysis Of Variance (ANOVA)

• The t test is the efficient method of testing the significance


of the difference between two population means.

• ANOVA is an extension of two sample t tests to three or


more samples.

• Ex: to decide whether there is a difference in the


effectiveness of the 4 commercial denture cleansers in
eliminating oral pathogens.
03/12/2020 139
Two way ANOVA
• Utilized when there is a need to study the impact
of two factors on variations in a specific variable
• Ex:
o The effect of age and sex on variations in height.

o The effects of grades of socio-economic factors and literacy levels of mothers on


the variations in protein quality or dietary adequacies of the children

03/12/2020 140
Multiple Comparison test Procedures
post hoc procedure
• If there are >two treatment groups
• Ex: treatment groups A, B, C, and D

Multiple comparisons are A vs B; A vs C; A vs D; B vs C;


B vs D; and C vs D
• TESTS
o Tukey’s method : when all the groups are in similar in size

o Scheffe’s method :when all the groups are differ in size

o Dunken’s method : with one control group with multiple test groups

o Dunnett’s method :used for all combinations of paired comparisons among


treatment groups with control group
03/12/2020 141
Analysis of covariance (ANCOVA)

• Analysis of covariance is a more sophisticated method of


analysis of variance.
• It is based on inclusion of supplementary variables
(covariates) into the model.
• This lets account for inter-group variation associated not with
the "treatment" itself, but with covariate(s).

03/12/2020 142
• Ex: hyper sensitivity study which we conducted

Prod
uct
A

Prod
uct B

Prod
uct C + =Age ANOVA

ANCOVA
LIMITATIONS OF TESTS OF HYPOTHESIS
1 ) These tests are not decision making itself but only
useful aids for decision making. Hence proper
interpretation of the statistical evidence is important to
intelligent decisions.

2 ) Tests don’t explain why differences exist between the two


samples.

03/12/2020 144
LIMITATIONS OF TESTS OF HYPOTHESIS
3 ) Results of significance are based on probability and as
such can’t be expressed with full certainty

4 ) Statistical inferences based on the significance tests


can’t be said to be entirely correct evidences concerning
the truth of the hypothesis. This is specially so in the case
of small samples

03/12/2020 145
NON- PARAMETRIC TESTS
• Tests in which the population from which the samples are
drawn is not normally distributed  alternative
procedures based on less stringent assumptions  Non
parametric tests (Distribution free statistics)

• These procedures should be used when the researcher


has any doubt about normality assumption

• Even if the assumption of normality is reasonable non-


parametric tests can be used

03/12/2020 146
NON- PARAMETRIC TESTS

ADVANTAGES
• Can be used without normality assumption

• Can be used with nominal or ordinal data


• Computation lighter & easy to understand
• Less sensitive to measurement errors than parametric
techniques : because they deal with ranks rather than
actual observed values

03/12/2020 147
NON- PARAMETRIC TESTS
DISADVANTAGES
• They tend to use less information than parametric methods.

• Less sensitive than parametric tests larger differences


required to reject null hypothesis.

• Less efficient than their parametric counterparts So larger


sample sizes are required to overcome the loss of
information.

03/12/2020 148
NON- PARAMETRIC TESTS

• Sign test • Kruskal Wallis test.

• Fisher Irwin test • Friedman test

• Wilcoxon rank sum test. • McNemer test


(Mann Whitney U-Test)

• One sample run test


Sign tests
• It is a very simple non parametric test applicable

when we sample a continuous symmetrical population in


which the probability of getting a sample value less than
mean is half and probability of getting a sample value
greater than mean is also half.

• Then null hypothesis is tested against an alternative


hypothesis by replacing every item of sample with (+) and
(-)which are above and below the mean(H0)
Fisher Irwin test
• This test is used in testing a hypothesis concerning no
difference among two sets of data.

• It is employed to determine whether one can reasonably


assume, for example, two supposedly different
treatments are in fact different in terms of the results
they produce.

• This test is applied when observation can be classified


into ‘fail’ or ‘pass’ or ‘yes’ or ‘no’
One sample run test

• There are many applications in which it is difficult to


decide whether the sample used is random one or not.

• One sample run test is a test used to judge the


randomness of a sample on the basis of the orders in
which the observation are taken.
Wilcoxon rank sum test.

(Mann Whitney U-Test )


• The Wilcoxon rank sum test considers the magnitude of
the differences via ranks, was developed to test the null
hypothesis that there are no differences in the two
treatments, i.e the two sample come from the same
populations

• It only requires that the samples are from the continuous


distributions and it avoid ties.

• It is more informative than sign test


03/12/2020 153
Kruskal Wallis test.
• In ANOVA procedure, population distribution must be

normal, or approximately normal and variance must be

equal, when these assumptions are not satisfied, a non

parametric test is applied called as Kruskal Wallis test.

• The population distribution should be continuous with

same variance and Skeweness. It is recommended that at

least five samples must be drawn from population


Friedman test
• Kruskal wallis test is for k- independent sample , but
Friedman test is for k-related sample.

• When the data is from K matched samples and the data


is discrete, the Friedman two way ANOVA by ranks is
utilized to test the hypothesis.

• K samples has to be drawn from the same population


and should have same median value.

03/12/2020 155
McNemer test
• It is one of the important test often used when the data
happens to be nominal and related to two related
samples.

• This test is useful specially with before and after


measurement of the same subjects.

• When we study or relate samples equal to or more than


three then the test applied is called as Cochran test

03/12/2020 156
Sl No Situation Parametric test Non Parametric test

1 < 30 Student’s t test


Mann Whitney U test
2 > 30 z / T-test

3 > 2 groups ANOVA Kruskal Wallis test

4 Association Karl Pearson’s Spearman’s Rank


between correlation correlation
variables

03/12/2020 157
Correlation

• The relationship or association b/w two quantitatively


measured or continuous variable is called correlation but
it does not prove that one particular variable alone causes
the change in the other. The cause of change in the same
or opposite direction may be due to other factors.

• The extent or degree of relationship is measured in


another parameter called coefficient of correlation.(r)

• Ranges b/w minus one(-1) to plus one(+1) i. e -1r +1


• If r = +1 then both the variables X and Y rise or fall

in the same proportion. It is an indication of perfect


positive correlation.

• If r = -1 then both the variables X and Y rise and fall in


opposite direction to one another.

• If r = 0 then both the variables X and Y are independent of


one another. These variable are independent variable.

03/12/2020 159
Scatter diagrams
Examples
Scattered diagram
Example
Scattered diagram
• Correlation coefficient is given by the formula:

• Limit for correlation coefficient is from -1 to +1


• i.e. -1 < r < +1
Classification of correlation
coefficient

Statistics without tears,A primer for non mathematicians


by: Rowmtree D (1981), p – 147-153
Regression analysis
• Regression is used more broadly to describe
relationships between variables.

• Regression analysis is a means of studying variation of


one quantity (dependent variable) at selected value of
another quantity (independent variable).

• Straight line relation between them is given by the


equation y = a+ bx where a and b are constants
Examples

• To predict family’s dental and medical expenditure in


terms of household income.

• To describe the relationship between drug potency and


the assay response.

• To describe the relationship between height and weight of


dentists

03/12/2020 171
Simple linear regression
• To study the statistical relationship between two

variables, first we plot the independent and


dependent variables on a square plot.
Y-Values
3
2.5
2
Y-Values
1.5
1
0.5
0
0.5 1 1.5 2 2.5 3

03/12/2020 172
Multiple regression
• In simple linear regression we study only two variables,
but the prediction will improve if we consider and
include other independent variables.

o Prediction of blood pressure (y), based on age (x1) and


the amount of weekly exercise (x2)
o Sales volume of toothpaste (y) predicted from the price
(X1), advertisement expenditure (X2) and quality of the
product (X3)

03/12/2020 173
Advantages achieved by presenting data

1.Data become concise without losing details.

2.Arouse interest in reader.

3.Simple & meaningful.

4.Define the problem & suggest solution too.

5.Helpful in further analysis

03/12/2020 174
PRESENTING THE DATA
Methods of presentation of data
1) Tabulation.

2) Drawing

Diagrams

Graphs

03/12/2020 176
Methods of presentation of data

Tabulation.

Tabulation are devices for presenting data from a mass


of statistical data.

Preparation of the frequency distribution table is the


first requirement

03/12/2020 177
Diagrammatic Representation
1.One Dimensional Diagrams
i. Simple Bar Diagram
ii. Multiple Bar Diagram
iii. Component Bar Diagram
iv. Percentage Bar Diagram

2.Two Dimensional Diagram


Pie-Diagram/Sector diagram

3. Pictograms/Picture diagram

4. Map diagram or spot map


03/12/2020 178
Year wise Distribution of subjects
studying BDS
• Simple Table Simple Bar Diagram

years No.

93
94

subjects 92

Subjects
90

First 93 88

85
86
Second 84

84
84

Third 85

81
82

80

Fourth 81 78

76

Total 313 74
First Second Third Four

03/12/2020 179
Distribution of subjects by Year and sex
Year Male Female Multiple Bar Diagram

First 39 54 60

50
54
Second 38 46 40 46 47
39 38
30 38
34
Third 47 38 20

10 17
Fourth 34 17 0
First Second Third Fourth

Total 158 155 Male Female

03/12/2020 180
Component Bar Diagram/proportional
Bar diagram
• Distribution of subjects by blood groups and sex:

• The bars are constructed on the basis of total and The total divided
into its components.

Blood Grp Male Female Total

A 39 54 93

B 38 46 84

O 47 38 85

AB 34 17 51
03/12/2020 181
Component Bar Diagram/proportional
Bar diagram
• Distribution of subjects by blood groups and sex
N o . of s tu de n ts

100
90
80 39
70 38 47
60
50
40 34
54 46
30 38
20 17
10
0
A B AB O
03/12/2020 Male Female
182
Percentage Bar Diagram
The absolute values are converted into percentage, and are
presented accordingly.
Percentage distribution of subjects by blood groups

Blood Grp Male(%) Female(%) Total


A 41.9 58.1 93
B 45.2 54.8 84
O 55.3 44.7 85
AB 66.7 33.3 51

03/12/2020 183
Percentage Bar Diagram
• Percentage distribution of subjects by blood groups and
sex:
Percentage

100%
90%
80%
70%
60%
50%
40%
30%
20%
10%
0%
A B AB O

03/12/2020 Male Female


184
Pie or sector diagram (Two dimensional
diagram)
•  Area diagrams.

• Most frequently used.

• Frequencies of the groups are shown in a circle.

• Degree of angle denote the frequency & area of the

sector.
• Size of each angle is calculated with the following

formula:
Pie- Diagram
• Year-Wise Distribution of Study Population

Year No Angles FOURTH year


FIRST year
59 degree
107 degree

First 93 107
Second 84 97
Third 85 98
Fourth 51 59 THIRD year
98 degree SECOND year

Total 313 360 97 degree

03/12/2020 186
Pictogram or Picture diagram
Popular method to explain the frequency of

occurrence of events to a common man.

Pictures are drawn in a horizontal line.

Each picture indicates a unit of 10/20/50 happenings.

Number of pictures in each row – Relative frequency

of an attribute.

03/12/2020 187
• Eg. Number of deaths due to cholera in four
cities
Presentation
City Deaths
A 200

B 400

C 800

D 600

03/12/2020 188
Map diagram or spot map
Geographical distribution of frequencies of a
characteristic.
A dot indicates one unit of occurrence.

03/12/2020 189
Graphical Representation
(Quantitative Data)
• Histogram

• Frequency Polygon

• Line chart or Graph

• Stem and Leaf plot

• Scatter or dot diagram


03/12/2020 190
Frequency Distribution Table
• Distribution of patients according to their plaque scores

Plaque score Frequency


0.5-1.0 10
1.0-1.5 15
1.5-2.0 20
2.0-2.5 25
2.5-3.0 12
6
3.0-3.5
2
3.5-4.0

03/12/2020 191
Histogram
• It is a graphical representation of frequency distribution.

• Variable character of different groups is indicated on X


axis called abscissa.

• Number of observations are marked on the y axis called


ordinate

03/12/2020 192
Histogram
• Distribution of patients according to their plaque score
Frequency

28 25

24 20
20 15
16 12
10
12
6
8
2
4
0
0.5-1.0 1.0-1.5 1.5-2.0 2.0-2.5 2.5-3.0 3.0-3.5 3.5-4.0

Plaque Class

03/12/2020 193
Frequency polygon
• It is an area diagram of frequency distribution

developed over a histogram.

• It is formed by joining mid points of class intervals.

• When data is very large with narrow class intervals


frequency polygon will form smooth curve and called
frequency curve

03/12/2020 194
Frequency polygon
• Distribution of patients according to their plaque score.
Frequency

28
24
20
16
12
8
4
0
0.5-1.0 1.0-1.5 1.5-2.0 2.0-2.5 2.5-3.0 3.0-3.5 3.5-4.0

Plaque Class
03/12/2020 195
Line Chart or Graph
Frequency polygon representing variations by a line.
Shows trend of an event occurring over a period of time.

600
Population
500
in millions
400

300

200
1901 1911 1921 1931 1941 1951 1961 1971
03/12/2020
years 196
Stem and Leaf Plot
• Uses part of the data as
“Stem” and part of the
data as “Leaf”
• They grouped in such a
way that individual
observed values are
retained while shape of
observations are shown
Scatter or Dot diagram

 Correlation diagram.
 Graphic presentation showing the nature of
correlation between 2 variables.
Scatter diagram showing + ve correlation.

.
..
03/12/2020
. 198
Interpreting the data
INTERPRIT

All meanings, we know,


depend on the key of
interpretation.’

-George Eliot
Interpretation of the data
• Numbers do not speak for themselves.

• Interpretation is the process of attaching


meaning to the data
• .

INTERPRITATION
Interpretation
• Interpretation demands fair and careful judgments.
Often the same data can be interpreted in different
ways. So, it is helpful to involve others or take time to
hear how different people interpret the same
information
• Interpretation is done based on the knowledge of
collection , organization , analyzation and presentation
of the data
Statistical packages
• STATA
• SPSS
• Statistica
• Biostat

• Epi Info
o Ralloc
o nMASTER
Some of the online
biostatistics calculators
• STATISTIC CALCULATOR - VERSION 3
o http://www.danielsoper.com/statcalc3/default.aspx

• STAT TREK
o http://stattrek.com/tables/stattables.aspx

• FREE ONLINE T-TEST CALCULATOR


o http://studentsttest.com/

• GRAFF-PAD SOFTWEAR
o http://www.graphpad.com/quickcalcs/index.cfm

• ALOULA ONLINE CALCULATOR


o http://www.alcula.com/
Conclusions
• Biostatistics is an essential tool in health sciences research. It help assess

treatment effects, compare different treatment options, understand how

treatments interact, and evaluate many life and death situations in dental and

medical sciences.

• Knowledge of basic concepts in biostatistics is essential for decision making

at the community level and for setting up of policies.

• In this era of evidence based studies, biostatistics lays down the scientific

foundation03/12/2020
for rational thinking. 204
References
• Biostatistics for oral health care-1st edition; by Jay S. kim

• Biostatistics –a manual of statistical methods for use in


health, nutrition and anthropology-2nd edition by K.V Rao
• Essentials of preventive and community dentistry-3rd
edition; by soben peter.
• Research methodology- methods and techniques- second
revised edition by C.R Kothari
• Methods in biostatistics- sixth edition by B.K Mahajan
References
• Armitage P. Statistical Methods in Medical Research (1971).
Blackwell Scientific Publications. Oxford. P.189-207
• Steel R.G.D., Torrie J.H., Dickey D.A. Principles and Procedures of
Statistics. A Biomedical Approach. 3rd. Ed. (1997)ISBN 0-07-
061028-2 p. 191-192
• Studentised range tables : Pearson ES, Hartley HO (1966)
Biometrika table for statisticians Ed. 3 Table 29.
• Pedhazur E.J. Multiple regression in behavioral research
explanation and prediction (3rd Ed) 1993. Harcourt Brace College
Publishers, Orlando Florida. ISBN 0-03-072831-2 p. 369-371
References
• Online access:
o http://www.mathsisfun.com/
o http://www.stattools.net/Posthoc_Exp.php
o http://
www.mathsisfun.com/data/standard-deviation-formulas.html
o http://biostatistics.oxfordjournals.org/
o http://en.wikipedia.org/wiki/Biostatistics
o www.google.com/images
Thank you

Attacking is the best way of defense


-Chanakya

You might also like