BIOSTATISTICS

BIOSTATISTICS
DR SIDHARTHA MANGAL BORDOLOI

INTRODUCTION
STATISTICS
-Is the science, which deals with collecting, organizing,
analyzing, tabulating and interpretation of the data, for the
purpose of making inferences about a population from which the
data are obtained.
BIOSTATISTICS-
–Is the branch of statistics applied to biological or medical
sciences (biometry).
OR
- Is that branch of statistics concerned with mathematical
facts and data relating to biological events.
Functions Of Biostatistics -
1. It simplifies and reduces the bulk of data.

2. Add precession to thinking
3. It helps to compare different sets of figure.
4. Helps in planning the programme
5. Helps in studying relationship between different
facts.
Application and Uses of Biostatistics as a Science
In pharmacology
i. To find the action of drug.
ii. To compare the action of two different drugs or
two successive dosages of the same drug.
iii. To find the relative potency of a new drug with
respect to a standard drug.
In medicine
i. To compare the efficacy of a particular drug,
operation or line of treatment.
ii. To find an association between two attributes such as
cancer and smoking or filariasis and social class—an
appropriate test is applied for this purpose.
iii. To identify signs and symptoms of a disease or
syndrome.
In community medicine and public health- to test
usefulness of sera and vaccines .
 Different health status & different health problem
can be measured.
Helps in comparing health status of one country.
Comparing the present status with the past
For planning & administration of health services.
For Prediction of health trends.
Helps in evaluating health programme & present on
going programme.
Basic principles of biostatistics
Collection of data
 Presentation of data
 Summarization of data
Analysis of data
Interpretation of data
Collection of data
Data are characteristics or information, usually numerical that
are collected through observation.
Depending on the nature of the variable, data is classified into 2
categories-
1)Qualitative data-
-data is collected on the basis of attributes or qualities like sex,
malocclusion, cavity etc.
- Calculated by counting the individuals and not by
measurements
2)Quantitative data-
-data is collected throgh measurement using calipers, like arch
length, arch width, flouride concentration in water supply etc.
- Measurable in whole numbers or in fractions .
SAMPLING AND SAMPLE DESIGNS
Sampling is the process of obtaining information
about an entire population by examining only a
part of it.
Sample –A representative portion of the
population or a part of the whole- is examined for
purposes of research.
Need for Sampling
Sampling is used in practice for a variety of reasons such as:
-Save time and money. A sample study is usually less

expensive than a census study and produces results at a
relatively faster speed.
-A sample study is generally conducted by trained and

experienced investigators  more accurate measurements
-Sampling remains the only way when population
contains infinitely many members.
-Sampling remains the only choice when a test involves

the destruction of the item under study.
-Sampling usually enables to estimate the sampling errors

and, thus, assists in obtaining information concerning
some characteristic of the population.
SAMPLE DESIGN
• A sample design is a definite plan for obtaining a sample
from a given population.
• Sample design may as well lay down the number of items
to be included in the sample i.e., the size of the sample.
• It is determined before data are collected.
• Should be reliable and appropriate for the research study.

SAMPLE DESIGN
Based upon the type and nature of population and
objectives of the investigation-
 Simple Random Sampling
 Systematic Sampling
 Stratified Random Sampling
 Area or Cluster Sampling
 Multistage Sampling
 Multiphase Sampling
 Pathfinder surveys.
SIMPLE RANDOM SAMPLING
 Each and every unit in the population has an equal
probability of being selected.
 It is the foundation on which probability sampling is built.
 Each selection in a simple random sample is independent of

other selections.
 Applicable only when population is small, homogenous &
the readily available
Used in experimental medicine or clinical trials
like testing the efficacy of a particular drug.
Methods for achieving random selection: lottery

method or table of random number method.
However, the most common method of random
selection is the use of a table of random numbers.
SYSTEMIC RANDOM SAMPLING
Formed by selecting one unit at random and then
selecting additional units at evenly spaced interval
till the sample of required size has been formed.
The researcher must know the number of people
in the population and the sample size desired.
Drawback – the process might skip a significant
portion of the population that is grouped together
on the list.
Stratified random sampling
 This method is followed when the population is not homogeneous.
 Population to be sampled is subdivided into groups(age/sex/genetic)
known as Strata, such that each group is homogenous in
characteristics.
 Then a simple random selection is done from each stratum.
 More representive, provide greater accuracy and concentrate on wider
geographic area.
Area or cluster sampling
Units of population are natural groups or clusters
such as villages, wards, blocks, slums of a town,
children of a school etc.
First a sample of the clusters is selected and then
all the units in each of the selected clusters are
surveyed.
The data collection in this method is simpler and
involves less time and cost than in other sampling
techniques.
There are more chances of sampling error.
Multiphase Sampling
 In this method, part of the information is collected

from the whole sample and part from the subsample.
 Eg: In a school health survey,
Phase 1- All the children in the school are examined
Phase2- ones with oral health problems selected
Phase3 section needing treatment selected.

 The number of children in the sub-samples in the
2nd and 3rd phase becomes smaller and smaller.
 Survey by such procedure will be less costly, less

laborious and more purposeful.
ERRORS IN SAMPLING
• Sampling errors are the random variations in the
sample estimates around the true population
values.
• Sampling error decreases with the increase in the
size of the sample, and it is less in homogeneous
population.
• Sampling error can be measured for a given sample design
and size. The measurement of sampling error is usually
called the ‘precision of the sampling plan’.
• If we increase the sample size, the precision can be
improved.
• But a large sized sample increases the cost of collecting
data and also enhances the systematic bias.
• Thus the effective way to increase precision is usually to
select a better sampling design which has a smaller
sampling error for a given sample size at a given cost.
2 types of errors arise in sampling investigation
sampling error-
-faulty sample design
-small size of sample
Non-sampling error-
-Coverage errors-due to non response or non-
cooperation of the informant.
-Observational error-due to interview bias or
imperfect experiment technique.
-Processing error-errors in statistical analysis.
PRESENTATION OF STATISTICAL DATA
Statistical data ones collected are presented in such a

way that they bring out the important points clearly &
strikingly.
 There are several methods of presenting data

-Charts, tables, diagrams, crafts, pictures & special
curves.
Tabulation
Tables are devices of presenting the data simply from
masses of statistical data. Tabulation is the first step
before the data is used for analyzing or interpretation,
A table can be simple or complex.
 Simple table –one way table which supply the
answer to questions about one characteristic of data
only.
eg-No. of student in a Dental Year/session
Batch college. 2018-2019
I BDS 100
II BDS 85
III BDS 86
IV BDS 70
Frequency distribution table -
 In frequency distribution table the data is first split into convenient
groups, (class intervals) and the no. of frequency which occurs in
each group is shown in the adjacent column
 eg. Following are the patient age group who attended a dental
college.
Age, - 10,11,24,25,35,40,45,41,54,15,13,33,45,42, 34,43,51,52
Age group Frequency

10-20 4
20-30 2
30-40 4
40-50 6
50-60 3
Diagrammatically & graphically representative of a
data -
Though the tabular form of data representative is the

best form of presentation, it is not easily understood
by common man, for many people numerical data are
not interesting and not appealing hence to present
statistical data in common manner Diagrammatic &
graphic methods are used.
Advantages of charts & diagrams –
-They are attractive
- They will be easily understood by common men.
- They facilitate comparison
-The impression created by them are long lasting.
Disadvantages -
-Brief statistical analysis not possible
-It gives only the rough picture of data.
-Sometime misleading.
BAR CHARTS/diagrams -
 The bar charts are popular media of presentation statistical data and
enables values to compare usually.
 Length of the bars, drawn vertical or horizontal, indicates the frequency of
a character.
 Bar chart or diagram is a popular and easy method adopted for visual
comparison of the magnitude of different frequencies in discrete data, such
as of morbidity, mortality, immunization status of population in different
ages, sexes.
 Bars may be drawn in ascending or descending order of magnitude or in
the serial order of events.
 Spacing between any two bars should be nearly equal to half of the width
of the bar.
 There are three types of bar diagrams:
Simple
multiple and
proportional bar diagram for comparison of data.
HISTOGRAM
It's a pictorial diagram of
frequency distribution.
 It consists of series of
blocks.
 The class intervals are
given along the horizontal
axis & the frequency along
the vertical axis.
The area of each block or
rectangle is proportional
to the frequency.
FREQUENCY POLYGON
A frequency distribution
may also be represented
diagrammatically by the
frequency polygon.
It is obtained by joining

the mid point of the
histogram blocks.
PIE DIAGRAM-
Instead of comparing the length
of a bar the area of segments of a
circle are compared.
The area of each segment depend
upon the angle.
A circle is divided into different
sectors corresponding to the
frequencies of the variable in the
distribution.
PICTOGRAM -
A pictogram are popular
method of presenting
data to the man in the
street and to those who
can not understand
orthodox charts.
 A small picture of
symbol are use to present
the data
 for eg a picture of a
doctor to represent the
populations per physician
Measure of statistical average or central
tendency
Cental value around which all the other observations
are distributed.
Main objective is to condense the entire mass of data
and to fasciliate the comparision.
Type of measures of central value -
Tendency or average -
1. Mean
2. Median
3. Mode
MEAN
The mean is the average value, or the sum (Σ) of all of
the observed values (xi) divided by the total number of
observations (N):
Mean x̅ = Σ xi
n
Mean is denoted by sign x̅( X bar)
 eg. Diagnostic BP of 10 individuals.

83,75,81,79,71,95,75,77,84,90
mean =x̄ =810/10=81
Advantages of Mean -
1) Easy to understand and calculate.
2) It is based on all values
3) It is most stable than any other average.
4) It can be calculated even when some of values are
equal to zero.
Disadvantages of Mean
1) Since it is based on all the values of observation even
if one of the observation is missing it can't be
calculated.
2) Affected by extreme values.
MEDIAN -
To obtain median the data is first arranged in

ascending/descending order of magnitude, than the value
of middle observation is located.
In case of even number the average of the two middle value
is taken.
Eg. –following are the number of visits to a dentist by 10
patients: 13,8,4,3,4,2,8,1,7,4
For calculating mean, numbers are first arranged in order
of magnitude as-1,2,3,4,4,4,7,8,8,13
Average of 5th and 6th patient is calculated as the median.
Median =(4+4)/2=4
MODE
Most frequently occurring observation in a data is
called .
Not often used in medical science.
EXAMPLE
Number of decayed teeth in 10 children
2,2,4,1,3,0,10,2,3,8
Mode=2(3 times)
Mean=34/10=3.4
Median=(0,1,2,2,2,3,3,4,8,10)=2+3/2=2.5
Measures of dispersion / variation
Dispersion is the degree of spread or variation of the variable

about a central value.
Helps us to study the spread of values about the central value.
A good measure of dispersion should be simple to understand,
easy to compute, should be amenable for future analysis and
should not be affected by extreme values.
Most common measures of dispersion are-
1. Range
2. Mean deviation
3. Standard deviation.
RANGE -
It is defined as
difference between
the highest & lowest
value in the series.
Gives no
information about
the values that lies
between the
extreme value.
Diastolic Mean (x̄) Deviation
Mean Deviation BP(x)
Eg.
from mean
(x-x̄)
83 81 2
It is the average of
75 81 6
deviation from the
arithmetic mean 81 81 0
MD=  ( x-x̄) 79 81 2
n 71 81 10
x=each values 95 81 14
x̄=mean 75 81 6
77 81 4
n=number of
values 84 81 3
MD=56/10=5.6 90 81 9
Total 56
STANDARD DEVIATION
Most important and widely used measure of studying
dispersion.
Also know as root mean square deviation.
Greater the standard deviation, greater will be the
magnitude of dispersion from the mean.
A small SD means a higher degree of uniformity of the
observations.
STANDARD DEVIATION
• SD =   ( x-x̄ )2
n
If sample size is more than 30
• For sample size less than 30

SD(sample SD) =   (x-x̄̄)2
n-1
• Steps in calculating deviation -
1. Take the deviation of each value from the mean (x-x̄)
2. The square each deviation (x-x̄)2
3. Add the square deviation Z (x-x̄)2
4. Divide the result by the no of observation (n)
5. Then take the square root which gives the standard deviation.
x x̄(mean) x-x̄ (x-x̄)²
Sample standard 9 6.5 2.5 6.25

deviation calculation
using sample value 2 6.5 -4.5 20.25
9,2,5,4,12,7
5 6.5 -1.5 2.25
SD =   (x-x̄̄)2
n-1 4 6.5 -2.5 6.25
N-1=(6-1)=5
12 6.5 5.5 30.25
SD=√(65.5/5)=√13.1
3.619 = 7 6.5 0.5 0.25
Total=65.5
Test of significance -
A statistical hypothesis test (also known as test of
significance) is a method of statistical inference using
data from a scientific study. Test of significance helps to
decide on the basis of a sample data, whether a hypothesis
about the population is likely to be true or false.
This is test to estimate or compare the significant
difference between 2 or more sample.
 The test verifies whether the difference are real or
made due to variation in the sampling (by chance).
 Hence the test of significance, distinguishes between
the real difference & chance differences.
Test of Significance of Difference in Means
 Tests of significance of difference in means are discussed
under two heads.
1. Z-test for large samples.
2. t-test for small samples applied as:
i. Unpaired t-test (two independent samples); and
ii. Paired t-test (single sample correlated observations).
 The two essential conditions for application of these tests are:
i. Samples are selected randomly from the corresponding
populations.
ii. There should be homogeneity of variances in the two
samples.
To test the homogeneity of variances-Fisher’s F-test
also called variance ratio test is applied.
 If the difference is found to be insignificant, there is
homogeneity of variances and the Z-test and or t-test
can be carried out.
STANDARD ERROR OF MEAN (SE X̄ )
 Gives the standard deviation of the means of several
samples from the same population.
SE X̄=SD/√n
SD=standard deviation
n=no of observation
Standard error of proportion
Standard error of proportion may be defined as a unit
that measures variation which occurs by chance in the
proportions of a character from sample to sample or
from sample to population or vice versa in a qualitative
data.
Standard error of proportion(SE p)=√pq/n
p=population proportion
q=1-p
n=sample size.
Standard error of difference between two mean
 SE(d)Between the means

= √(10.2)²/12 + (24.1)²/12
=√8.67+48.4
=√57.07
=7.5
The standard error of difference
b/w the two mean is 7.5
The actual difference b/t the
numbe mean SD two mean is 52(370-
r
318),which is more than twice
Control 12 318 10.2 the standard error of
group
difference b/w the means,
Experi 12 370 24.1 and therefore significant.
mental
group
Standard Error of Difference Between
Proportions
If proportion of two
samples are being
compaired, the standard
is given by
t- Test
• It was designed by W.S. Gossett whose pen name
was Student. Hence, this test is also called
Student’s t-test.
• One of the most commonly used test.
• “Student” derived a new distribution , known as “t”
distribution calculated as:
• t = (x̄ 1 –x̄ 2) / Smd
where x̄ 1 and x̄ 2 are the mean of two samples
Smd is the standard error of this difference.
T-test-con….
There are two types of student t test- unpaired t
test and paired t test.
Chriteria for applying t test-

Samples are randomly selected
Data utilized is quantitative
Variable normally distributed
Sample size less than 30
Two types-
PAIRED t-TEST
(If observations UNPAIRED t
made on the same TEST(if
unit of study observations can
before and after be made from 2
interventation) different groups)
T-TEST FOR COMPARING PAIRED
OBSERVATIONS
When each individual gives a pair of observations, to
test for the difference in the pair of values, paired t
test is utilized.
The test procedure for testing the significance of
difference is as followes-
1. Find the difference in each set of paired
observations before and after (X1 – X2 = x).
2. Calculate the mean of the difference (x̄).
3. Work out the SD of differences and then the SE of
mean from the same.
4. Determine ‘t’ value by substituting the above values
in the formula.
t=x̄ ∕ (SD/√n)
As per null hypothesis, there should be no real
difference in means of two sets of observations
5.Find the degrees of freedom-degree of freedom is n –
1.
6.Compair the calculated t value with the table value
for (n-1) d.f. to find the p value.
7.If the probability (P) is more than 0.05, the difference
observed has no significance. But if P is less than 0.05,
the difference observed is significant.
ANALYSIS OF VARIANCE
 Analysis for variance— often ANOVA for short
 It is a statistical technique for identifying sources of variation with

in the data.
 ANOVA is frequently used in statistics to analyze data obtained from

experiments.
 Although ANOVA can be considered as an extension of t- test, it is
more widely applicable than the t-test.
 While the t-test can be used only to test two group means, ANOVA
can test two or more than two group means.
 In ANOVA, we use an F-distribution. We draw a
sample from one population and draw a second
sample from another population, then we calculate
the F-ratio of two variances by the formula
 F = S12 / S22
ASSUMPTIONS for ANOVA
• Samples are random and independent of each other
• The distribution of dependent variable is normal. If

the distribution is skewed, the ANOVA may be
invalid.
• The group should have equal variances.

Chi-square Test for Qualitative data(x2 test)
• Chi-square test is an important non-parametric test
and it is used to make comparisons between two or
more nominal variables.
• Unlike other tests of significance, the chi-square is

used to make comparisons between frequencies.
• It was developed by Karl Pearson

• If the calculated value of χ2 is less than the table value
at a certain level of significance, the fit is considered
to be good and null hypothesis is not rejected (two
attributes are independent or not associated)- called a
test of independence.
• But if the calculated value of χ2 is greater than its

table value, the fit is not considered to be a good one
and null hypothesis is rejected (two attributes are
associated ).
χ2 value is obtained by a formula:
χ2=Σ (O-E)2
E
O= observed frequency .
E= expected frequency.
Steps Involved In Applying Chi-square Test
EXAMPLE
To study the gender-wise prevalence of oral cancer
Gender Oral cancer No oral
cancer
Males 38 7 45
Females 29 17 46
67 24 91
Null Hypothesis- There is no difference in the

prevalence of oral cancer and in males and females
Steps Involved In Applying Chi-square Test [cont…]
1) Calculate the expected frequencies on the basis of

given hypothesis or on the basis of null hypothesis
Expected frequency = (Row total) X (column total)
(Grand Total)
Calculation of expected frequencies= 45/91 X 67=33.13
45/91 X 24=11.87
46/91 X 67=33.87
46/91 X 24=12.13
2) Obtain the difference between observed and expected frequencies

and find out the squares of such differences i.e., calculate (O-E) 2.
3) Divide the quantity (O-E)2 obtained as stated above by the

corresponding expected frequency to get (O-E) 2/ E and this should
be done for all the cell frequencies or the group frequencies.
Find the summation of (O-E)2/ E i.e., Σ (O-E)2
E
 Calculation of(χ2) value=(38-33.13)2 + (7-11.87)2 + (29-33.87)2 + (17-12.13)2
33.13 11.87 33.87 12.13
=0.72 + 2.00 + 0.70 + 1.96 = 5.38

Degree of Freedom- The term “degree of freedom”

refers to the number of observations that are free to
vary.
Calculation of the degrees of freedom (df) for a
contingency table, based on the number of rows
and columns:
Degree of freedom=(r-1)x(c-1)
df = (2-1)(2-1)=1
Determination of the p value
Value from the chi-square table for 5.38 on 1 df: 0.01<p<0.025
(statistically significant)
Since the calculated value of 5.38 is higher than the table
value of 3.841 at 5% level of significance, null hypothesis is
rejected.
Interpretation- The results noted in this 2X2 table
are statistically significant because the observed p-
value is less than the alpha level (0.025<0.05).
That is, it is highly probable that the investigator can
reject the null hypothesis of independence and accept
the alternate hypothesis that there is significant
difference in the prevalence of oral cancer in males
and females
Case control study
Retrospective study
A case–control study is an observational study in which
subjects are sampled based upon presence or absence of
disease and then their prior exposure status is
determined.
Distinct feature
a. Both exposure and outcome (disease) have occurred
before the start of the study.
b. The study proceeds backwards from effect to cause.
c. It uses a control or comparison group to support or refute
an inference.
Basic Steps
1. Selection of cases and controls
2. Matching
3. Measurement of exposure
4. Analysis and interpretation
Selection of cases and controls
 SELECTION OF CASE :
The prior definition of what constitutes a case is
crucial to the case control study .
It involves 2 specifications
1. Diagnostic criteria
2. Eligibility criteria •
 SOURCE OF CASES Cases can be drawn from

hospitals or general population.
SELECTION OF CONTROLS
 THE CONTROLS MUST BE FREE FROM DISEASE UNDER
STUDY
 They must be similar to the cases as possible except for the
absence of the disease under study .
 SOURCE OF CONTROLS:
Hospitals
Relatives
Neighborhood
General population
 One control per case if many cases are available or as many
as 2,3 , or even 4 if the study group is small.
Matching
Defined as the process by which we select controls in
such a way that they are similar to cases with regards
to certain pertinent selected variables (e g. age, sex,
occupation, social status etc. ) which are known to
influence the outcome of the disease.
MEASUREMENT OF EXPOSURE :
Information of exposure should be obtained precisely
in same manner in both the cases and controls.
 It can be recorded by interview, by questionnaires or
by studying past records.
ANALYSIS :
 The final step is analysis.
1. Exposure rate among cases and controls to a
suspected factor.
2. Estimation of diseases risk associated with exposure
EXPOSURE RATES
A case control study provides a direct estimation of the
exposure to a suspected factor on the disease and non-
disease groups
Exposure rates
Cases : 𝑎/ 𝑎+𝑐
Controls : 𝑏/ 𝑏+𝑐
Case control studies of smoking and lung cancer
Exposure rate: Cases Controls
(with lung (Without
cases=a/(a+c)=33/35 cancer) lung
cancer)
=94.2%
control=b/(b+d)=55/82 Smokers 33(a) 55(b)
=67%
Non 2(c) 27(d)
this shows frequency rate smokers
of lung cancer is Total 35(a+c) 82(b+d)
definitely higher among
smokers than among
non- smokers.
ESTIMATION OF RELATIVE RISK(Odds
ratio/relative odds)
The second analytical step is estimation of disease risk
associated with exposure.
 Estimation of risk: it is obtained by an index termed as
“relative risk” or “risk ratio”, which is defined as the
probability of an event(developing a disease) occurring in
exposed people compared to the probability of the event in
non-exposed people, or the as the ratio of the two probabilities.
 Relative risk = risk in exposed / risk in non-exposed.
or odd ratio=odds that case was exposed/odds that control was
exposed.
Odds ratio is a Key Parameter in the analysis of case control
studies.
INTERPRETING ODDS RATIO
 OR = 1
-Odds of exposure among cases
and controls are same
-Exposure is not associated with
disease
 OR > 1
-Odds of exposure among cases
are higher than controls
-Exposure is positively associated
with disease
 OR < 1
- Odds of exposure among cases
are lower than controls
- Exposure is negatively associated
with disease
ADVANTAGES DISADVANTAGES
 Relatively easy to carry out  Problems of bias
 Rapid and inexpensive  Selection of an appropriate
compared to cohort study control group may be difficult
 Requires comparatively few
subjects  Incidence cannot be
 No risk to the subject measured, only relative risk
 Allows the study of several
can be estimated
 Not suited for evaluation of
different etiological factors
like effect of smoking, therapy or prophylaxis of
physical activity, diet disease
COHORT STUDY:
Definition: it is a type of study usually undertaken to
obtain additional evidence to refute or support the
existence of an association between suspected cause
and disease.
 Features of cohort studies
1. Cohorts are identified prior the appearance of the
disease under investigation .
2. The study groups are observed over a period of time
to determine the frequency of the disease among them
3. The study proceeds forward from cost to effect
COHORT SELECTION :
 Cohort is a group of people who share a common
characteristic or experience within a definite time
period.
 For selecting a cohort the following facts should be
considered
1. Cohorts must be free form the disease under study. 2.
Both the study and control cohorts should be equally
susceptible to the disease under study.
3. Both the cohort must be comparable in respect to all
possible variable which may Influence the frequency of
the disease.
STEPS OF COHORT STUDY
Selection of study subjects
 Obtaining data on exposure
 Selection of comparison groups
Follow up
 Analysis
SELECTION OF STUDY SUBJECT
Subjects of a cohort study are usually assembled from
general population or special groups .
 OBTAINING DATA ON DATA EXPOSURE :

Information about exposure is obtained directly from
any one or all of the following sources :
Cohort members , review of records , medical
examinations or special tests and environmental
surveys
SELECTION OF COMPARISON GROUPS :
 There are many ways of assembling a group
1. Internal comparison : when a single cohort enters
the study, on the basis of information obtained, be
classified into several comparison groups according to
the degrees of levels of exposure to risk before the
development of disease in question.
2.External comparison : when the degree of
comparison is not available , it is necessary to put up an
external , to evaluate the experience of the exposed
group .
3.Comparison with general population rates .
FOLLOW UP
 FOLLOW UP COMPRISES OF:
1.Period medical examination of each member if the cohort
2.Reviewing physician and hospital records
3.Routine surveillance of death records
4.Mailed questionnaires , telephone calls and periodic home
visits. •
 Limitation : A certain percentage of losses to follow up
are inevitable due to death , change of
residence ,migration , or withdrawal of occupation .
These losses may bias the result .
 ANALYSIS: -
Incidence rates
Estimation of risks
 INCIDENCE RATES:
Among cases: 𝑎/ 𝑎+𝑏
Among controls : 𝑐/ 𝑐+𝑑
• ESTIMATION OF RISK
-Relative risk
-Attributable risk
-Absolute risk
Relative Risk
 It is the direct measure of strength of the association
between suspected cause and effect.
 Relative risk:
𝑖𝑛𝑐𝑖𝑑𝑒𝑛𝑐𝑒 𝑜𝑓 𝑑𝑖𝑠𝑒𝑎𝑠𝑒 𝑎𝑚𝑜𝑛𝑔 𝑒𝑥𝑝𝑜𝑠𝑒𝑑
𝑖𝑛𝑐𝑖𝑑𝑒𝑛𝑐𝑒 𝑜𝑓 𝑑𝑖𝑠𝑒𝑎𝑠𝑒 𝑎𝑚𝑜𝑛𝑔 𝑛𝑜𝑛−𝑒𝑥𝑝𝑜𝑠𝑒𝑑
Interpretation:
 I exp= I non-exp = no correlation
 I exp> I non-exp = Positive association
 I exp < I non-exp = negative correlation
Attributable Risk:
 It is the difference in incidence rates of disease
between an exposed group and non exposed group
 = (𝑖𝑛𝑐𝑖𝑑𝑒𝑛𝑐𝑒 𝑜𝑓 𝑑𝑖𝑠𝑒𝑎𝑠𝑒 𝑟𝑎𝑡𝑒 𝑎𝑚𝑜𝑛𝑔
𝑒𝑥𝑝𝑜𝑠𝑒𝑑+𝑖𝑛𝑐𝑖𝑑𝑒𝑛𝑐𝑒 𝑜𝑓 𝑑𝑖𝑠𝑒𝑎𝑠𝑒 𝑟𝑎𝑡𝑒 𝑎𝑚𝑜𝑛𝑔 𝑛𝑜𝑛
𝑒𝑥𝑝𝑜𝑠𝑒𝑑)×100
𝑡𝑜𝑡𝑎𝑙 𝑠𝑡𝑢𝑑𝑦 𝑝𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛
 It signifies potential for prevention.
Absolute risk
 Risk of developing the disease, irrespective of
exposure to risk factor and is expressed in percentage.
 =( 𝑖𝑛𝑐𝑖𝑑𝑒𝑛𝑐𝑒 𝑎𝑚𝑜𝑛𝑔 𝑒𝑥𝑝𝑜𝑠𝑒𝑑+𝑖𝑛𝑐𝑖𝑑𝑒𝑛𝑐𝑒
𝑎𝑚𝑜𝑛𝑔 𝑛𝑜𝑛 𝑒𝑥𝑝𝑜𝑠𝑒𝑑) /total study population x100
Types of Cohort Study
Prospective cohort study
 Retrospective (historical) cohort study
 Combination of Retrospective and Prospective cohort
study.
DISADVANTAGES
ADVANTAGES
 Involve a large number of
 Incidence can be
people
calculated  unsuitable for investigating
Several possible outcomes
uncommon disease or
related to exposure can be diseases with low incidence
studied simultaneously in the population
 Provide a direct  Long time to complete study
estimation of relative risk  Loss of substantial
Dose-response ratios can proportion of cohort due to
also be calculated migration, loss of interest.
 Expensive
randomized controlled trial
A randomized controlled trial (or randomized control
trial; RCT) is a type of scientific (often medical)
experiment that aims to reduce certain sources of bias
when testing the effectiveness of new treatments.
It is a trial in which subjects are randomly assigned to
one of two groups: one (the experimental group)
receiving the intervention that is being tested, and the
other (the comparison group or control) receiving an
alternative (conventional) treatment.
The two groups are then followed up to see if there are
any differences between them in the outcome.

BIOSTATISTICS - PPTX Sidhathab

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

BIOSTATISTICS - PPTX Sidhathab

Uploaded by

Copyright:

Available Formats

DR SIDHARTHA MANGAL BORDOLOI

1. It simplifies and reduces the bulk of data.

Sampling is used in practice for a variety of reasons such as:

-Save time and money. A sample study is usually less

-A sample study is generally conducted by trained and

-Sampling remains the only choice when a test involves

-Sampling usually enables to estimate the sampling errors

• Should be reliable and appropriate for the research study.

 Each selection in a simple random sample is independent of

Methods for achieving random selection: lottery

 In this method, part of the information is collected

Phase 1- All the children in the school are examined

Phase2- ones with oral health problems selected

Phase3 section needing treatment selected.

 Survey by such procedure will be less costly, less

Statistical data ones collected are presented in such a

 There are several methods of presenting data

Age group Frequency

Though the tabular form of data representative is the

It is obtained by joining

 eg. Diagnostic BP of 10 individuals.

To obtain median the data is first arranged in

Dispersion is the degree of spread or variation of the variable

• For sample size less than 30

Sample standard 9 6.5 2.5 6.25

 SE(d)Between the means

Chriteria for applying t test-

 Analysis for variance— often ANOVA for short

 It is a statistical technique for identifying sources of variation with

 ANOVA is frequently used in statistics to analyze data obtained from

• Samples are random and independent of each other

• The distribution of dependent variable is normal. If

• The group should have equal variances.

• Unlike other tests of significance, the chi-square is

• It was developed by Karl Pearson

• But if the calculated value of χ2 is greater than its

Null Hypothesis- There is no difference in the

1) Calculate the expected frequencies on the basis of

2) Obtain the difference between observed and expected frequencies

3) Divide the quantity (O-E)2 obtained as stated above by the

=0.72 + 2.00 + 0.70 + 1.96 = 5.38

Degree of Freedom- The term “degree of freedom”

 SOURCE OF CASES Cases can be drawn from

 OBTAINING DATA ON DATA EXPOSURE :

You might also like