Biostatistics Word

MEGHNA INSTITUTE OF
DENTAL SCIENCES
MALLARAM, NIZAMABAD
SEMINAR TOPIC:
BIOSTATISTICS
PRESENTED BY
DR. Harish Kumar Thota
PG III YEAR
DEPARTMENT OF ORTHODONTICS
AND DENTOFACIAL ORTHOPAEDICS
MEGHNA INSTITUTE OF DENTAL SCIENCES
BIOSTATISTICS
Contents
▪ Introduction
▪ Sampling and sampling designs
▪ Collection of Data
▪ Presentation of Data
▪ Measures of central tendency
▪ Measures of variation or dispersion
▪ Normal distribution
▪ Null hypothesis
▪ Tests of statistical significance
▪ Conclusion
▪ References
INTRODUCTION
• “Statistics” comes from an Italian word ‘statista’ meaning statesman or
German word ‘statistik’ which means political state. The science of
statistics is said to have originated from two main sources, viz., (1)
Government records, (2) mathematics. It started as registration of heads
of families in ancient Egypt to roman census on military strength, births,
deaths. Etc
• John Graunt (1620-1674) who was neither a physician nor
mathematician is the Father of Health Sciences.
• Statistics is a field of study concerned with technique or method of
collection of data, classification, summarizing, interpretation, drawing
inference, testing of hypothesis making recommendations.
• Biostatistics is a term used when tools of statistics are applied to data
that is derived from biological sciences. While conducting an oral
examination investigator makes observations according to his judgement
of situation that depends on his skill, knowledge and experience when
the same observer repeats the procedure or by any other investigator,
there me be some variability in opinions. This variability in measurement
can be handled using statistics. Epidemiology and statistics are called as
Sister Sciences. Epidemiology collects facts relating to group of
population in places, times and situations, whereas biostatistics converts
these facts into figures and then translates into facts to interpret the
significance of their results. Statistics is also called as ‘Science of figures.
• Uses of biostatistics-
1.To test whether difference between two population groups is real or by
chance occurrence.
2.To study correlation between attributes in the same population
3.To measure the mortality and morbidity
4.To fix priorities in public health programs
5.To assess the state of oral health in community and to determine the
availability and utilization of dental care facilities.
6.To determine the success or failure of specific oral health care programs
7.To evaluate the achievements of public health program
 SAMPLING AND SAMPLING METHODS: -

• Sample- A sample is a part of a population, called the “universe”,
“reference or “parent population”. It is basically a subset of a population.
• Sampling- is nothing but the process of selection of a sample.
• Sampling frame- the set of sampling units from which a sample is to be
selected.eg a list of names, places, ages.
Ideal requirements of sample

1. Efficiency- Ability of a sample to yield the desired information.
2. Representativeness- Sample should not differ from parent population.
3. Measurability- The design of the sample should be made i.e., the
investigator should be able to estimate the extent to which findings from
sample are likely to differ from parent population.
4. Size-the sample should be large enough to minimize sample variability
and to allow estimates of population characteristics to be made with
precision.
5. Coverage- Adequate coverage is essential if the sample has to remain
representative. Refusal, non-follow up, withdrawals make the sample
non representative.
6. Goal selection- selection should be oriented towards objectives and
research design.
7. Feasibility- should be simple enough to be carried out in practice.
8. Economy and cost efficiency- The sample design should save time and
cost.
 Types of sampling
Purposive or non-probable sampling
It is the procedure of selecting a sample from a population without the use
of probability. Deliberate or purposive selection of individual is done.
Random or probability sampling
It is a sample in which every individual has an equal chance to be selected .
 Sampling Methods
a) Simple random sampling – in this type each and every unit has equal
chance to be selected. Selection is by chance. It is carried out either by
lottery method or table of random numbers. In lottery method units are
numbered on slip and are shuffled and selected by blindfolding the
investigator. In other method 0-9 digits are arranged randomly,
selection is done horizontal or vertical direction.
b) Systemic random sampling – select one unit at random and then
selecting additional units at evenly spaced interval till sample of
required size has been drawn.
c) Stratified random sampling – population is subdivided into groups;

simple random selection is done from each stratum or group.
d) Cluster sampling – population forms the natural groups like village,
children of a school etc. sample is then selected by any above method.
e) Multiphase sampling - The part of the information is collected from the
whole sample and a part of information from sub sample.
e.g., Students from a school are examined, Students with malocclusion
are selected, students with skeleton malocclusion are selected.
f) Multistage Sampling – it is done in various stages; the 1 st stage is to
select the group or clusters samples are taken in many subsequent
stages.
1st stage Choice of states within country
2nd stage Choice of towns within each state
3rd stage Choice of neighborhoods within each town.
 Errors of sampling
 Sampling errors – occur due to faulty sample design and small
size of sample.
 Non sampling errors – Coverage errors are due to non-
cooperative and non-responsive of the informant,
Observational errors are due to imperfect experimental
technique and interviewer bias, Processing errors may occur
in statistical analysis.
 COLLECTION AND PRESENTATION OF DATA: -

Data are a set of values of one or more variables recorded on one or mare
individuals. Data consists of discrete observations people or events that
carry little meaning when considered alone. It needs to be transformed
into information by reducing, summarizing and adjusting them in such a
way that comparison over time and place are possible. It is of two types-
1. Qualitative data- When data is collected on the basis of
attributes or qualities like sex, malocclusions, cavity etc.
2. Quantitative data-When the data is collected through
measurement e.g., arch length, arch width. It is of 2
types.
a) Discrete- When the variable under observation takes
only fixed values like whole numbers the data is discrete
e.g., the DMF teeth
b) Continuous- If the variable can take any value in a
given range, or decimal it is called as continuous data
like arch length, mesiodistal width of the erupted teeth.
Sources - data can be collected by
a) Primary Source- It is obtained by the investigator himself can be
collected by
i)Direct personal interviews
ii)Oral health examination
iii)Questionnaire method
b) Secondary Source- It is the data already recorded id utilized to serve
the purpose of the objective of the study
e.g., records of the o.p.d. of the dental clinics.
2 main types of data presentation
Tabulation
Graphic presentation
Charts – bar charts
pie chart
doughnut chart
Diagrams – histograms
pictograms
maps
• Bar charts: they represent the set of data by the length of a bar which is
proportional to magnitude of the data. They are of 3 types, 1) simple bar
2) multiple bar 3) component bar.
• Pie chart: here instead of comparing the length of bar, the areas of
segments of a circle are compared. The area of each segment depends
upon the percentage, which is converted to angle and drawn.
• Pictogram: here pictures or symbols are used to present the data

• Histogram: is a set of vertical bars whose areas are proportional to
frequencies presented. Class intervals are given on horizontal axis and
frequencies along the vertical axis.
• Line chart: it shows the trends or changes in data varying with the
constant, at even intervals. It emphasizes the flow of a constant and rate
of change, rather than amount of change.
• Frequency curve: it is a graphical display of frequency table. The midpoints

of each frequency bar are located and drawn which are then connected to
form a polygon.
 MEASURE OF CENTRAL TENDENCY
▪ Refers to the middle observation value which serves as a single
estimate of series of data and enable comparison. The objective of
central tendency is - To condense the entire mass of data, to
facilitate comparison. It is of three types Mean Median and Mode
Mean:
• It is the average value obtained by summing of all observations and
divide by total observations.
• It is the simplest method to measure the central tendency.
Merits
 It Easy to understand and calculate.
 It is based on ALL VALUES in data
 It is rigidly defined.
 It is Not much affected by sampling fluctuations.
Demerits
X It cannot be calculated if any observations are missing.
X It is Affected by extreme values.
X It cannot be located graphically.
X It may be number which is not present in given data.
Median:
▪ Data is arranged either in ascending or descending order and the
value of middle observation is located.
Merits
 It is rigidly defined.
 It is easy to calculate and understand.
 It is not affected by extreme values.
 It can be located just by an inspection in many cases.
 It can be located on graph.
 It can be calculated for the data based on ordinal scale.
Demerits
X Not based upon all values of given data.
X In case of larger samples, it’s difficult to arrange in an order.
X It is not capable of further mathematical treatment.
Mode:
▪ Mode is predominant or commonly occurring value in a distribution
of data. Sometimes there can be no single mode/ bimodal/ trimodal/
multimodal.
Merits
 It is easy to understand and calculate.
 It is not affected by extreme values.
 It is even if extreme values are unknown, can be calculated.
 It is applicable for both qualitative and quantitative data.
Demerits
X It is not rigidly defined.
X It is not based upon all values of data.
X It is not capable of further mathematical treatment.
 MEASURE OF VARIATION / DISPERSION
▪ The scatteredness or the variation of observation from their average is
dispersion. The objective is to study the variability of data and accounting
the variability in data.
▪ Types – Range, Mean deviation, Standard deviation.
Range:
▪ The Difference between highest and lowest values in given data is called
range. It is the Simplest measure of dispersion.
Merits
 It is Easy to understand.
 It can be quickly calculated.
Demerits
X Value fluctuates with size of distribution.
X Unstable in repeated sampling.
X Not suitable for precise and accurate studies.
 It is of no practical importance as it does not indicate anything about
dispersion of values between two extreme values.
Example - 1 2 3 4 5 6 7 8 9
lowest value is 1 highest value is 9.
hence the range of these values is 19.

The range is not of much practical importance. It indicates only the
extreme values. Tells nothing about the dispersion of values between
these two extremes value.
The Mean Deviation:
 It is the average of the deviations from the arithmetic mean.
 M.D = Sum of deviation from mean No. Of observations
123456789
Mean we calculated is 45/9 = 5
M.D=∑(X-Xi)/n
Standard Deviation:
 In simple terms, it is defined as "Root Means Square Deviation." The
standard deviation is the most frequently used measure of deviation.
Steps:
1. First of all, take the deviation of each value from the arithmetic mean.
2. Then, square each deviation.
3. Add up the squared deviations.
4. Divide the result by the number of observations N [or (N 1) in case the
sample size is less than 30]

5. Then take the square root, which gives the standard deviation.
S. D= √(x-xi)²/n
 It is an abstract number that gives us an idea of ‘spread’ of dispersion.
 Larger the standard deviation, greater the dispersion of values about the
mean.
 NORMAL DISTRIBUTION
The shape of the curve will depend upon the mean and standard deviation
which in turn will depend upon the number and nature of observations.
In a normal curve
o The area between one standard deviation on either side of the mean
will include approximately 68 % of the values in the distribution. The
area between two standard deviations on either side of the mean will
cover most of the values, approximately 95 per cent of the values, and
the area between three standard deviations will include 99.7 per cent
of the values. These limits on either side of the mean are called
"confidence limits."
Standard normal curve
1. The standard normal curve is a smooth, bell shaped. It is
perfectly symmetrical curve based on an infinitely large number
of observations.
2. The total area of the curve is 1, its mean is 0 and standard
deviation is 1.
 Tests of significance
The different samples drawn from the same population have
different estimates. The difference in the estimates is called
sampling variability. Hence, when dealing with 2 or more samples
one is interested to know whether the difference in the values is
due to sampling variations or not.

Null hypothesis:
The first step in testing of hypothesis is to set up an appropriate
hypothesis with the problem. The null hypothesis asserts that there is
no real difference in the sample and the population in the particular
matter under consideration. The difference found is accidental and
arises out of sampling variations.E.g., to test the association between
thumb sucking and upper anterior proclination the null hypothesis
would be there is no association between thumb sucking and upper
anterior proclination.
Type I and type II errors
Even in the best research there is a possibility that the researcher will
make a mistake regarding the relationship between 2 variables.
There are 2 possible errors.
o Type I
o Type II
Type I error- (false-positive)
 Occurs if an investigator rejects a null hypothesis that is actually
true in the population.

Type II error-(false-negative)
 Occurs if the investigator fails to reject a null hypothesis that is
actually false in the population.
 Tests of statistical significance:
Parametric tests
Z-test
It is used to test the significance of difference of means for large
samples.
Pre requisites-
The sample must be randomly selected, and the data must be
quantitative. The variable measured is assumed to follow a
normal distribution in the population. sample should be greater
than 30.
z=a-ā/SD
Ttest
When sample size is small, ttest is used to test the hypothesis. It
was designed by W.S Gossett, whose pen name was student hence
it is called student t test. T ratio is observed difference between 2
means of small samples to this standard error of difference in the
same.
There are 2 types.
 unpaired t test
 paired t test1
paired t test is applied when there is a pair of data from single
element in an observation. Data collected before and after
intervention so that the same group acts as both case and control,
then the mean of both groups is compared to get the values.
Example: 2 BP measurements on the same person using different
equipment.
unpaired t test is used to compare the averages of 2
independent or unrelated groups to determine if there is
difference between two.

ANOVA
Analysis of co variance. Many research problems involve comparing
more than two groups. If the design includes only one independent
variable the technique is called one way ANOVA. If there are more
factors within each group two-way ANOVA is considered also known
as N way analysis of variance. In many experiments, the outcome of
a variable depends on the magnitude of the variable before
subjecting the experimental units for experimentation. As such, it
may be necessary to analyze the outcome values in relation to initial
values. In some other cases, the outcome of a particular variable
may be dependent on the outcome of a particular variable may be
dependent on the outcome of another variable. In such cases it is
desired to analyze the significance of the effect of this variable on
the outcome of the experimental variable. This technique combines
features of analysis of variance and regression analysis.
ONE WAY ANOVA FTEST
In a study investigator wanted to study the effect of drug A and B
on blood pressure. They were randomly allocated in to 4 groups.

o Those taking drug A alone.
o Those taking drug B alone.
o Taking both A and B
o Taking placebo.
The difference of pretreatment and post treatment systolic blood
pressure is determined. Then mean difference is calculated. The f
test is a kind of super t test that allows investigators to compare
more than two means simultaneously. The null hypothesis for the
Ftest is that the mean change in blood pressure will be same in all
groups indicating that all samples were from the same
population.
In ANOVA test two measures of variance are
there?
one between group variances
two within group variance.
 And is based on the variation within each group.

Fratio = between group variance/ within group variance
If the f ratio is fairly close to 1 the two estimates of variance are similar
and the null hypothesis that all of the means came from same
underlying population is not rejected. If the ratio is much larger than
one there must have been some group differences
TWOWAY ANOVA
The two-way ANOVA compares the mean differences between groups
that have been split on two independent variables (called factors).
The primary purpose of a two-way ANOVA is to understand if there is
an interaction between the two independent variables on the
dependent variable. For example, you could use a two-way ANOVA to
understand whether there is an interaction between gender and
educational level on test anxiety amongst university students, where
gender (males/females) and education level
(undergraduate/postgraduate) are your independent variables, and
test anxiety is your dependent variable. Alternately, you may want to

determine whether there is an interaction between physical activity
level and gender on blood cholesterol concentration in children,
where physical activity (low/moderate/high) and gender
(male/female) are your independent variables, and cholesterol
concentration is your dependent variable. The interaction term in a
two-way ANOVA informs you whether the effect of one of your
independent variables on the dependent variable is the same for all
values of your other independent variable (and vice versa). For
example, is the effect of gender (male/female) on test anxiety
influenced by educational level (undergraduate/postgraduate).2.
Tukey test
 Once an ANOVA model results in the rejection of the null
hypothesis, the only conclusion we have is that not all means are
equal. However, we do not know which means are different.
Steps
Determine how many pairs of means there are -> for each pair of
means we have a pair of hypotheses -> repeat for all the pairs of the
means -> calculate the critical value.
Non parametric tests
Chi-square test
 It was developed by Karl Pearson. When data is measured in
terms of attributes or quality and is intended to test whether
the difference in the distribution of attributes in different
groups is due to sampling variation or not, Chi square test is
used. It is used to test the significance of difference between 2
proportions and can be used when there are more than 2
groups to be compared. e.g., If there are two groups, one who
have the habit of thumb sucking and the other who do not.
occurrence of malocclusion
Group present absent Total

Those who did 10 50
not suck their

Those who 40 8 40
Total 42 48 90
sucked their
Steps
Test the null hypothesis.
 To test whether there is an association between thumb sucking
and malocclusion the null hypothesis would be “there is no
association between thumb sucking and malocclusion.”
 Among those who did not suck their thumb Expected no with
malocclusion=40×0.47=18.8
 Expected no without malocclusion=40×0.53=21.2
Procedure
 All the observation in two samples is ranked numerically from
smallest to largest without regard the groups. Then identify
the observation for I and II samples. Sum of ranks for I and II
sample determined separately. Take difference of two sum T

=R1 R2 Calculate u value using the formula. If the value is
less than or equal to o.o5 the null hypothesis i.e., the samples
have not been drawn from the same population is rejected4
Meta-analysis
It is a quantitative approach systematically combining results of
previous research to arrive at conclusions about the body of research.
It collects data from individual data. It identifies heterogeneity in effect
among multiple studies.
Steps
 Define the research question and specific hypothesis. Define the
criteria for including and excluding studies. Locate research
studies. Determine which studies are eligible for inclusion. Classify
and code important study characteristics (e.g., sample size, length
of follow up etc.) Select or translate results from each study using
a common metric. Aggregate findings across studies generating

weighted pooled estimates of effect size. Evaluate the statistical
homogeneity of pooled studies5
SPSS -Stands for Statistical Package for the social sciences. It is a
software package used for interactive statistical analysis. Computerized
new way of analysis.
References
1. Essentials of Public health dentistry- Soben Peter,6th edition.
2. Kim, H.-Y., 2014. Analysis of variance (ANOVA) comparing means
of more than two groups. Restor. Dent. Endod. 39, 74e77.
3. Abdi, H., Williams, L.J., 2010. Honestly significant difference (HSD)
test. In: Salkind, N.J., Dougherty, D.M., Frey, B. (Eds.),
Encyclopedia of Research Design. Sage, Thousand Oaks, CA, USA,
pp. 583e585
4. Mary L McHugh. The Chi-square test of independence. Biochemia
Medica, 23(2):143–149, 2013.
5. Russo MW. How to review a meta-analysis. (N Y).2007;3:637–642 .

Biostatistics Word

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Biostatistics Word

Uploaded by

Copyright:

Available Formats

MEGHNA INSTITUTE OF

 SAMPLING AND SAMPLING METHODS: -

Ideal requirements of sample

c) Stratified random sampling – population is subdivided into groups;

 COLLECTION AND PRESENTATION OF DATA: -

• Pictogram: here pictures or symbols are used to present the data

• Frequency curve: it is a graphical display of frequency table. The midpoints

lowest value is 1 highest value is 9.

hence the range of these values is 19.

extreme values. Tells nothing about the dispersion of values between

these two extremes value.

The Mean Deviation:

 It is the average of the deviations from the arithmetic mean.

 M.D = Sum of deviation from mean No. Of observations

Mean we calculated is 45/9 = 5

 In simple terms, it is defined as "Root Means Square Deviation." The

standard deviation is the most frequently used measure of deviation.

2. Then, square each deviation.

3. Add up the squared deviations.

4. Divide the result by the number of observations N [or (N 1) in case the

sample size is less than 30]

 It is an abstract number that gives us an idea of ‘spread’ of dispersion.

will include approximately 68 % of the values in the distribution. The

1. The standard normal curve is a smooth, bell shaped. It is

perfectly symmetrical curve based on an infinitely large number

2. The total area of the curve is 1, its mean is 0 and standard

The different samples drawn from the same population have

different estimates. The difference in the estimates is called

sampling variability. Hence, when dealing with 2 or more samples

one is interested to know whether the difference in the values is

due to sampling variations or not.

The first step in testing of hypothesis is to set up an appropriate

no real difference in the sample and the population in the particular

matter under consideration. The difference found is accidental and

arises out of sampling variations.E.g., to test the association between

thumb sucking and upper anterior proclination the null hypothesis

would be there is no association between thumb sucking and upper

Type I and type II errors

make a mistake regarding the relationship between 2 variables.

There are 2 possible errors.

Type I error- (false-positive)

 Occurs if an investigator rejects a null hypothesis that is actually

true in the population.

 Occurs if the investigator fails to reject a null hypothesis that is

actually false in the population.

 Tests of statistical significance:

It is used to test the significance of difference of means for large

The sample must be randomly selected, and the data must be

quantitative. The variable measured is assumed to follow a

normal distribution in the population. sample should be greater

it is called student t test. T ratio is observed difference between 2

means of small samples to this standard error of difference in the

There are 2 types.

paired t test is applied when there is a pair of data from single

element in an observation. Data collected before and after

then the mean of both groups is compared to get the values.

Example: 2 BP measurements on the same person using different

unpaired t test is used to compare the averages of 2

independent or unrelated groups to determine if there is

difference between two.

Analysis of co variance. Many research problems involve comparing

factors within each group two-way ANOVA is considered also known

as N way analysis of variance. In many experiments, the outcome of