Professional Documents
Culture Documents
BIOSTATISTICS AND
RESEARCH
METHODOLOGY
PRESENTED BY
Dr. Aswathi S Nair
1st year MDS
Department of Periodontology
CONTENTS
• What is statistics?
• Biostatistics-Definition & Uses
• Data: Definition
Types of data
Collection of data
Presentation of data
• Measures of central tendency
• Measures of variability
• Normal Distribution & Curve
• Probability
• Tests of significance
• Correlation &Regression
• Report Writing
What is Statistics?
According to Croxton and Cowden :
Statistics is defined as the Collection ,Presentation,Analysis and
Interpretation of numerical data.
• simplify
• Use sample
• Describe
data to
and Present
study
data
associations
• Reduce ,or to
information compare
to a differences
convenient or
form predictions
about a
larger set of
data.
INFERENTIAL STATISTICS
DESCRIPTIVE STATISTICS
DESCRIPTIVE STATISTICS INFERENTIAL STATISTICS
Organizing and summarizing data using Using sample data to make an inference or
numbers and graphs. draw conclusion of the population.
Describe the characteristics of the sample The objective is to draw conclusion of the
or population poulation data.
Parameters:
• A characteristic of the population in which we have a particular interest.
• Examples:The proportion of the population that would respond to a certain drug.
The association between a risk factor and a disease in a population.
SAMPLES AND STATISTICS
Types of variables
Quantitative/ Qualitative /
Numerical Categorical
Continuous
Discrete
Qualitative Variable:
Quantitative Variable:
• It is a characteristic of people or objects that can be naturally
expressed in a numeric value.
E.g.:Age,Height,Bond strength
Discrete Variable:
It is a random variable that can take on a finite number of values or a
countable infinite number (as many as there are whole numbers) of values.
E.g.:The size of a family
The number of DMFT teeth. T can be any one of the 33 numbers,
0,1,2,3,…32.
Continuous Variable:
It is a random variable that can take on a range of values on a continuum,
i.e., its range is uncountably infinite.
E.g.:Treatment time,Temperature,Torque value on tightening an implant
abutment .
Confounding Variable:
• For example,The distance between 8thand 9thpoints on the scale is the same as that
between the 3rd and 4th
• Date is a very widely used interval scale variable
• There is no absolute zero, so, it is not possible to say that 9th value is 3 times that of
3rd.
E.g.:
• IQ score representing the level of intelligence.IQ score 0 is not indicative of no
intelligence.
• Temperature in °C on 4 successive days
Day : A B C D
4.Ratio Scale
• The highest level of measurement
• Incorporates the properties of nominal, ordinal and interval
scales
• Includes an absolute zero,in additionall mathematical
procedures of +, -, x and / are possible
• Examples are length and mass; for example, length of 150mm is
three times as long as 50mm
• Possesses all the characteristics of interval measurement, and
there exists a true zero. eg. Weight in pounds of 6 individuals
136, 124, 148, 118, 125, 142
• Besides heights and numbers, ratio scales include weights (mg,
g), volumes (cc, cu.m), capacities (ml, l), rates (cm/sec., Km/h)
and lengths of time (h, Yr) etc.,
• There cannot be negative measurements
Data
• Data are the quantities (numbers ) or
qualities measured or observed that are to
collected and analysed.
• The term “data” refers to the kinds of
information researchers obtain on the
subjects of their research. Fraenkel &
Wallen (2000)
• Collective recording of observations is data.
• Main sources :experiments, surveys ,
records [ census , public reports]
• Demographic data- details of population
Are the data reliable and valid?
Validity: Are you measuring what you think you are measuring?
Suitability:
• Direct personal observation is adopted in the following cases:
• Where greater accuracy is needed
• Where the field of enquiry is not large
• Where confidential data are to collected
• Where sufficient time is available
Merits: Demerits:
• Original data are collected . • It is unsuitable where the area is
• True and reliable data can be large.
included. • It is expensive and time-
• Response will be more consuming.
encouraging, because of personal • An untrained investigator will
approach . not bring good results.
• A high degree of accuracy can • One has to collect information
be aimed . according to the convenience of
the informant.
Indirect oral interview
• The investigator approaches the witness or third parties, who are in touch with
the informant.
• The enumerator interviews the people, who are directly or indirectly connected
with the problem under the study.
• Generally this method is employed by different enquiry committees and
commissions. The police department generally adopts this method to get clues of
thefts, riots , murders, etc.
Suitability:
It is more suitable when the area to be studied is large.
It is used when direct information cannot be obtained.This system is generally
adopted by governments.
Merits Demerits
• It is simple and convenient. • The information cannot be relied
• It saves time, money and labor. because of absence of direct
contact.
• It can be used in the
investigation of a large area. • Interview with an improper man
will spoil the results.
• Adequate information can be
had. • In order to get the real position, a
sufficient no of people are to be
interviewed The careless
attitude of the informant will
affect the degree of accuracy
Information through agencies
• The local agents or correspondents will be appointed, they collect the
information and transmit it to the office or person.
• They do according to their own ways and tastes.
• This system is adopted by newspapers, agencies, etc., when
information is needed in different fields.The informants are generally
called correspondents.
• Suitability: In those cases where the information is to be obtained at
regular intervals from a wide area.
Merits Demerits
• Extensive information can be had. • The information may be
biased.
• It is the most cheap and economical • Degree of accuracy cannot be
method. maintained.
• Speedy information is possible. • Uniformity cannot be
• It is useful where information is maintained.
needed regularly. • Data may not be original.
• Information through agencies.
Mailed Questionnaires
Demerits
• We cannot be sure about the accuracy and reliability of the data.
There is long delay in receiving questionnaires duly filled in.
Secondary Data
SIGNIFICANCE OF TABULATION
•Simplifies complex data
•Unnecessary details and repetitions of data avoided in tabulation
•Facilitates comparison
•Gives identity to data
•Reveals pattern with in the figures which cannot be seen in the narrative
form
RULES OF TABULATION
• A number should be assigned to the table ( Table No.)
• A title should be given to the table , it should be concise and self explanatory
• Contents of the table should be defined clearly.
• Subtitles should be properly mentioned with columns and rows.
• Group intervals in columns and rows should neither be too narrow nor too wide.
• They should also be mutually exclusive.
• Unit of measurement must be mentioned clearly where ever necessary.
• Any short forms /symbols , if used should be explained in the foot note.
• No place should be left in the body of tables .
• There should be logical arrangement of data in the table.
Simple Table:
They are one-way tables which supply answers to questions
about one characteristic of data only.
Master Tables:
They are tables ,which contain all the data obtained from a
survey,
Reference tables(General purpose or
primary tables)
• These tables present the original data for
reference purposes.
• It contains only absolute and actual
figures and round numbers or
percentages.
• Eg: Tables in census record, Appendices
of Publications Sl.No Contents Page
numbers
TEXT TABLES (SPECIAL
PURPOSE OR DERIVATIVE
TABLES)
• Constructed to present selected
data from one or more general
purpose tables.
• It brings out a specific point of
answer to specific question.
• It includes ratios, percentages,
averages etc.
• It should be found in the body of
the text.
VISUAL DATA SUMMARIES
QUANTITATIVE/CONTINUOUS/MEASURED DATA
• Histogram
• Frequency polygon
• Frequency curve
• Line chart/graph
• Cumulative frequency diagram
• Scatter /dot diagram
QUALITATIVE/DISCRETE
• Bar Diagram
• Pie Sector Diagram
• Pictogram
• Map Diagram
Impact on Better retained Easy
imagination in memory comparisons
ADVANTAGES
• They are attractive
• They give a bird’s eye-view of the data
• They can be easily understood by common men
• They facilitate comparison of various characteristics
• The impression created by them are long lasting
• Theorems and results of statistics can be visualized using graphs
DISADVANTAGES
• They are visual aids. They cannot be considered as alternatives for
numerical data.
• Though theories and results could be easily visualized by diagrams and
graphs, mathematical rigour cannot be brought in
• Diagrams and graphs are not accurate as tabular data. Only tabular data
can be used for further analysis.
• By diagrammatical and graphical misrepresentation observers can be
misled easily. It is possible to create wrong impressions using diagrams
and graphs.
HISTOGRAM
• A histogram is a special sort of bar chart .
• The successive groups of data are linked in a definite numerical order
• Represented by a set of rectangular bars
• Variables (Class) is taken along the X-axis & frequency along the Y-
axis.
• With the class intervals as base, rectangles with height proportional to
class frequency are drawn.
• The set of rectangular bars so obtained gives histogram.
Note :
•The total area of the rectangles in a histogram represent total frequency
•If the frequency distribution has inclusive class intervals, they should be
converted into exclusive type
•Mode of the distribution can be obtained from the histogram ( from the
highest rectangular bar).
FREQUENCY CURVE
• Variables is taken along the X-axis and
frequencies along Y-axis.
• Frequencies are plotted against the class mid-
values and then, these points are joined by a
smooth curve.
• The curve so obtained is the frequency
curve.
• Total area under the frequency curve
represents total frequency.
Frequency Polygons:
INDIA 3700
BANGLADESH 9700
• The data have items whose magnitudes have two or more components.
• Here, the items are represented by rectangular bars of equal width and
height proportional to magnitude.
• Then, the bars are divided so that the sub-divisions in height represent
the components.
• To distinguish the components from one another clearly, different shades
are applied and an index describing the shades is provided.
• Component bars are drawn when a comparison of total magnitudes along
with the components is required.
PERCENTAGE BAR DIAGRAM
Statistics/parameters such as
• Mean (the arithmetic average)
• Median (the middle datum)
• Mode (the most frequent score).
Objectives
• To condense the entire mass of data.
• To facilitate comparison.
Mean
• The most common measure of cental tendency.
• Affected by extreme values(outliers)
• This measure implies the arithmetic average or arithmetic mean.
• It is obtained by summing up all the observations and dividing the total by number
of observations.
E.g. The following gives you the fasting blood glucose levels of a sample of 10
children.
• I 2 3 4 5 6 7 8 9 10
• 56 62 63 65 65 65 65 68 70 71
• Total Mean = 650 / 10 = 65
• Mean is denoted by the sign X(X bar)
Advantages:
• Easy to calculate
• Easily understood
• Utilizes entire data
• Affords good comparison
Disadvantages:
• Mean is affected by extreme values, In such cases it leads
• to bad interpretation.
Median
• Robust measure of central tendency
• Not affected by extreme values.
• In an ordered array, the median is the “middle” number
If n or N is odd ,the median is the middle number
If n r N is even,the median is the average of the two middle numbers.
• In median the data are arranged in an ascending or descending order of
magnitude and the value of middleobservation is located.
For example:
71,75,75,77,79,81,83,84,90,95.
• Median = 79 + 81 / 2 = 80
• If there are only 9 observations then median = 79.
Advantages:
• 1. It is more representative than mean.
• 2. It does not depend on every observations.
• 3. It is not affected by extreme values.
Mode
• Values that occurs most often.
• Not affected by extreme values
• Used for either numerical or categorical data
• There may be no mode
• There may be several modes.
E.g. Diastolic blood pressure of 10 individuals.
85,75,81,79,71,80,75,78,72,73
• Here mode = 75 i.e. the distribution is uni-modal
• 85,75,81,79,80,71,80,78,75,73
• Here mode =75 and 80 i.e. the distribution is bi-modal.
MEASURES OF
DISPERSION(VARIATION)
MEASURES OF DISPERSION
• Dispersion refers to the variations of the items among themselves /
around an average.
• Greater the variation amongst different items of a series, the more
will be the dispersion.
• As per Bowley, “Dispersion is a measure of the variation of the item
• Measures of central tendency – single value to represent data
• Measures of Dispersion - degree of spread or variation of the variable
about the central value.
OBJECTIVES OF MEASURING
DISPERSION
• To determine the reliability of an average
• To compare the variability of two or more series
• For facilitating the use of other statistical measures
• Basis of Statistical Quality Control
PROPERTIES OF A GOOD MEASURE OF
DISPERSION
• Easy to understand.
• Simple to calculate.
• Uniquely defined.
• Based on all observations
• Not affected by extreme observations.
• Capable of further algebraic treatment
RANGE
• Measure of variation
• Difference between the largest
and the smallest observations.
• Ignores the way in which data
are distributed.
MERITS AND DEMERITS OF RANGE
MERITS DEMERITS
• Gives a quick answer • Cannot be calculated in open
• Simple and easy to understand ended distributions
• Affected by sampling
fluctuations
• Changes from one sample to the
next in population
• Gives a rough answer and is not
based on all observation
MEAN DEVIATION
The average of the absolute values of deviation from the mean(median or mode)
is called mean deviation.
MERITS DEMERITS
• Simplifies calculations • Not reliable
• Can be calculated by mean, median and mode • Mathematically illogical to assume all negatives
as positives
• Is not affected by extreme measures
• Not suitable for comparing series
• Used to make healthy comparisons
VARIANCE
• Variance is the average squared deviation from
the mean of a set of data.
• It is used to find the standard deviation.
Processes To Find Variance
1. Find the Mean of the data.
2. Mean is the average so add up the values and divide
by the number of items.
3. Subtract the mean from each value – the result is
called the deviation from the mean.
4. Square each deviation of the mean.
5. Find the sum of the squares.
6. Divide the total by the number of items.
Standard Deviation
• Most important and widely used.
• Root mean square deviation
• Summary measure of the differences of each obsern from mean of all
observations.
• Greater the deviation,greater the dispersion.
• Lesser the deviation,greater uniformity.
CALCULATION OF SD
For ungrouped data:
• Calculate the mean(X) of the series.
• Take the deviations (d) of the items from the mean by : d=Xi – X,
where Xi is the value of each observation.
• Square the deviations (d2) and obtain the total (∑ d2)
• Divide the ∑ d2 by the total number of observations i.e (n-1) and
obtain the square root. This gives the standard deviation.
• Symbolically, standard deviation is given by:
SD= √ ∑ d2 /(n-1)
For grouped data with single units for class intervals:
S = √∑(Xi - X) x fi / (N -1)
Where, Xi is the individual observation in the class interval
fi is the corresponding frequency
X is the mean
N is the total of all frequencies
• Use the standard normal table to find te cumulative are under the standard
normal curve.
Application of Normal Curve Model
• Using z scores to compare two raw scores from different distributions.
• Can determine relative frequency and probability Can determine
percentile rank.
• Can determine the proportion of scores between the mean and a
particular score.
• Can determine the number of people within a particular range of scores
by multiplying the proportion by N.
SKEWED DISTRIBUTION
The Skewed Distribution is distribution with data clumped up on one side
or the other with decreasing amounts trailing off to the left or the right.
• Non-symmetrical distribution :Mean, median, mode not the same
• Negatively skewed :extreme scores at the lower end
Mean < median <mode most did well, a few poorly
• Positively skewed : at the higher end
Mean >median >mode Most did poorly, a few well
• The further apart the mean and median, the more the distribution is
skewed.
CONCLUSION
Statistics is central to most medical research . Basic principles of
statistical methods or techniques equip medical and dental students
to the extent that they may be able to appreciate the utility and
usefulness of statistics in medical and other biosciences. Certain
essential bits of methods in biostatistics, must be learnt to
understand their application in diagnosis, prognosis, prescription
and management of diseases in individuals and community.
REFERENCES
• Methods in Biostatistics- 7th edition by BK Mahajan.
• Park K, Park’s text book of preventive and social medicine, 21st
ed, 2011, Bhanot, India; pg- 785-792.
• Peter S, essential of preventive and community dentistry, 4th ed;
pg- 379- 386.
• Mahajan BK, methods in biostatistics. 6th edition.
• John j, textbook of preventive and community dentistry, 2nd ed;
pg- 263- 68.
• Prabhkara GN, biostatistics; 1st edition.