You are on page 1of 128

GOOD MORNING

BIOSTATISTICS AND
RESEARCH
METHODOLOGY
PRESENTED BY
Dr. Aswathi S Nair
1st year MDS
Department of Periodontology
CONTENTS
• What is statistics?
• Biostatistics-Definition & Uses
• Data: Definition
Types of data
Collection of data
Presentation of data
• Measures of central tendency
• Measures of variability
• Normal Distribution & Curve
• Probability
• Tests of significance
• Correlation &Regression
• Report Writing
What is Statistics?
According to Croxton and Cowden :
Statistics is defined as the Collection ,Presentation,Analysis and
Interpretation of numerical data.

American Heritage Dictionary defines statistics as “The Mathematics of


the collection,organization,and interpretation of numerical dta, and
especially the analysis of population characteristics by inference from
sampling”.
• Generalize
from
samples to
• Organize populations.
• • Hypothesis
summarize
testing
TYPES OF STATISTICS

• simplify
• Use sample
• Describe
data to
and Present
study
data
associations
• Reduce ,or to
information compare
to a differences
convenient or
form predictions
about a
larger set of
data.
INFERENTIAL STATISTICS
DESCRIPTIVE STATISTICS
DESCRIPTIVE STATISTICS INFERENTIAL STATISTICS

Organizing and summarizing data using Using sample data to make an inference or
numbers and graphs. draw conclusion of the population.
Describe the characteristics of the sample The objective is to draw conclusion of the
or population poulation data.

Collection,organizing,summarizing and Drawing conclusion,performing


presenting data. estimations and making predictions.

Charts,Tables and Graphs Form of results-probability


Tools:Measures of central tendency, Tools:Hypothesis test ,ANOVA
Measuresof dispersion

Data set is small Used when the population data set is


large.
DESCRIPTIVE V/S INFERENTIAL
DESCRIPTIVE INFERENTIAL
• A bowler wants to find his • A bowler wants to estimate his
bowling average for the past 12 chance of winning a game based on
months. his current season averages and the
average of his opponents.
• A housewife wants to determine
the average weekly amont she • A housewife would like to predict
spent on groceries in the past 3 based on last year’s grocery bills,the
months. average weekly amount she will
spend on groceries for this year.
• A politician wants to know the
exact number of votes he • A politician would like to estimate
recieves in last election. based on opinion polls,his chance
for winning in the upcoming
election.
BIOSTATISTICS(Biometrics or Biometry)
• It is the branch of statistics concerned with mathematical facts and
data related to biological events.

• It is the science that helps in managing medical uncertainities.

• Is the science which deals with the development and application of


the most appropriate methods for the
- Collection of data.
- Presentation of the collected data.
- Analysis and Interpretation of the results.
- Making decisions on the basis of such analysis.
Collection of Data:
It is the first step in collection of data. Careful planning is essential before collecting
the data.
Presentation of Data:
• The mass data collected should be presented in a suitable form for further analysis.
• The collected data may be presented in the form of tabular or diagrammatic or
graphical form.
Analysis of Data:
The data presented should be carefully analyzed from the presented data such as
measures of central tendencies, dispersion, correlation, regression, etc.
Interpretation of Data:
• The final step is drawing conclusion from the data collected.
• A valid conclusion must be drawn on the basis of analysis.
• A high degree of skill and experience is necessary for the interpretation.
• Here the focus is on Human life and health.Thus the areas of
application relates to:
Pharmacology
Medicine
Epidemiology
Public health
Physiology and anatomy
Genetics

AS THE KNIFE IS TO SURGERY SO IS THE BIOSTATISTICS TO


MEDICAL RESEARCH
HISTORY
• Sir Francis Galton is considered as the Father of Biostatistics.
• He was the first to apply statistical methods to the study of human differences
and inheritance of intelligence, and introduced the use of Questionnaires and
Surveys for collecting data on human communities, which he needed for
genealogical and biographical works and for his anthropometric studies.
Origin and development of statistics in medical research
• In 1929,a huge paper on application of statistics was published in physiology
journal by Dunn.
• In 1937, 15 articles on statistical methods by Austin Bradford Hill, were
published in book form.
• In 1948, a RCT of streptomycin for pulmonary tb., was published in which
Bradford Hill has a key influence. Then the growth of statistics in Medicine from
1952 was a 8-fold increase by 1982.
Applications of Biostatistics
• Public health, including epidemiology, health services research, nutrition,
environmental health and healthcare policy & management.
• Design and analysis of clinical trials in medicine.
• Assessment of severity state of a patient with prognosis of outcome of a disease.
• Population genetics, and statistical genetics in order to link variation in genotype
with a variation in phenotype. This has been used in agriculture to improve crops
and farm animals (animal breeding). In biomedical research, this work can assist in
finding candidates for gene alleles that can cause or influence predisposition to
disease in human genetics.
• Analysis of genomics data, for example from microarray or proteomics
experiments. Often concerning diseases or disease stages.
• Ecology, ecological forecasting.
• Biological sequence analysis.
• Systems biology for gene network inference or pathway analysis.
USES OF BIOSTATISTICS
• To test whether the difference between two population is real or a chance occurence.
• To study the correlation between attributes in the same population.
• To evaluate the achievements of public health.
• To calculate average, median, mode standard deviation of the given collected data.
• To measure mortality and morbidity.
• To compare two sets of data .
• To get a conclusion (or) result.
• To find the association between the two variables .
• To find the correlation between the two variables.
• To give the results in a tabular or diagrammatic form.
USES OF BIOSTATISTICS IN DENTAL SCIENCES

• Assess the state of oral health in community.


• Indicate basic factors underling state of oral health.
• Determine success or failure of specific oral health care programmes
or to evaluate the programme action.
• Promote health legislation and in creating administrative standards for
or health.
DEFINITION OF TERMS
variable- is a characteristic or attribute that can assume different
values. eg. Color, height, temperature, texture
A variable is any characteristic of an object that can be measured or
categorized.
• Denoted by an upper case of the alphabet, X, Y, or Z.
E.g.:Age,Sex,Waiting time in clinic,Diabetic levels

Data- are the values (measurements or observations) that the variables


can assume. A collection of data values forms a data set. Each value in
a data set is called datum or data value.

Population
• The collection of all elements of interest having one or more common characteristics
is called a population.
• Consists of all subjects that are being studied.
• The elements can be individual subjects, objects, or events.
• The population that contains an infinite number of elements is called an infinite
populations.
• The population that contains an finite number of elements is called an finite
populations.
Sample population-part of the population or a group of subjects selected from a
population

Parameters:
• A characteristic of the population in which we have a particular interest.
• Examples:The proportion of the population that would respond to a certain drug.
The association between a risk factor and a disease in a population.
SAMPLES AND STATISTICS

Sample – a subset of a population (hopefully representative) .



Statistic – a characteristic of the sample 
Examples:  The observed proportion of the sample that responds to
treatment 
The observed association between a risk factor and a disease in this
sample
Types of Variables

Types of variables

Quantitative/ Qualitative /
Numerical Categorical

Continuous

Discrete
Qualitative Variable:

• It is a characteristic of people or objects that cannot be


naturally expressed in a numeric value.
E.g.:
Sex – male, female
Facial type – Brachyfacial, Dolichofacial, Mesiofacial
Level of oral hygiene – poor, fair, good

Quantitative Variable:
• It is a characteristic of people or objects that can be naturally
expressed in a numeric value.
E.g.:Age,Height,Bond strength
Discrete Variable:
It is a random variable that can take on a finite number of values or a
countable infinite number (as many as there are whole numbers) of values.
E.g.:The size of a family
The number of DMFT teeth. T can be any one of the 33 numbers,
0,1,2,3,…32.

Continuous Variable:
It is a random variable that can take on a range of values on a continuum,
i.e., its range is uncountably infinite.
E.g.:Treatment time,Temperature,Torque value on tightening an implant
abutment .
Confounding Variable:

• The statistical results are said to be


confounded when the results can have
more than one explanation.
E.g.:In a study, smoking is the most
important etiological factor in the
development of oral squamous cell
carcinoma. It has been suggested that
alcohol is one of the major causes of
squamous cell carcinoma, and alcohol
consumption is also known to be closely
related to smoking. Therefore, in this
study, alcohol is confounding variable.
SCALES OF MEASUREMENT
• There are four scales of measurement used to measure any variable.
1.Nominal Scale
2.Ordinal Scale
3.Interval Scale
4.Ratio Scale
1.Nominal Scale
• The most elementary scale
• It is the simplest type of data, in which the values are in unordered categories.
• No ranking order can be placed on the data.
• For qualitative / categorical variables.
• The variables are classified by some quality rather than by a numerical
measurement. In such cases, the variable is called an attribute and said to
using a nominal scale of measurement.
• No difference could be made by arrangements

For example: 1. Data are represented as male or female.


2. Heights may be recorded as tall or short.
3. Eye color–black, brown, green
4.Blood type (A, B, AB and O)
2.Ordinal Scale :
• Classifies data into categories that can be ranked, however precise differences
between the ranks do not exist. The categories can be ordered or ranked.
• The amount of the difference between any two categories, though they can be ordered,
is not quantified.
• Have all the properties of “nominal scale”; classifying and labeling.
• New thing is “a sense of order / arrangements”
E.g.:Pain level
0 - no pain
1 - mild pain
2 - moderate pain
3 - severe pain
4 - extremely severe pain
Only for statistic convenience
3.Interval Scale
• In addition to classification and ranking, interval scale allows the recognition of
precisely “how far apart are the individual classes from each other on the scale”
• Observations can be ordered, and precise differencesbetween units of measure exist.
However, there is no meaningful absolute zero.

• For example,The distance between 8thand 9thpoints on the scale is the same as that
between the 3rd and 4th
• Date is a very widely used interval scale variable
• There is no absolute zero, so, it is not possible to say that 9th value is 3 times that of
3rd.
E.g.:
• IQ score representing the level of intelligence.IQ score 0 is not indicative of no
intelligence.
• Temperature in °C on 4 successive days
Day : A B C D

4.Ratio Scale
• The highest level of measurement
• Incorporates the properties of nominal, ordinal and interval
scales
• Includes an absolute zero,in additionall mathematical
procedures of +, -, x and / are possible
• Examples are length and mass; for example, length of 150mm is
three times as long as 50mm
• Possesses all the characteristics of interval measurement, and
there exists a true zero. eg. Weight in pounds of 6 individuals
136, 124, 148, 118, 125, 142
• Besides heights and numbers, ratio scales include weights (mg,
g), volumes (cc, cu.m), capacities (ml, l), rates (cm/sec., Km/h)
and lengths of time (h, Yr) etc.,
• There cannot be negative measurements
Data
• Data are the quantities (numbers ) or
qualities measured or observed that are to
collected and analysed.
• The term “data” refers to the kinds of
information researchers obtain on the
subjects of their research. Fraenkel &
Wallen (2000)
• Collective recording of observations is data.
• Main sources :experiments, surveys ,
records [ census , public reports] 
• Demographic data- details of population
Are the data reliable and valid?
Validity: Are you measuring what you think you are measuring?

Reliability: if something was measured again using the same


instrument, would it produce the same (or nearly the same) results?
Depending upon the nature of the variable, data is classified into 2 broad
categories
• Qualitative Data
• Quantitative Data:- 1. discrete
2. continuous

• Qualitative data :- (characterized by words) when the data is collected on the


basis of attributes or qualities like sex, malocclusions, cavity etc.
• Quantitative data :- (characterized by numbers) when the data is collected
through measurement, like arch length, fluoride concentration etc.
• Discrete data :- when the variable under observation takes only fixed
values like whole numbers.
• Continuous data :- if the variable can take any value in a given range,
decimal or fractional.
Based on the source of collection
Data can be collected through either :-
1) Primary source
2) secondary source
Primary Data :-
• Here the data is obtained by the investigator himself. This is a first hand
information.
Advantages :- Precise information and reliable.
Disadvantages :- Time consuming, expensive.
• Primary data can be obtained using :- 1) Direct personal
2) Oral health examination
3) Questionnaire method
Secondary Data :- The data already recorded is utilized to serve the purpose
COLLECTION OF DATA
• First and the most important stage of staistical investigation.
• Since the data are the raw materials to any statistical
investigation,utmost care is necessary to be taken for collection to be
taken for collection of reliable and accurate data.

Choice of methods :The investigator must decide the choice between


the two methods-
Primary Data /Secondary Data(Information through agencies)
Quantitative/ qualitative data
The methods of data collection depends upon a number of factors-

• Object and nature of enquiry


• Availability of financial resources.
• Availability of time. 
• Accuracy required. 
• Collecting agencies.
Primary Data
• The data which are collected from the field under the control and supervision
of an investigator.
• This type of data are generally afresh and collected for the first time. They
are original in character
• For the collection of primary data, the investigator must choose any of the
following methods- 
 Direct personal observation 
 Indirect oral interview 
 Information through agencies 
Mailed questionnaires 
 Schedules sent through enumerator
Direct personal observation:
• The data is collected by the investigator personally, he/she must be a keen observer,
tactful and courteous in behavior. 
• He asks or cross-examines the informant and collects necessary information. 
• It is original in character

Suitability:
• Direct personal observation is adopted in the following cases: 
• Where greater accuracy is needed
• Where the field of enquiry is not large 
• Where confidential data are to collected 
• Where sufficient time is available 
Merits:  Demerits: 
• Original data are collected . • It is unsuitable where the area is
• True and reliable data can be large.
included. • It is expensive and time-
• Response will be more consuming.
encouraging, because of personal • An untrained investigator will
approach . not bring good results.
• A high degree of accuracy can • One has to collect information
be aimed . according to the convenience of
the informant.
Indirect oral interview 
• The investigator approaches the witness or third parties, who are in touch with
the informant. 
• The enumerator interviews the people, who are directly or indirectly connected
with the problem under the study.
• Generally this method is employed by different enquiry committees and
commissions. The police department generally adopts this method to get clues of
thefts, riots , murders, etc.

Suitability: 
It is more suitable when the area to be studied is large.
It is used when direct information cannot be obtained.This system is generally
adopted by governments. 
Merits  Demerits 
• It is simple and convenient.  • The information cannot be relied
• It saves time, money and labor. because of absence of direct
contact. 
• It can be used in the
investigation of a large area.  • Interview with an improper man
will spoil the results. 
• Adequate information can be
had. • In order to get the real position, a
sufficient no of people are to be
interviewed  The careless
attitude of the informant will
affect the degree of accuracy 
Information through agencies
• The local agents or correspondents will be appointed, they collect the
information and transmit it to the office or person. 
• They do according to their own ways and tastes. 
• This system is adopted by newspapers, agencies, etc., when
information is needed in different fields.The informants are generally
called correspondents.
• Suitability: In those cases where the information is to be obtained at
regular intervals from a wide area.
Merits  Demerits
• Extensive information can be had. • The information may be
 biased.
• It is the most cheap and economical • Degree of accuracy cannot be
method. maintained. 
• Speedy information is possible. • Uniformity cannot be
• It is useful where information is maintained.
needed regularly.  • Data may not be original. 
• Information through agencies.
Mailed Questionnaires

• In this method, a questionnaire consisting of a list of questions pertaining to


the enquiry is prepared. 
• The questionnaires is sent to the respondents, there are blank spaces for
answers. 
• A covering letter is also sent along with the questionnaire, requesting the
respondent to extend their full cooperation by giving the correct replies.
• This method is adopted by research workers, private individuals, non-
officials agencies and State and Central Governments.

Suitability: This method is appropriate in cases where informants are spread


over a wide area
Merits 
• Of all the methods, the mailed questionnaire is the most economical.
It can be widely used, when the area of investigation is large.
• It saves money, labor and time.

Demerits 
• We cannot be sure about the accuracy and reliability of the data. 
There is long delay in receiving questionnaires duly filled in. 
Secondary Data

• Secondary data are those data


which have been already collected
and analysed by some earlier
agency for its own use and later the
same data are used by a different
agency.
• Sources of Secondary Data
1. Published Sources
2. Unpublished Sources
Published sources:
Various governmental, international and local agencies publish statistical data, and
chief among them are: 
• International publications: They are U.N.O, IM.F etc. 
• Official publications of Central and State Govt.: Reserve Bank of India Bulletin,
Census of India, Indian Trade Journal, etc. 
• Semi-Official publications: Semi-Govt. institutions like Municipal Corporation,
District Board, Panchayat, etc. publish reports. 
• Publications of Research Institutions: Indian Statistical Institutions (I.S.I), Indian
Council of Agricultural Research (I.C.A.R) etc. publish the finding of their
research programmes. 
• Journals and Newspapers: Current and important materials on statistics and
socio-economic problems can be obtained from journals and newspapers like,
Economic Times, Commerce, Indian Finance etc.
Unpublished sources:
• There are various sources of unpublished data.
• They are the records maintained by various government and private offices, the
researches carried out by individual research scholars in the universities or research
institutes.
• We must take extra care when using secondary data.
• According to Prof. Bowley “It is never safe to take published statistics at their face
value without knowing their meaning and limitations and it is always necessary to
criticize arguments that can be based on them.”

Precautions in the use of Secondary Data:


Before using the secondary data, the investigators should consider the following
factors:
• Suitability of the data 
• Adequacy of the data 
• Reliability of data
COMMON DATA COLLECTION
METHODS
Presentation of Data
• The objective of classification of data is to make the data simple,
concise, meaningful and interesting and helpful in further analysis.
• Method by which the people organize, summarize and communicate
information using a variety of tools such as tables, graphs and
diagrams
• Data collected & compiled from experimental work , surveys , records
–raw data needs to be sorted & classified to make it
simple ,concise ,meaningful , interesting & helpful .
2 methods :1.Tabulation 
2.Diagrams / drawings
ADVANTAGES
• Easy and better understanding of the subject
• Provides first hand information about data
• Helpful in future analysis
• Easy for making comparisons
• Very attractive
PRINCIPLES OF PRESENTATION

• Data should be presented in simple form


• Arose interest in reader
• Should be concise but without losing important details
• Facilitate further statistical analysis
• Define problem and Should suggest its solution
Tabulation 

It is a systematic and logical arrangement of classified data in rows and


columns.

SIGNIFICANCE OF TABULATION
•Simplifies complex data
•Unnecessary details and repetitions of data avoided in tabulation
•Facilitates comparison
•Gives identity to data
•Reveals pattern with in the figures which cannot be seen in the narrative
form
RULES OF TABULATION
• A number should be assigned to the table ( Table No.)
• A title should be given to the table , it should be concise and self explanatory
• Contents of the table should be defined clearly.
• Subtitles should be properly mentioned with columns and rows.
• Group intervals in columns and rows should neither be too narrow nor too wide.
• They should also be mutually exclusive.
• Unit of measurement must be mentioned clearly where ever necessary.
• Any short forms /symbols , if used should be explained in the foot note.
• No place should be left in the body of tables .
• There should be logical arrangement of data in the table.
Simple Table:
They are one-way tables which supply answers to questions
about one characteristic of data only.

Frequency distribution Table:


The simplest table is a two column frequency table.The first
colum list the classes into which the data are grouped.The
second column lists the frequencies foreach classification.

Master Tables:
They are tables ,which contain all the data obtained from a
survey,
Reference tables(General purpose or
primary tables)
• These tables present the original data for
reference purposes. 
• It contains only absolute and actual
figures and round numbers or
percentages.
• Eg: Tables in census record, Appendices
of Publications Sl.No Contents Page
numbers
TEXT TABLES (SPECIAL
PURPOSE OR DERIVATIVE
TABLES)
• Constructed to present selected
data from one or more general
purpose tables. 
• It brings out a specific point of
answer to specific question. 
• It includes ratios, percentages,
averages etc. 
• It should be found in the body of
the text.
VISUAL DATA SUMMARIES
QUANTITATIVE/CONTINUOUS/MEASURED DATA
• Histogram
• Frequency polygon
• Frequency curve
• Line chart/graph
• Cumulative frequency diagram
• Scatter /dot diagram
QUALITATIVE/DISCRETE
• Bar Diagram
• Pie Sector Diagram
• Pictogram
• Map Diagram
Impact on Better retained Easy
imagination in memory comparisons
ADVANTAGES
• They are attractive
• They give a bird’s eye-view of the data 
• They can be easily understood by common men 
• They facilitate comparison of various characteristics 
• The impression created by them are long lasting 
• Theorems and results of statistics can be visualized using graphs
DISADVANTAGES
• They are visual aids. They cannot be considered as alternatives for
numerical data. 
• Though theories and results could be easily visualized by diagrams and
graphs, mathematical rigour cannot be brought in 
• Diagrams and graphs are not accurate as tabular data. Only tabular data
can be used for further analysis. 
• By diagrammatical and graphical misrepresentation observers can be
misled easily. It is possible to create wrong impressions using diagrams
and graphs.
HISTOGRAM
• A histogram is a special sort of bar chart .
• The successive groups of data are linked in a definite numerical order
• Represented by a set of rectangular bars
• Variables (Class) is taken along the X-axis & frequency along the Y-
axis.
• With the class intervals as base, rectangles with height proportional to
class frequency are drawn.
• The set of rectangular bars so obtained gives histogram.
Note :
•The total area of the rectangles in a histogram represent total frequency
•If the frequency distribution has inclusive class intervals, they should be
converted into exclusive type
•Mode of the distribution can be obtained from the histogram ( from the
highest rectangular bar).
FREQUENCY CURVE
• Variables is taken along the X-axis and
frequencies along Y-axis.
• Frequencies are plotted against the class mid-
values and then, these points are joined by a
smooth curve.
• The curve so obtained is the frequency
curve.
• Total area under the frequency curve
represents total frequency.
Frequency Polygons:

• Variables is taken along the X-axis and frequencies


along the Y-axis. 
• Class frequencies are plotted against the class mid-
values and then, these points are joined by Straight
line. The figure so obtained is the frequency
polygon.
• Total area under the frequency curve represents total
frequency.
• A frequency distribution may also be represented
diagrammatically by the frequency polygon.
• It is obtained by joining the mid points of the
histogram blocks.
Ogives(CUMULATIVE FREQUENCY
CURVES)
• Is a smooth graph with cumulative
frequency (cf) plotted against values of
variables (Class limits).
• Class limits are taken along X-axis and cf
along Y-axis
There are 2 types of ogives -
Less than cf curve or less than ogive (<cf) -
Greater than cf curve or greater than ogive
(>cf).
• Less than cf curve (< ogive): the variables values (class limits) is taken
along the X-axis and <cf along the Y-axis. < cf are plotted against the
respective Variable values. Then these points are joint by a smooth
curve. The resulting graph is less than ogive.
• Greater than cf curve (> ogive): Here, the variables values (class limits)
is taken along the X-axis and >cf along the Y-axis. >cf are plotted
against the respective variable values. Then these points are joint by a
smooth curve. The resulting graph is greater than ogive.
• Note: The two ogives are drawn together with common axis. The
points of intersection of the two ogives gives the Median point of the
distribution. Ogives are used to locate partition values also (like
median, quartiles, deciles, percentiles ).
Line Graph:(Time series graph)

• When the quantity is a continuous variable i.e., time or


temperature, data is plotted as a continuous line.
• Line graphs are used to display the comparison
between two variables which are plotted on the X-axis
and Y-axis.
• The X-axis represents measures of time, while the Y-
axis represents percentage or measures of quantity.
• They organize and present data in a clear manner and
show relationships between the data.
• Line graphs displays a change in direction
• It shows trend of an event occurring over a period of
time to know whether it is increased or decreased. Eg:
IMR, Cancer deaths etc
SCATTER DIAGRAMS
• Used to study and identify the
possible relationship between the
changes observed in two
different sets of variables.
SCATTER GRAPHS AND
CORRELATIONS
Pie diagram
• Presenting discrete data of qualitative characteristics
such as blood groups, RH factor, Age group, sex
group, causes of mortality or social group in a
population etc. The frequencies of the groups are
shown in a circle.
• Degrees of angle denote the frequency and area of
the sector.
• It gives comparative difference at a glance.
• Size of each angle is calculating by multiplying the
frequency/total frequency by 3600 .
Spot Map/Cartogram
-Cartograms are used to represent data on
geographical basis,For example ,possible date
of rainfall.

-These diagrams are very effective and


attractive if message is to be communicated.
Pictograms:
• Pictorial or diagrammatical data
represented by a pictorial symbol.
USA 50

SINGAPORE 1100

INDIA 3700

BANGLADESH 9700

Population per Physician


BAR DIAGRAMS
• Bar diagram consists of a series of rectangular bars of equal width.
• The bars stand on common base line with equal gap between one bar and another.
• The bars may be either horizontal or vertical.
• The bars are constructed in such a way that their lengths are proportional to the magnitudes
(frequency)
• Note: • Space between consecutive bars are equal
• The bars are of equal width.
SIMPLE BAR DIAGRAM
• Used to represent when items have to be compared with regard to a single characteristic.
• Here, the items are represented by rectangular bars of equal width and height proportional to
their magnitude.
• The bars are drawn on a common base line, with equal distance between consecutive bars.
• The bars may be shaded.
SUBDIVIDED (Component or stacked or proportional Bar
diagrams

• The data have items whose magnitudes have two or more components.
• Here, the items are represented by rectangular bars of equal width and
height proportional to magnitude.
• Then, the bars are divided so that the sub-divisions in height represent
the components.
• To distinguish the components from one another clearly, different shades
are applied and an index describing the shades is provided.
• Component bars are drawn when a comparison of total magnitudes along
with the components is required.
PERCENTAGE BAR DIAGRAM

• To represent items whose magnitudes have two or more components.


• The comparison of components are expressed as percentages of the
corresponding totals. The totals are represented by bars of equal width
and height equal to 100 each.
• These bars are divided according to the percentage components.
• The different sub-divisions are shaded properly and an index which
describes the shades is provided. Percentage bars are useful in
comparing percentage components.
MULTIPLE BAR DIAGRAM
• When there are two or more different comparable sets of values, multiple
bars are drawn. Eg: Imports and exports.
• Here, sets of rectangular bars of equal width with height proportional to
the value are drawn.
• The bars corresponding to the same unit are placed together adjacent to
one another.
• •The diagram is shaded properly and an index is provided.
DEVIATION BAR DIAGRAMS
• Useful for presenting net
quantities which have both
positive and negative values.
• The positive deviations are
presented by bars above the base
line while negative deviations
are presented by bars below the
base line.
Other presentation of data
• A stem and leaf plot is a quick way to
organize large amounts of data.
• A special table where each data value is
split into a "leaf" (usually the last digit)
and a "stem" (the other digits).
• The "stem" values are listed down, and
the "leaf" values go right (or left) from
the stem values.
• The "stem" is used to group the scores
and each "leaf" indicates the individual
scores within each group.
SUMMARY MEASURES
Central Tendency / Statistical Averages:

• Central tendency refers to the center of the distribution of data


points.
• Value or parameter which serves as single estimate of a series of
data
• One central value around which all other observations are dispersed.

Statistics/parameters such as
• Mean (the arithmetic average)
• Median (the middle datum)
• Mode (the most frequent score).

Objectives
• To condense the entire mass of data.
• To facilitate comparison.
Mean
• The most common measure of cental tendency.
• Affected by extreme values(outliers)
• This measure implies the arithmetic average or arithmetic mean.
• It is obtained by summing up all the observations and dividing the total by number
of observations.
E.g. The following gives you the fasting blood glucose levels of a sample of 10
children.
• I 2 3 4 5 6 7 8 9 10
• 56 62 63 65 65 65 65 68 70 71
• Total Mean = 650 / 10 = 65
• Mean is denoted by the sign X(X bar)
Advantages:
• Easy to calculate
• Easily understood
• Utilizes entire data
• Affords good comparison

Disadvantages:
• Mean is affected by extreme values, In such cases it leads
• to bad interpretation.
Median
• Robust measure of central tendency
• Not affected by extreme values.
• In an ordered array, the median is the “middle” number
If n or N is odd ,the median is the middle number
If n r N is even,the median is the average of the two middle numbers.
• In median the data are arranged in an ascending or descending order of
magnitude and the value of middleobservation is located.

For example:
71,75,75,77,79,81,83,84,90,95.
• Median = 79 + 81 / 2 = 80
• If there are only 9 observations then median = 79.
Advantages:
• 1. It is more representative than mean.
• 2. It does not depend on every observations.
• 3. It is not affected by extreme values.
Mode
• Values that occurs most often.
• Not affected by extreme values
• Used for either numerical or categorical data
• There may be no mode
• There may be several modes.
E.g. Diastolic blood pressure of 10 individuals.
85,75,81,79,71,80,75,78,72,73
• Here mode = 75 i.e. the distribution is uni-modal
• 85,75,81,79,80,71,80,78,75,73
• Here mode =75 and 80 i.e. the distribution is bi-modal.
MEASURES OF
DISPERSION(VARIATION)
MEASURES OF DISPERSION
• Dispersion refers to the variations of the items among themselves /
around an average. 
• Greater the variation amongst different items of a series, the more
will be the dispersion. 
• As per Bowley, “Dispersion is a measure of the variation of the item
• Measures of central tendency – single value to represent data
• Measures of Dispersion - degree of spread or variation of the variable
about the central value. 
OBJECTIVES OF MEASURING
DISPERSION
• To determine the reliability of an average 
• To compare the variability of two or more series 
• For facilitating the use of other statistical measures 
• Basis of Statistical Quality Control
PROPERTIES OF A GOOD MEASURE OF
DISPERSION
• Easy to understand.
• Simple to calculate.
• Uniquely defined. 
• Based on all observations 
• Not affected by extreme observations.
• Capable of further algebraic treatment
RANGE
• Measure of variation
• Difference between the largest
and the smallest observations.
• Ignores the way in which data
are distributed.
MERITS AND DEMERITS OF RANGE
MERITS DEMERITS
• Gives a quick answer • Cannot be calculated in open
• Simple and easy to understand ended distributions
• Affected by sampling
fluctuations
• Changes from one sample to the
next in population
• Gives a rough answer and is not
based on all observation
MEAN DEVIATION
The average of the absolute values of deviation from the mean(median or mode)
is called mean deviation.

MERITS DEMERITS
• Simplifies calculations • Not reliable
• Can be calculated by mean, median and mode • Mathematically illogical to assume all negatives
as positives
• Is not affected by extreme measures
• Not suitable for comparing series
• Used to make healthy comparisons
VARIANCE
• Variance is the average squared deviation from
the mean of a set of data.
• It is used to find the standard deviation.
Processes To Find Variance
1. Find the Mean of the data. 
2. Mean is the average so add up the values and divide
by the number of items. 
3. Subtract the mean from each value – the result is
called the deviation from the mean. 
4. Square each deviation of the mean. 
5. Find the sum of the squares. 
6. Divide the total by the number of items.
Standard Deviation
• Most important and widely used.
• Root mean square deviation
• Summary measure of the differences of each obsern from mean of all
observations.
• Greater the deviation,greater the dispersion.
• Lesser the deviation,greater uniformity.
CALCULATION OF SD
For ungrouped data: 
• Calculate the mean(X) of the series. 
• Take the deviations (d) of the items from the mean by : d=Xi – X,
where Xi is the value of each observation. 
• Square the deviations (d2) and obtain the total (∑ d2) 
• Divide the ∑ d2 by the total number of observations i.e (n-1) and
obtain the square root. This gives the standard deviation.
• Symbolically, standard deviation is given by:
SD= √ ∑ d2 /(n-1)

For grouped data with single units for class intervals: 
S = √∑(Xi - X) x fi / (N -1) 
Where, Xi is the individual observation in the class interval
fi is the corresponding frequency 
X is the mean 
N is the total of all frequencies

For grouped data with a range for the class interval:


S =√ ∑(Xi - X) x fi / (N -1)
Where, Xi is the midpoint of the class interval
fi is the corresponding frequency
X is the mean N is the total of all frequencies
Relationship between standard deviation and
variance
Merits of standard deviation
• It takes intoaccount all the items and is capableof future algebraic
treatment andstatistical analysis.
• It is possible to calculate standard deviationfor two or more series
• This measure is most suitable for making comparisons among two or
more series about variability.

Demerits of Standard Deviation


• It is difficult to compute.
• It assigns more weights to extreme itemsand less weights to items that
are nearer to mean.
COFFFICIENT OF VARIATION
• A relative measure of dispersion. 
• Coefficient of Variation is a measure of spread that describes the amount of
variability relative to the mean.
• To compare two or more series of data with either different units of
measurement or marked difference in mean. 
• C.V.= (Sx100)/ X 
• Where, C.V. is the coefficient of variation 
• S is the standard deviation 
• X is the mean 
• Higher the C.V. greater is the variation in the series of data
Merits Demerits
• Best one • COV is dimensionless or non-
• Most appropriate one unitized 
• Based on Mean and Standard • It is impossible to calculate if
Deviation Mean is 0 
• It is difficult to calculate if the
values are both positive and
negative numbers & if the mean
is close to 0.
SHAPE OF THE DISTRIBUTION
• Describes how data is distributed
• Measures of shape: Symmetric or skewed.
Normal or Gaussian distribution
• A graphical representation of the normal
distribution.
• It is determined by the mean and the standard
deviation.
• It a symmetric unimodal bell-shaped curve.
• Its tails extending infinitely in both directions.
• The wider the curve, the larger the standard
deviation and the more variation exists in the
process.
• Half of the observations lie above and half below
the mean
50% > mean; 50% < mean

Properties Of Normal Curve
• Bell shaped. 
• Symmetrical about the midpoint.
• Mean=median=mode
• Total area of the curve is 1.
• Its mean zero & standard deviation 1. 
• Height of curve is maximum at the mean and all three measures of central
tendency coincide. 
• Maximum number of observations is at the value of the variable corresponding
to the mean, numbers of observations gradually decreases on either side with
few observations at extreme points.
• Area under the curve between any two points
can be found out in terms of a relationship
between the mean and the standard deviation
as follows: 
• Mean ± 1 SD covers 68.3% of the
observations
• Mean ± 2 SD covers 95.4% of the
observations
• Mean ± 3 SD covers 99.7% of the
observations 
• These limits on either side of mean are called
confidence limits. 
• Forms the basis for various tests of
significance .
EMPIRICAL RULE
For any normally distributed data:
• 68% of the data fall within 1 standard deviation of the mean.
• 95% of the data fall within 2 standard deviations of the mean.
• 99.7% of the data fall within 3 standard deviations of the mean.
CONFIDENCE INTERVAL AND LIMIT
• Confidence interval: is a type of estimate computed
from the statistics of the observed data. This
proposes a range of plausible values for an unknown
parameter. The interval has an associated
confidence level that the true parameter is in the
proposed range. A range of values so constructed
that there is a specified probability of including the
true value of a parameter within it
• CONFIDENCE LEVEL: Probability of including the true
value of a parameter within a confidence interval
Percentage
• CONFIDENCE LIMITS: Two extreme measurements
within which an observation lies End points of the
confidence interval
Larger confidence – Wider
The Standard Normal Distribution (Z)

• Standard Scores are expressed in standard


deviation units .
• To compare variables measured on different
scales. There are many kinds of Standard
Scores.
• The most common is the ‘z’ scores. How
much the original score lies above or below
the mean of a normal curve
• All normal distributions can be converted into
the standard normal curve by subtracting the
mean and dividing by the standard deviation .
• If each data value of a normally distributed random variable x is
transformed into a z score,the result will a standard normal distribution.

• Use the standard normal table to find te cumulative are under the standard
normal curve.
Application of Normal Curve Model
• Using z scores to compare two raw scores from different distributions.
• Can determine relative frequency and probability Can determine
percentile rank.
• Can determine the proportion of scores between the mean and a
particular score.
• Can determine the number of people within a particular range of scores
by multiplying the proportion by N.
SKEWED DISTRIBUTION
The Skewed Distribution is distribution with data clumped up on one side
or the other with decreasing amounts trailing off to the left or the right.
• Non-symmetrical distribution :Mean, median, mode not the same 
• Negatively skewed :extreme scores at the lower end 
Mean < median <mode  most did well, a few poorly
• Positively skewed : at the higher end 
Mean >median >mode  Most did poorly, a few well 
• The further apart the mean and median, the more the distribution is
skewed.
CONCLUSION
Statistics is central to most medical research . Basic principles of
statistical methods or techniques equip medical and dental students
to the extent that they may be able to appreciate the utility and
usefulness of statistics in medical and other biosciences. Certain
essential bits of methods in biostatistics, must be learnt to
understand their application in diagnosis, prognosis, prescription
and management of diseases in individuals and community.
REFERENCES
• Methods in Biostatistics- 7th edition by BK Mahajan.
• Park K, Park’s text book of preventive and social medicine, 21st
ed, 2011, Bhanot, India; pg- 785-792. 
• Peter S, essential of preventive and community dentistry, 4th ed;
pg- 379- 386. 
• Mahajan BK, methods in biostatistics. 6th edition.
• John j, textbook of preventive and community dentistry, 2nd ed;
pg- 263- 68. 
• Prabhkara GN, biostatistics; 1st edition.

You might also like