You are on page 1of 53

Introduction to Biostatistics for

Basic Clinical Research

Hernan Cortez Labao, PTRP BSPT MSPT (Neuro) CPE


Lecturer, Department of Physiotherapy
Faculty of Health and Life Sciences
INTI International University, Nilai, Negeri Sembilan, Malaysia
STATISTICS is…
• …“ the science of collecting,
organizing, presenting, analysing, and
interpreting numerical data to assist in
making more effective decisions ”

BIOSTATISTICS is…
…the branch of applied statistics that
applies statistical methods to medical
and biological problems

- Rosner, 2016
The process of associating numbers
or symbols to observations obtained in a
research study
- Kothari & Garg, 2014

It consists of rules for assigning


numbers to objects in such a way as to
represent quantities of attributes
- Carter, Lubinsky & Domholdt, 2011
What is the TYPE OF
DATA that I am going
to collect?

How do I categorize
my variables?
VARIABLE is…
“a value of characteristics that changes from subject/ sample to another”

QUALITATIVE QUANTITATIVE
 Have OBSERVED values that are attributes  Typically NUMERIC and mathematical
or categories operations can be applied
 Can be counted but cannot be computed  Can be calculated/ computed
 Uses “Dummy codes”, but numbers do not  Can be discrete or continuous
reflect its actual value
 Examples:
 Examples: • Weight
Gender • Temperature
Race
• Height
Skin color

- Kothari & Garg, 2014


VARIABLES  Have finite
or
countable
DISCRETE number of
QUALITATIVE QUANTITATIVE possible
values
 COUNTED
 Typically numeric,
 Attributes or
can be counted
Categories
and computed  Have
infinite
number of
 Examples:  Examples: CONTINUOUS
possible
Gender, Race, Height, Weight, values
Skin color Temperature  MEASURED

 Dual response
CATEGORIES BINARY or
 Ex: Yes or No
DICHOTOMOUS  COUNTED
 CLASSIFIED
CATEGORICAL BINARY or DICHOTOMOUS
• Can be CLASSIFIED • Only DUAL category, level, or type
• Examples: Race, Marital Status, Income Category • Examples: Yes or No, Male or Female, Positive or
Negative, Decrease or Increase

DISCRETE CONTINUOUS

• Can be COUNTED • Can be MEASURED


• Examples: No. of Shoes, No. of Women • Examples: Height, Weight, Time, Distance
Levels of Measurement

Nominal Ordinal Interval Ratio

• Ordered/ ranks • Ordered, there is • Origin, equal


• Classified, counted
• Intervals bet. equal distance distance, & order
• Valueless
categories or ranks • Meaningful diff. • Meaningful 0 pt. &
• No order, distance
may or may not be b/w values interval b/w values
or origin
equal • No origin • Arithmetic functions

• Race • IQ Score
• MMT Score
• Sex • Temp. in °F • Speed
• Ashworth Scale
• Civil Status • Dress Size • Weight
• Likert Scale
• Hand Dominance • Distance Travelled

Discrete Data - Kothari & Garg, 2014 Continuous Data


- Plichta, Kelvin & Munro, 2013
What is my RESEARCH
OBJECTIVE?

Do I want to describe,
correlate, test relationship,
or compare differences
among variables?
If your objective is to DESCRIBE variables…
DESCRIPTIVE STATISTICS are often used in…
DESCRIPTIVE STATISTICS

 Consists of methods used in organizing,


summarizing, and presenting data in
informative way
 Develops certain indexes from raw data
- Kothari & Garg, 2014

 Measure data that is presently occurring


in all subjects
 Somehow provide an overall picture of
what the data look like
- Rosner, 2016
MEASURES OF CENTRAL LOCATION: CENTRAL TENDENCY

 Data must be summarized as succinctly as possible


 Looking at individual sample points may lead to losing track of overall picture
 These are tests that will allow to summarize data – if data is distributed at the center or
middle of the sample
- Rosner, 2016; Blaxter, Hughes & Tight, 2010
MEASURES OF CENTRAL TENDENCY

MEAN

• Sum of all observations divided by the number of


observations
• INTERVAL OR RATIO DATA
- Carter, Lubinsky & Domholdt, 2011

• The expected value represents the “average” value


of the random variable
• May be influenced profoundly by the extreme
variables - OUTLIERS
- Plichta, Kelvin & Munro, 2013
MEASURES OF CENTRAL TENDENCY

MEDIAN

• “Middle” score of a distribution, or the score above which half of the distribution lies
• For ORDINAL DATA
• STEPS: 1st - scores are rank, 2nd – find the middle score
• If distribution is odd in number – middle score
• If distribution is even in number – mean of two scores
Median = 𝑁1 + 𝑁2
2 - Carter, Lubinsky & Domholdt, 2011
MEASURES OF CENTRAL TENDENCY

MODE

• Score that occurs more frequently in a


distribution
• If there are 2 modes – distribution is termed
bimodal
• Often used to describe nominal data
• NOMINAL/ CATEGORICAL DATA

- Carter, Lubinsky & Domholdt, 2011


MEASURES OF DISPERSION/ VARIABILITY

• The amount of SPREAD of data

- Carter, Lubinsky & Domholdt, 2011

• Opposite of centrality of data


• Three measures of variability:
1. Range
2. Variance
3. Standard Deviation

- Plichta, Kelvin & Munro, 2013


MEASURES OF DISPERSION/ VARIABILITY

RANGE

• Technically a single score, but often reported both the high and low scores
• Example:
High Score – Low Score = Range
95 – 32 = 63°
- Carter, Lubinsky & Domholdt, 2011

• Difference between the highest and lowest values in the distribution


• Can be used as a measure of variability with any of the measures of central tendency
but is particularly appropriate for use with the MEDIAN
- Plichta, Kelvin & Munro, 2013
MEASURES OF DISPERSION/ VARIABILITY

VARIANCE

• Measure of variability that, like the mean requires that every score in the distribution
be used in its calculation
• Variance along with Standard Deviation are generally reported with the MEAN
• Degrees of Freedom – # of items that are free to fluctuate

- Carter, Lubinsky & Domholdt, 2011


MEASURES OF DISPERSION/ VARIABILITY

STANDARD DEVIATION
• It is the square root of the variance and is expressed in the units of the original
measure.
• Appropriate for MEAN
- Carter, Lubinsky & Domholdt, 2011
- Plichta, Kelvin & Munro, 2013
EMPIRICAL RULE
MEASURES OF DISPERSION/ VARIABILITY

STANDARD ERROR OF MEAN

• Allows inference of how the sample mean


matches up to the true population mean
• Helps to determine confidence in the data
collected from a sample
• 95% confidence interval = + 2 SE
MEASURES OF NON-CENTRAL LOCATION:
FREQUENCY DISTRIBUTION

 Tallies of number of times each data is


represented in a data set

 It is the frequency or count of the


occurrences of values within a
particular group
- Carter, Lubinsky & Domholdt, 2011

 Usually used for nominal or ordinal data


 INCLUDES: Proportions, percentages,
and ratios

- Blaxter, Hughes & Tight, 2010


FREQUENCY DISTRIBUTION

PERCENTAGES
• Used to compare parts into a whole
• Whole is divided into 100 equal parts

• RATE – a proportion with the specification of time


FREQUENCY DISTRIBUTION

RATIO

• Comparison of two quantities/ units

• Allows quantifying part to part


relationships which are part of a whole

• Allows quantifying part to whole


relationships
FREQUENCY DISTRIBUTION

PROPORTIONS

• Comparison of equality of two ratios or


fractions
Population WITH DISEASE: 4:6
• Use to determine whether two parts are
equal

Population WITHOUT DISEASE: 8:12

WITH DISEASE = WITHOUT DISEASE


What is my RESEARCH
OBJECTIVE?

Do I want to describe,
correlate, test relationship,
or compare differences
among variables?
If your objective is to TEST a hypothesis…
INFERENTIAL STATISTICS are often used in…
INFERENTIAL STATISTICS

 Consists of tests used to make a decision, estimate,


prediction, or generalization about a population based on a
sample

- Kothari & Garg, 2014

 Samples are only estimates of the population


 Based on Probability Theory – permits us to
estimate the accuracy or representativeness of
the sample
 Allows to;
1. Assess likelihood of something to happen at
some point in the future
2. Test a sample of the population to
generalize to the entire population
PROBABILITY
Inferential Statistics
 Probability is the measure of the likelihood that an event will occur.
 Probability is quantified as a number between 0 and 1 (where 0 indicates
impossibility and 1 indicates certainty).
 In inferential statistics, the term ‘null hypothesis’ (H0 ‘H-naught,’ ‘H-null’)
denotes that there is no relationship (difference) between the population
variables in question.
 Alternative hypothesis (H1 and Ha) denotes that a statement between the
variables is expected to be true.
 The P value (or the calculated probability) is the probability of the event
occurring by chance if the null hypothesis is true.
 The P value is a numerical between 0 and 1 and is interpreted by researchers in
deciding whether to reject or retain the null hypothesis

- Plichta, Kelvin & Munro, 2013


PROBABILITY
Inferential Statistics

- Plichta, Kelvin & Munro, 2013


If your objective is to test CAUSE AND
EFFECT RELATIONSHIPS…
Inferential Statistics
REGRESSION ANALYSIS

Tests the effect of one or more variable (predictor variable) on


another variable (outcome variable) – cause and effect

Examples Predictor Variable Outcome Variable


Simple Linear • 1 Continuous • 1 Continuous
Regression
Multiple Linear • 2 or more Continuous • 1 Continuous
Regression
Logistic Regression • Continuous • Binary/ Dichotomous
If your objective is to test whether 2
variables are related…
Inferential Statistics
CORRELATION TESTS

Use to check whether two variables are related without assuming


cause-and-effect relationships

Examples Predictor Variable Outcome Variable


Pearson’s Correlation • 1 Continuous • 1 Continuous

Spearman’s Correlation • 1 Continuous (but violated • 1 Continuous (but violated


parametric assumptions) parametric assumptions)
Chi-Square • Categorical • Categorical
If your objective is to compare differences
among variables…
Inferential Statistics
COMPARISON TESTS/ TEST OF DIFFERENCE

Use to test differences among group means


Use to test the effect of a categorical variable on the mean of
some other characteristics
Examples Predictor Variable Outcome Variable
Paired T-Test (one • 1 Categorical • Quantitative
group)
Independent T-Test (2 • 1 Categorical • Quantitative
groups)
ANOVA (more than 2 • 1 or more Categorical • Quantitative
groups)
 What is the type of data
collected?
 Does the data
collected met certain
assumptions?

…when deciding the statistical


tests and tools that you will use.
PARAMETRIC ASSUMPTIONS

1. Data must be RATIO or INTERVAL.


2. Subjects should be RANDOMLY SELECTED.
3. Data should be NORMALLY DISTRIBUTED.
4. There should be SIMILAR VARIANCE.

 #1 CANNOT be violated
 #2-4 can be violated to some extent

- Kothari & Garg, 2014 ; Carter, Lubinsky & Domholdt, 2011


The data collected should be
RATIO or INTERVAL
Subjects are
RANDOMLY
SELECTED
The data should be NORMALLY
DISTRIBUTED
TEST FOR NORMALITY
• Can be determined GRAPHICALLY

• Using the central distribution graph, data • The data points should be close to the
should not be negatively or positively diagonal line
skewed
TEST FOR NORMALITY

• Can be determined NUMERICALLY

• SHAPIRO-WILK TEST – is more appropriate for small samples sizes (less than
50) but can also handle sample size as large as 2000.
• It should have statistical significance of greater than 0.05 for data to be normal
- Laerd Statistics, 2018
TEST FOR NORMALITY

• Can be determined NUMERICALLY

• KOLMOGOROV-SMIRNOV TEST – generic method to test whether


two random samples are drawn from the same distribution
• Generally less powerful, for larger sample size or used when other
parametric assumptions are already violated
- Plichta, Kelvin & Munro, 2013
=

There should be SIMILAR


VARIANCE between groups
TEST OF VARIANCE

• LEVENE’S TEST – assesses equality of variances between 2 or more


group means
• Assesses homogeneity or homoscedasticity
• Ensuring equal variances can minimize type 1 error

- Laerd Statistics, 2018


References
• Blaxter, L., Hughes, C., & Tight, M. (2010). How to research, 4th Ed. McGraw
Hill Companies, New York.
• Carter R., Lubinsky, J., & Domholdt, E. (2010). Rehabilitation research:
principles and applications, 4th Ed. St. Louis, Missouri, Elsevier Saunders.
• Kothari, C.R. & Garg, G. (2014). Research methodology: methods and
techniques, 3rd Ed. New Age International Publishers, New Delhi.
• Kumar, R. (2011). Research methodology: a step-by-step guide for beginners,
3rd Ed. SAGE, Los Angeles.
• Plichta, S. B., Kelvin, E. A., & Munro, B. H. (2013). Munro’s statistical methods
for health care research. Wolters Kluwer Health/Lippincott Williams & Wilkins.
• https://statistics.laerd.com/
• Pictures from public domains
ACKNOWLEDGMENT:

INDIAN CLINICAL RESEARCH SCAN ME TO GET A COPY


ASSOCIATION (ICRA) https://qrgo.page.link/oku62

You might also like