Professional Documents
Culture Documents
Statistics Basis To Advance
Statistics Basis To Advance
Dr Mahalakshmy T
Special thanks:
Dr Ajay Kumar, The Union
Dr. Palanivel C
What is intended?
• APPROACH to analysis
Research
question
Sample
4
Inferential statistics
Generalisation to the population
Conclusions based
on the sample
Population
Research
question
Sample
5
Examples of descriptive and
inferential statistics
TYPES OF VARIABLES
VARIABLES
QUALITATIVE QUANTITATIVE
(Categorical) (Numeric)
• Measures of dispersion
How to summarize a quantitative
variable?
• Mean (SD) or Median (IQR/Range)
• Mean difference
• Correlation coefficient
Conclusions based
on the sample
Population
Research
question
60%
Sample
20
Statistics – Confidence interval
(95% CI)
▪ A Confidence Interval is a range of values within
which the “true” population parameter is believed to
be found with a given level of confidence.
Interpret
What is alpha error and beta
error?
Errors in the study….
STUDY
No
association
TRUTH
Association No
association
Association
Power
(1-beta)
STUDY
No
association (1-alpha)
Confidence
TRUTH
Association No
association
Association Type I error
Power (alpha error)
(1-beta) FP error
STUDY ‘P’ value
No Type II error
association (beta error) (1-alpha)
FN error Confidence
TRUTH
Association No
association
STUDY Association (1-beta) Type I error
Power (alpha error)
FP error
‘P’ value
No Type II error (1-alpha)
association (beta error) Confidence
FN error
The Judges Dilemma
Not
Guilty Guilty
Type I error
Type II Error
TRUTH
Guilty Innocent
(associated
with crime)
JUDGE Guilty (1-beta) Type I error
(associated Power (alpha error)
with crime) FP error
▪ Conventionally
• Accepted alpha error is 0.05
• Accepted Beta error is 0.2
▪ Alpha error is “error of commission” – it
changes practice
▪ Beta error is “error of omission”
What is p value?
‘p’ value < 0.05 – what does it really
mean?
▪ “p” is the probability that the result is just by
chance
▪ It is nothing but alpha error
▪ That is… when we conclude that there is an
association but in reality there is no
association.
▪ Lower the p value more confident we are
that the result is true…
Statistics – P-value
▪ Statistical significance:
• P value < 0.05 - Significant
• P value < 0.01 - Very significant
• P value < 0.001 - Highly significant
What is the
What is the
power
“p” value?
of the study?
Example
Yoga
Vs
Standard treatment
7%
Standard
10%
Example..contd
▪ The study got over; as decided I have
included 466 individuals in each group
COFFEE COFFEE
DRINKING DRINKING
OBSERVED ASSOCIATION
OBSERVED ASSOCIATION
SMOKING
PANCREATIC PANCREATIC
CANCER CANCER
Function of the complex interrelationship
between exposure and disease
Factor should differ between the study
groups
Mixing of the effect of the exposure under
study on the disease with that of a third
factor.
Factor must be associated with the exposure
Review of literature
Identify
the direction of effect of
confounding
Positive Confounding
Over-estimate risk
Eg: Smoking in a study on coffee drinking and
MI
Negative Confounding
Underestimate risk/protection
Eg: Gender in a study on Physical Activity and
MI
Hypothetical example of confounding in an unmatched
case control study : Number of Exposed and nonexposed
cases and controls
ODDS RATIO = 30 × 82
70 × 18
= 1.95
A. Causal B. Due to Confounding
Exposure Exposure
Observed Association
Observed Association
Older Age
Disease Disease
TOTAL 50 80
YES 25 10 OR= 1
≥ 40 NO 25 10
TOTAL 50 20
At the design stage
Both
Restriction
Matching
Stratification
71 72
70 65
PER 100,000 MAN-YEARS
60
50
RATES
NEVER SMOKED
40
REGFULARLY SMOKED
CIGARETTES
30
20 15
9
10 5
0
0
CITY OF CITY OF SUBURB OR RURAL
50000+ 10000-5000 TOWN
90 85
80 71 72
AGE ADUESTED LUNG CANCER DEATH
70 65
60
PER 100,000 MAN-YEARS
50
40
30 NEVER SMOKED
RATES
20 15
9 REGFULARLY SMOKED
10 5
0 CIGARETTES
0
Confounding describe the reality of the
interrelationship between certain factors and
a certain outcome
Characterize every situation in which
etiology is addressed because most causal
questions involve the relationship of multiple
exposures and multiple etiological factors
In all analytical observational studies, bias
and confounding must always be considered
as an alternative explanation for study
findings.
Number of methods are available for
controlling confounders
No single method can be considered optimal
in every situations
Combination of strategies is preferred
Epidemiology- by Leon Gordis (6th Edition)
1+1=2
• No interaction
1+1=4 or 1+1=-2
• Interaction
Stratified analysis
Interaction Interaction
not present is present
Definition by Mac Mahan
Smoking
- +
- 3 9
Urbanisation
+ 15 9+15-3=21
Additive model
Incidence of lung cancer
Smoking
- +
- 3 9
Urbanisation
+ 15 > 21
(interaction present)
synergism
Deaths from lung cancer (per 1,00,000)
Asbestos exposure
- +
- 11 58
Smoking
+ 123 602
Relative Risk of oral cancer
Smoking
- +
- 1 1.5
Alcohol
+ 1.2 5.7
Relative Risks of Liver Cancer for persons exposed
to Aflatoxin or Chronic Hepatitis B Infection
Blog: https://significantlystatistical.wordpress.com/2014/12/12/confounders-mediators-moderators-and-covariates/
CHOOSING A
STATISTICAL
TESTS
Dr Mahalakshmy T
Inferential Statistics
• Estimation (Confidence Interval)
Example:
1. Smoking vs cancer
Interval or ratio/
continuous
(Parametric )
Ordinal scale
(Non parametric)
Nominal or Dichotomous
CHOICE OF STATISTICAL TESTS
2 groups 3 or more
Scale of
groups
dependent COMPARE
variable Independe Repeated Independe
nt samples measures nt samples
Interval or Means
Independe Paired One way
ratio/
nt samples samples t ANOVA
continuous
t test test
(Parametric)
housewif
1 34 1 12 e 0 124 120 86y n n 1.55 65 98 24 32
housewif
2 50 1 6e 0 100 118 82n n n 1.72 47 73 48 24
fisherma
3 34 2 12 n 0 114 120 88y n n 1.68 61 85 24 24
fisherma
5 34 2 7n 0 120 140 80n n n 1.62 67 92 24 24
Scale of dependent
COMPARE Independent samples
variable
Parametric (Interval Values
Pearson correlation
or ratio/ continuous)
Non parametric Ranks
Spearman correlation
(ordinal scale)
CHOICE OF STATISTICAL TESTS
Predicting single outcome variable from several
independent variables
Continuous
Multiple regression
Dichotomous
Logistic regression
CHOOSE A STATISTICAL
TEST TO BE USED IN THE
FOLLOWING SITUATIONS
FRAMEWORK FOR CHOICE
OF STATISTICAL TESTS
Aim
Analysis type
Parameter to be analysed
Distribution of data
(Normal or Non-normal)
Design
(Paired or Unpaired)
1. A new drug is proposed to lower total
cholesterol. A randomized controlled trial
is conducted with 30 randomly assigned
participants to receive either the new drug
or a placebo. Each participant is asked to
take the assigned treatment for 6 months.
At the end of 6 months, each patient's
total cholesterol level is measured
SOLUTION
Aim
• To see whether the new drug
alters blood cholesterol levels
Analysis type • Comparison of means
Parameter to be
analysed • Cholesterol levels
No. & Name of the groups /
data sets to be analysed • Two – New drug and Placebo
Distribution of data
(Normal or Non-normal)
• Normal
Design
(Paired or Unpaired)
• Unpaired
Unpaired t test
2. Health education to adolescents on healthy
intervention
Paired t test
3. Three different formulations of iron to
of 6 months
Correlation
6. To evaluate an adverse effect, two
groups 1000 subjects were administered
either a drug or placebo.
The proportion of subjects affected were
78.54% and 21.63%.
Wilcoxon test
8. A new drug was tested to see how
its concentration in the body alters
with time. 10mg of the drug was
given iv and plasma concentration
was measured at 4, 8, 12, 24, 48 & 72
hr. Repeated measures
ANOVA
Summary
CHOICE OF STATISTICAL TESTS
2 groups 3 or more
Scale of
groups
dependent COMPARE
variable Independe Repeated Independe
nt samples measures nt samples
Interval or Means
Independe Paired One way
ratio/
nt samples samples t ANOVA
continuous
t test test
(Parametric)