Professional Documents
Culture Documents
5
28 December 2017
Statistics estimate parameters
Representative
Sampling error
6
28 December 2017
N=6 N=6 N=6
Red = 3/6 = 50% Red = 2/6 = 33.3% Red = 4/6 = 66.7%
N = 35
Red = 18/35 = 51.4% 50% 51.4%
Parameter Statistics Parameter
7
28 December 2017
Variable & its role
A value
and whose associated value may be
changed
8
28 December 2017
Causation
Relation of events (cause and effect)
But correlation (between two events) does not
9
28 December 2017
Hill’s criteria
1. Strength of 6. Plausibility
association 7. Coherence
11
Exposure
Exposure
Exposure
Time & causation
Outcome
Time
Smoking
13
28 December 2017
Causal web
Web of causation
Conceptual framework
14
Basic Biostatistics (C) Jamalludin Ab Rahman 2015 28 December 2017
15
28 December 2017
Exposure
Mediator
Confounder
Effect modifier or Moderator
Slide
16
Basic Biostatistics (C) Jamalludin Ab Rahman 2015 28 December 2017
17
Basic Biostatistics (C) Jamalludin Ab Rahman 2015 28 December 2017
18
http://www.apa.org/science/about/psa/2008/06/ahn.aspx
Categorical Numerical
e.g. Gender, Race e.g. Cancer e.g. Parity, e.g. Hb, RBS,
staging, Severity Gravida cholesterol.
of CXR for PTB
21
28 December 2017
Distribution (shape) of data
Applicable to numerical value
Discrete or Continuous
22
28 December 2017
Central limit theorem
“Given a distribution with a mean μ and
variance σ², the sampling distribution of the
23
28 December 2017
Normal Distribution
1 1 𝑥−𝜇
−2 ( 𝜎 ) 2
𝑓 𝑥; 𝜇, 𝜎 2 = 𝜎 𝑒 )
2𝜋
24
Bell
Unimodal
Symmetrical
shaped curve
Characteristics
26
28 December 2017
Use Normality test with caution
Small samples almost always pass a normality
test. Normality tests have little power to tell whether or not a small
27
28 December 2017
Why run statistical test?
1. Measure magnitude of event
2. Determine presence of difference (or similarity)
28
28 December 2017
Is there any difference
between A & B?
A B C
29
28 December 2017
Statistical analysis
Descriptive Analytical
32
28 December 2017
How to describe a data
Frequency
Categorical (count) &
Percentage
Numerical
Median
Not Normal
(Range/IQR)
33
28 December 2017
Analytical statistics
B C
35
28 December 2017
3 methods to compare values
1. P-value
2. Confidence interval
3. Effect size
Basic Biostatistics
36
28 December 2017
P value
P-value is ‘likely’ or ‘unlikely’ that Ho is true
Taking 0.05 as the cut-off point (a), if P ≤ 0.05, it is
then ‘unlikely’ Ho is true, therefore reject Ho
Basic Biostatistics
37
28 December 2017
Hypothesis testing
Truth
Ho True Ho False
Type I error
Reject Ho Correct
(a)
39
Two-sided
Left-sided
One-tailed vs. two-tailed
Right-sided
42
28 December 2017
P < 0.05
Why 5%?
Cut-off point proposed by Sir Ronald A. Fisher
43
28 December 2017
Hypothesis Testing using
bivariable analysis
Try to prove that Exposure causes the Disease
e.g. Smoking causing Lung Cancer
44
28 December 2017
No Lung
Lung Cancer
Cancer
45
28 December 2017
Confidence Interval
Range of plausible values
Narrow interval high precision
47
28 December 2017
Interpret single CI
Compare with the null value
i.e. can be 0 for % or 1 for risk
B D
Null Null
Source: http://www.childrens-mercy.org/stats/journal/confidence.asp
48
A
B
Comparing multiple CIs
50
Basic Biostatistics (C) Jamalludin Ab Rahman 2015 28 December 2017
51
28 December 2017
Statistical Test
Bivariable (univariate) ~ One dependent & one
independent
52
28 December 2017
What test to use?
Variable 1 Variable 2 Test
Categorical Categorical Chi-square
Categorical (2 pop) Numerical (Normal) Independent sample t-test
53
28 December 2017
Bivariable Analyses
Compare means
Independent sample t-test (Unpaired t-test) ~ Two unrelated
54
Writing plan for statistical analysis
#1
Data were analyzed using the complex sample function of SPSS
(version 13.0). Sampling errors were estimated using the primary
sampling units and strata provided in the data set. Sampling
weights were used to adjust for nonresponse bias and the
oversampling of blacks, Mexican Americans, and the elderly in
NHANES. The prevalence of hypertension, as well as the
awareness, treatment, and control rates, were age adjusted by
direct standardization to the US 2000 standard population.10 To
analyze differences over time, the 2003–2004 data were compared
with the 1999–2000 data. Estimates with a coefficient of variation
>0.3 were considered unreliable. A 2-tailed P value <0.05 was
considered statistically significant.
(Ong et al. 2009)
Writing plan for statistical analysis
#2
To assess the effect of the selection process on the characteristics of the
cases, we compared cases included in the final analysis to the rest of the
cases. Since controls included in the present analysis were different from
the rest of the diabetes free participants by design, no similar comparisons
were performed for that group. To compare baseline characteristics of
cases and controls appropriate univariate statistics were used. Similar
binary logistic and multiple linear regression models were built with incident
diabetes or HbA1c as respective outcomes and additive block entry of
adiponectin and potential confounders. For linear regression CRP and
triglycerides were log transformed. Since HbA1c could be modified by drug
treatment, we ran a sensitivity analysis excluding all participants on
antidiabetic medication. A p-value of <0.05 was considered significant.
Analyses were performed with SPSS 14.0 for Windows.
Reporting analysis (example)
60