You are on page 1of 56

Basic Biostatistics

By. Oczhinvia Dwitasari, M.Si.


Population vs. Sample

Basic Biostatistics (C) Jamalludin Ab Rahman 2015 28 December 2017


28 December 2017
Parameter vs. Statistics
Parameter
characteristic of the whole population

Basic Biostatistics (C) Jamalludin Ab Rahman 2015


Statistics
characteristic of a sample, presumably
measurable.

5
28 December 2017
Statistics estimate parameters
 Representative
 Sampling error

Basic Biostatistics (C) Jamalludin Ab Rahman 2015


 Different samples yield different estimates
 Statistics = Parameter if sampling done properly
 How to prove?

6
28 December 2017
N=6 N=6 N=6
Red = 3/6 = 50% Red = 2/6 = 33.3% Red = 4/6 = 66.7%

Basic Biostatistics (C) Jamalludin Ab Rahman 2015


50% + 33.3% + 66.7%
Average % Red = = 50%
3
Statistics

N = 35
Red = 18/35 = 51.4% 50%  51.4%
Parameter Statistics Parameter

7
28 December 2017
Variable & its role
 A value
and whose associated value may be
changed

Basic Biostatistics (C) Jamalludin Ab Rahman 2015


Independent Dependent

8
28 December 2017
Causation
 Relation of events (cause and effect)
 But correlation (between two events) does not

Basic Biostatistics (C) Jamalludin Ab Rahman 2015


(always) imply causation
 Rooster's crow does not cause the sun to rise
 Switch does not cause the bulb to light

9
28 December 2017
Hill’s criteria
1. Strength of 6. Plausibility
association 7. Coherence

Basic Biostatistics (C) Jamalludin Ab Rahman 2015


2. Consistency 8. Experiment
3. Specificity 9. Analogy
4. Temporality
Hill AB. The environment and disease:
5. Biological gradient Association or causation? Proceed Roy Soc
Medicine – London. 1965;58:295–300.

11
Exposure

Exposure
Exposure
Time & causation
Outcome

Time

Basic Biostatistics (C) Jamalludin Ab Rahman 2015 28 December 2017


12
28 December 2017
Time & causation (example)
Exposure to
Age silica Lung Cancer

Basic Biostatistics (C) Jamalludin Ab Rahman 2015


Time

Smoking

13
28 December 2017
Causal web
 Web of causation
 Conceptual framework

Basic Biostatistics (C) Jamalludin Ab Rahman 2015


 Path analysis/web
 Relationship between variables
 Cause and effect

14
Basic Biostatistics (C) Jamalludin Ab Rahman 2015 28 December 2017
15
28 December 2017
Exposure
Mediator

Outcome Exposure Exposure

Basic Biostatistics (C) Jamalludin Ab Rahman 2015


Exposure
Exposure

Confounder
Effect modifier or Moderator
Slide
16
Basic Biostatistics (C) Jamalludin Ab Rahman 2015 28 December 2017
17
Basic Biostatistics (C) Jamalludin Ab Rahman 2015 28 December 2017
18
http://www.apa.org/science/about/psa/2008/06/ahn.aspx

Basic Biostatistics (C) Jamalludin Ab Rahman 2015 28 December 2017


19
http://www.apa.org/science/about/psa/2008/06/ahn.aspx

Basic Biostatistics (C) Jamalludin Ab Rahman 2015 28 December 2017


20
2017
28 December
Type of data
(Level of measurement)

Categorical Numerical

Basic Biostatistics (C) Jamalludin


Ab Rahman 2015
Nominal Ordinal Discrete Continuous

e.g. Gender, Race e.g. Cancer e.g. Parity, e.g. Hb, RBS,
staging, Severity Gravida cholesterol.
of CXR for PTB

21
28 December 2017
Distribution (shape) of data
 Applicable to numerical value
 Discrete or Continuous

Basic Biostatistics (C) Jamalludin Ab Rahman 2015


 Discrete ~ Binomial, Poisson, Negative Binomial,
Hypergeometry, Multinomial etc.
 Continuous ~ Normal, t, chi-square, F etc.

22
28 December 2017
Central limit theorem
“Given a distribution with a mean μ and
variance σ², the sampling distribution of the

Basic Biostatistics (C) Jamalludin Ab Rahman 2015


mean approaches a normal distribution
with a mean (μ) and a variance σ²/N as N,
the sample size, increases” (David M. Lane)

23
28 December 2017
Normal Distribution
1 1 𝑥−𝜇
−2 ( 𝜎 ) 2
𝑓 𝑥; 𝜇, 𝜎 2 = 𝜎 𝑒 )
2𝜋

Basic Biostatistics (C) Jamalludin Ab Rahman 2015


Why Normal?
- Because many biological
& psychological variables
are distributed normally

24
 Bell

 Unimodal
 Symmetrical
shaped curve
Characteristics

Basic Biostatistics (C) Jamalludin Ab Rahman 2015 28 December 2017


25
28 December 2017
Test of Normality
 Anderson–Darling Test
 Corrected Kolmogorov–Smirnov Test (Lilliefors Test)

Basic Biostatistics (C) Jamalludin Ab Rahman 2015


 Cramér–von-Mises Criterion
 D'agostino's K-squared Test
 Jarque–Bera Test
 Pearson's Chi-square Test
 Shapiro–Francia
 Shapiro–Wilk Test

26
28 December 2017
Use Normality test with caution
 Small samples almost always pass a normality
test. Normality tests have little power to tell whether or not a small

Basic Biostatistics (C) Jamalludin Ab Rahman 2015


sample of data comes from a Gaussian distribution.
 With large samples, minor deviations from
normality may be flagged as statistically
significant, even though small deviations from a normal distribution
won’t affect the results of a t test or ANOVA.

27
28 December 2017
Why run statistical test?
1. Measure magnitude of event
2. Determine presence of difference (or similarity)

Basic Biostatistics (C) Jamalludin Ab Rahman 2015


3. Determine degree of difference
4. Determine the direction of changes (trend)
5. Predict changes (outcomes)

28
28 December 2017
Is there any difference
between A & B?

Which one is taller? A or B?

How big is the difference

Basic Biostatistics (C) Jamalludin Ab Rahman 2015


between A & B?

Is C different from A & B?

Is there any pattern now?

If there will be D, can you


predict how tall is D?

A B C

29
28 December 2017
Statistical analysis

Descriptive Analytical

Basic Biostatistics (C) Jamalludin Ab Rahman 2015


Univariable Bivariable Multivariable
IV DV IV DV IV DV IV

e.g. Describe socio- e.g. Compare demographic e.g. How demographic


demographic characteristics - characteristics between two characteristics (more than one
Age, Sex, Race etc. population – Compare age factors) explain hypertension
between male & female
e.g. Prevalence of
hypertension. e.g. Distribution of gender by
hypertension status 30
28 December 2017
Descriptive statistics

Basic Biostatistics (C) Jamalludin Ab Rahman 2015 31


28 December 2017
Descriptive Statistics
 Explainone variable at one time
 Method based on type of measure

Basic Biostatistics (C) Jamalludin Ab Rahman 2015


Categorical
Frequency (Percentage)
Numerical
Central measures (e.g. mean, median) & Dispersion
(e.g. variance, standard deviation, range, min-max,
interquartile range)

32
28 December 2017
How to describe a data
Frequency
Categorical (count) &
Percentage

Basic Biostatistics (C) Jamalludin Ab Rahman 2015


Data
Normal Mean (SD)

Numerical
Median
Not Normal
(Range/IQR)

33
28 December 2017
Analytical statistics

Basic Biostatistics (C) Jamalludin Ab Rahman 2015 34


28 December 2017
Comparing difference
Which of the following
A shows true difference
between two

Basic Biostatistics (C) Jamalludin Ab Rahman 2015


populations?

B C

35
28 December 2017
3 methods to compare values
1. P-value
2. Confidence interval
3. Effect size

Basic Biostatistics
36
28 December 2017
P value
 P-value is ‘likely’ or ‘unlikely’ that Ho is true
 Taking 0.05 as the cut-off point (a), if P ≤ 0.05, it is
then ‘unlikely’ Ho is true, therefore reject Ho

Basic Biostatistics
37
28 December 2017
Hypothesis testing
Truth
Ho True Ho False

Basic Biostatistics (C) Jamalludin Ab Rahman 2015


Do not Type II error
Correct
reject Ho ()
Test

Type I error
Reject Ho Correct
(a)

P-value is the probability to make Type I


error (based on frequentist inference) 38
28 December 2017
One-tailed vs. two-tailed
 Isthere a difference between Hb 14 g% vs. Hb 12
g% in male & female respectively?

Basic Biostatistics (C) Jamalludin Ab Rahman 2015


Ho: HbM – HbF = 0
H1: HbM = HbF
H2: HbM > HbF
H3: HbM < HbM
 Note: Should be determined a priori

39
Two-sided
Left-sided
One-tailed vs. two-tailed

Right-sided

Basic Biostatistics (C) Jamalludin Ab Rahman 2015 28 December 2017


40
P & Sample Size

Basic Biostatistics (C) Jamalludin Ab Rahman 2015 28 December 2017


41
28 December 2017
The truth about P value
 Measures effectiveness (even by US FDA)
 < P means statistically significant, NOT clinical

Basic Biostatistics (C) Jamalludin Ab Rahman 2015


significant
 But, be careful when interpreting P value
 P is affected BOTH by effectiveness AND sample size
 P can be < 0.05 even though the effectiveness is
marginal when sample size is huge
 Compare Ps between studies only appropriate if the
sampling & sample size is the same

42
28 December 2017
P < 0.05
 Why 5%?
 Cut-off point proposed by Sir Ronald A. Fisher

Basic Biostatistics (C) Jamalludin Ab Rahman 2015


1925 to reject or not to reject a hypothesis
 If P < 0.05 = Probability to make Type I error is
less than 5%
 If P > 0.05, > 5% of the difference occurred by
chance & not due to the TRUE difference

43
28 December 2017
Hypothesis Testing using
bivariable analysis
 Try to prove that Exposure causes the Disease
e.g. Smoking causing Lung Cancer

Basic Biostatistics (C) Jamalludin Ab Rahman 2015


 Ho: No difference of risk to get Lung Cancer
between smoker and non-smoker

44
28 December 2017
No Lung
Lung Cancer
Cancer

Smoking 20 (18.2%) 90 (81.8%)

Not Smoking 5 (4.5%) 105 (95.5%)

Basic Biostatistics (C) Jamalludin Ab Rahman 2015


The occurrence of lung cancer is
significantly higher (18.2%) among
smokers compared to non-smokers (4.5%)
(2 (df=1)= 10.15, P =0.001, OR = 4.7 (CI95%
1.7 – 13.0))

45
28 December 2017
Confidence Interval
 Range of plausible values
 Narrow interval  high precision

Basic Biostatistics (C) Jamalludin Ab Rahman 2015


Wide interval  poor precision
 How narrow is narrow? And how wide is wide?
Base on your clinical judgment

47
28 December 2017
Interpret single CI
 Compare with the null value
i.e. can be 0 for % or 1 for risk

Basic Biostatistics (C) Jamalludin Ab Rahman 2015


 Compare with practical significance or the clinical
significance/indifference
A C
Null Null

B D
Null Null

Source: http://www.childrens-mercy.org/stats/journal/confidence.asp
48
A
B
Comparing multiple CIs

Basic Biostatistics (C) Jamalludin Ab Rahman 2015 28 December 2017


49
28 December 2017
Effect size
 Themeasure of effect irrespective of sample size
 Cohen (1988) classify effect size into

Basic Biostatistics (C) Jamalludin Ab Rahman 2015


Low (<0.3)
Medium (0.3-0.7)
Large (> 0.7)
 Manual calculation or web based calculation

50
Basic Biostatistics (C) Jamalludin Ab Rahman 2015 28 December 2017
51
28 December 2017
Statistical Test
 Bivariable (univariate) ~ One dependent & one
independent

Basic Biostatistics (C) Jamalludin Ab Rahman 2015


 Multivariate ~ Multiple dependent & multiple
independent variable

52
28 December 2017
What test to use?
Variable 1 Variable 2 Test
Categorical Categorical Chi-square
Categorical (2 pop) Numerical (Normal) Independent sample t-test

Basic Biostatistics (C) Jamalludin Ab Rahman 2015


Categorical (2 pop) Numerical (Not Normal) Mann-Whitney U test
Categorical (> 2 pop) Numerical (Normal) One-way ANOVA
Categorical (> 2 pop) Numerical (Not Normal) Kruskal-Wallis test
Numerical (Normal) Numerical (Normal) Pearson Correlation Coefficient
Test

Numerical (Normal/ Not Numerical (Not Normal) Spearman Correlation Coefficient


Normal) Test

Numerical (Normal) Numerical (Normal) – Paired t-test


Paired
Numerical (Not Normal) Numerical (Not Normal) – Wilcoxon Signed Rank Test
Paired

53
28 December 2017
Bivariable Analyses
 Compare means
Independent sample t-test (Unpaired t-test) ~ Two unrelated

Basic Biostatistics (C) Jamalludin Ab Rahman 2015


means
Paired t-test ~ Two related means
One-way ANOVA ~ More than 2 means
 2 Test ~ Between categorical variables
 Non-parametric tests (Kruskall-Wallis, Man-Whitney U
tests) ~ If data is not normally distributed

54
Writing plan for statistical analysis
#1
Data were analyzed using the complex sample function of SPSS
(version 13.0). Sampling errors were estimated using the primary
sampling units and strata provided in the data set. Sampling
weights were used to adjust for nonresponse bias and the
oversampling of blacks, Mexican Americans, and the elderly in
NHANES. The prevalence of hypertension, as well as the
awareness, treatment, and control rates, were age adjusted by
direct standardization to the US 2000 standard population.10 To
analyze differences over time, the 2003–2004 data were compared
with the 1999–2000 data. Estimates with a coefficient of variation
>0.3 were considered unreliable. A 2-tailed P value <0.05 was
considered statistically significant.
(Ong et al. 2009)
Writing plan for statistical analysis
#2
To assess the effect of the selection process on the characteristics of the
cases, we compared cases included in the final analysis to the rest of the
cases. Since controls included in the present analysis were different from
the rest of the diabetes free participants by design, no similar comparisons
were performed for that group. To compare baseline characteristics of
cases and controls appropriate univariate statistics were used. Similar
binary logistic and multiple linear regression models were built with incident
diabetes or HbA1c as respective outcomes and additive block entry of
adiponectin and potential confounders. For linear regression CRP and
triglycerides were log transformed. Since HbA1c could be modified by drug
treatment, we ran a sensitivity analysis excluding all participants on
antidiabetic medication. A p-value of <0.05 was considered significant.
Analyses were performed with SPSS 14.0 for Windows.
Reporting analysis (example)

Basic Biostatistics (C) Jamalludin Ab Rahman 2015 28 December 2017


57
Reporting analysis (example)

Basic Biostatistics (C) Jamalludin Ab Rahman 2015 28 December 2017


58
Reporting analysis (example)
28 December 2017
Summary
1. Identify & define variables
2. Type – independent vs. dependent

Basic Biostatistics (C) Jamalludin Ab Rahman 2015


3. Level of measurements – nominal, ordinal or
continuous
4. Check distribution – Normal vs. Not Normal
5. Decide what to do - descriptive vs. analytical

60

You might also like