You are on page 1of 56

Dealing with variables and hypotheses

Wojciech Fendler, M.D. Ph.D.

Department of Biostatistics and Translational Medicine


We formulate universal truths that will be
confirmed in the general population on the
basis of small samples

How to do it properly?

1. The cornerstone of experiment design – random selection


of samples

2. Good experimental design


The root of the problem?
The general concept of sampling
Population
Random sample

Unknown Relationship Yi  ˆ0  ˆ1 X i  ˆi


Yi   0  1X i   i

  
 
  

Three things to remember in medical research

1. Anything, however unlikely, may happen due to chance

2. It is impossible to directly infer causation in the clinical setting

3. One rarely cares about the observed effect in the studied


sample, but rather about the potential to generalize the result
to the whole target population
Statistical hypotheses
• Parity
• Mutually complementary

imply that

• They cannot both occur


• They both include all possible outcomes

H0: x1 = x2 null hypothesis


HA: x1  x2 alternative hypothesis
Statistical hypotheses
• Hypothesis parity: „what is not false, it has to be true”
• Null hypothesis (‘straw man’ hypothesis, H0) most often is opposite to what
the researcher believes in (alternate hypothesis, HA)
H0: x1 = x2
HA: x1  x2
• Both H0 and HA must complement each other fully – no alternative options

Neyman, Pearson, Gosset


Simple example
• Is body height associated with gender?
H0 = height of men and women is equal
HA = height of men and women differs

• Why is it easier to reject the null hypothesis than


find universal evidence confirming the HA?
Why is it more convenient to formulate untrue
hypotheses as the starting point?
The truth
Guilty Not Guilty
Guilty Ok Innocent person
Our verdict sentenced
Not guilty Murderer goes Ok
free

H0 – Innocent until proven guilty


HA – Guilty as charged
How to formulate a statistical hypothesis ?

• in a majority of cases our null hypothesis is an opposite to


what is stated in research theory (Reject Support approach)

why is it so ?

• aimed at rejecting null hypothesis and accepting alternative


one – HA = research theory – H0 = opposite to research theory
Principles of hypothesis testing

• We can only reject H0 and never prove that it is true


• Non-significant result does not mean accepting H0, but merely its non-
rejecting at some particular conditions
• Even if we have no right to deny that
H0 is true, then we cannot reject it but it does not mean that it is true
When do we commit errors in hypothesis
testing?
The truth
Guilty Not Guilty
Guilty Ok type 1 error – H0
rejected
Our verdict incorrectly
Not guilty Type 2 error – false H0 Ok
not rejected
Type I statistical error – we reveal the non-existing (false) difference
Type II statistical error – we conceal the real (existing) difference
The meaning of errors in hypothesis testing

• Type 1 error (how to interpret):


– Rejection of a true null hypothesis – detection of a difference where there is
none
– False positive finding – discovery of an association which is purely by chance

• Type 2 error:
– Not rejecting a false null hypothesis – failure to detect a true effect
– False negative result – failure to discover the effet in a properly planned study
How do we combat the errors of hypothesis
testing?
• Type 1 error
– Plan the study properly
– Use adequate statistical tests
– …be lucky

• Type 2 error
– Plan a sufficiently large group
– Use adequate statistical tests
– …be lucky
The acceptable probability of errors (type 1 - a)
• For type 1 error – it is generally assumed that a
5% probability of this error is the maximum
tolerable margin allowing the researcher to reject
the null hypothesis
– Results of statistical tests that show the null
hypothesis be true with a probability <0.05 are
considered „statistically significant”
P value
• Post-test probability that we reject the null hypothesis
incorrectly
• A low p value means that the results are more likely to be
due to actual effects rather than a chance observation
• Typically we consider p values <0.05 as „statistically
significant” which translates to be a sufficient weight of
evidence to claim that the observed effect is an actual fact
What if we want the differences to be non-
existent?

„Absence of evidence is not


evidence of absence”

C. Sagan
The acceptable probability of errors (type 2 - )
• For type 2 error – it is generally assumed that the admissible probability of
this error is 20%

• The lower the better, but lowering it escalates the number of samples and
cost

• 1- is called statistical power – the probability that the study will be able to
reject a wrong null hypothesis (i.e. the probability of not making a type 2
error)
RS and AS approach to hypthesis testing
• Reject-support (RS) testing

– rejection of H0 supports research theory

• Accept-support (AS) testing

– non-rejecting H0 supports research theory


Comparison between RS and AS testing
RS testing AS testing

interested to reject H0 not interested to reject H0

care about low a care about low 

large sample size large sample size


beneficial disadvantageous

high power important for society low a important


Statistical inference
The process of deducing the
underlying distribution by analysis
of data. Inferential analysis provides
the properties of a variable in the
population by testing hypotheses
and deriving estimates.
Basic statistical terms

Variables (also known as features, characteristics) – values


that are monitored or measured, being under control of or
manipulated by researcher in a course of study
May be classified according to a given criterion
Variables
• Independent variables – variables that may be controlled and/or
modified (manipulated) by a researcher in an experiment

• Dependent variables – variables that may be merely monitored


and measured by a researcher, cannot be manipulated or changed;
researcher does not affect these values
Discrete variables
• Example: responses to a five-point rating scale can only take on
the values 1, 2, 3, 4, and 5
• All qualitative variables are discrete; some quantitative variables
are discrete, such as performance rated as 1,2,3,4, or 5, or
temperature rounded to the nearest degree

• Sometimes, a variable that takes on enough discrete values can be


considered to be continuous for practical purposes
• Example: time rounded to the nearest millisecond
Continuous variables
• Examples: Length, weight, concentration, time, and the points on a
line are continuous variables

• The variable "Time to solve an anagram problem" is continuous since


it could take 2 minutes, 2.13 minutes etc. to finish a problem

• The variable "Number of correct answers on a 100 point multiple-


choice test" is not a continuous variable since it is not possible to get
54.12 problems correct
Variables
Independent variables
• Variables that may be controlled and/or modified (manipulated)
by a researcher in an experiment

Dependent variables
• Variables that may be merely monitored and measured by a
researcher, cannot be manipulated or changed; researcher does
not affect these values
Types of variables

What are the implications of such a


characteristics ?

Different tests are employed to manage with


different types of variables
Measurement scales
Measurement is the assignment of numbers to
objects or events in a systematic fashion

Four levels of measurement scales are commonly


distinguished:

• nominal
• ordinal
• interval
• ratio
Measurement scales - nominal
Nominal measurement - consists of assigning items to groups or categories
No quantitative information is conveyed and no ordering of the items is implied
Nominal scales are therefore qualitative rather than quantitative
Variables measured on a nominal scale are often referred to as categorical or
qualitative variables

Examples:

- religious preference
- race
- sex
- living in a village or a city
Measurement scales - ordinal
Ordinal measurements - are ordered in the sense that higher numbers represent
higher values, although the intervals between the numbers are not necessarily equal

Example:

- NYHA score of cardiac insufficiency


- A 4-grade rating scale:
- Grade I symptoms after exertion
- Grade II symptoms after moderate exertion
- Grade III symptoms after light exertion
- Grade IV symptoms at rest (indication for heart transplant)

- A change of 1 grade is an improvement but of different magnitude


Measurement scales - interval

Interval scale – the scale, on which the intervals between the numbers are
equal; one unit on the scale represents the same magnitude on the trait or
characteristic being measured across the whole range of the scale
Interval scale does not have a “true” zero point – it is not possible to make
statements on how many times one valus is higher from the other
Interval scales continued

Examples:

- The Fahrenheit scale for temperature; equal differences on this scale represent
equal differences in temperature, but a temperature of 30 degrees is not twice as
warm as one of 15 degrees

- Anxiety scale of behaviour; if anxiety were measured on an interval scale, then a


difference between a score of 10 and a score of 11 would represent the same
difference in anxiety as would a difference between a score of 50 and a score of 51
Measurement scales - ratio

Ratio scale – is like interval scales except it has a true zero point

Examples:

- Kelvin scale of temperature, which has an absolute zero; the temperature of


300 Kelvin is twice as high as a temperature of 150 Kelvin

The majority of continuous variables values represent either ratio or interval


scales
When monitoring variables …

… we take care of:

 Precision – the degree of repeatability of measurements in a


series

 Validity/accuracy – how far a measurement reflects what it


suppose to reflect (internal validity)
Precision and accuracy
High precision, Low accuracy,
high accuracy high precision

Low accuracy
and low
precision
Low precision,
high accuracy
Precision
(repeatability)

 How estimated ? – by comparing measurements and estimating


their repeatability in a series

 How significant it is for an investigation? – Increases the chance


to detect real differences between groups, because reduces within-
group variability (statistical power)

 Why is reduced ? – random errors


Accuracy
(internal validity)

 How is it estimated ? – by comparing values of measurements with


‘golden standard’ or reference values

 How significant it is for an investigation? – Increases reliability of


the results

 Why is reduced ? – systematic errors


Measures of central tendency
– which should be used and when ?

• mean (arithmetic)
• geometric mean
• harmonic mean
• median
• mode
• minimum, maximum
Mean

• The arithmetic mean - distinguished from the geometric mean or


harmonic mean
• The expected value of a random variable, which is also called the
population mean
Measures of location - Median
example

The median is the arithmetic mean of the two middle values in a


series with an even number of elements or the value of the
middle element in the ordered list of values
10 12 13 14 18 24 25 80 89 90 120 140 145

If the number of values is even, the median is the mathematical


average of the two middle values
10 12 13 14 18 24 25 26 80 89 90 120 140 145
Mode
• The value that occurs the most frequently
• Has to occur more frequently than other values
• There can be several modes, provided that they occur
more frequently than other values and equally frequently
with each other
• Examples:
– 1, 2, 3, 4, 4, 5, 6, 7, 10 (Mode 4)
– 1, 2, 3, 4, 4, 5, 5, 6, 7, 10 (Modes 4 AND 5)
Statistics of central tendency

- SD + SD

Mean/ Median / Mode Mode Mean

Median
Measures of dispersion
Statistical dispersion (also called statistical variability or variation) is variability or
spread in a variable or a probability distribution

used to express the variability of a given characteristic in the studied


population
• variance
• standard deviation (SD)
• standard error (of the mean) SE(M)
• variability coefficient
• agreement
• quantiles, quartiles
Variance – the total variability of
the variable
Continuous case

If the random variable X is continuous with probability density function p(x),

where

and where the integrals are definite integrals taken for x ranging over the range of X.

Sum of squared differences from the mean of a continuous variable


Standard deviation

The standard deviation of a statistical population, a data set, or a probability


distribution is the square root of its variance. Standard deviation is a widely
used measure of the variability or dispersion, being algebraically more tractable
though practically less robust than the expected deviation or average absolute
deviation

Value of SD represents the average difference from the mean in the group
Standard deviation

A data set with a mean of 50 (shown in blue) and a standard deviation (σ) of 20
Normal (Gaussian) distribution

0.5% 0.5%
0.1% 0.1%
Standard error (of mean)

Depends on sample size and SD

Used as a measure of precision for estimating the


true value of the mean
When SD and when SEM ?
• Standard deviation – measure of within-population
variability

• Standard error (of mean) – measure of imprecision in


technical replicates

The use of SEM for comparing groups (controls-cases) does not make a
sense and neither does using SD for calibration curves
What is the median?

What will you get if you divide the


upper and lower halves of an ordered
data series by their respective medians?
Interquartile range

The interquartile range (IQR), also called the midspread or middle fifty, is a
measure of statistical dispersion, being equal to the difference between the third
and first quartiles

IQR = Q3 − Q1

The median is the corresponding measure of central tendency

Used together with the median for describing non-normally distributed samples
Interquartile range
i x[i] Quartile
1 102
2 104
3 105 Q1
4 107
5 108
6 109 Q2 (median)
7 110
8 112
9 115 Q3
10 116
11 118
How do we read measures of central
tendency and dispersion ?

Middle line –
mean
Box – standard
deviation

Total cholesterol
How do we read measures of central
tendency and dispersion ?
Middle line –
median
Box – upper
and lower
quartiles
Thank you for your attention

You might also like