You are on page 1of 19

GEA1000

Chapter 1 Review

david.chew@nus.edu.sg

1
A DMIN : A SSESSMENT

• 6 LumiNUS quizzes: 10%

• Tutorial participation: 15%

– If you’re sick: get a MC and make up for class


– Present tutorial questions
– Participation in group discussions
– Mini in-class data quiz

• Group project: 30%

• Mid-term test: 15%


02 Oct, Sat at 1400 hrs

• Final examinations: 30%


20 Nov, Sat at 1300 hrs

2
I N THIS TUTORIAL . . .

1. Sampling

2. Variables, Summary Statistics

3. Experiments

4. Observational studies

3
1 S AMPLING

• Population
A collection of units we wish to obtain a parameter of.

• Parameter
A numerical fact about a population. For example, mean, variance, range, etc.

• Census
An attempt to take measurement of EVERY unit in the population.

• Sample
A selection of units in a population, used to estimate a population parameter.
Done because it is cheaper, faster, or because a census is not possible.

parameter = estimate + bias + random error

4
• Sampling frame
To identify all units in a population (directly or via a variable with link to them), we construct a
list called the sampling frame. For example, a list of home addresses or physicians.

– What makes a good frame?


> Good coverage: every unit in a population has a chance of being selected.
> Up-to-date and complete.
– Imperfect sampling frame
> Does not cover all units of target population or include unwanted units.
> If units are left out, need to redefine target population or assess impact of exclusion.

• Types of bias
– Selection bias: Some types of units excluded (due to imperfect sampling frame or bad
sampling plan).
– Non-response bias: Happens when not all selected units are contactable or willing to take
part. Results biased since non-respondents usually differ from respondents.
– Other types of bias: For example, understated responses about undesirable habits, phras-
ing of questions.

5
• Probability sampling plans
Definition: Probability of selecting each unit from the population is known (but may not be the
same for each unit!). This is used to avoid selection bias.

• Random error

– For any sample that is taken randomly, there is the element of random error present.
– The larger the sample size, the smaller the random error of sample estimate.
– If there is no bias, then the sample estimate will fluctuate about the parameter.

• Check out the following R Shiny app


https://david-chew.shinyapps.io/WhySRS/
It demonstrates what random error (and SRS) is about.

6
P ROBABILITY S AMPLING P LANS

Plan 1: Simple random sampling


Draw tickets at random without replacement or use software.
Every unit has an equal chance of being selected.
Plan 2: Systematic sampling
Suppose we have N units and we want a sample of size n. Let k = N/n. Select a random
starting point r (where 1 ≤ r ≤ k), and include every kth unit after r into the sample.
For example, say N = 12, n = 4. Then let k = 12/4 = 3. Suppose we randomly selected r = 2.
Then we include every 3rd unit after 2 into the sample.

7
Plan 3: Stratified random sampling
• Divide population into homogeneous subgroups (strata), e.g., faculties/schools in NUS.
• Take a random sample from each subgroup (stratum).
• Combine these samples to form the final sample.
Plan 4: Cluster sampling
• Divide population into naturally occurring subgroups (clusters), e.g., tutorial groups in
GEA1000.
• Take a random sample of clusters.
• Combine ALL units in selected clusters to form the final sample.

8
S AMPLING P LANS : A COMPARISON

Note: The last two are non-probability sampling methods.

9
W HEN WILL OUR ESTIMATION BE GOOD ?

Recall: Our aim – estimate a population parameter using a sample.

parameter = estimate + bias + random error

How to choose a good sample so as to get a good estimate?

We need to minimize (i) bias and (ii) random error

• use a good sampling frame ← avoid selection bias

• select a suitable probability sampling plan ← avoid selection bias

• ensure response rate is not too low (say ≥ 70%) ← avoid non-response bias

• avoid asking “bad" questions ← avoid other types of bias

• use a sample that is as large as possible ← minimise random error

10
2 VARIABLES

11
S UMMARY S TATISTICS

Numerical "summaries" computed using samples are known as statistics.


• Measures of central tendencies
– mean
– median
– mode
These can be thought of as a "representative" value of the data set.
• Measures of dispersion
– variance / standard deviation
– range
– interquartile range
These can be thought of as a measurement of the "spread" of the data set.
https://david-chew.shinyapps.io/esquisse/

12
3 E XPERIMENTS

Comparison of outcomes between a treatment group and a control group, to see the effect of a treat-
ment.

• How similar must the treatment group and the control group be?

• How different must the treatment group and control group be?

• What if the control group and treatment group are different in other factors?
It becomes difficult to ascertain if the observed differences between both groups is due to the treatment.

Ask the question: “Are the groups different, aside from the treatment?"

13
E NSURING SIMILARITY: R ANDOM A LLOCATION

• We allocate subjects in control and treatment groups randomly.


• Probability laws ensures that all other variables are almost equally present in both groups.

Thus the random allocation / assignment of subjects into control and treatment groups ensures that the only
difference between the two groups is due to the treatment.
Check out
https://david-chew.shinyapps.io/RandomAssignment/

14
AVOIDING BIAS : B LINDING

In a single-blinded experiment, either

• the participants are blind to whether they are in the treatment or control group; XOR

• the evaluators are blind to whether they are assessing someone in the treatment or control
group.

In a double-blinded experiment, both

• the participants are blind to whether they are in the treatment or control group; AND

• the evaluators are blind to whether they are assessing someone in the treatment or control
group.

Blinding is done to guard against human bias.

15
T HE G OLD S TANDARD

The Gold Standard


Randomised, controlled, double-blinded experiment the best!

• randomised controlled
— treatment and control groups are randomly assigned

• double-blinded
— both participants and evaluators are "blinded"

16
4 O BSERVATIONAL STUDIES

In an observational study, the investigators do not assign the subjects to either group.

• The assignment is self-assigned by the participants.

• For example: smokers in "treatment" group, non-smokers in "control" group

• Non-random: cannot ask people to smoke!

17
A SSOCIATION

• Observational studies can only establish association between variables.

• Association between variables: there is a relationship/link between variables.

• Association does not imply causation.

• With observational studies, we often need to grapple with the issue of a confounder.

• Akang Datang in Chapter 2!

18
C OMPARISON : E XPERIMENTS AND O BSERVATIONAL STUDIES

Experiments Observational Studies


Assignment By researchers Participants self-assign
Randomisation Preferable Not possible
Possible
Ethical issues Unlikely
(if intervention may be harmful)
Unlikely
Confounders Likely
(if random assignment is done)
Possible to show causation Yes in the ideal case Very difficult
Able to show association Yes

19

You might also like