QS104 Lecture Week 10 PDF

QS104 - Introduction to Social Analytics I
Week 10: Sample Size and Statistical Inference
Dr Florian Reiche
Why didn’t null hypothesis win the costume contest?

- He got rejected.
(unknown)
Table of Contents
Determining the Sample Size
Significance Tests for a Mean
Type I and Type II Errors
2/43
3/43
Determining the Sample Size
4/43
The Research Cycle
5/43
• If we sample, we need to accept that we have to deal with uncertainty
• We can quantify and control this uncertainty
• The key to this is the standard error, which we defined as
s
se = √ (1)
n
where s is the standard deviation of the sample and n the sample size.
6/43
Recall the equation for calculating the standard error:
s
se = √ (2)
n
If we invert it, we receive:
s2
n= (3)
se 2
7/43
Example1
Assume we have a population of 10,000, s 2 =0.2, and our desired standard error
se=0.016. When we pop that into our equation:
s2
n= (4)
se 2
we receive:
0.20
n= = 781.25 (5)
0.000256
1
Taken from Walliman, N. (2011)
8/43
Large Samples
If the sample size is large relative to the population, we need to add a correction to
this, by calculating the optimal sample size n’:
n
n0 = n (6)
(1 + N)
where:
• N: population size
• n=sample size
• n’=optimal sample size
9/43
Example (contd.)
In our example:
781.25
n0 = = 725 (7)
(1 + 781.25
10,000 )
This does not work here properly. Why?
10/43
Considerations
• Size of the sampling error (standard error)

• Nonresponse Error
• Uninterviewable
• Not found
• Not at home
• Refusals
• Sample size appropriate for statistical method? (e.g. crosstabulations
(QS105))
• Sample size appropriate for all variables?
11/43
Any Questions?
12/43
Significance Tests for a Mean
13/43
The Research Cycle
14/43
Significance Tests2
Significance Test
A significance test uses data to summarise the evidence about a hypothesis. It
compares point estimates of parameters to the values predicted by the hypothesis.
2
Based on Agresti and Finlay (2013)
15/43
Example
• Suppose we have a study on the impact of CCTV on crimes

• We have data from 29 cameras (n=29)
• ȳ = −3.007, where y is the change in the number of crimes committed
• standard deviation = 7.309
• We want to explore from this sample, whether CCTV has had an effect.
• Our hypothesis is, that CCTV impacts on the number of crimes committed.
16/43
Example

16/43
Example

16/43
Example

16/43
Example

16/43
Example

16/43
5 Steps of a Significance Test
1. Assumptions
2. Hypotheses
3. Test statistic
4. p-value
5. Conclusion
17/43
1. Assumptions
• Randomisation
• Population Distribution (here: normal)
• (Type of data)
• Sample Size
18/43
2. Hypotheses
• In empirical social science research, we try to find out, whether the data agree
with certain predictions
• These predictions result from theories we want to test
• The predictions are called hypotheses
Hypothesis
"In statistics, a hypothesis is a statement about a population. It is usually a
prediction that a parameter describing some characteristic of a variable takes a
particular numerical value or falls in a certain range of values." (Agresti and Finlay,
2014, p. 143)
19/43
2. Hypotheses (contd.)
A hypothesis must be falsifiable

• “it must be logically possible to make true observational statements that
conflict with the hypothesis” (Walliman, 2011, p. 63)
Examples:
• “All unicorns are pink.”
• “Countries are democracies if their per capita GDP exceeds $ 10,000.”
• “A person is either an immigrant or not.”
20/43

Examples:
20/43

Examples:
20/43
• Each significance test has TWO hypotheses about the value of a parameter
• Null hypothesis (H0 ): is a statement that the parameter takes a particular
value, that usually indicates no effect.
• Alternative hypothesis (Ha ): states that the parameter falls into some
alternative range of values, representing an effect of some type
21/43
• H0 : µ = µ0 , where µ0 is a particular value for the population mean

• Ha : µ 6= µ0 , such as Ha : µ 6= 0
• This is called a two-sided test
22/43

• Ha : µ 6= µ0 , such as Ha : µ 6= 0
22/43

• Ha : µ 6= µ0 , such as Ha : µ 6= 0
22/43
• H0 : µ = µ0 = 0 (CCTV has made no difference)

• Ha : µ 6= µ0 6= 0 (CCTV has made an impact on the number of crimes)
• We assume here, that crimes can go up or down.
23/43

23/43

23/43
3. Test Statistic
Test Statistic
"The parameter to which the hypotheses refer has a point estimate. The test
statistic summarizes how far that estimate falls from the parameter value in H0 .
Often this is expressed by the number of standard errors between the estimate and
the H0 value." (Agresti and Finlay, 2014, p. 145)
24/43
3. Test Statistic (contd.)
• The sample mean ȳ estimates the population mean µ.

• We assume under H0 that µ = µ0 (see graph on the board)
• Center of the sampling distribution of ȳ is the value µ0
• A value of ȳ that falls far out in the tail of the distribution would be unusual,
and provide strong evidence against H0
25/43

25/43

25/43

25/43
t-Test Statistic
• The evidence about H0 is summarised by the number of standard errors that ȳ

falls from the null hypothesis value µ0
• Recall from week 8, that the true standard error is σȳ = √σn
• In reality,we do not know what σ (the standard deviation of the population) is
• We can estimate it, however, by se = √sn , where s is the sample standard
deviation
26/43
t-Test Statistic

deviation
26/43
t-Test Statistic

deviation
26/43
t-Test Statistic

deviation
26/43
t-Test Statistic (contd.)
• The resulting test-statistic is the t-score
ȳ −µ0 √s
t= se , where se = n
• In principle, this is the same as the z-value from week 8

• BUT we use s to estimate σ, and therefore introduce additional error
• Therefore, this test uses the t-distribution
27/43
Calculating the t-value
• We calculate the t-test statistic as follows:
ȳ −µ0 √s
t= se , where se = n
• In our example:
−3.007−0 7.309
t= 1.357 = −2.22, where se = √
29
28/43
4. The p-value
• We need to create a probability statement of the evidence against H0 .

• For this, we use the test statistic, under the assumption that H0 is true.
• The purpose is to find out how unusual the observed test statistic value is
compared to what H0 predicts
29/43
4. The p-value (contd.)
p-value
"The p-value is the probability that the test statistic equals the observed value or a
value even more extreme in the direction predicted by Ha . It is calculated
presuming that H0 is true. The p-value is denoted by p."(Agresti and Finlay, 2014,
p. 145)
The smaller the p-value, the stronger the evidence against H0 .
30/43
Determining the p-value
• We have calculated the t-statistic, and know that our observed value of ȳ lies
2.22 standard errors away from H0 (in our case zero).
• We now use this value to determine what percentage under the distribution is
covered by this distance
31/43
Determining the p-value (contd.)
• Different t-scores apply for each df value

• df=28
• Our value lies between 2.048 and 2.467
• This would correspond to between 5% and 2% of the area to the left and the
right
32/43

• df=28
right
32/43

• df=28
right
32/43

• df=28
right
32/43
• The remaining area beyond the t-value is the p-value (for a two-sided test you
need to sum up both sides)
• This is the blue area in the graph below
• Stata will tell you the exact value automatically
density
−2.22 0 2.22
ty
33/43
5. Conclusion
• p-value summarises the evidence against H0

• If the p-value is sufficiently small, we reject H0 , and accept Ha
• Most studies require p ≤ 0.05
• In our example, we have strong evidence to reject the null-hypothesis
34/43
5. Conclusion

34/43
5. Conclusion

34/43
5. Conclusion

34/43
Any Questions?
35/43
Type I and Type II Errors
36/43
Why not go for p = 0?
• Not possible (therefore you CANNOT PROVE anything)

• We can merely make a decision between committing either of two errors
• These errors are called the Type I and Type II Errors
37/43
The Relationship between Type I and Type II Errors
38/43
Why does it matter?
• Court Trial
• H0 : Defendant is innocent
• Ha : Defendant is guilty
• Type I error: We send an innocent person to jail
• Type II error: We let a guilty person run free
39/43
Why does it matter?
• Court Trial
39/43
Why does it matter?
• Court Trial
39/43
Why does it matter?
• Court Trial
39/43
Why does it matter?
• Court Trial
39/43
Any Questions?
40/43
Congratulations, you have survived QS104!
41/43
Goodbye, QS104
Academic Year
2019/20 2020/21
Term 1 QS104: Introduction to Social PO11Q: Introduction to Quan-

Analytics I titative Political Analysis I

Analytics II titative Political Analysis II
42/43
Combinations
Academic Year
2019/20 2020/21

Analytics I titative Political Analysis I

Analytics II titative Political Analysis II
43/43

QS104 Lecture Week 10 PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

QS104 Lecture Week 10 PDF

Uploaded by

Copyright:

Available Formats

QS104 - Introduction to Social Analytics I

Week 10: Sample Size and Statistical Inference

Why didn’t null hypothesis win the costume contest?

Determining the Sample Size

Significance Tests for a Mean

Type I and Type II Errors

This does not work here properly. Why?

• Size of the sampling error (standard error)

• Suppose we have a study on the impact of CCTV on crimes

• Suppose we have a study on the impact of CCTV on crimes

• Suppose we have a study on the impact of CCTV on crimes

• Suppose we have a study on the impact of CCTV on crimes

• Suppose we have a study on the impact of CCTV on crimes

• Suppose we have a study on the impact of CCTV on crimes

A hypothesis must be falsifiable

A hypothesis must be falsifiable

A hypothesis must be falsifiable

• H0 : µ = µ0 , where µ0 is a particular value for the population mean

• H0 : µ = µ0 , where µ0 is a particular value for the population mean

• H0 : µ = µ0 , where µ0 is a particular value for the population mean

• H0 : µ = µ0 = 0 (CCTV has made no difference)

• H0 : µ = µ0 = 0 (CCTV has made no difference)

• H0 : µ = µ0 = 0 (CCTV has made no difference)

• The sample mean ȳ estimates the population mean µ.

• The sample mean ȳ estimates the population mean µ.

• The sample mean ȳ estimates the population mean µ.

• The sample mean ȳ estimates the population mean µ.

• The evidence about H0 is summarised by the number of standard errors that ȳ

• The evidence about H0 is summarised by the number of standard errors that ȳ

• The evidence about H0 is summarised by the number of standard errors that ȳ

• The evidence about H0 is summarised by the number of standard errors that ȳ

• The resulting test-statistic is the t-score

• In principle, this is the same as the z-value from week 8

• We calculate the t-test statistic as follows:

• We need to create a probability statement of the evidence against H0 .

The smaller the p-value, the stronger the evidence against H0 .

• Different t-scores apply for each df value

• Different t-scores apply for each df value

• Different t-scores apply for each df value

• Different t-scores apply for each df value

• p-value summarises the evidence against H0

• p-value summarises the evidence against H0

• p-value summarises the evidence against H0

• p-value summarises the evidence against H0

• Not possible (therefore you CANNOT PROVE anything)

Term 1 QS104: Introduction to Social PO11Q: Introduction to Quan-

Term 2 QS105: Introduction to Social PO12Q: Introduction to Quan-

Term 1 QS104: Introduction to Social PO11Q: Introduction to Quan-

Term 2 QS105: Introduction to Social PO12Q: Introduction to Quan-

You might also like