You are on page 1of 15

Syllabus and Course Outline

How do we analyze data?


Statistical Inference
Prediction, Explanation, and the Role of Models
Summary

Gov2000: Quantitative Methodology for


Political Science I
Lecture 1: Introduction

September 17, 2007

Gov2000: Quantitative Methodology for Political Science I

Syllabus and Course Outline


How do we analyze data?
Statistical Inference
Prediction, Explanation, and the Role of Models
Summary

Outline

1 Syllabus and Course Outline


2 How do we analyze data?
Enumeration, Summary, and Comparison
Inference
3 Statistical Inference
The Role of Probability
Reversing the problem
4 Prediction, Explanation, and the Role of Models
Prediction versus Explanation
The Role of Models

Gov2000: Quantitative Methodology for Political Science I


Syllabus and Course Outline
How do we analyze data?
Enumeration, Summary, and Comparison
Statistical Inference
Inference
Prediction, Explanation, and the Role of Models
Summary

Inference

Estimation Hypothesis Testing

Summary Comparison

Enumeration

Figure: Diagram from Efron 1982. “Maximum Likelihood and Decision


Theory.” The Annals of Statistics. 10: 340-356.
Gov2000: Quantitative Methodology for Political Science I

Syllabus and Course Outline


How do we analyze data?
Enumeration, Summary, and Comparison
Statistical Inference
Inference
Prediction, Explanation, and the Role of Models
Summary

Enumeration
Data from Fish, M. Steven. 2002. “Islam and Authoritarianism.”
World Politics. 55: 4-37.
Democracy Income Muslim OPEC
1 1.100000 2.250420 1 0
2 4.100000 2.925312 1 0
3 2.150000 3.214314 1 1
4 1.900000 2.824126 0 0
5 5.650000 3.762078 0 0
6 3.950000 3.187803 0 0
.. .. .. .. ..
. . . . .
156 4.200000 2.653213 0 0
157 3.000000 2.848805 0 0

Gov2000: Quantitative Methodology for Political Science I


Syllabus and Course Outline
How do we analyze data?
Enumeration, Summary, and Comparison
Statistical Inference
Inference
Prediction, Explanation, and the Role of Models
Summary

Summary

Democracy Income Muslim OPEC


Min. 1.000 2.000 0.0000 0.00000
1st Qu. 2.550 2.660 0.0000 0.00000
Median 4.075 3.178 0.0000 0.00000
Mean 4.102 3.220 0.3013 0.07051
3rd Qu. 5.675 3.649 1.0000 0.00000
Max. 7.000 4.662 1.0000 1.00000

Gov2000: Quantitative Methodology for Political Science I

Syllabus and Course Outline


How do we analyze data?
Enumeration, Summary, and Comparison
Statistical Inference
Inference
Prediction, Explanation, and the Role of Models
Summary

Summary

7 ●● ●● ●● ●●

●●●● ●

● ●
● ● ● ● ●●
●● ●
● ●

● ●
6 ● ●●




● ● ● ●● ● ●
● ●●

● ● ●
● ● ●
5 ● ● ●

● ●
● ● ●●

Democracy

●● ●
● ●
● ● ●
● ● ●
● ●●●
4 ●
● ● ●
● ●
● ● ● ●

● ● ●

● ●● ● ● ● ●

3 ● ●

● ●
● ● ●
● ●● ● ● ● ●
●● ●
● ●
● ● ● ● ●
● ●

2 ● ●● ●
● ●
● ●

●● ●
●●●● ● ●●
1 ● ● ● ●●

2.0 2.5 3.0 3.5 4.0 4.5


Income

Gov2000: Quantitative Methodology for Political Science I


Syllabus and Course Outline
How do we analyze data?
Enumeration, Summary, and Comparison
Statistical Inference
Inference
Prediction, Explanation, and the Role of Models
Summary

Comparison

2.0 2.5 3.0 3.5 4.0 4.5

Muslim Non−Muslim

7 ●
● ●●●
●●●

●●● ●

● ●
● ● ● ● ● ●● ●
● ●


●●
6 ● ●
● ●
● ●

● ● ●●● ● ● ●
● ●

● ●● ●


5 ● ● ●

●●
● ● ●●

Democracy


●● ●
● ● ●
● ●
● ● ●● ● ●
4 ●
● ● ●
● ●
● ● ●
● ●
● ● ●

● ● ● ● ● ● ●

3 ● ●

● ●
● ● ●
●●● ● ●
● ● ● ●
● ● ●
● ● ● ● ●
● ● ●
2 ● ● ●

● ●
● ●



● ● ●●
1 ● ● ● ●● ● ● ● ●

2.0 2.5 3.0 3.5 4.0 4.5


Income

Gov2000: Quantitative Methodology for Political Science I

Syllabus and Course Outline


How do we analyze data?
Enumeration, Summary, and Comparison
Statistical Inference
Inference
Prediction, Explanation, and the Role of Models
Summary

2.0 2.5 3.0 3.5 4.0 4.5

OPEC Member OPEC Member


Muslim Non−Muslim
7

6
● 5

4
● ● 3

● ● ●

2
Democracy

● ●● 1
Not OPEC Member Not OPEC Member
Muslim Non−Muslim
7 ●
●● ●●
●● ●




● ●
●● ● ●
●● ●● ● ● ● ●● ●
6 ● ●●●● ● ● ●

●● ● ●●● ● ● ● ●
●● ● ●● ●
5 ●
● ● ● ●
● ●
●●
● ●
●● ●●
● ● ●● ● ● ●●● ●
4 ● ● ●
● ●
● ●

● ●
● ● ● ●
● ● ● ● ● ● ●
3 ●
● ● ●
●●● ● ●● ● ● ●● ●

● ● ● ● ●
2 ● ● ● ●●
● ● ●●

1 ●●
● ● ●● ●●● ● ●

2.0 2.5 3.0 3.5 4.0 4.5


Income
Gov2000: Quantitative Methodology for Political Science I
Syllabus and Course Outline
How do we analyze data?
Enumeration, Summary, and Comparison
Statistical Inference
Inference
Prediction, Explanation, and the Role of Models
Summary

Inference

The observed data set (sample) is interesting, but we may be


more interested in a larger data set that we haven’t observed
(population). For example...
a large population of individuals
a conceptually infinite data set (i.e. the process by which
these data were generated)
counterfactual values for the variables

Gov2000: Quantitative Methodology for Political Science I

Syllabus and Course Outline


How do we analyze data?
Enumeration, Summary, and Comparison
Statistical Inference
Inference
Prediction, Explanation, and the Role of Models
Summary

Estimation and Testing

Estimation – Would the summary be accurate in the "larger


data set"? For example, is 1.669 (the slope of the line in
the first plot) close to the slope of the line in some larger
population of countries?
Testing – Would the comparison be accurate in the "larger
data set"? For example, the slope for Muslim countries (Inc
v Dem) is not equal to the slope for non-Muslim countries
(Inc v Dem). Is this difference due to something other than
chance variation? Would the difference still be there in the
larger (maybe conceptually infinite) group of countries.

Gov2000: Quantitative Methodology for Political Science I


Syllabus and Course Outline
How do we analyze data?
The Role of Probability
Statistical Inference
Reversing the problem
Prediction, Explanation, and the Role of Models
Summary

In order to make inference about a population/process, we


often need a model. A deterministic model will not allow us to
account for observations that do not fit. Probability models
allow us to deal with "noise" in the data.
Loosely speaking...
Probability allows us to reason from populations/processes
to samples.
Statistical inference is the practice of reasoning from
samples to populations/processes.

Gov2000: Quantitative Methodology for Political Science I

Syllabus and Course Outline


How do we analyze data?
The Role of Probability
Statistical Inference
Reversing the problem
Prediction, Explanation, and the Role of Models
Summary

Coin Flipping Example


Suppose we have a fair coin that we plan on flipping 10 times.
“Fair” means that the probability of getting “H” on any given flip
is 12 . This P(H) is the parameter of the population/process.

We will usually represent population/process parameters with


greek letters. (e.g. θ ≡ P(H))

Given θ we can answer questions like the following:


Q: What is the probability of seeing 4 or fewer heads?
A: approximately 0.377.
Q: What is the probability of seeing the sequence
{T , H, H, T , T , T , H, T , H, T }?
A: approximately 0.0009766.
Gov2000: Quantitative Methodology for Political Science I
Syllabus and Course Outline
How do we analyze data?
The Role of Probability
Statistical Inference
Reversing the problem
Prediction, Explanation, and the Role of Models
Summary

“In solving a problem of this sort, the grand thing is to be able to


reason backward... Most people, if you describe a train of
events to them, will tell you what the result would be. They can
put those events together in their minds, and argue from them
that something will come to pass. There are few people,
however, who, if you told them a result, would be able to evolve
from their own inner consciousness what the steps were which
led up to that result. This power is what I mean when I talk of
reasoning backward...”

– Sherlock Holmes

Gov2000: Quantitative Methodology for Political Science I

Syllabus and Course Outline


How do we analyze data?
The Role of Probability
Statistical Inference
Reversing the problem
Prediction, Explanation, and the Role of Models
Summary

[Demonstration]

# of red cards
Let θ be the total # of cards .
What is our best guess for θ, and how good is our guess?
(estimation)
Is it a regular deck? (i.e. does θ = 21 ) (testing)

Our ability to answer these questions depends on our


assumptions about the sample process.

Gov2000: Quantitative Methodology for Political Science I


Syllabus and Course Outline
How do we analyze data?
The Role of Probability
Statistical Inference
Reversing the problem
Prediction, Explanation, and the Role of Models
Summary

2004 Ohio Example

Ohio Vote Counts (from 2004 FEC report)


Bush: 2,859,768 (≈ 51%)
Kerry: 2,741,167

Exit Poll Counts (Freeman, S.F. 2004)


Bush: 941 (≈ 48%)
Kerry: 1,022

Gov2000: Quantitative Methodology for Political Science I

Syllabus and Course Outline


How do we analyze data?
The Role of Probability
Statistical Inference
Reversing the problem
Prediction, Explanation, and the Role of Models
Summary

What if we knew the truth?


Population
Population Index 1 2 3 ... 5,600,935
Vote Choice Bush Kerry Bush ... Bush

Repeated Sampling
Sample Index 1 2 ... 1,963
Population Index 2, 790,375 47,893 ... 3,983,486
Vote Choice Kerry Kerry ... Bush
Sample Index 1 2 ... 1,963
Population Index 3, 548,192 5,168,386 ... 1,926,017
Vote Choice Kerry Bush ... Bush
.. .. .. .. ..
. . . . .
Gov2000: Quantitative Methodology for Political Science I
Syllabus and Course Outline
How do we analyze data?
The Role of Probability
Statistical Inference
Reversing the problem
Prediction, Explanation, and the Role of Models
Summary

1,000 SRS “Polls” from Ohio Vote Counts (n=1,963)

Histogram of Simulated Polls

35
30
25
20
Density

15
10
5
0

0.48 0.50 0.52 0.54

Bush Proportion

Gov2000: Quantitative Methodology for Political Science I

Syllabus and Course Outline


How do we analyze data?
The Role of Probability
Statistical Inference
Reversing the problem
Prediction, Explanation, and the Role of Models
Summary

1,000 SRS “Polls” from Ohio Vote Counts (n=1,963)

Histogram of Simulated Polls


35

Vote Count
Exit Poll
30
25
20
Density

15
10
5
0

0.48 0.50 0.52 0.54

Bush Proportion

Gov2000: Quantitative Methodology for Political Science I


Syllabus and Course Outline
How do we analyze data?
The Role of Probability
Statistical Inference
Reversing the problem
Prediction, Explanation, and the Role of Models
Summary

Usually we don’t know the truth!


Sample Estimate of the Population:
Population Index 1 2 3 ... 1,963
Vote Choice Bush Kerry Bush ... Bush

Resampling with Replacement:


Resample Index 1 2 ... 1,963
Sample Index 925 396 ... 842
Vote Choice Kerry Kerry ... Bush
Resample Index 1 2 ... 1,963
Sample Index 447 1,076 ... 447
Vote Choice Kerry Bush ... Bush
.. .. .. .. ..
. . . . .

Gov2000: Quantitative Methodology for Political Science I

Syllabus and Course Outline


How do we analyze data?
The Role of Probability
Statistical Inference
Reversing the problem
Prediction, Explanation, and the Role of Models
Summary

Estimation versus Testing

Null and Estimated Sampling Distributions

Null Dist.
Sampling Dist.
30
25
20
Density

15
10
5
0

0.44 0.46 0.48 0.50 0.52 0.54 0.56

Bush Proportion

Gov2000: Quantitative Methodology for Political Science I


Syllabus and Course Outline
How do we analyze data?
Prediction versus Explanation
Statistical Inference
The Role of Models
Prediction, Explanation, and the Role of Models
Summary

The 1994 State Failure Task Force


State Failure:
revolutionary wars
genocide or politicide
adverse or disruptive regime transition

In addition to information on state failures, the task force


collected data on more than one thousand variables for 195
countries between 1955 and 1998.

Two Possible Questions:


1 Using this data, can we predict state failure?
2 Using this data, can we explain why states fail?
Gov2000: Quantitative Methodology for Political Science I

Syllabus and Course Outline


How do we analyze data?
Prediction versus Explanation
Statistical Inference
The Role of Models
Prediction, Explanation, and the Role of Models
Summary

The Prediction Problem


3.0

S S
S
2.5
Population Density (lagged 2 yrs)

F
2.0

S
F
S F
F
1.5

S
1.0

S
S
0.5

S
S
0.0

1 2 3 4 5 6

Infant Mortality (lagged 2 yrs)

Gov2000: Quantitative Methodology for Political Science I


Syllabus and Course Outline
How do we analyze data?
Prediction versus Explanation
Statistical Inference
The Role of Models
Prediction, Explanation, and the Role of Models
Summary

The Training Set

3.0
S

2.5
Population Density (lagged 2 yrs)

F
2.0

S
F
S
1.5

S
1.0
0.5

S
0.0

1 2 3 4 5 6

Infant Mortality (lagged 2 yrs)

Gov2000: Quantitative Methodology for Political Science I

Syllabus and Course Outline


How do we analyze data?
Prediction versus Explanation
Statistical Inference
The Role of Models
Prediction, Explanation, and the Role of Models
Summary

The Validation Set


3.0

S S
S
2.5
Population Density (lagged 2 yrs)

F
2.0

S
F
S F
F
1.5

S
1.0

S
S
0.5

S
S
0.0

1 2 3 4 5 6

Infant Mortality (lagged 2 yrs)

Gov2000: Quantitative Methodology for Political Science I


Syllabus and Course Outline
How do we analyze data?
Prediction versus Explanation
Statistical Inference
The Role of Models
Prediction, Explanation, and the Role of Models
Summary

Explaining State Failure


Does high infant mortality explain state failure?
Does high population density explain state failure?
Is classification enough?

3.0

S S
S
2.5
Population Density (lagged 2 yrs)

F
2.0

S
F
S F
F
1.5

S
1.0

S
S
0.5

S
S
0.0

1 2 3 4 5 6

Infant Mortality (lagged 2 yrs)

Gov2000: Quantitative Methodology for Political Science I

Syllabus and Course Outline


How do we analyze data?
Prediction versus Explanation
Statistical Inference
The Role of Models
Prediction, Explanation, and the Role of Models
Summary

Explaining State Failure


Does high infant mortality explain state failure?
Does high population density explain state failure?
Is classification enough?
3.0

0 0
0
2.5
Population Density (lagged 2 yrs)

1
2.0

0
1
0 1
1
1.5

0
1.0

0
0
0.5

0
0
0.0

1 2 3 4 5 6

Infant Mortality (lagged 2 yrs)

Gov2000: Quantitative Methodology for Political Science I


Syllabus and Course Outline
How do we analyze data?
Prediction versus Explanation
Statistical Inference
The Role of Models
Prediction, Explanation, and the Role of Models
Summary

The Role of Models

Speaking of functional form... This is primarily a course on


linear regression, and therefore, we will usually assume a linear
relationship.

Q: Is linearity a reasonable modeling assumption?


A1: In some cases, the relationship may be close enough to
linear, that we will get reliable answers
A2: We may be able to tell when the linear model is inadequate.
A3: We may be able to make small changes in order to fix
things up.

We’ll be making lots of assumptions, and some of them can’t


be tested with data!

Gov2000: Quantitative Methodology for Political Science I

Syllabus and Course Outline


How do we analyze data?
Prediction versus Explanation
Statistical Inference
The Role of Models
Prediction, Explanation, and the Role of Models
Summary

Mice and Tigers

“Since all models are wrong the scientist cannot obtain a


‘correct’ one by excessive elaboration. ... Just as the ability to
devise simple but evocative models is the signature of the great
scientist so overelaboration and overparameterization is often
the mark of mediocrity. Since all models are wrong the scientist
must be alert to what is importantly wrong. It is inappropriate to
be concerned about mice when there are tigers abroad.” –
George E. P. Box, 1976

“All models are wrong. Some are useful.” – George E. P. Box

Gov2000: Quantitative Methodology for Political Science I


Syllabus and Course Outline
How do we analyze data?
Statistical Inference
Prediction, Explanation, and the Role of Models
Summary

Summary

In this course, we will focus on estimation (point and


interval) and testing.
Predictive and explanatory models have different goals,
and often utilize different statistical techniques. In this
course, we focus on explanation.
We will not get the model “right”, but we may be close
enough to say something useful. In this course, we will
learn when the linear model is (and is not) useful.

Gov2000: Quantitative Methodology for Political Science I

You might also like