Syllabus and Course Outline How do we analyze data?

Statistical Inference Prediction, Explanation, and the Role of Models Summary

Gov2000: Quantitative Methodology for Political Science I
Lecture 1: Introduction

September 17, 2007

Gov2000: Quantitative Methodology for Political Science I

Syllabus and Course Outline How do we analyze data? Statistical Inference Prediction, Explanation, and the Role of Models Summary

Outline
1 2

Syllabus and Course Outline How do we analyze data? Enumeration, Summary, and Comparison Inference Statistical Inference The Role of Probability Reversing the problem Prediction, Explanation, and the Role of Models Prediction versus Explanation The Role of Models
Gov2000: Quantitative Methodology for Political Science I

3

4

Syllabus and Course Outline How do we analyze data? Statistical Inference Prediction, Explanation, and the Role of Models Summary

Enumeration, Summary, and Comparison Inference

Inference

Estimation

Hypothesis Testing

Summary

Comparison

Enumeration

Figure: Diagram from Efron 1982. “Maximum Likelihood and Decision Theory.” The Annals of Statistics. 10: 340-356.
Gov2000: Quantitative Methodology for Political Science I

Syllabus and Course Outline How do we analyze data? Statistical Inference Prediction, Explanation, and the Role of Models Summary

Enumeration, Summary, and Comparison Inference

Enumeration
Data from Fish, M. Steven. 2002. “Islam and Authoritarianism.” World Politics. 55: 4-37. 1 2 3 4 5 6 . . . 156 157 Democracy 1.100000 4.100000 2.150000 1.900000 5.650000 3.950000 . . . 4.200000 3.000000 Income 2.250420 2.925312 3.214314 2.824126 3.762078 3.187803 . . . 2.653213 2.848805 Muslim 1 1 1 0 0 0 . . . 0 0 OPEC 0 0 1 0 0 0 . . . 0 0

Gov2000: Quantitative Methodology for Political Science I

Syllabus and Course Outline How do we analyze data? Statistical Inference Prediction, Explanation, and the Role of Models Summary

Enumeration, Summary, and Comparison Inference

Summary

Min. 1st Qu. Median Mean 3rd Qu. Max.

Democracy 1.000 2.550 4.075 4.102 5.675 7.000

Income 2.000 2.660 3.178 3.220 3.649 4.662

Muslim 0.0000 0.0000 0.0000 0.3013 1.0000 1.0000

OPEC 0.00000 0.00000 0.00000 0.07051 0.00000 1.00000

Gov2000: Quantitative Methodology for Political Science I

Syllabus and Course Outline How do we analyze data? Statistical Inference Prediction, Explanation, and the Role of Models Summary

Enumeration, Summary, and Comparison Inference

Summary

7
q q q q q q q q q q

q q

qq q qqq q q q q q q q q qq

q

q q q q

q

6

q q q q q q q q q q q q q q q q q q q q q qq q q q qq q q q q qq q q q q q qq q q q q q q q q q q q qq q q q q qq q qq q q q q q q q q q q q q q q qq q q q q q q q

q q

q

q q

5
q

q

q

Democracy

q

q q

q

4

q

q q q

3

q

q

q q q q qq

q

q q q q q

q

q

q

2

1

qq

2.0

2.5

3.0

3.5

4.0

4.5

Income

Gov2000: Quantitative Methodology for Political Science I

Syllabus and Course Outline How do we analyze data? Statistical Inference Prediction, Explanation, and the Role of Models Summary

Enumeration, Summary, and Comparison Inference

Comparison
2.0 2.5 3.0 3.5 4.0 4.5

Muslim
7

Non−Muslim
q q q qq q q q q qq q q q q q q q qq q q q q q q q q qq q q q q q q q q q q q qq q q q

6

q

5

q q q q qq q

q qq q q q q

q q

Democracy

4
q

q q q q q q q q

q q q q q q q q q q q

q q q q q q q q

q q q q q q q q q q q q q q

3

q

q

q q q

q

q q qq

q q q q q q q q q q q q q q

q

q q q

q q q

2

q

q

q q q q q q q q q q q

1

q q q

q

q

2.0

2.5

3.0

3.5

4.0

4.5

Income

Gov2000: Quantitative Methodology for Political Science I

Syllabus and Course Outline How do we analyze data? Statistical Inference Prediction, Explanation, and the Role of Models Summary

Enumeration, Summary, and Comparison Inference

2.0

2.5

3.0

3.5

4.0

4.5

OPEC Member Muslim

OPEC Member Non−Muslim
7 6
q

5 4

q q q q

q q q

3 2 1

Democracy

q

q q

Not OPEC Member Muslim
7 6 5 4 3 2 1 2.0
q q q q q q q q q q q q q q q q q q qq q q qq q q q q q q q q q q

Not OPEC Member Non−Muslim
q q q qq q q q q qq q q q qq q q q q q q qq q q q q q qqq q q q q q q q q q q qq q q q q qq q q q q qq q q q q qq q q q q q qq q qq q q q q q q q q q q q q qq q q q q q q q q q qq q q q q q q q

q

q

q

2.5

3.0

3.5

4.0

4.5

Income
Gov2000: Quantitative Methodology for Political Science I

Syllabus and Course Outline How do we analyze data? Statistical Inference Prediction, Explanation, and the Role of Models Summary

Enumeration, Summary, and Comparison Inference

Inference

The observed data set (sample) is interesting, but we may be more interested in a larger data set that we haven’t observed (population). For example... a large population of individuals a conceptually infinite data set (i.e. the process by which these data were generated) counterfactual values for the variables

Gov2000: Quantitative Methodology for Political Science I

Syllabus and Course Outline How do we analyze data? Statistical Inference Prediction, Explanation, and the Role of Models Summary

Enumeration, Summary, and Comparison Inference

Estimation and Testing
Estimation – Would the summary be accurate in the "larger data set"? For example, is 1.669 (the slope of the line in the first plot) close to the slope of the line in some larger population of countries? Testing – Would the comparison be accurate in the "larger data set"? For example, the slope for Muslim countries (Inc v Dem) is not equal to the slope for non-Muslim countries (Inc v Dem). Is this difference due to something other than chance variation? Would the difference still be there in the larger (maybe conceptually infinite) group of countries.

Gov2000: Quantitative Methodology for Political Science I

Syllabus and Course Outline How do we analyze data? Statistical Inference Prediction, Explanation, and the Role of Models Summary

The Role of Probability Reversing the problem

In order to make inference about a population/process, we often need a model. A deterministic model will not allow us to account for observations that do not fit. Probability models allow us to deal with "noise" in the data. Loosely speaking... Probability allows us to reason from populations/processes to samples. Statistical inference is the practice of reasoning from samples to populations/processes.

Gov2000: Quantitative Methodology for Political Science I

Syllabus and Course Outline How do we analyze data? Statistical Inference Prediction, Explanation, and the Role of Models Summary

The Role of Probability Reversing the problem

Coin Flipping Example
Suppose we have a fair coin that we plan on flipping 10 times. “Fair” means that the probability of getting “H” on any given flip is 1 . This P(H) is the parameter of the population/process. 2 We will usually represent population/process parameters with greek letters. (e.g. θ ≡ P(H)) Given θ we can answer questions like the following: Q: What is the probability of seeing 4 or fewer heads? A: approximately 0.377. Q: What is the probability of seeing the sequence {T , H, H, T , T , T , H, T , H, T }? A: approximately 0.0009766.
Gov2000: Quantitative Methodology for Political Science I

Syllabus and Course Outline How do we analyze data? Statistical Inference Prediction, Explanation, and the Role of Models Summary

The Role of Probability Reversing the problem

“In solving a problem of this sort, the grand thing is to be able to reason backward... Most people, if you describe a train of events to them, will tell you what the result would be. They can put those events together in their minds, and argue from them that something will come to pass. There are few people, however, who, if you told them a result, would be able to evolve from their own inner consciousness what the steps were which led up to that result. This power is what I mean when I talk of reasoning backward...” – Sherlock Holmes

Gov2000: Quantitative Methodology for Political Science I

Syllabus and Course Outline How do we analyze data? Statistical Inference Prediction, Explanation, and the Role of Models Summary

The Role of Probability Reversing the problem

[Demonstration] Let θ be the
# of red cards total # of cards .

What is our best guess for θ, and how good is our guess? (estimation)
1 Is it a regular deck? (i.e. does θ = 2 ) (testing)

Our ability to answer these questions depends on our assumptions about the sample process.

Gov2000: Quantitative Methodology for Political Science I

Syllabus and Course Outline How do we analyze data? Statistical Inference Prediction, Explanation, and the Role of Models Summary

The Role of Probability Reversing the problem

2004 Ohio Example

Ohio Vote Counts (from 2004 FEC report) Bush: 2,859,768 (≈ 51%) Kerry: 2,741,167

Exit Poll Counts (Freeman, S.F. 2004) Bush: 941 (≈ 48%) Kerry: 1,022

Gov2000: Quantitative Methodology for Political Science I

Syllabus and Course Outline How do we analyze data? Statistical Inference Prediction, Explanation, and the Role of Models Summary

The Role of Probability Reversing the problem

What if we knew the truth?
Population Population Index Vote Choice Repeated Sampling Sample Index Population Index Vote Choice Sample Index Population Index Vote Choice . . . 1 2, 790,375 Kerry 1 3, 548,192 Kerry . . . 2 47,893 Kerry 2 5,168,386 Bush . . . ... ... ... ... ... ... . . . 1,963 3,983,486 Bush 1,963 1,926,017 Bush . . . 1 Bush 2 Kerry 3 Bush ... ... 5,600,935 Bush

Gov2000: Quantitative Methodology for Political Science I

Syllabus and Course Outline How do we analyze data? Statistical Inference Prediction, Explanation, and the Role of Models Summary

The Role of Probability Reversing the problem

1,000 SRS “Polls” from Ohio Vote Counts (n=1,963)
Histogram of Simulated Polls
35 Density 0 5 10 15 20 25 30

0.48

0.50

0.52

0.54

Bush Proportion

Gov2000: Quantitative Methodology for Political Science I

Syllabus and Course Outline How do we analyze data? Statistical Inference Prediction, Explanation, and the Role of Models Summary

The Role of Probability Reversing the problem

1,000 SRS “Polls” from Ohio Vote Counts (n=1,963)
Histogram of Simulated Polls
35

Density

0

5

10

15

20

25

30

Vote Count Exit Poll

0.48

0.50

0.52

0.54

Bush Proportion

Gov2000: Quantitative Methodology for Political Science I

Syllabus and Course Outline How do we analyze data? Statistical Inference Prediction, Explanation, and the Role of Models Summary

The Role of Probability Reversing the problem

Usually we don’t know the truth!
Sample Estimate of the Population: Population Index Vote Choice 1 Bush 2 Kerry 3 Bush ... ... 1,963 Bush

Resampling with Replacement: Resample Index Sample Index Vote Choice Resample Index Sample Index Vote Choice . . . 1 925 Kerry 1 447 Kerry . . . 2 396 Kerry 2 1,076 Bush . . . ... ... ... ... ... ... . . . 1,963 842 Bush 1,963 447 Bush . . .

Gov2000: Quantitative Methodology for Political Science I

Syllabus and Course Outline How do we analyze data? Statistical Inference Prediction, Explanation, and the Role of Models Summary

The Role of Probability Reversing the problem

Estimation versus Testing
Null and Estimated Sampling Distributions

Density

0

5

10

15

20

25

30

Null Dist. Sampling Dist.

0.44

0.46

0.48

0.50

0.52

0.54

0.56

Bush Proportion

Gov2000: Quantitative Methodology for Political Science I

Syllabus and Course Outline How do we analyze data? Statistical Inference Prediction, Explanation, and the Role of Models Summary

Prediction versus Explanation The Role of Models

The 1994 State Failure Task Force
State Failure: revolutionary wars genocide or politicide adverse or disruptive regime transition In addition to information on state failures, the task force collected data on more than one thousand variables for 195 countries between 1955 and 1998. Two Possible Questions:
1 2

Using this data, can we predict state failure? Using this data, can we explain why states fail?
Gov2000: Quantitative Methodology for Political Science I

Syllabus and Course Outline How do we analyze data? Statistical Inference Prediction, Explanation, and the Role of Models Summary

Prediction versus Explanation The Role of Models

The Prediction Problem

3.0

S S

S

Population Density (lagged 2 yrs)

2.5

2.0

S F

F F S S F

1.0

1.5

S S

0.5

S 0.0 S

1

2

3

4

5

6

Infant Mortality (lagged 2 yrs)

Gov2000: Quantitative Methodology for Political Science I

Syllabus and Course Outline How do we analyze data? Statistical Inference Prediction, Explanation, and the Role of Models Summary

Prediction versus Explanation The Role of Models

The Training Set

3.0

S

Population Density (lagged 2 yrs)

2.5

2.0

S

F F S

1.5

S 1.0 0.5

S 0.0 1

2

3

4

5

6

Infant Mortality (lagged 2 yrs)

Gov2000: Quantitative Methodology for Political Science I

Syllabus and Course Outline How do we analyze data? Statistical Inference Prediction, Explanation, and the Role of Models Summary

Prediction versus Explanation The Role of Models

The Validation Set

3.0

S S

S

Population Density (lagged 2 yrs)

2.5

2.0

S F

F F S S F

1.0

1.5

S S

0.5

S 0.0 S

1

2

3

4

5

6

Infant Mortality (lagged 2 yrs)

Gov2000: Quantitative Methodology for Political Science I

Syllabus and Course Outline How do we analyze data? Statistical Inference Prediction, Explanation, and the Role of Models Summary

Prediction versus Explanation The Role of Models

Explaining State Failure
Does high infant mortality explain state failure? Does high population density explain state failure? Is classification enough?

3.0

S S

S

Population Density (lagged 2 yrs)

2.5

2.0

S F

F F S S F

1.0

1.5

S S

0.5

S 0.0 S

1

2

3

4

5

6

Infant Mortality (lagged 2 yrs)

Gov2000: Quantitative Methodology for Political Science I

Syllabus and Course Outline How do we analyze data? Statistical Inference Prediction, Explanation, and the Role of Models Summary

Prediction versus Explanation The Role of Models

Explaining State Failure
Does high infant mortality explain state failure? Does high population density explain state failure? Is classification enough?

3.0

0 0

0

Population Density (lagged 2 yrs)

2.5

2.0

0 1

1 1 0 0 1

1.0

1.5

0 0

0.5

0 0.0 0

1

2

3

4

5

6

Infant Mortality (lagged 2 yrs)

Gov2000: Quantitative Methodology for Political Science I

Syllabus and Course Outline How do we analyze data? Statistical Inference Prediction, Explanation, and the Role of Models Summary

Prediction versus Explanation The Role of Models

The Role of Models
Speaking of functional form... This is primarily a course on linear regression, and therefore, we will usually assume a linear relationship. Q: Is linearity a reasonable modeling assumption? A1: In some cases, the relationship may be close enough to linear, that we will get reliable answers A2: We may be able to tell when the linear model is inadequate. A3: We may be able to make small changes in order to fix things up. We’ll be making lots of assumptions, and some of them can’t be tested with data!
Gov2000: Quantitative Methodology for Political Science I

Syllabus and Course Outline How do we analyze data? Statistical Inference Prediction, Explanation, and the Role of Models Summary

Prediction versus Explanation The Role of Models

Mice and Tigers
“Since all models are wrong the scientist cannot obtain a ‘correct’ one by excessive elaboration. ... Just as the ability to devise simple but evocative models is the signature of the great scientist so overelaboration and overparameterization is often the mark of mediocrity. Since all models are wrong the scientist must be alert to what is importantly wrong. It is inappropriate to be concerned about mice when there are tigers abroad.” – George E. P. Box, 1976 “All models are wrong. Some are useful.” – George E. P. Box

Gov2000: Quantitative Methodology for Political Science I

Syllabus and Course Outline How do we analyze data? Statistical Inference Prediction, Explanation, and the Role of Models Summary

Summary

In this course, we will focus on estimation (point and interval) and testing. Predictive and explanatory models have different goals, and often utilize different statistical techniques. In this course, we focus on explanation. We will not get the model “right”, but we may be close enough to say something useful. In this course, we will learn when the linear model is (and is not) useful.

Gov2000: Quantitative Methodology for Political Science I