Poli 30: Political Inquiry
Fall Quarter, 2012
Review
Correlation & Causation
Correlation is a relationship between two (or more) variables
Correlation does not imply causation
Requirements for establishing causality
Temporal ordering
Correlation
Mechanism
Confounds ruled out
Research Design & Hypotheses
Tests are designed in the face of constraints (i.e. data, time,
funding) with different goals
Internal Validity vs. External Validity
Designs & validity
Controlled observational studies (- internal, + external)
Natural experiments (internal depends on selection, external)
Randomized/lab experiments (+ internal, - external)
Hypotheses
Specific and measurable implications of theories
Variable Types & Measurement
Nominal
Ordinal
Interval
Ratio
Count
Yes
Yes
Yes
Yes
Calculate median & percentiles
No
Yes
Yes
Yes
Add or subtract
No
No
Yes
Yes
Calculate mean & standard deviation
No
No
Yes
Yes
Calculate ratio
No
No
No
Yes
Measurement error stems from differences between conceptual and
operational definitions
Systematic error a chronic and consistent distortion in observations
leads to biases in analysis
Random error inconsistent distortions in observations increases
noise, but does not bias inferences
Visualizations
Stem & Leaf Plots
Histograms
Bar Charts
Calculating a Confidence Interval
Confidence intervals contain a range of values based on a
sample that are likely to contain the actual population
parameter
1.96
1.96
1.96
1.96
See Week 7 slides for more.
T-Scores vs. Z-Scores
Z of 1.96 gives us 95% confidence with a two-tailed test
This means each tail contains 2.5% of data, given a normal
distribution
Use a t score for small samples, when comparing means
As sample size increases, t approaches z
At 100 or more observations, you can replace t with z
To find the correct value of t, calculate degrees of freedom
DF = Sample Size - 1
Z Table Note that
this particular table
gives the area in the
body of the
distribution, rather
than the tail. To get
the tail, simply
subtract this value
from 1.
T Table
Hypothesis Testing
One-sample proportion test (IV and DV are nominal)
Difference in proportions (IV and DV are nominal)
Difference in means (IV binary, DV interval)
Chi-square (IV and DV are categorical)
Regression framework (IV and DV are interval)
Steps
Identify H0 and H1
Choose test
Calculate key value (compare to Z, 2, t)
Interpret
Crosstabs, Summary Tables & Controlled Comparisons
Cross-tabulation
Simple test where IV and DV are nominal or ordinal
Summary Table
IV is nominal or ordinal, DV is interval or ratio
Controlled Comparison
Create crosstabs based on different levels of confound
Causal Relationships
Spurious
IVs effect on DV disappears when controlling for confound
Additive
IVs effect on DV remains stable when adding confound into
analysis
Interactive
IVs effect on DV increases or decreases (but remains significant)
when confound is included
Regression Analysis (I)
~
~
+
+
Provides a way to analyze the effect of your IV on your DV,
especially when controlling for one or more additional IVs
In a typical regression table, coefficients (b), standard errors
(SE), t-values (t), and an indication of significance (P>|t|) are
given
Regression Analysis (II)
Functional link between DV and IVs varies based on the type
of DV
Nominal DV Probit or Logit/Logistic Regression
Ordinal DV Ordered Probit/Logit OR Multinomial Probit/Logit
Interval/Ratio DV Linear Regression
When in doubt, ASK
Regression Analysis (II)
Interpreting coefficients
For each of these, sign indicates the same directional
relationship, and determining significance is done in the same
way
Further interpretation
Linear regression B indicates change in slope that is attributed
to X
i.e. a 1 unit increase in X results in a B-unit change in Y
Other forms of regression This is a huge mess, so dont worry
about it unless you want to take years and years of econometrics
Regression Analysis (III)
Determining significance of a coefficient
Compare t to relevant t/Z score for 95% significance
Other levels of significance are acceptable (i.e. 90%, 99%)
If presented with p > |t|, simply compare to .05 (1 0.95)
For other levels of significance, this might be 0.1 or 0.01
Regression Analysis (IV)
. regress exports revt emp ppent, robust cluster(naics)
Linear regression
Number of obs
F( 3,
843)
Prob > F
R-squared
Root MSE
=
=
=
=
=
6677
1.44
0.2285
0.0159
312.23
(Std. Err. adjusted for 844 clusters in naics)
-----------------------------------------------------------------------------|
Robust
exports |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------Total Revenue|
.0029434
.0020906
1.41
0.160
-.00116
.0070467
Employees |
.2512895
.4425897
0.57
0.570
-.6174177
1.119997
Phys. Capital| -.0009527
.0034821
-0.27
0.784
-.0077873
.0058819
Intercept|
6.60017
4.801907
1.37
0.170
-2.824926
16.02527
------------------------------------------------------------------------------
Regression Analysis (V)
. logit procafta atfp tsale exporter, robust cluster(naics)
Iteration
Iteration
Iteration
Iteration
Iteration
Iteration
0:
1:
2:
3:
4:
5:
log
log
log
log
log
log
pseudolikelihood
pseudolikelihood
pseudolikelihood
pseudolikelihood
pseudolikelihood
pseudolikelihood
Logistic regression
Log pseudolikelihood = -99.244511
=
=
=
=
=
=
-111.17378
-107.81316
-100.04385
-99.262411
-99.244544
-99.244511
Number of obs
Wald chi2(3)
Prob > chi2
Pseudo R2
=
=
=
=
6138
21.68
0.0001
0.1073
(Std. Err. adjusted for 832 clusters in naics)
-----------------------------------------------------------------------------|
Robust
procafta |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------Productivity|
.3844836
.164192
2.34
0.019
.0626732
.706294
Total Sales |
.0188455
.0050635
3.72
0.000
.0089213
.0287697
Exporter
|
1.403025
.7771746
1.81
0.071
-.1202088
2.92626
Intercept | -7.956071
.753366
-10.56
0.000
-9.432642
-6.479501
------------------------------------------------------------------------------