Professional Documents
Culture Documents
PLSC 506: Measurement, Estimation and Inference (With Applications To Text Data)
PLSC 506: Measurement, Estimation and Inference (With Applications To Text Data)
Spring 2017
Course Description
This course covers a wide array of methodologies that aim to improve the quality of measurement,
estimation, and inference, particularly in light of challenges that emerge in the analysis of text
data. Though topics will be generally applicable to political science research, the course largely will
draw examples from text analytic problems and will focus somewhat closely on methods to study
text statistically. Topics will include measurement, reliability and error, text and web scraping,
supervised and unsupervised learning, Bayesian inference, cluster and topic modeling, ideal point
scaling, and some advanced topics in statistical inference. The aim of the course is to provide
students with a host of practical tools that can be used to evaluate and replicate other research, as
well as to help students address methodological issues arising in their own work. Prerequisites for
the course include PLSC 500a, 503b, and 504a or equivalent.
Course Requirements
Final grades will be based on a series of homework assignments (30% of final grade), presentations
(20% of final grade), a term paper (40% of final grade), and course participation (10% of final
grade). Collaboration on the final paper is encouraged, but students may not coauthor with more
than two other students.
In addition to the weekly course readings, the following books are recommended for consultation
or purchase:
1
Programming in R:
• Krause, Andreas and Melvin Olson (2005), The Basics of S-PLUS. New York: Springer.
• Venables, W.N and Brian D. Ripley (2003), Modern Applied Statistics with S. New York:
Springer-Verlag.
• Kruschke, John K. (2011), Doing Bayesian Data Analysis: A Tutorial with R and BUGS,
New York: Elsevier.
Bayesian Inference and Machine Learning:
• Bishop, Christopher (2006), Pattern Recognition and Machine Learning, New York: Springer.
• Hastie, Trevor, Robert Tibshirani, and Jerome Friedman (2009), The Elements of Statistical
Learning: Data Mining, Inference, and Prediction 2nd edition, New York: Springer.
• Gelman, Andrew, John B. Carlin, Hal S. Stern, and Donald B. Rubin (2004), Bayesian Data
Analysis, Boca Raton, Florida: Chapman and Hall/CRC. [Third edition is also good]
• Jackman, Simon (2009), Bayesian Analysis for the Social Sciences, London: Wiley.
• Aggarwal, Charu C. and ChengXiang Zhai (2012), Mining Text Data, New York: Springer.
• Manning, Christopher D., Prabhakar Raghavan and Hinrich Schütze (2008), Introduction to
Information Retrieval, Cambridge: Cambridge University Press, available online at http:
//nlp.stanford.edu/IR-book/information-retrieval-book.html
Academic Dishonesty
It is your responsibility to be familiar with and to follow the university policy on academic dishon-
esty. (See http://catalog.yale.edu/handbook-instructors-undergraduates-yale-college/
teaching/academic-dishonesty/ and http://gsas.yale.edu/academic-professional-development/
professional-ethics-regulations.) Any student caught plagiarizing or engaging in other aca-
demic dishonesty will receive an F in the course and will reported to the Dean’s Office for further
sanction.
2
Course Schedule
Week 1
Week 2
- Monroe and Schrodt, “Introduction to the Special Issue: The Statistical Analysis of Po-
litical Text” at http://pan.oxfordjournals.org/content/16/4/351.full.pdf#page=
1&view=FitH
- Jackman, “Data from the Web Into R” at http://www.nyu.edu/projects/spirling/
documents/tpm.pdf
Suggested
Week 3
3
- Feinerer, “Introduction to the tm Package Text Mining in R” at http://cran.r-project.
org/web/packages/tm/vignettes/tm.pdf
- Denny and Spirling, “Assessing the Consequences of Text Preprocessing Decisions” at
http://www.nyu.edu/projects/spirling/documents/preprocessing.pdf
Suggested
Week 4
Suggested
- Turney and Pantel, “From Frequency to Meaning: Vector Space Models of Semantics”
at http://www.jair.org/media/2934/live-2934-4846-jair.pdf
- Classic Multidimensional Scaling: http://www.stat.pitt.edu/sungkyu/course/2221Fall13/
lec8_mds_combined.pdf
Week 5
4
- Monroe, Colaresi, and Quinn, “Fightin’ Words: Lexical Feature Selection and Evaluation
for Identifying the Content of Political Conflict” at http://pan.oxfordjournals.org/
content/16/4/372.full.pdf#page=1&view=FitH
- Manning, Raghavan and Schütze, Ch. 13 (especially 13.2 – 13.5) in Introduction to
Information Retrieval
Suggested
Week 6
- Slapin and Proksch, “A Scaling Model for Estimating Time-Series Party Positions from
Texts” at http://www.wordfish.org/uploads/1/2/9/8/12985397/slapin_proksch_
ajps_2008.pdf
- Beauchamp, “Using Text to Scale Legislatures with Uninformative Voting” at http:
//nickbeauchamp.com/work/Beauchamp_scaling_current.pdf
- Will Lowe, “Understanding Wordscores” at http://faculty.washington.edu/jwilker/
tft/Lowe.pdf
- Bafumi, Gelman, Park and Kaplan, “Practical Issues in Implementing and Understand-
ing Bayesian Ideal Point Estimation” at http://www.stat.columbia.edu/~gelman/
research/published/171.pdf
Suggested
- Clinton, Jackman, and Rivers, “The Statistical Analysis of Roll Call Data” at https:
//my.vanderbilt.edu/joshclinton/files/2011/10/CJR_APSR2004.pdf
- Martin and Quinn, “Dynamic Ideal Point Esimation via Markov Chain Monte Carlo for
the U.S. Supreme Court, 1953-1999” at http://mqscores.wustl.edu/media/pa02.pdf
- Jackman, “Bayesian Analysis for Political Research” at http://www.annualreviews.
org/doi/abs/10.1146/annurev.polisci.7.012003.104706
5
Week 7
- Gelman, Carlin, Stern, and Rubin, Chs. 1 – 3, 5 in Bayesian Data Analysis [pay closer
attention to Chs. 3 & 5]
- Jackman, “Introduction” starting on page xxvii, Chs. 1 and 2, in Bayesian Analysis
for the Social Sciences
Suggested
- Jackman, Chs. 3 – 5 in Bayesian Analysis for the Social Sciences [more comprehensive
treatment of posterior sampling approximations]
- Gelman, Carlin, Stern, and Rubin, Chs. 4, 6 & 8 in Bayesian Data Analysis
- Assorted Materials/Tutorials on the JAGS/BUGS Model Environment
• http://www.uvm.edu/~bbeckage/Teaching/DataAnalysis/Manuals/manual.jags.
pdf
• http://people.math.aau.dk/~kkb/Undervisning/Bayes14/sorenh/docs/JAGS-intro-slides.
pdf
• http://files.meetup.com/1406240/Probabalistic%20Programming%20-%20JMW.
pdf
• http://blog.nus.edu.sg/alexcook/files/2010/06/w6_handout.pdf
- Implementing JAGS/BUGS in R
• http://www.johnmyleswhite.com/notebook/2010/08/20/using-jags-in-r-with-the-rjags-p
• http://cran.r-project.org/web/packages/rjags/rjags.pdf
• http://cran.r-project.org/web/packages/R2WinBUGS/R2WinBUGS.pdf
• http://voteview.com/bayes_beach/R2WinBUGS.pdf
Week 8
- Tan, Steinbach, and Kumar, “Cluster Analysis: Basic Concepts and Algorithms” at
http://www-users.cs.umn.edu/~kumar/dmbook/ch8.pdf
- Blei, Ng, and Jordan, “Latent Dirichlet Allocation” at http://www.cs.princeton.edu/
~blei/papers/BleiNgJordan2003.pdf
- Quinn, Monroe, Colaresi, Crespin, and Radev “How to Analyze Political Attention
with Minimal Assumptions and Costs” at http://onlinelibrary.wiley.com/doi/10.
1111/j.1540-5907.2009.00427.x/abstract
- Grimmer, “A Bayesian Hierarchical Topic Model for Political Texts: Measuring Ex-
pressed Agendas in Senate Press Releases” at http://www.stanford.edu/~jgrimmer/
ExpAgendaFinal.pdf
- Blaydes and Grimmer “Political Cultures: Exploring the Long-Run Determinants of Val-
ues Transmission” at https://ncgg.princeton.edu/IPES/2013/papers/S930_rm3.pdf
6
Suggested
- Hastie, Tibshirani, and Friedman, Ch. 14.3: “Cluster Analysis” in The Elements of
Statistical Learning
- Blei and McAuliffe, “Supervised Topic Models” at https://www.cs.princeton.edu/
~blei/papers/BleiMcAuliffe2007.pdf
- Implementing LDA in R
• http://cran.r-project.org/web/packages/lda/lda.pdf
• http://obphio.us/pdfs/lda_tutorial.pdf [nice tutorial on the LDA algorithm]
• https://github.com/cjrd/SimpleLDA-R [see here for R implementation of a slow
version of LDA]
Week 9 & 10
Week 11
Suggested
Week 12
April 5: Unsupervised Learning III: Structural Ideal Points and Topic Models
- Kim, Londregan and Ratkovic, “Estimating Spatial Preferences from Votes and Text”
at http://web.mit.edu/insong/www/pdf/sfa_pa.pdf
- Gerrish and Blei, “The Ideal Point Topic Model” at https://people.cs.umass.edu/
~wallach/workshops/nips2010css/papers/gerrish.pdf
- Roberts et al., “Structural Topic Models for Open-Ended Surveys” at http://scholar.
harvard.edu/dtingley/files/topicmodelsopenendedexperiments.pdf.
Suggested
7
- Gerrish and Blei, The Issue-Adjusted Ideal Point Model at https://arxiv.org/pdf/
1209.6004.pdf
- Gerrish, Sean, “Applications of Latent Variable Models in Modeling Influence and De-
cision Making” at http://www.seangerrish.com/data/thesis.pdf
- Fox, “Multilevel IRT Modeling in Practice with the Package mlirt” at https://www.
jstatsoft.org/article/view/v020i05/v20i05.pdf
- STM in R: https://cran.r-project.org/web/packages/stm/stm.pdf
Week 13
April 12: Supservised Learning II: Lasso and Ridge Regression, Ensemble Learner,
Cross-Validation
Suggested
Week 14
Week 15