Professional Documents
Culture Documents
y 1 x1 2 x 2 n x n
The statistical approximation
The probability model – and everything else in the world
Data Generating Process
Think of simply rolling dice. What factors might influence the
outcome?
◦ Symmetry of the dice; starting orientation; direction of throw; force of throw;
shape of the surface, spin; coefficient of friction between dice and surface; age of
dice (shape of edges and corners); air movement; temperature, etc…
Definitions
◦ Population: the entire set of your unit of analysis that you wish to draw
conclusions about.
◦ Sample: a subset of units in the population of interest.
◦ Sampling Frame: population from which sample is actually drawn
Sampling and Randomization
How might a sampling frame differ from a population?
◦ EX: with phone survey, this would be all citizens living in households
with a phone
◦ How would this change if you used the internet instead?
◦ What about asking people on the street?
FR
CH
GB
IE
AT
NO
NL
FI
BE
CY
EE
SI
PL
IT
RS Not at all interested
BG Hardly interested
HU Quite interested
CZ Very interested
0 20 40 60 80 100
percent
Data Collection: ESS Example
Operationalization: Data Collection
Data Collection: ESS Example
Data Collection: ESS Example
Data Collection: ESS Example
Equivalence?
◦ Cross-national/cultural congruency of concepts
◦ When/where can generalizations be drawn?
◦ Impose our understandings, i.e. cultural values imposed
when we ask our questions
Levels of analysis: fallacies
◦ Who are the units of analysis?
Data Collection: Aggregate Data
◦[Aggregate] Observation
◦ Political Participation
◦ GDELT monitors print, broadcast, and
web news media in over 100 languages
from across every country in the world to
keep continually updated on breaking
developments anywhere on the planet.
◦ Electoral data for turnout
◦ What does this tell us about individuals?
◦ Ecological Fallacy
Fallacies: Consistency in the data
Ecological Fallacy:
◦ EX1: A county with 60% purple and 40% green. Mayor wins
with 60% of the vote….
◦ What can we conclude about individual purple people or greens?
◦ EX2: If you have regional level education data and regional
level election turnout data, which of these hypotheses can
you use?
◦ Turnout is higher in regions with more educated citizens.
◦ More educated citizens are more likely to vote.
Fallacies
Deductive fallacy: Individual conclusions based on groups
properties
• Liberals support this issue, so Joe, a liberal, supports this issue.
Inductive fallacy: Group conclusions based on individual
evidence.
◦ Individualistic fallacy: “Joe, a liberal, likes it, so Liberals like it”
◦ IOW: One observation as the basis for generalization: “This swan is
white, so all swans must be white.”
Data Collection: Big Data Revolution (?)
Explosion of data availability, largely due to the growth of internet.
Big Data: Huge amount of available data/very large datasets
◦ Billions of data on thousands of variables and units of analysis.
◦ Google answers 100 billion search queries each month (Sullivan 2012).