0% found this document useful (0 votes)
258 views11 pages

Big Data in Political Economy

This document introduces the concept of "big data" in political economy research. It notes that the massive growth in computing and data collection since the 1980s has allowed researchers to study questions empirically that were previously only theoretical. Big data allows analysis at both a high level of disaggregation and high frequency. The document highlights examples of new data sources in political economy research and identifies key aspects that make big data transformative, including the ability to link large datasets, extract data directly from web pages, increased computing capacity, and private sector data provision.

Uploaded by

timax1375
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
258 views11 pages

Big Data in Political Economy

This document introduces the concept of "big data" in political economy research. It notes that the massive growth in computing and data collection since the 1980s has allowed researchers to study questions empirically that were previously only theoretical. Big data allows analysis at both a high level of disaggregation and high frequency. The document highlights examples of new data sources in political economy research and identifies key aspects that make big data transformative, including the ability to link large datasets, extract data directly from web pages, increased computing capacity, and private sector data provision.

Uploaded by

timax1375
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Introduction: Big Data in Political Economy

Author(s): Atif Mian and Howard Rosenthal


Source: RSF: The Russell Sage Foundation Journal of the Social Sciences, Vol. 2, No. 7, Big
Data in Political Economy (November 2016), pp. 1-10
Published by: Russell Sage Foundation
Stable URL: [Link]
Accessed: 05-02-2020 17:40 UTC

REFERENCES
Linked references are available on JSTOR for this article:
[Link]
reference#references_tab_contents
You may need to log in to JSTOR to access the linked references.

JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide
range of content in a trusted digital archive. We use information technology and tools to increase productivity and
facilitate new forms of scholarship. For more information about JSTOR, please contact support@[Link].

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at
[Link]
This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs
3.0 Unported License (CC BY-NC-ND 3.0). To view a copy of this license, visit
[Link]

Russell Sage Foundation is collaborating with JSTOR to digitize, preserve and extend access to
RSF: The Russell Sage Foundation Journal of the Social Sciences

This content downloaded from [Link] on Wed, 05 Feb 2020 [Link] UTC
All use subject to [Link]
Introduction: Big Data in
Political Economy
at if mi a n a n d howa r d rosen t h a l

The massive growth in computing since the appearing in this issue. These include data on
1980s and 1990s has revolutionized data gath- mortgage originators (Igan, this issue); na-
ering and how people transact with one an- tional data on individual voter registration and
other. The result is that practically every eco- turnout (Catalist); data on the characteristics
nomic and financial transaction is recorded of individual professionals such as medical
somewhere by someone and can be linked to doctors (Bonica, Rothman, and Rosenthal
the individuals undertaking the transaction. 2014, 2015) or lawyers (Bonica, Chilton, and
Such proliferation of “big data” has made it Sen 2015); government payments to contrac-
possible for both economists and political sci- tors; Medicare payments to physicians; phar-
entists to empirically analyze questions that maceutical company payments to physicians;
earlier could be addressed only theoretically. campaign contributions (Bonica, this issue;
In particular, big data permits us to study be- Dimmery and Peterson, this issue); lobbying
havior at both a high level of disaggregation (Igan, this issue); tariffs (Kim 2014); traditional
and a high time frequency. For example, what and social media content; government docu-
is a household’s spending behavior and how ments (O’Halloran et al., this issue); Google
does it depend on changes in interest rates, searches (Chae et al. 2015); and Twitter follow-
asset prices, or political events? How do house- ers (Barberá 2015).
holds form expectations of future events? How Of course, for big data to be seen as trans-
do ideology and electoral politics affect these forming research in political economy, it must
expectations? What are the distributional con- be more than just the analysis of data sets with
sequences of macro shocks—such as the im- very large numbers of observations. Research-
pact of monetary policy or housing collapse on ers have been exploiting the census for de-
the rich versus the poor? These are fundamen- cades. Similarly, the pathbreaking research of
tal economic and political questions that can Thomas Piketty and Emmanuel Saez (2003),
now be addressed using advancements in data using individual IRS records, dates from the
collection and computing. turn of the century. In the 1980s, Keith Poole
and Howard Rosenthal (1991) studied the en-
B I G DATA : W H AT I S tire congressional history of tens of millions
NEW AND DISTINCTIVE of individual roll call voting decisions with a
There are numerous examples of research us- supercomputer. So what is distinctive about
ing new, disaggregated data sources, several the current use of “big data” in political econ-

Atif Mian is Theodore A. Wells ’29 Professor of Economics and Public Affairs at Princeton University. Howard
Rosenthal is professor of politics at New York University and Roger Williams Straus Professor of Social Sci-
ences, Emeritus, at Princeton University.

Direct correspondence to: Atif Mian at atif@[Link], 26 Prospect Ave., Princeton, NJ 08540; and Howard
Rosenthal at hr31@[Link], NYU Department of Politics, 19 W. 4th St., New York, NY 10012.

This content downloaded from [Link] on Wed, 05 Feb 2020 [Link] UTC
All use subject to [Link]
2 big data in poli t ica l econom y

omy? At least the following considerations ap- with individual survey data on consumer senti-
pear relevant1: ments to analyze the link between consumer
1. A new ability to link large data sets that are spending and sentiments about government
of far more limited use if unlinked has emerged. policy.
For example, political activity in the form of 2. A growing ability to extract data directly
lobbying can be linked to microlevel data on from web pages, using Python and other tools,
the firm’s business activity, such as mortgage has become an important source of additional
lending behavior in metropolitan statistical ar- data. For example, Matthew Gentzkow and
eas (see Igan, this issue). Another example is Jesse Shapiro (2010) use textual analysis of on-
political activity in the form of roll call voting line newspaper data to construct measures of
by a member of Congress, which can now be “slant” in various newspapers.
linked not just to aggregate economic charac- 3. The growth of computing capacity remains
teristics such as median income but more important. For example, Chris Tausanovitch
finely to characteristics of small geographic (this issue) carries out an ideological scaling
units in congressional districts, such as mort- using hundreds of thousands of public opin-
gage foreclosure activity in portions of the dis- ion surveys. The scaling takes advantage of
trict that have a high level of Republican voting special software that uses graphics chips to
(Mian, Sufi, and Trebbi 2010). turn PCs into parallel processors. Changes in
An important aspect of record linkage is the estimation strategy are also likely to accom-
development of automated record linkage pany the use of big data. For example, Kosuke
through the use of algorithms that assign a Imai, James Lo, and Jonathan Olmsted (2015)
probability that a record from one data set can have recently proposed using efficient
be matched to another. Record linkage is also expectation-maximization (EM) algorithms
facilitated by geocoding techniques. Research- for ideological scaling to replace the widely
ers are recognizing that matches must carry an used Markov chain Monte Carlo (MCMC)
acceptable level of measurement error but methods. Computing capacity and estimation
need not be perfect. For example, political ac- strategy are likely to be particularly important
tivity in the form of campaign contributions in the growing area of text analysis, as illus-
can be linked to the professional and demo- trated by O’Halloran and her colleagues in
graphic characteristics of individuals in most this issue.
licensed professions (medicine, law, nursing) 4. The private sector has become a large pro-
or state government employment and in some vider of big data of potential usefulness to po-
cases to income data (state government em- litical economists. Big data about financial
ployees, including academics and physicians markets have been available for many years,
in university hospitals). More recently, re- accessible to academics through Wharton Re-
searchers have been able to link public records search Data Services (WRDS) and other
such as bankruptcy filings (for example, Dob- sources. More recently, data have begun to be
bie and Song 2015) with Social Security data to accumulated about career paths (LinkedIn)
address questions like the impact of debtor re- and about housing and consumer markets.
lief on earnings and labor supply. Atif Mian, The private sector both complements and sub-
Amir Sufi, and Nasim Khoshkhou (2016) link stitutes for the government sector. For exam-
constituent ideology and voting outcomes with ple, LinkedIn can provide data about workers
consumer spending at the county level and in unlicensed professions that can comple-

1. The software firm SAS characterizes big data as having volume, velocity, variety, variability, and complexity.
See SAS, “Big Data: What It Is and Why It Matters,” [Link]
-[Link]?keyword=big%20data&matchtype=e&publisher=google&gclid=CjwKEAiAxfu1BRDF2cfnoPy
B9jESJADF-MdJIJyvsnTWDXHchganXKpdoer1lb_DpSy6IW_pZUTE_hoCCwDw_wcB (accessed February 13,
2016). “Velocity” and “variability” refer to real-time applications, which are not yet present in political economy.
The papers in this issue represent applications that have large volumes of data, data arising in different formats,
and data with complex structures.

r s f : t h e ru s s e l l s a g e f o u n d at i o n j o u r n a l o f t h e s o c i a l s c i e n c e s

This content downloaded from [Link] on Wed, 05 Feb 2020 [Link] UTC
All use subject to [Link]
in troduction 3

ment the data in government databases about Stock Research have such data in more acces-
those in licensed professions. sible forms.
The growth of online payment and personal 5. Government electronic record keeping has
finance tools has given researchers access to also expanded dramatically. About the time the
people’s spending and investing behavior. For SEC created EDGAR, for example, government
example, Scott Baker (2014) uses data from in- agencies in the fifty states, such as education
dividual accounts at a personal finance site to departments, were shifting to electronic, web-
investigate how consumers respond to income accessible data. Records that were previously
shocks in the presence of debt. Mian, Rao, and accessible only as copied or scanned docu-
Sufi (2013) use data on credit card spending to ments became available in spreadsheet form.
analyze the impact of the housing collapse on A transition to transparency has accompanied
spending. Similar data have been used to ana- the technological transition to electronic re-
lyze the impact of political shocks—as when cord keeping.
the federal government approached the fiscal Data on government payments to most con-
cliff or when it was threatened with shut- tractors have long been a matter of public re-
down—on consumer spending behavior as cord, but the provision to the public of infor-
well. mation on government payments to health
A related feature of these data is that they care providers, long resisted by the providers,
are potentially available at high frequency, did not become federal government policy un-
such as daily spending behavior. The high fre- til 2014. Similarly, disclosure of payments to
quency enables researchers to exploit the providers by pharmaceutical companies was
sharp timing of certain events—such as the fall required by the Physician Payments Sunshine
of Lehman Brothers in September 2008 or the Act, a part of the Affordable Care Act passed in
attacks of September 11, 2001—to analyze the 2010.
impact on consumer spending and investment There have also been important shifts in the
behavior. availability of large individual-level data sets at
Credit bureaus, both in the United States various governmental organizations. For ex-
and abroad, are another important private ample, academics have worked with the Inter-
source of data. The credit bureaus contain data nal Revenue Service (IRS) on tax return data
on all types of borrowing at the individual level and the Social Security Administration (SSA)
at monthly frequency. These data also contain on payroll data. These data have been ex-
information on an individual’s location and tremely useful in illuminating trends in in-
basic demographics and are thus potentially equality and social mobility. At the same time,
linkable to other data sets. Mian and Sufi the granularity of the data sets is useful in
(2014) describe a number of examples of re- helping us better understand the impact of
search studies using credit bureau data. changes in tax laws and other public policy in-
A number of private firms specialize in col- terventions. The U.S. Census Bureau also
lecting and consolidating data from a large maintains data on sales and employees by
number of public data sources. For example, firm.
the Securities and Exchange Commission 6. At the same time, the development of opti-
(SEC) requires publicly traded corporations to cal character recognition (OCR) made it possible
file a variety of reports, including information to process older data at relatively low cost. Ten
on trading by insiders and on large block hold- years ago, analysis of roll call voting data was
ings. Since 1993, this information has been largely limited to the U.S. Congress. Boris Shor
available in electronic form on the SEC’s ED- and Nolan McCarty (2011) have extended this
GAR platform. But the SEC has done little to work to all fifty states.
summarize these reports in a way that would
be useful to researchers. One cannot go to the T H E C H A L L E N G E S O F B I G DATA
SEC site and download a spreadsheet with the The use of big data does present some chal-
details of the largest owners of S&P 1500 com- lenges for academic research. There are ques-
panies. On the other hand, firms like Vickers tions of data accuracy. There is a question of

r s f : t h e ru s s e l l s a g e f o u n d at i o n j o u r n a l o f t h e s o c i a l s c i e n c e s

This content downloaded from [Link] on Wed, 05 Feb 2020 [Link] UTC
All use subject to [Link]
4 big data in poli t ica l econom y

equal access to data. There is a question of about patient management (Hersh and Gold-
the ethics of the relationship of academic re- enberg 2016). Again, are physicians who are
searchers to private-sector collectors of data. registered voters representative of physicians?
Although the challenges we identify apply Are former government employees with Linke-
more generally to the social sciences, political dIn accounts representative of all former gov-
economy faces some particularly intensive ernment employees?
challenges: because political economy ad- A related big data development is repre-
dresses the interplay between political trans- sented by attempts to “bridge” different sam-
actions and market transactions, the need pling universes by using common stimuli—for
for market transaction data makes political example, a legislative roll call vote on a bill and
economists heavily dependent on the private media editorials on the same bill. Jeffrey Lewis
sector. and Chris Tausanovitch (2015) survey this lit-
erature and discuss its promises and short-
Data Accuracy comings.
There are several potential problems with re- Record linkage introduces inaccuracy. In
spect to data accuracy: the case of campaign contributions to candi-
dates, the reports of individual candidates may
What Inferences Can Be Made be prepared by unpaid interns who lack strong
from the Sampling Universe? incentives to be accurate. Even when the initial
This question is particularly relevant for data reports are filed accurately, reports across can-
from search engines and social media. Are in- didates can have a different name spelling and
dividuals who search on Google representative address for the same individual. Conversely,
of the larger population? Are heavy searchers individuals with common last names can be
representative of all searchers? Are Twitter us- confused. When the contributors are linked to
ers representative of broader population? another database, such as the National Pro-
Some of the data in the Tausanovitch paper in vider Identifier (NPI) database that the govern-
this issue come from surveys conducted ment maintains for physicians, there is further
through the Internet. In longitudinal studies, opportunity for mismatch.
how will these data match up with data col-
lected in the 1950s through door-to-door inter- New Sources of Big Data May
views or with telephone interviews in the Contain Misrepresentation
1990s? Misrepresentation is hardly a new problem.
Campaign contributions, explored in the For instance, the November Current Popula-
Bonica paper in this issue, allow us to study tion Survey (CPS) has long been used to study
groups that are not reported in sample surveys. voter turnout (Wolfinger and Rosenstone 1980),
For example, medical doctors would represent but turnout is substantially overreported in the
only on the order of 1 percent of the respon- CPS. Citizenship is also likely to be overre-
dents in a survey of 2,000 adult Americans. But ported (McCarty, Poole, and Rosenthal 2006).
145,000 physicians have made campaign con- Income tax and estate tax returns are subject
tributions, with an indication of partisan pref- to fraud. Misrepresentation may be particu-
erence, over the past twenty years (Bonica et larly important in loan markets (Griffin and
al. 2014, 2015). Those 145,000 physicians, in Maturana 2013; Mian and Sufi 2015; Keys, Seru,
turn, can be broken down into still large sam- and Vig 2012). Mian and Sufi (2015) show that
ples by specialty, gender, and employer type. income reporting on publicly available Home
But are these 145,000 representative of the Mortgage Disclosure Act (HMDA) files was sub-
nearly 900,000 physicians in the United States? ject to large-scale overstatement by mortgage
Another larger source of partisan prefer- applicants during the mortgage credit boom
ence could come from voter registration data of 2002 to 2006. The financial incentives of
put together by Catalist. The Catalist data have firms to misreport do, however, represent a
also been used to study physician preferences new concern.

r s f : t h e ru s s e l l s a g e f o u n d at i o n j o u r n a l o f t h e s o c i a l s c i e n c e s

This content downloaded from [Link] on Wed, 05 Feb 2020 [Link] UTC
All use subject to [Link]
in troduction 5

Data Access Another question concerns funding: not all


As we previously indicated, much of the new academic researchers have the resources to
big data is being generated by organizations, purchase the data in the first place. As govern-
both for profit (LinkedIn) and by nonprofits ment funding for research ebbs—the National
(ProPublica) that charge fees for data access. Science Foundation (NSF) cut out political sci-
When an academic researcher uses proprietary ence for a period starting in 2013—researchers
data, what are the conditions for replication? with large internal research funds, in either
Should journals allow publication if the entire professional schools or elite universities, will
data set cannot be made available for replica- have an advantage over others. It is also con-
tion and further study? When the data are pur- ceivable that private sources of data could
chased, the purchase agreement may exclude grant differential access, essentially limiting
the posting of replication materials.2 access to those individuals whom an organiza-
Government agency rules regarding data ac- tion believes to be “safe.” Many private data
cess have not been sufficiently streamlined yet. contracts already give the right of refusal to the
There is natural aversion by government agen- data provider in case the provider objects to
cies to “sharing” their internal data. The reluc- the research findings.
tance may be due to the fear of either lawsuits These important questions regarding ac-
or scrutiny by outsiders of how the agency cess and scientific bias need to be addressed
works. The latter excuse warrants greater trans- carefully as more and more private data
parency, as access might have the beneficial sources are used by academics.
side effect of improving the functioning of
some government agencies. Another source of The Ethics of Collaboration
reluctance is pressure from private interests. An ethical issue arises when there is an aca-
For example, until recently, such pressure kept demic collaboration with a for-profit generator
Medicare payments to individual physicians of big data. The situation was highlighted by
from public scrutiny. the Facebook deception study in 2014 (Albert-
A related issue is the ability to link various son and Gadarian 2014). The study, which in-
government data sets, which raises a natural volved a researcher from Cornell University, had
concern about privacy. Data are often anony- the “big data” advantage that it was possible to
mized before they are shared with researchers. study the behavior of 700,000 individuals. The
Although this is a good practice to follow, ano- big data issue is that a private firm, such as Face-
nymizing data makes it difficult to link them book, has proprietary interests and research ob-
across different sources. It would be useful if jectives that can differ from those of a small,
the government came up with a mechanism to on-campus laboratory experiment monitored
link the various data sets before anonymizing by a university’s institutional review board
them so as to expand the scope of the research (IRB). In the case of the Facebook deception ex-
that could be conducted using governmental periment, the Cornell IRB approved the study
sources of data. with the argument that it was Facebook, not
Along similar lines, there is also a need for Cornell, that practiced the deception. The study
the government to come up with uniform data was published in the prestigious Proceedings of
access rules across its various agencies. Access the National Academy of Sciences. Certainly some
to governmental data sometimes depends on researchers would argue that Cornell and PNAS
who within the agency one knows and can col- made bad choices. Debate is needed about the
laborate with. As such, the playing field is not wider issue of conflicts of interest generated by
level when it comes to access to government- the interaction of non-academic data providers
owned data. and academic research.

2. This issue arose with a recent publication (Lucca, Seru, and Trebbi 2014) that analyzed the revolving door.
Although Francesco Trebbi, at a Harvard conference in 2013, orally stated that the data were from LinkedIn, the
conference paper and the published version did not identify LinkedIn as the data source.

r s f : t h e ru s s e l l s a g e f o u n d at i o n j o u r n a l o f t h e s o c i a l s c i e n c e s

This content downloaded from [Link] on Wed, 05 Feb 2020 [Link] UTC
All use subject to [Link]
6 big data in poli t ica l econom y

A S U M M A RY O F T H E stand is quite arguably even lower. Tausano-


PA P E R S I N T H I S I S S U E vitch also points to a very weak linkage be-
Research in political economy is increasingly tween the policy preferences of voters in a con-
focused on the role of money expenditures, as stituency and the preferences of their
against votes, in shaping the outcomes of elec- representatives in Congress.
tions and policy. These expenditures can take In “A Data-Driven Voter Guide for U.S. Elec-
the form of lobbying or campaign expendi- tions,” Adam Bonica develops a platform for
tures. Three of the papers in this issue—by better informing voters about candidates. So
Adam Bonica, by Drew Dimmery and Andrew the ambition of the Bonica paper is potentially
Peterson, and by Deniz Igan—focus on politi- important. The paper exploits government
cal expenditures. electronic records, computing capacity, the
The consequences of political expenditures linkage of a variety of different data sources,
have been debated in the academic literature and text analysis, four of the important big
(compare Levitt 1994 and Erikson and Palfrey data facets outlined earlier.
2000). It is easy to identify cases where massive The central innovation of the Bonica paper
expenditures came up empty. One example is is the use of the big data present in hundreds
Michael Huffington’s record-breaking personal of millions of campaign contribution records.
expenditure of $28 million in his 1994 Califor- Informing voters in the United States is inher-
nia Senate race against Diane Feinstein. An- ently a big data problem because of the decen-
other is Sheldon Adelson’s $140 million expen- tralized aspect of both campaign finance and
diture on the 2012 election, most of which went the political system, which only weakly con-
into Newt Gingrich’s attempt to be the Repub- trols candidate entry. In parliamentary sys-
lican presidential nominee. Comcast’s attempt tems, where online voter guides are important,
to acquire Time-Warner failed in 2015 despite the informational problem largely reduces to
massive lobbying and personal connections to presenting the platforms of one or two hand-
the Obama administration. On the other hand, fuls of national parties. In the United States,
expenditure by Adelson and others is said to politics can be described in one-dimensional
have forced a total alignment between the Re- liberal-conservative terms (Poole and Rosen-
publican congressional delegation in the thal 2007), but placing candidates on this con-
United States and the Netanyahu government tinuum is challenging. Most candidates in an
in Israel. Similarly, intense lobbying by hedge election have not previously been elected to a
funds appears to have maintained the carried legislature, either because they are new en-
interest deduction in the 2012 tax bill. trants or because they have never won a past
We are, in terms of the research frontier, election. So their positions cannot be esti-
several steps away from tightly drawing the mated from the well-established methods of
linkages between expenditures and the out- roll call vote scaling developed for Congress
comes of elections or legislation. Research at (Poole and Rosenthal 2007) or state legislatures
this point, including the three papers on the (Shor and McCarty 2011). But candidates—not
subject in this issue, is more focused on the only in federal elections but also for state leg-
motivations and characteristics of the makers islatures and elected positions in state courts—
of political expenditures. At the individual can be placed on a common scale using the
level, what is the connection to income, wealth, information provided by campaign contribu-
and ideology? At the corporate level, what is tors. If an individual, for example, contributes
the connection to firm characteristics, such as to a candidate for a U.S. Senate seat, a state
the propensity to take risks or to engage in lower house seat, and a judicial contest, the
fraud? individual’s contributions will provide infor-
In his paper “Income, Ideology, and Repre- mation that glues together the continuum for
sentation,” Chris Tausanovitch stresses the low two legislatures and a judicial body (Bonica
level of voter information about the policy 2013, 2014). More information is provided by
stances of their representatives. The elector- candidates who, as is most often the case, are
ate’s awareness of where unelected candidates also contributors in other races.

r s f : t h e ru s s e l l s a g e f o u n d at i o n j o u r n a l o f t h e s o c i a l s c i e n c e s

This content downloaded from [Link] on Wed, 05 Feb 2020 [Link] UTC
All use subject to [Link]
in troduction 7

To provide context to the contribution data, example, the websites are likely to identify the
the platform also incorporates information officers of the association (see the websites of
from political text, election outcomes, and roll Planned Parenthood and Crossroads GPS, two
call scaling. Use of this additional information organizations mentioned in the paper), who in
allows the platform to provide voters with in- turn are very likely to have made individual po-
formation on candidate positions on specific litical contributions. Record linkage of this
issues. In practice, given the unidimensional- type can “out” the expenditures and ideology
ity of American politics, information on spe- of undisclosed nonprofits.
cific issues is attractive in presentation but As we discussed earlier, a key challenge in
marginal in terms of information value. Bonica the political economy literature is to draw a
nicely refers to this problem as the “curse of tighter connection between political expendi-
unidimensionality.” tures and legislation or policy. One way to ad-
Bonica’s paper emphasizes the importance dress this challenge is to focus on expenditure
of disclosure of campaign contributions and in specific industries and investigate the rela-
roll call votes in providing information to the tionship between political spending and legis-
public. Disclosure, however, is not always im- lative impact. Deniz Igan takes this approach,
plicit in democracy. Roll call votes in the Italian with specific focus on the household credit, or
parliament were secret until 1988 (Giannetti mortgage, industry.
2010). In the United States, political expendi- Focusing on the mortgage industry has
tures by nonprofits—specifically, 501(c) organi- some natural advantages, from both a political
zations—are subject to minimal disclosure and economy and a big data perspective. First, the
have become increasingly important. Drew financial industry is regulated in a number of
Dimmery and Andrew Peterson, in “Shining the different ways. The largest players in the mort-
Light on Dark Money,” take a big data approach gage industry—Freddie Mac and Fannie Mae—
to identifying the political activity and expen- have heavy mandates from the government.
diture of 340,000 nonprofits. The paper cross- There is thus a natural incentive for the private
walks government electronic websites and in- sector to try to influence the ways in which the
formation from the websites of the nonprofits. industry is regulated. Second, large data are
Dimmery and Peterson use automated tech- available for analysis, both for campaign con-
niques to identify the websites of nonprofits tributions and for disbursements of mortgage
and then to scrape the websites of the organi- credit. Igan describes a comprehensive data set
zations. They argue that the websites reveal on political influence exerted by financial in-
more about these organizations than what the stitutions on Congress and links it to the mort-
organizations report to the federal government gage lending activity of these institutions. She
or what has previously been gleaned by the then describes the role of political influence in
Center for Responsive Politics. To ferret out po- dictating financial regulation and credit dis-
litical nonprofits, they match the larger set of bursement during the U.S. credit boom of 1999
nonprofits with a much smaller number of to 2006.
nonprofits whose names or IRS reports directly Results suggest that lobbying by financial
reveal them to be political organizations and institutions helped sway legislative decisions.
with nearly 11,000 political action committees Legislators who changed their vote in favor of
(PACs) that register with the Federal Election deregulation under various bills were more
Commission (FEC). Nonprofits are deemed po- likely to have been lobbied by the financial in-
litical when their websites use language simi- dustry. At the same time, financial institutions
lar to that used on the websites of known po- that engaged in greater lobbying of the legis-
litical organizations. The automated sources lature were more likely to engage in risky lend-
are validated by human evaluations that are ing behavior. For example, financial institu-
crowdsourced. tions that spent more on lobbying activity gave
The study is an important entry point to out loans with higher loan-to-income ratios,
bring nondisclosing organizations into the dis- were more likely to securitize the loans, and
closed world explored in the Bonica paper. For had higher delinquency rates ex-post.

r s f : t h e ru s s e l l s a g e f o u n d at i o n j o u r n a l o f t h e s o c i a l s c i e n c e s

This content downloaded from [Link] on Wed, 05 Feb 2020 [Link] UTC
All use subject to [Link]
8 big data in poli t ica l econom y

By linking lobbying and campaign contribu- mean preference of the poor or the mean pref-
tion data with actual voting and lending behav- erence of the rich, is a better predictor than the
ior, Igan presents evidence that suggests that mean of either group.
lobbying by the financial sector influences leg- One limitation of the Tausanovitch study is
islators’ voting behavior. Moreover, the finan- that income is top-coded so that the “rich” in
cial institutions that benefit the most from de- the study are all respondents reporting an in-
regulation—such as subprime lenders—are come over $100,000—hardly the infamous 1
more likely to devote greater resources to lob- percent (Edsall 2013). One could apply the big
bying activity. data capacity of the Bonica study to use con-
Another prominent topic in political econ- tributor zip codes to compute a money-
omy is income inequality (see Piketty 2014, as weighted average ideology of contributors in a
well as the papers in the summer 2013 issue of district. This might be a better measurement
the Journal of Economic Perspectives). Politics, of the opinion of the truly rich, and it could be
in turn, can exacerbate income inequality if the run through the Tausanovitch analytics.
political process overweights those with high Rather than looking at contributions, roll
incomes. Larry Bartels (2009) filed the opening call voting, public opinion, or cheap talk text,
claim that members of Congress represented the paper by Sharyn O’Halloran, Sameer Mas-
the views of their rich constituents and largely key, Geraldine McAllister, David K. Park, and
ignored the views of poor ones. Bartels’s meth- Kaiping Chen goes directly to a policy analysis
odological and measurement groups have sub- of financial regulatory structure. A major ob-
sequently been challenged (Bhatti and Erikson jective, shared with the Bonica and Dimmery
2011; Brunner, Ross, and Washington 2013). and Peterson papers, is to replace tedious
Tausanovitch brings big data to this prob- hand- coding of volumes of text with auto-
lem by making substantial increases in the mated procedures. And volumes there are—
number of respondents used in the analysis. the paper ambitiously tackles all regulatory
He estimates an item response model for legislation since 1950. The analytical problem
362,000 respondents. The large sample size has worsened over time as legislation has be-
permits analysis of the U.S. House of Represen- come increasingly wordy. (Dodd-Frank alone
tatives, whereas the earlier studies were lim- has over 30,000 words.) The main topics of in-
ited to the Senate. Doing so required develop- terest, classic in the political science litera-
ing special software that took advantage of ture, are regulatory delegation and procedural
graphical processing units in desktop comput- constraints. The work shows that traditional
ers. The paper innovates in a way that goes be- coding and automated coding are complemen-
yond increasing sample size. Whereas Bartels tary.
and Erikson and Bhatti used responses to a The authors use their processing of text to
single survey item, five-point or seven-point test two hypotheses: (1) that there is more dis-
ideological self-placements, Tausanovitch ap- cretion when the president and Congress have
plies the item response model to policy ques- similar preferences or there is more market un-
tions. He can then measure ideology on a con- certainty, and (2) that higher risk aversion
tinuum and eliminate the granularity in the leads to more regulation, but with more discre-
other measurements. For a similar policy ques- tion.
tion approach but with smaller samples, see To summarize the methodology, O’Halloran
Stephen Jessee (2012). and her coauthors started by identifying the
The bottom line in the results is that how texts of all financial regulation laws to the ex-
the distribution of income in a district influ- clusion of those dealing with mortgage lend-
ences whether Democrats or Republicans rep- ing. The laws were then coded for delegation
resent the district is far more important than and procedural constraint. Both delegation
how differences in income affect within-party and constraint were reduced to one-
representation. Moreover, the mean overall dimensional indexes, and discretion was mea-
preference of the district, which is likely to sured as the product of the delegation index
have less measurement error than either the and one minus the constraint index. The anal-

r s f : t h e ru s s e l l s a g e f o u n d at i o n j o u r n a l o f t h e s o c i a l s c i e n c e s

This content downloaded from [Link] on Wed, 05 Feb 2020 [Link] UTC
All use subject to [Link]
in troduction 9

ysis shows that discretion is least with a Dem- ington. 2013. “Does Less Income Mean Less Rep-
ocratic president and a Republican Congress. resentation?” American Economic Journal: Eco-
nomic Policy 5(2): 53–76.
R E FE R E N C E S Chae, David H., Sean Clouston, Mark L. Hatzen-
Albertson, Bethany, and Shana Gadarian. 2014. “Was buehler, Michael R. Kramer, Hannah L. F. Cooper,
the Facebook Emotion Experiment Unethical?” Sacoby M. Wilson, Seth I. Stephens-Davidowitz,
Washington Post, July 1. Robert S. Gold, and Bruce G. Link. 2015. “Associ-
Baker, Scott. 2014. “Debt and the Consumption Re- ation Between an Internet-Based Measure of
sponse to Household Income Shocks.” April. Area Racism and Black Mortality.” PLOS ONE
[Link] (April 24).
_DebtConsumption.pdf (accessed May 31, 2016). Dimmery, Drew, and Andrew Peterson. 2016. “Shin-
Barberá, Pablo. 2015. “Birds of the Same Feather ing the Light on Dark Money: Political Spending
Tweet Together: Bayesian Ideal Point Estimation by Nonprofits.” RSF: The Russell Sage Foundation
Using Twitter Data.” Political Analysis 23(1): 76– Journal of the Social Sciences 2(7). doi: 10.7758/
91. RSF.2016.2.7.04.
Bartels, Larry M. 2009. Unequal Democracy: The Po- Dobbie, Will, and Jae Song. 2015. “Debt Relief and
litical Economy of the New Gilded Age. Princeton, Debtor Outcomes: Measuring the Effects of Con-
N.J.: Princeton University Press. sumer Bankruptcy Protection.” American Eco-
Bhatti, Yosef, and Robert S. Erikson. 2011. “How nomic Review 105(3): 1272–1311.
Poorly Are the Poor Represented in the U.S. Sen- Edsall, Thomas B. 2013. “When Class Trumps Iden-
ate?” In Who Gets Represented? edited by Peter tity.” New York Times, October 29. [Link]
K. Enns and Christopher Wlezien (New York: .[Link]/2013/10/29/opinion/edsall-when
Russell Sage Foundation). -[Link] (accessed May 23,
Bonica, Adam. 2013. “Ideology and Interests in the 2016).
Political Marketplace.” American Journal of Politi- Erikson, Robert S., and Thomas R. Palfrey. 2000.
cal Science 57(2): 294–311. “Equilibria in Campaign Spending Games: The-
———. 2014. “Mapping the Ideological Marketplace.” ory and Data.” American Political Science Review
American Journal of Political Science 58(2): 367– 94(3): 595–609.
86. Gentzkow, Matthew, and Jesse M. Shapiro. 2010.
———. 2016. “A Data-Driven Voter Guide for U.S. “What Drives Media Slant? Evidence from U.S.
Elections: Adapting Quantitative Measures of the Newspapers.” Econometrica 78(1): 35–71.
Preferences and Priorities of Political Elites to Giannetti, Daniela. 2010. “Secret Voting in the Ital-
Help Voters Learn About Candidates.” RSF: The ian Parliament.” Paper presented at the annual
Russell Sage Foundation Journal of the Social Sci- meeting of Rationalité et Sciences Sociales. Col-
ences 2(7). doi: 10.7758/RSF.2016.2.7.02. lege de France, Paris (June 3–4).
Bonica, Adam, Adam S. Chilton, and Maya Sen. Griffin, John M., and Gonzalo Maturana. 2013. “Who
2015. “The Political Ideologies of American Law- Facilitated Misreporting in Securitized Loans?”
yers.” Journal of Legal Analysis. First published Working paper. University of Texas, Austin.
online October 13. doi: 10.1093/jla/lav011. Hersh, E. D., and M. Goldenberg. 2016. “Political
Bonica, Adam, David J. Rothman, and Howard L. Spillover Effects on Physician Clinical Practice.”
Rosenthal. 2014. “The Political Polarization of Working paper. Yale University, New Haven,
Physicians in the United States: An Analysis of Conn.
Campaign Contributions to Federal Elections, Igan, Deniz. 2016. “Home Truths: Promises and
1991–2012.” JAMA Internal Medicine 174(8): 1308– Challenges in Linking Mortgages and Political In-
17. fluence.” RSF: The Russell Sage Foundation Jour-
———. 2015. “The Political Alignment of U.S. Physi- nal of the Social Sciences 2(7). doi: 10.7758/RSF
cians: An Update Including Campaign Contri- .2016.2.7.05.
butions to the Congressional Midterm Elections Imai, Kosuke, James Lo, and Jonathan Olmsted.
in 2014.” JAMA Internal Medicine 175(7): 1236– 2015. “Fast Estimation of Ideal Points with Mas-
37. sive Data.” Working paper. Princeton University.
Brunner, Eric, Stephen L. Ross, and Ebonya Wash- Jessee, Stephen A. 2012. Ideology and Spatial Voting

r s f : t h e ru s s e l l s a g e f o u n d at i o n j o u r n a l o f t h e s o c i a l s c i e n c e s

This content downloaded from [Link] on Wed, 05 Feb 2020 [Link] UTC
All use subject to [Link]
10 big data in poli t ica l econom y

in American Elections. New York: Cambridge the Economic Slump.” Quarterly Journal of Eco-
University Press. nomics (July 25). doi: 10.1093/qje/qjt020.
Keys, Benjamin J., Amit Seru, and Vikrant Vig. 2012. Mian, Atif, Amir Sufi, and Nasim Khoshkhou. 2016.
“Lender Screening and the Role of Securitization: “Government Economic Policy, Sentiments and
Evidence from Prime and Subprime Mortgage Consumption.” NBER working paper.
Markets.” Review of Financial Studies 25(7): Mian, Atif R., Amir Sufi, and Francesco Trebbi. 2010.
2071–108. “The Political Economy of the U.S. Mortgage De-
Kim, In Song. 2014. “Intra-industry Trade and Trade fault Crisis,” American Economic Review 100(5):
Liberalization: Evidence from Dyad-Level Tariff 1967–98.
Data.” Working paper. Massachusetts Institute of O’Halloran, Sharyn, Sameer Maskey, Geraldine
Technology, Cambridge. McAllister, David K. Park, and Kaiping Chen.
Levitt, Steven D. 1994. “Using Repeat Challengers to 2016. “Data Science and Political Economy: Ap-
Estimate the Effect of Campaign Spending on plication to Financial Regulatory Structure.” RSF:
Election Outcomes in the U.S. House.” Journal of The Russell Sage Foundation Journal of the Social
Political Economy 102(4): 777–98. Sciences 2(7). doi: 10.7758/RSF.2017.2.6.06.
Lewis, Jeffrey B., and Chris Tausanovitch. 2015. Piketty, Thomas. 2014. Capital in the Twenty-First
“When Does Joint Scaling Allow for Direct Com- Century. Cambridge, Mass.: Belknap Press of
parisons of Preferences?” Working paper. Univer- Harvard University Press.
sity of California, Los Angeles. Piketty, Thomas, and Emmanuel Saez. 2003. “In-
Lucca, David, Amit Seru, and Francesco Trebbi. come Inequality in the United States, 1913–1998.
2014. “The Revolving Door and Worker Flows in Quarterly Journal of Economics 118(1): 1–39.
Banking Regulation.” Journal of Monetary Eco- Poole, Keith T., and Howard L. Rosenthal. 1991. “Pat-
nomics 65: 17–32. terns of Congressional Voting.” American Journal
McCarty, Nolan, Keith T. Poole, and Howard Rosen- of Political Science 35(1): 228–78.
thal. 2006. Polarized America: The Dance of Ide- ———. 2007. Ideology and Congress. New Brunswick,
ology and Unequal Riches. Cambridge, Mass.: N.J.: Transaction Publishers.
MIT Press. Shor, Boris, and Nolan McCarty. 2011. “The Ideologi-
Mian, Atif, and Amir Sufi. 2014. House of Debt: How cal Mapping of American Legislatures.” American
They (and You) Caused the Great Recession, and Political Science Review 105(3): 530–51.
How We Can Prevent It from Happening Again. Tausanovitch, Chris. 2016. “Income, Ideology, and
Chicago: University of Chicago Press. Representation.” RSF: The Russell Sage Founda-
———. 2015. “Fraudulent Income Overstatement on tion Journal of the Social Sciences 2(7). doi:
Mortgage Applications During the Credit Expan- 10.7758/RSF.2016.2.7.03.
sion of 2002 to 2005.” Working paper. Princeton Wolfinger, Raymond E., and Steven J. Rosenstone.
University and University of Chicago. 1980. Who Votes? New Haven, Conn.: Yale Uni-
Mian, Atif, Kamalesh Rao, and Amir Sufi. 2013. versity Press.
“Household Balance Sheets, Consumption, and

r s f : t h e ru s s e l l s a g e f o u n d at i o n j o u r n a l o f t h e s o c i a l s c i e n c e s

This content downloaded from [Link] on Wed, 05 Feb 2020 [Link] UTC
All use subject to [Link]

You might also like