Big Data A Systematic View

An Output of Whiteboard @ SonicRim, Oct-Nov 2013.

1

Every month we open our San Francisco office to all professionals and students working in or interested in innovation, design, and research that inspires change. Using 128 square feet of whiteboard space, we have created a forum for creative collaboration to build a shared understanding of emerging issues that shape our work. For the October 2013 Session, a variety of design, research, innovation management, and product management practitioners gathered to discuss Big Data.

2

This document is a synthesis of a Whiteboard session at SonicRim on Big Data. This is an attempt to arrive at a systematic understanding of Big Data, its impact on research and design, and opportunities afforded by the trend. Insights and ideas came from the group discussion, but all errors in this document come from SonicRim’s synthesis. [November 2013]

Cocreated by Aaron Marcus Brigitte Jordan Elena O'Curry Eric Nehrlich Jeff Greger Joshua Kauffmann Kaaren Hanson Keren Solomon Larry Cheng Marc Hébert Mike Alvarado Millicent Cooley Min Lee Paul Van Slembrouck And the SonicRim gang Arvind Venkataramani Caroline Smith Christopher Avery Uday Dandavate Vidya Venkateswaran

3

Remixed iconography credits: ‘City’ designed by Thibault Geffroy & ‘People’ designed by Charlene Chen from The Noun Project; Iconshock.

4

[This definition and the following perspectives are a synthesis of dominant themes shared by attendees during the Whiteboard session.]

Big Data is: 1.! massive, interconnected, unstructured data 2.! that paints a complex picture of the world, connecting micro and macro scales, 3.! which needs and enables new kinds of analysis and produces new kinds of insights, 4.! leading to new value propositions and businesses.
5

1.1 Massive, interconnected, unstructured data
“Big Data is fundamentally networked. Its value comes from the patterns that can be derived by making connections between pieces of data, about an individual, about individuals in relation to others, about groups of people, or simply about the structure of information itself.” Boyd (2011); Six Provocations for Big Data.
Massive: we have new, widely available sources of data such as smartphones and social media, combined with a vastly improved set of technologies to gather, store, and process data. Interconnected: because this data is both gathered through and lives in computer networks, it’s possible to combine datasets gathered independently, or for very different purposes, and draw insight from making heretofore unlikely connections. Unstructured: Big Data is a mix of data produced either intentionally (as in tracking user-supplied inputs) or incidentally (by observing an environment or ecosystem). Because of the multiple sources, formats, data pipelines, the resulting data set is often messy.
6

1.2 …That paints a complex picture of the world, connecting micro and macro scales
A complex picture is enabled by making connections between a variety of different kinds of data; this means – unlike ‘quant’ data sets – that the more data you have, the more complex the picture of the phenomena you get, instead of just being more accurate (statistically). A corollary of this complexity is the ability to connect multiple types of data (numerical, textual, geographical, media inputs, etc.) of different scales, with the possibility of not just achieving complex insights, but also telling complex stories that synthesize micro- and macro-scale trends in meaningful ways. Part of the promise of Big Data is connecting events at the micro-scale through identified chains of cause and effect to broader trends at the macro-scale, though different beneficiaries might wish to look at these connections from different ends.
7

“In the Quantified Self movement, the data and their visualization are illuminated because of the reflective dialogue people have with them..”
Joshua Kauffmann

“At Intuit, we’re asking ourselves questions like: how can we help small business owners learn from the best practices of other small businesses?”
Larry Cheng

8

1.3 … Which needs and enables new kinds of analysis and produces new kinds of insights
Big Data differs from classic analytic approaches in terms of:
Cadence: data changes quickly, and can be captured & analyzed quickly. Pipelines: constantly running flows of data transformed algorithmically into consumable units, in contrast with planned data collection projects. Skill sets: need to involve engineering/database people, data scientists, domain experts/ researchers and information designers.

New kinds of insight are being invented – and not always by by people possessing data literacy/research skills – attempting to answer essentially ‘qualitative’ questions using quantitative approaches to data that is fundamentally both and neither. Looking at data in this dual sense highlights its many meanings, offering the possibility of both capturing its context and problematizing it in dialog with multiple stakeholders and perspectives. Bottom line: You can do now what you couldn’t do before, but you need to do it in different ways from the way you imagined doing it.
9

1.4 … Leading to new value propositions and businesses
Organizations with access to larger and more rich data sets are seeking to make accurate predictions related to their business questions/objectives, or tune in to events/trends/opinions much more quickly. For individuals, Big Data connects personal/social/biometric data with a corpus of similar data in order to achieve deeper insight or clarity for oneself and enable others to do the same. Likewise, publicly available data that lives in the commons works at the personal and organizational scales for the public good, and enables stakeholders in public sphere to interrogate aspects of governance and civic life. In some cases the goals of individuals and organizations converge; in these cases the data itself is less important than the experiences and values built on it (e.g. Mint.com and how it enables smarter financial decision making for individuals while connecting them to relevant service providers).
10

BIG TRUTHS

BIG LIES

Will Big Data make us wiser or just better informed?

Will Big Data produce Big Truths or Big Lies?

11

[The following is a commentary on the state of conversations – in this Whiteboard session as well as in the broader research/design community – that became visible during the discussion.]

Big data is like teenage sex: everyone talks about it, nobody really knows how to do it, everyone thinks everyone else is doing it, so everyone claims they are doing it... Dan Ariely, January 2013 on Facebook

12

2 Big Data means different things to different people.
During the Whiteboard, the well informed group of people in attendance took a considerable amount of time to reach a consensus as to what Big Data is; this lack of consensus is visible in the Big Data conversations around the web. We see five stakeholder types with differing concerns:
People constructing the actual software tools are more concerned about the elements of the data pipelines and fitting them into existing business process & systems. People consuming the data care about the pipeline outputs, but may not care about or understand the pipeline architecture. People producing data care about the experiences they might have and the value they might derive by sharing their personal data (privacy is one component of a set of trade-offs). Researchers care about people’s rights, and truth claims emerging from analyses by researchers from other disciplines/institutions. Designers care about creating experiences that incorporate data.

Without a systematic perspective, these sets of people will continue to talk past each other, and struggle to achieve mutually acceptable data outcomes.
13

3 Quantitative vs. qualitative methods debates are part of, but not the defining challenge for Big Data.
The quant. vs. qual. debate is moving from whether to define a phenomenon through numbers or through qualities, towards how to tell stories & give explanations rooted in and combining a variety of kinds of evidence. Essentially, this debate is about how to interpret data. However, the primary Big Data challenge is not identifying the correct way to perform data interpretation, but how the existence of data sets and tools affects the various stakeholders in any data ecosystem and what they will do in response. A researcher’s role – whether quantitative or qualitative – may have to shift from being privileged purveyors of insight into people’s lives (users, consumers or stakeholders), to facilitating the right kinds of inquiry into the data by the various stakeholders involved.

14

“Every ten years it seems we have another round of conversations about data/data visualization, and all the things they are going to accomplish”
Aaron Marcus

“If this is really a revolution, we should ask ourselves what kind. Is Big Data like the printing press, or nuclear fission?”
Mike Alvarado

15

4 This is one in a long series of ‘data revolutions’, but with subtly different nuances.
Concerns about data use in general (e.g. questions of privacy, sampling, etc.) are being applied to Big Data without sufficient reframing to address the differences between data and Big Data sets.
–! Differences lie in Big Data’s interconnected nature, and the much more complex system of technologies and institutions involved in collecting, sharing and combining data sets.

Reactively applying old analytic methods to Big Data without sufficient acknowledgement that data and Big Data offer different constraints and opportunities creates dissonance:
–! There is a need to acknowledge/accommodate multiple perspectives on data (e.g. asking different types of questions of data to get different types of answers). –! Researchers might need to revisit their opinions on which phenomena fall under their purview and how they are described. –! There are different requirements for predictability (data must be “clean”, outliers are unwanted) and signal detection (outliers are more interesting than the norm) as analytical goals. –! Tools being developed today (and some data sources) are having a democratizing influence on who can ask questions and produce insight.
16

5 The ethical issues related to Big Data comprise both issues related to data in general, but must contend with new possibilities.
–! Existing ethical questions become complicated because of developments such as mining of personal data and outsourcing analysis to 3rd party contractors (specifically the impact of these on individual freedoms & rights). –! Data protection laws and policies are designed around single institution data sets, not networked data sets. –! New possibilities introduced by technology to find and act on previously inaccessible types and scales of data (specifically the commons, public spaces and resources, natural phenomena, governance at the community/organizational scale) produce new conflicts and politics. –! Data literacy is key: researchers will need to become fluent in multiple ways of comprehending, interpreting, and presenting data, and understanding/critiquing the algorithms that power Big Data tools. Further, it is important to have widespread data literacy, both so audiences can interpret data properly, but also so that people are empowered to draw their own (correct) conclusions. –! Big Data is not a one-person effort: it is no longer possible for a single individual to perform all of the data collection, management, analysis and reporting tasks. Until the management of massive data sets becomes commoditized and broadly affordable, the power to do things with Big Data lies primarily with large organizations, and networked communities of practice, with the result that organizations have more power in this regard.

17

6 A systematic approach to Big Data as a multidisciplinary endeavor would need to consider:
1.! A comprehensive view of the kinds of domains and inquiries benefited by Big Data. 2.! The uses that different kinds of stakeholders can put Big Data to. 3.! Analytic methods that acknowledge and integrate multiple disciplines and perspectives. 4.! The development of tools, reporting formats, and explanatory mechanisms that combine multiple kinds of data. 5.! The identification of new kinds of work and workflows made necessary by Big Data. 6.! Legal and policy updates to address questions of ownership and limits on transferring or sharing data, and other actions that can produce unanticipated consequences.
18

SonicRim will continue to host conversations about Big Data, and continue to pose and answer questions not examined in this session. Stay tuned for more, and drop by if you can!

19

Suggested Readings
Danah Boyd, Kate Crawford (2011); Six Provocations for Big Data.
SSRN: http://papers.ssrn.com/sol3/papers.cfm?abstract_id=1926431

Tricia Wang (2013). Big Data needs Thick Data.
Ethnography Matters blog: http://ethnographymatters.net/2013/05/13/big-data-needs-thick-data/

Shvetank Shah, Andrew Horne, and Jaime Capellá. (2012) “Good Data Won’t Guarantee Good Decisions.”
Harvard Business Review http://hbr.org/2012/04/good-data-wont-guarantee-good-decisions/ar/pr

Ben Kerschberg (2012). How Xerox Uses Analytics, Big Data and Ethnography To Help Government Solve "Big Problems”
Forbes. http://www.forbes.com/sites/benkerschberg/2012/10/22/how-xerox-uses-analytics-big-data-and-ethnography-to-help-government-solve-bigproblems/

Techrux blog (2013). What is Big Data? The definition.
http://techrux.net/big-data-definition/ (instructive for its extremely technical/enterprise perspective)

Abby Margolis (2013). Five Misconceptions about Personal Data: Why We Need a People-centred Approach to “Big” Data
Ethnographic Praxis in Industry Conference. http://www.claropartners.com/wp-content/uploads/downloads/2013/10/Five-misconceptions-about-Personal-Data-Margolis-for-EPIC-2013.pdf

Eli Pariser (2011). Beware online “filter bubbles”.
TED conference. http://www.ted.com/talks/eli_pariser_beware_online_filter_bubbles.html

Anthony Townsend (2013). Smart Cities: Big Data, Civic Hackers, and the Quest for a New Utopia. Daniel Huffman (2011). No Swearing in Utah.
Cartastrophe blog. http://cartastrophe.wordpress.com/2011/01/24/no-swearing-in-utah/

20

Provocations, Triggers, and Trends
If you are new to this discussion, the following experiments, tools, and businesses will be insightful. 23 and me personal genetic profiling Google Flu Trends Sourcemap crowdsourced product ingredient map Palantir Technologies help the US government trawl intelligence data The Quantified Self movement Strata conference O’Reilly sponsored conference on Big Data DataElite a Big Data venture capital firm Stamen MapStack makes constructing maps easy Infosthetics, Flowing Data blogs documenting interesting data visualizations
21

Thank you for reading! Please send comments to: whiteboard@sonicrim.com

22

Sign up to vote on this title
UsefulNot useful