Chapter 10: Data at Scale
10.1 Introduction
We no longer just study 10 users in a lab. We can study 10 million users via the cloud. This
chapter covers "Big Data"—how to collect it, visualize it, and the ethical responsibilities of
handling it.
10.2 Approaches for Collecting and Analyzing Data
● Crowdsourcing: asking the "crowd" (internet) to perform tasks.
○ Vivid Example: Wikipedia. Millions of people contributing small bits of data.
● Social Media Mining: Scraping platforms (Twitter/X) to analyze public sentiment.
● A/B Testing: A randomized experiment.
○ Method: Show Version A (Green Button) to 50% of users and Version B (Blue Button)
to 50%. Measure which gets more clicks.
○ Scale: Companies like Google and Netflix run thousands of these daily.
● Predictive Analytics: Using historical data to predict future behavior.
○ Vivid Example: Amazon suggesting "You might also like..." based on what other
people bought.
10.3 Visualizing and Exploring Data
● The Problem: You can't read a spreadsheet with 1 billion rows. You need visuals.
● Techniques:
○ Dashboards: Real-time overviews (e.g., Google Analytics).
○ Tag Clouds: Sizing words by frequency.
○ Geographical Maps: Plotting data points on a map (e.g., a disease outbreak
heatmap).
○ Network Diagrams: Showing connections between people (e.g., Who follows whom
on Facebook).
10.4 Ethical Design Concerns
● The Dilemma: Just because we can collect data, should we?
● 1. Privacy & Anonymity: It is very hard to truly anonymize data. "Anonymized" Netflix
data was de-anonymized by cross-referencing it with IMDb reviews.
● 2. Fairness & Bias: Algorithms learn from past data. If the past data is racist or sexist,
the algorithm will be too.
○ Vivid Example: A hiring AI learned to reject resumes with the word "Women's Chess
Club" because successful resumes in the past (from men) didn't have that term.
● 3. Transparency (Explainability): Can the user understand why an algorithm made a
decision? (e.g., Why was my loan denied?).
● 4. Ownership: Who owns the data? The user who created it, or the platform that hosted
📖 HCI Decoder
it?
● Scraping: Automated extraction of data from websites.
● Sentiment Analysis: Using Natural Language Processing (NLP) to determine if a piece of
text is positive, negative, or neutral.
● A/B Testing: A user experience research methodology to compare two versions of a web
page or app against each other to determine which one performs better.