You are on page 1of 39

Data and Explore!

7/28/2011 Check-in and Wafes


Friday, July 29, 2011

Ben Lee - @benlee

Overview
Data intro Explore recommendation engine Technology stack More infographics

7/28/2011 Check-in and Wafes


Friday, July 29, 2011

Ben Lee - @benlee

Data model at a glance


Users
Venues
@
Lobster roll!

Check-ins

8 PM

7/28/2011 Check-in and Wafes


Friday, July 29, 2011

Ben Lee - @benlee

Data model at a glance


Users
Venues
@
Lobster roll!

> 10 million > 15 million > 750 million > 3 million / day

Check-ins

8 PM

7/28/2011 Check-in and Wafes


Friday, July 29, 2011

Ben Lee - @benlee

Our data
Human mobility patterns Explicit geo-spatial/temporal data Many ways to interpret, lter, slice, visualize

7/28/2011 Check-in and Wafes


Friday, July 29, 2011

Ben Lee - @benlee

Growth

7/28/2011 Check-in and Wafes


Friday, July 29, 2011

Ben Lee - @benlee

Wall Street Journal Study

Week of Jan 21, 2011 Firehose of ~10.9 million checkins Examined NYC vs SF Bay area

7/28/2011 Check-in and Wafes


Friday, July 29, 2011

Ben Lee - @benlee

WSJ NYC Heatmp

7/28/2011 Check-in and Wafes


Friday, July 29, 2011

Ben Lee - @benlee

WSJ SF Heatmap

7/28/2011 Check-in and Wafes


Friday, July 29, 2011

Ben Lee - @benlee

WSJ factoids
NYC vs SF Bay Area Male vs Female

Women check into sushi, libraries, karaoke more Men check into burgers, tech startups, gay bars more 50/50 split on mexican, ramen, beaches, desserts

Friday, July 29, 2011

http://graphicsweb.wsj.com/documents/FOURSQUAREWEEK1104/

7/28/2011 Check-in and Wafes

Ben Lee - @benlee

Explore
Social recommendation
engine

Leverage check-in data

7/28/2011 Check-in and Wafes


Friday, July 29, 2011

Ben Lee - @benlee

Coffee?
Realtime recommendations Time of day Previous checkins Friend history User history
7/28/2011 Check-in and Wafes
Friday, July 29, 2011

Ben Lee - @benlee

Time of day
Relevant for the time of day, day of week Afnities for different categories at
different times useful

Per-venue check-in distribution over time is

7/28/2011 Check-in and Wafes


Friday, July 29, 2011

Ben Lee - @benlee

Tartine Bakery

7/28/2011 Check-in and Wafes


Friday, July 29, 2011

Ben Lee - @benlee

DNA Lounge

7/28/2011 Check-in and Wafes


Friday, July 29, 2011

Ben Lee - @benlee

Blue Bottle

7/28/2011 Check-in and Wafes


Friday, July 29, 2011

Ben Lee - @benlee

Previous checkins
We are not unique
snowakes
repeat checkin %

Median user checks-in

at to a place that their social circle has been to before > 60%

7/28/2011 Check-in and Wafes


Friday, July 29, 2011

Ben Lee - @benlee

Friend history
Social justication Call out similar
friends

Surface their tips

Friday, July 29, 2011

User history
Places Ive been

7/28/2011 Check-in and Wafes


Friday, July 29, 2011

Ben Lee - @benlee

User history
Highlight places similar
to places Ive been

7/28/2011 Check-in and Wafes


Friday, July 29, 2011

Ben Lee - @benlee

Venue similarity
People that go to Blue Bottle, also go to:

7/28/2011 Check-in and Wafes


Friday, July 29, 2011

Ben Lee - @benlee

Venue similarity
People that go to Tartine, also go to:

7/28/2011 Check-in and Wafes


Friday, July 29, 2011

Ben Lee - @benlee

Computing similarity
NYC Food sample

Venues Users
Incredibly Sparse Matrix

7/28/2011 Check-in and Wafes


Friday, July 29, 2011

Ben Lee - @benlee

Computing similarity
Venue similarity
Venues Users
Incredibly Sparse Matrix

vi vj for all i,j sim(vi, vj)

Take a avor of co-occurrence based similarity


7/28/2011 Check-in and Wafes
Friday, July 29, 2011

Ben Lee - @benlee

Computing similarity
User similarity
for all i,j sim(ui, uj) ui Incredibly Sparse Matrix uj Venues

Users

7/28/2011 Check-in and Wafes


Friday, July 29, 2011

Ben Lee - @benlee

Similarity pipeline
Input: Mongo dumps on S3 or HDFS Java mapreduces via Hadoop / Elastic
Mapreduce

Load output into Mongo for serving


7/28/2011 Check-in and Wafes
Friday, July 29, 2011

Ben Lee - @benlee

Mapreduce overview
key user visited venues
emit all pairs of visited venues for each user

map

vi, vj vi, vj ... key vi, vj score

score score

score ... score nal score

reduce

Sum up each users score contribution to this pair of venues


7/28/2011 Check-in and Wafes
Friday, July 29, 2011

Ben Lee - @benlee

Putting it all together


Nearby relevant venues Users check-in history Friends check-in history, similarity

Similar Venues

< 200 ms

MOAR Signals

7/28/2011 Check-in and Wafes


Friday, July 29, 2011

Ben Lee - @benlee

Explore performance
Track behavior over
time

Measure

improvements

Model updated

Run experiments
7/28/2011 Check-in and Wafes
Friday, July 29, 2011

Ben Lee - @benlee

Explore in the future


Weve just started! New controls Beyond here/now

7/28/2011 Check-in and Wafes


Friday, July 29, 2011

Ben Lee - @benlee

Data Stack
MongoDB (production) Amazon S3, Elastic Mapreduce Hadoop Hive Flume R/RStudio
7/28/2011 Check-in and Wafes
Friday, July 29, 2011

Ben Lee - @benlee

Hive interface
Text

6/28/2011 Big Data Camp


Friday, July 29, 2011

Ben Lee - @benlee

Rudest cities

6/28/2011 Big Data Camp


Friday, July 29, 2011

Ben Lee - @benlee

Boca is Nice!

6/28/2011 Big Data Camp


Friday, July 29, 2011

Ben Lee - @benlee

Happiness analysis
Aditya Mukerjees awesome work Happy Sad

stupid at&t and no reception SF fuck mondays NYC

woohoo! SF i love you, bone-in let. so, so much. NYC

7/28/2011 Check-in and Wafes


Friday, July 29, 2011

Ben Lee - @benlee

7/28/2011 Check-in and Wafes


Friday, July 29, 2011

Ben Lee - @benlee

*pocalypse
A tradition of hyperbole Snowpocalypse 2011 Heatpocalypse 2011

NY State senate passes marriage equality 6/24

Marriage Equalitocalypse

7/28/2011 Check-in and Wafes


Friday, July 29, 2011

Ben Lee - @benlee

Adityas Equalitocalypse gephi viz

7/28/2011 Check-in and Wafes


Friday, July 29, 2011

Ben Lee - @benlee

Love data?

Join the team! http://foursquare.com/jobs


7/28/2011 Check-in and Wafes
Friday, July 29, 2011

Ben Lee - @benlee