This action might not be possible to undo. Are you sure you want to continue?

Blake Shaw, PhD Data Scientist @ Foursquare @metablake

What is foursquare?

An app that helps you explore your city and connect with friends A platform for location based services and data

What is foursquare?

People use foursquare to: • share with friends • discover new places • get tips • get deals • earn points and badges • keep track of visits

What is foursquare?

Mobile Social

Local

Stats

15,000,000+ people 30,000,000+ places 1,500,000,000+ check-ins 1500+ actions/second

Video: h)p://vimeo.com/29323612

Overview • Intro to Foursquare Data • Place Graph • Social Graph • Explore • Conclusions

**The Place Graph • 30m places interconnected w/ different
**

signals:

• ﬂow • co-visitation • categories • menus • tips and shouts

NY Flow Network

**People connect places over time
**

• Places people go after the Museum of Modern

Art (MOMA): • MOMA Design Store, Metropolitan Museum of Art,

Rockefeller Center, The Modern, Abby Aldrich Rockefeller, Sculpture Garden, Whitney Museum of American Art, FAO Schwarz

**• Places people go after the Statue of Liberty:
**

•

Ellis Island Immigration Museum, Battery Park, Liberty Island, National September 11 Memorial, New York Stock Exchange, Empire State Building

**Predicting where people will go next
**

•Cultural
places
(landmarks
etc.) •Bus
stops,
subways,
train
staHons •Airports •College
Places •Nightlife

AMer
“bars”:
american
restaurants,
nightclubs,
pubs,
lounges,
cafes,
hotels,
pizza
places AMer
“coﬀee
shops”:
oﬃces,
cafes,
grocery
stores,
dept.
stores,
malls

Collaborative ﬁltering

How
do
we
connect
people
to
new
places
they’ll
like? People Places

Collaborative ﬁltering

[Koren, Bell ’08]

**• Item-Item similarity
**

• Find items which are similar to items that a user has

already liked

**• User-User similarity
**

• Find items from users similar to the current user • Low-rank matrix factorization • First ﬁnd latent low-dimensional coordinates of users

and items, then ﬁnd the nearest items in this space to a user

Collaborative ﬁltering

• Item-Item similarity

• •

Pro: can easily update w/ new data for a user Pro: explainable e.g “people who like Joe’s pizza, also like Lombardi’s” Con: not as performant as richer global models

**• • User-User similarity • Pro: can leverage social signals here as well... similar
**

can mean people you are friends with, whom you’ve colocated with, whom you follow, etc...

**Finding similar items
**

• Large sparse k-nearest neighbor problem • Items can be places, people, brands • Different distance metrics • Need to exploit sparsity otherwise

intractable

**Finding similar items
**

• Metrics we ﬁnd work best for recommending: • Places: cosine similarity

sim(xi , xj ) =

• Friends: intersection

xi xj kxi kkxj k

**• Brands: Jaccard similarity
**

sim(A, B) =

|A\B| |A[B|

sim(A, B) = |A \ B|

Computing venue similarity

each entry is the log(# of checkins at place i by user j) one row for every 30m venues...

X2R

n⇥d

Kij = sim(xi , xj ) xi xj = kxi kkxj k

K2R

n⇥n

**Computing venue similarity • Naive solution for
**

computing K :

O(n d)

2

• Requires ~4.5m

machines to compute in < 24 hours!!! and 3.6PB to store!

Kij = sim(xi , xj ) xi xj = kxi kkxj k

K2R

n⇥n

**Venue similarity w/ map reduce
**

key user vi, vj vi, vj key vi, vj score score ... score ... visited venues score score

map

emit “all” pairs of visited venues for each user

reduce

ﬁnal score

Sum up each user’s score contribution to this pair of venues

**The Social Graph • 15m person social network w/ lots of
**

different interaction types:

• friends • follows • dones • comments • colocation

What happens when a new coffee shop opens in the East Village?

A new coffee shop opens...

The Social Graph

**The Social Graph
**

How can we better visualize this network?

A2B

n⇥n

L 2 Rn⇥d

Graph embedding

• Spring Embedding - Simulate physical system

by iterating Hooke’s law • Spectral Embedding - Decompose adjacency matrix A with an SVD and use eigenvectors with highest eigenvalues for coordinates • Laplacian eigenmaps [Belkin, Niyogi ’02] - form graph laplacian from adjacency matrix, L = D A , apply SVD to L and use eigenvectors with smallest non-zero eigenvalues for coordinates

Preserving structure

A connectivity algorithm G(K) such as k-nearest neighbors should be able to recover the edges from the coordinates such that G(K) = A

Embedding

Connectivity G(K)

Edges

Points

**Structure Preserving Embedding
**

[Shaw, Jebara ’09]

**• SDP to learn an embedding K from A • Linear constraints on K preserve the global
**

topology of the input graph • Convex objective favors low-rank K close to the spectral solution, ensuring lowdimensional embedding • Use eigenvectors of K with largest eigenvalues as coordinates for each node

**Structure Preserving Embedding
**

[Shaw, Jebara ’09]

max tr(KA)

K⇥K

Dij > (1

**Aij ) max(Aim Dim ) ⇧i,j
**

m

where K = {K ⇤ 0, tr(K) ⇥ 1,

**Dij = Kii + Kjj
**

SDP

2Kij

P

ij

Kij = 0}

SVD

A2B

n⇥n

K2R

n⇥n

L 2 Rn⇥d

1 SDP a From only connectivity information describing = a triplet (i, j, k) such maximum= 1 and Aik which neighbors, b-matching, or that Aij SVD spanning 0. weight where the step-size = ⇤t . Tony Jebara X n nodes in randomly chosen graph are connected, K specifying disorm and for eachaccepts as input clearly can we learnthe,set of This set Computer Science kernel constraint a if ⇥ tree) which aof all triplets a triplet subsumes Cl an acan use projection to enforce th Dept. fP = ⇥tr(L LA) (L) max(tr ue A, tr(Cl L⇤ L) and returns an adjacency each each individual low-dimensional above, and allows node call that tance constraints update L for matrix, ⇥ embedding > 0 thencoordinatesaccording to: we suchan s, Columbia University l S ij (L L)ij = 0, by subtracting Structure be written as theused to>reconstruct Preserving application of G y these coordinates can easily tr(Cl K) llest embedding structure preserving ifbe Embedding constraint New York, NY 10027 to 0 where d dividing each entry of L by its F the l K) Lt+1structure ij (finput Cl )Kkk Temporarily the tr(Coriginal K= Lt + SGDnetwork? . G(K) = A. =reproduces of + (Lt ), graph: 2K the 2Kik to optimized via K exactly We will maximize f (L) via projec sjj [Shaw, constraints, Jebara here dropping the centering reserving Embedding and scaling preserves’11] we gradient decent. Deﬁne the subgra oLinear the step-sizeto SPE1 learns aKeach as K tr(Cl K constraints on K=be . written step, the a enforce that matrix ⌅ constraint After we As via ces. where now formulate the SDP above as maximizing the can ﬁrst proposed, single randomly chosen triplet: gt topology of the input adjacency matrixthen decomposes ⇤ Imp following objective function that tr(L L) over cansemideﬁnite program (SDP) and+ 2K⇥solvesKkk . use projection to enforce2KijL: ( SPE for greedy nearest-neighbor constraints 1 and the ik hich Ptr(Cl K) = Kjj ⇥ K ⇤ L L = 2L(⇥A Cl ) Red if tr in following = 0, by performing singular value decomij (L L)ij SDP: by subtracting the mean from L and et of Deﬁne distance and weightX terms of K: ⇤ x (f (L), Cl ) = ⇤ position. ⇥tr(L LA) by itsmax(tr(Cl L directly j f (L) = We of L Frobenius L), liza- dividing each entrypropose optimizing L norm.0). using 0 otherwise hing that st Dij = iKii descent 2Ktr(C beca stochastic gradient + Kjj k (SGD).l K) < 0 ij max tr(KA) l⇥S ngs, ruct K K Wij = Tony = Kii Kjj + 2Kij Dij Jebara and for each randomly chosen impo trip cted m We will maximize(1 (L) ij ) max(Aim Dim ) ⌅i,j subDij > f Avia Dept. Computer Science projected stochastic node ⇥ k tr(Cl L L) > 0 then update L acco m distr(C A, gradient decent. Deﬁne the subgradient 0 terms of a Columbia University l K) > in SPE-SGD i j P fc- a s stsosingle K = {K ⇤chosenNY⇥ 1, Aij s.t. A ⌅0} randomly York, G(K) = arg max triplet: ˜ ij Kij ˜ T Wij New 0, tr(K) 10027 where Lt+1 = Lt + (f (Lt ), ˜ logotan yields exponential number ij constraints of form: of SGD A X = s ( ⇥ mning fStructure ⇥tr(L LA) l ) if tr(Cmax(tr(Cl L⇥ where the step-size (L) =C ) = 2L(⇥Aconstraints can ⇤ L) >written L), 0). 1 Large-Scale⇤Struct C 0 re l L be preserving = 1t . A Algorithm (f (L), l Structure preserving constraints also beneﬁt bors k-nearest neighborsotherwise {C can , ...C }, where .D trix as a set of matrices S = 0 1 , C2 bedding can use projection to enforce that l These m in methods P ns a dimensionality reduction algorithms. S SPE forl greedy nearest-neighbor constraints solves the hest ⇥ each C is a constraint matrix corresponding to ( n h Require:ijA 0, bynsubtracting th , dimension an similarly DijSDP: Aij ) max(Aim Dim ) that preserve if ﬁnd randomly chosen triplet constraint C , compact coordinates ij (L L) = ⇧ B then > (1 following (i, j, k) such that Aij = 1 and Aik =l 0. ⇥ and for a tripleteach m dividing each ⇥, and maximum i 2L(⇥A Cto: of parameter properties of the input according l if tr(Cl L L) > 0 entry of L by its Fro alue certainl L⇤ L) >all then update Ldata. Many)of these lizatr(C set of 0 = orm This (L), Cl ) triplets clearly subsumes the SPE (f at 1: Initialize L0 ⌅ rand(d, n) max ectly manifold learning techniques preserve local distances ings, -balls blah constraints, and otherwise individual A, distance blah K tr(KA) allows each st K 0 ct (or optionally initialize to sp Ltopology. + = Lt We (f (Lthat ladding 0 where ), cted but not graphto be written showtr(Cl K) > explicit t+1 llest constraint Dij > (1 Aij )as t(AC ) im ) ⌅i,j ng max im Laplacian eigenmaps solution) constraints to these existingD dis- topological= Kjj 2Kij + 2Kikm Kkk . algorithms is tr(Cl K) y1 1 2: t ⌅ 0 and the ij (Aij f so- crucial for preventing 2 ) ⇥=(Aijt ),After each step, we Lwhere =DL{K+ 0,folding ⇥1t1, PCl ) = problems (f ⌅ collapsing 0} here t+1 K = step-size tr(K)(L . 2 ) K t ⇤ x, a where 3: repeat ij olog- that occur projection to enforce centering ⇤ L) ⇥scaling can use in dimensionality reduction. ij es. Temporarily dropping the that tr(L and 1 and li1 P n 4: t ⌅ ⇤t+1 ⇤ maximum weightby subtracting the mean fromblah subgraph method blah L and j constraints, preserving formulate the SDP above as h: ij (L L) = 0, ng Structure ij we can now be written hich When the connectivityof Lconstraints can maximum 5: i ⌅ rand(1 . . . n) dividing each entry algorithm G(K) is a maximizing the following by its Frobenius norm. L: objective function over

Large-scale SPE

A

L

Video: h)p://vimeo.com/39656540 Notes on next slide

Notes for previous slide: Each node in this network is a person, each edge represents friendship on foursquare. The size of each node is proportional to how many friends that person has. We can see the existence of dense clusters of users, on the right, the top, and on the left. There is a large component in the middle. There are clear hubs. We can now use this low-dimensional representation of this high-dimensional network, to better track what happens when a new coffee shop opens in the east village. As expected, it spreads ...like a virus, across this social substrate. We see as each person checks in to la colombe, their friends light up. People who have discovered the place are shown in blue. The current checkin is highlighted in orange in orange. It’s amazing to see how la colombe spreads. Many people have been talking about how ideas, tweets, and memes spread across the internet. For the ﬁrst time we can track how new places opening in the real world spread in a similar way.

**The Social Graph • What does this low-dimensional structure
**

mean? • Homophily

• Location, Demographics, etc.

The Social Graph

**Inﬂuence on foursquare • Tip network
**

• sample of 2.5m people “doing” tips from other •

people and brands avg. path length 5.15, diameter 22.3

**• How can ﬁnd the authoritative people in this
**

network?

**Measuring inﬂuence w/ PageRank
**

[Page et al ’99]

• Iterative approach

• start with random values and iterate • works great w/ map-reduce X PR(j) P PR(i) = (1 d) + d k Aik

j2{Aij =1}

**Measuring inﬂuence w/ PageRank
**

[Page et al ’99]

**• Equivalent to ﬁnding the principal
**

eigenvector of the normalized adj. matrix

A2B

n⇥n

PR(i) / vi where Pv =

Aij Pij = P j Aij

1v

**Inﬂuence on foursquare • Most inﬂuential brands:
**

• History Channel, Bravo TV, National Post,

Eater.com, MTV, Ask Men, WSJ, Zagat, NY Magazine, visitPA, Thrillist, Louis Vuitton

**• Most inﬂuential users
**

• Lockhart S, Jeremy B, Naveen S

Explore

A social recommendation engine built from check-in data

Foursquare Explore

• Realtime recommendations from signals: • location • time of day • check-in history • friends preferences • venue similarities

**Putting it all together
**

Nearby relevant venues Friends’ check-in history, similarity Similar Venues User’s check-in history MOAR Signals

< 200 ms

Our data stack • MongoDB • Amazon S3, Elastic Mapreduce • Hadoop • Hive • Flume • R and Matlab

**Open questions • What are the underlying properties and
**

dynamics of these networks? • How can we predict new connections? • How do we measure inﬂuence? • Can we infer real-world social networks?

Conclusion

• Unique networks formed by people interacting

with each other and with places in the real world • Massive scale -- today we are working with millions of people and places here at foursquare, but there are over a billion devices in the world constantly emitting this signal of userid, lat, long, timestamp

Join us!

foursquare is hiring! 110+ people and growing foursquare.com/jobs

Blake Shaw @metablake blake@foursquare.com

- Hadoop @ Foursquare
- Foursquare Data and Explore Final
- Machine Learning and Big Data at Foursquare
- Foursquare - ML Presentation
- Creative Commons CC+ Overview
- WITNESS - Cameras Everywhere (2011)
- Use of Internet for Terrorist Purposes
- Where and why do people watch video on smartphone vs. tablets
- The Future of Money
- Boston.rb
- Twitter, Pig, and HBase. For Bay Area Hadoop User Group May 2010
- (Big) Data Science
- Web 2.0 Tools in Education
- IAB Mobile Devices Report Final
- Indexing The World Wide Web
- Wikipedia Sources
- Control of two robotic platforms using multi-agent systems
- 2013
- AGREEMENT CONTAINING CONSENT ORDER
- Are deep artificial neural network architectures a suitable approach for solving complex business-related problem statements?
- Technology Pioneers 2011
- Reuters Institute Digital Report
- Metadata, Content Models, and Taxonomies
- RenderMan. Hacker + Airplanes = No Good Can Come of This
- Power Consumption Analysis of a Modern Smartphone
- 2012 KPCB Internet Trends Year-End Update
- What’s the value of data of a community of a Festival brand?
- Pentagon North Korea Report
- Facebook Launches "Collections" for Businesses; New Buttons
- Unlocking the Potential Through Creative Commons

Are you sure?

This action might not be possible to undo. Are you sure you want to continue?

We've moved you to where you read on your other device.

Get the full title to continue

Get the full title to continue reading from where you left off, or restart the preview.

scribd