Machine Learning With Large Networks of People and Places

Machine Learning with Large Networks of People and Places
Blake Shaw, PhD Data Scientist @ Foursquare @metablake
What is foursquare?
An app that helps you explore your city and connect with friends A platform for location based services and data
What is foursquare?
People use foursquare to: share with friends discover new places get tips get deals earn points and badges keep track of visits
What is foursquare?
Mobile Social
Local
Stats
15,000,000+ people 30,000,000+ places 1,500,000,000+ check-ins 1500+ actions/second
Video: h)p://vimeo.com/29323612
Overview Intro to Foursquare Data Place Graph Social Graph Explore Conclusions
The Place Graph 30m places interconnected w/ different

signals:
ow co-visitation categories menus tips and shouts
NY Flow Network
People connect places over time

Places people go after the Museum of Modern
Art (MOMA): MOMA Design Store, Metropolitan Museum of Art,
Rockefeller Center, The Modern, Abby Aldrich Rockefeller, Sculpture Garden, Whitney Museum of American Art, FAO Schwarz
Places people go after the Statue of Liberty:
Ellis Island Immigration Museum, Battery Park, Liberty Island, National September 11 Memorial, New York Stock Exchange, Empire State Building
Predicting where people will go next

Cultural places (landmarks etc.) Bus stops, subways, train staHons Airports College Places Nightlife
AMer bars: american restaurants, nightclubs, pubs, lounges, cafes, hotels, pizza places AMer coee shops: oces, cafes, grocery stores, dept. stores, malls
Collaborative ltering
How do we connect people to new places theyll like? People Places
[Koren, Bell 08]
Item-Item similarity
Find items which are similar to items that a user has
already liked
User-User similarity
Find items from users similar to the current user Low-rank matrix factorization First nd latent low-dimensional coordinates of users
and items, then nd the nearest items in this space to a user
Item-Item similarity

Pro: can easily update w/ new data for a user Pro: explainable e.g people who like Joes pizza, also like Lombardis Con: not as performant as richer global models
User-User similarity Pro: can leverage social signals here as well... similar
can mean people you are friends with, whom youve colocated with, whom you follow, etc...
Finding similar items

Large sparse k-nearest neighbor problem Items can be places, people, brands Different distance metrics Need to exploit sparsity otherwise
intractable
Finding similar items

Metrics we nd work best for recommending: Places: cosine similarity
sim(xi , xj ) =
Friends: intersection
xi xj kxi kkxj k
Brands: Jaccard similarity

sim(A, B) =
|A\B| |A[B|
sim(A, B) = |A \ B|
Computing venue similarity
each entry is the log(# of checkins at place i by user j) one row for every 30m venues...
X2R
nd
Kij = sim(xi , xj ) xi xj = kxi kkxj k
K2R
nn
Computing venue similarity Naive solution for

computing K :
O(n d)
Requires ~4.5m
machines to compute in < 24 hours!!! and 3.6PB to store!
Kij = sim(xi , xj ) xi xj = kxi kkxj k
K2R
nn
Venue similarity w/ map reduce

key user vi, vj vi, vj key vi, vj score score ... score ... visited venues score score
map
emit all pairs of visited venues for each user
reduce
nal score
Sum up each users score contribution to this pair of venues
The Social Graph 15m person social network w/ lots of

different interaction types:
friends follows dones comments colocation
What happens when a new coffee shop opens in the East Village?
A new coffee shop opens...
The Social Graph
The Social Graph

How can we better visualize this network?
A2B
nn
L 2 Rnd
Graph embedding
Spring Embedding - Simulate physical system
by iterating Hookes law Spectral Embedding - Decompose adjacency matrix A with an SVD and use eigenvectors with highest eigenvalues for coordinates Laplacian eigenmaps [Belkin, Niyogi 02] - form graph laplacian from adjacency matrix, L = D A , apply SVD to L and use eigenvectors with smallest non-zero eigenvalues for coordinates
Preserving structure
A connectivity algorithm G(K) such as k-nearest neighbors should be able to recover the edges from the coordinates such that G(K) = A
Embedding
Connectivity G(K)
Edges
Points
Structure Preserving Embedding

[Shaw, Jebara 09]
SDP to learn an embedding K from A Linear constraints on K preserve the global

topology of the input graph Convex objective favors low-rank K close to the spectral solution, ensuring lowdimensional embedding Use eigenvectors of K with largest eigenvalues as coordinates for each node
Structure Preserving Embedding

[Shaw, Jebara 09]
max tr(KA)
KK
Dij > (1
Aij ) max(Aim Dim ) i,j

m
where K = {K 0, tr(K) 1,
Dij = Kii + Kjj

SDP
2Kij
ij
Kij = 0}
SVD
A2B
nn
K2R
nn
L 2 Rnd
1 SDP a From only connectivity information describing = a triplet (i, j, k) such maximum= 1 and Aik which neighbors, b-matching, or that Aij SVD spanning 0. weight where the step-size = t . Tony Jebara X n nodes in randomly chosen graph are connected, K specifying disorm and for eachaccepts as input clearly can we learnthe,set of This set Computer Science kernel constraint a if tree) which aof all triplets a triplet subsumes Cl an acan use projection to enforce th Dept. fP = tr(L LA) (L) max(tr ue A, tr(Cl L L) and returns an adjacency each each individual low-dimensional above, and allows node call that tance constraints update L for matrix, embedding > 0 thencoordinatesaccording to: we suchan s, Columbia University l S ij (L L)ij = 0, by subtracting Structure be written as theused to>reconstruct Preserving application of G y these coordinates can easily tr(Cl K) llest embedding structure preserving ifbe Embedding constraint New York, NY 10027 to 0 where d dividing each entry of L by its F the l K) Lt+1structure ij (finput Cl )Kkk Temporarily the tr(Coriginal K= Lt + SGDnetwork? . G(K) = A. =reproduces of + (Lt ), graph: 2K the 2Kik to optimized via K exactly We will maximize f (L) via projec sjj [Shaw, constraints, Jebara here dropping the centering reserving Embedding and scaling preserves11] we gradient decent. Dene the subgra oLinear the step-sizeto SPE1 learns aKeach as K tr(Cl K constraints on K=be . written step, the a enforce that matrix constraint After we As via ces. where now formulate the SDP above as maximizing the can rst proposed, single randomly chosen triplet: gt topology of the input adjacency matrixthen decomposes Imp following objective function that tr(L L) over cansemidenite program (SDP) and+ 2KsolvesKkk . use projection to enforce2KijL: ( SPE for greedy nearest-neighbor constraints 1 and the ik hich Ptr(Cl K) = Kjj K L L = 2L(A Cl ) Red if tr in following = 0, by performing singular value decomij (L L)ij SDP: by subtracting the mean from L and et of Dene distance and weightX terms of K: x (f (L), Cl ) = position. tr(L LA) by itsmax(tr(Cl L directly j f (L) = We of L Frobenius L), liza- dividing each entrypropose optimizing L norm.0). using 0 otherwise hing that st Dij = iKii descent 2Ktr(C beca stochastic gradient + Kjj k (SGD).l K) < 0 ij max tr(KA) lS ngs, ruct K K Wij = Tony = Kii Kjj + 2Kij Dij Jebara and for each randomly chosen impo trip cted m We will maximize(1 (L) ij ) max(Aim Dim ) i,j subDij > f Avia Dept. Computer Science projected stochastic node k tr(Cl L L) > 0 then update L acco m distr(C A, gradient decent. Dene the subgradient 0 terms of a Columbia University l K) > in SPE-SGD i j P fc- a s stsosingle K = {K chosenNY 1, Aij s.t. A 0} randomly York, G(K) = arg max triplet: ij Kij T Wij New 0, tr(K) 10027 where Lt+1 = Lt + (f (Lt ), logotan yields exponential number ij constraints of form: of SGD A X = s ( mning fStructure tr(L LA) l ) if tr(Cmax(tr(Cl L where the step-size (L) =C ) = 2L(Aconstraints can L) >written L), 0). 1 Large-ScaleStruct C 0 re l L be preserving = 1t . A Algorithm (f (L), l Structure preserving constraints also benet bors k-nearest neighborsotherwise {C can , ...C }, where .D trix as a set of matrices S = 0 1 , C2 bedding can use projection to enforce that l These m in methods P ns a dimensionality reduction algorithms. S SPE forl greedy nearest-neighbor constraints solves the hest each C is a constraint matrix corresponding to ( n h Require:ijA 0, bynsubtracting th , dimension an similarly DijSDP: Aij ) max(Aim Dim ) that preserve if nd randomly chosen triplet constraint C , compact coordinates ij (L L) = B then > (1 following (i, j, k) such that Aij = 1 and Aik =l 0. and for a tripleteach m dividing each , and maximum i 2L(A Cto: of parameter properties of the input according l if tr(Cl L L) > 0 entry of L by its Fro alue certainl L L) >all then update Ldata. Many)of these lizatr(C set of 0 = orm This (L), Cl ) triplets clearly subsumes the SPE (f at 1: Initialize L0 rand(d, n) max ectly manifold learning techniques preserve local distances ings, -balls blah constraints, and otherwise individual A, distance blah K tr(KA) allows each st K 0 ct (or optionally initialize to sp Ltopology. + = Lt We (f (Lthat ladding 0 where ), cted but not graphto be written showtr(Cl K) > explicit t+1 llest constraint Dij > (1 Aij )as t(AC ) im ) i,j ng max im Laplacian eigenmaps solution) constraints to these existingD dis- topological= Kjj 2Kij + 2Kikm Kkk . algorithms is tr(Cl K) y1 1 2: t 0 and the ij (Aij f so- crucial for preventing 2 ) =(Aijt ),After each step, we Lwhere =DL{K+ 0,folding 1t1, PCl ) = problems (f collapsing 0} here t+1 K = step-size tr(K)(L . 2 ) K t x, a where 3: repeat ij olog- that occur projection to enforce centering L) scaling can use in dimensionality reduction. ij es. Temporarily dropping the that tr(L and 1 and li1 P n 4: t t+1 maximum weightby subtracting the mean fromblah subgraph method blah L and j constraints, preserving formulate the SDP above as h: ij (L L) = 0, ng Structure ij we can now be written hich When the connectivityof Lconstraints can maximum 5: i rand(1 . . . n) dividing each entry algorithm G(K) is a maximizing the following by its Frobenius norm. L: objective function over
Large-scale SPE
Video: h)p://vimeo.com/39656540 Notes on next slide
Notes for previous slide: Each node in this network is a person, each edge represents friendship on foursquare. The size of each node is proportional to how many friends that person has. We can see the existence of dense clusters of users, on the right, the top, and on the left. There is a large component in the middle. There are clear hubs. We can now use this low-dimensional representation of this high-dimensional network, to better track what happens when a new coffee shop opens in the east village. As expected, it spreads ...like a virus, across this social substrate. We see as each person checks in to la colombe, their friends light up. People who have discovered the place are shown in blue. The current checkin is highlighted in orange in orange. Its amazing to see how la colombe spreads. Many people have been talking about how ideas, tweets, and memes spread across the internet. For the rst time we can track how new places opening in the real world spread in a similar way.
The Social Graph What does this low-dimensional structure

mean? Homophily
Location, Demographics, etc.
The Social Graph
Inuence on foursquare Tip network

sample of 2.5m people doing tips from other
people and brands avg. path length 5.15, diameter 22.3
How can nd the authoritative people in this

network?
Measuring inuence w/ PageRank

[Page et al 99]
Iterative approach
start with random values and iterate works great w/ map-reduce X PR(j) P PR(i) = (1 d) + d k Aik
j2{Aij =1}
Measuring inuence w/ PageRank

[Page et al 99]
Equivalent to nding the principal

eigenvector of the normalized adj. matrix
A2B
nn
PR(i) / vi where Pv =
Aij Pij = P j Aij
1v
Inuence on foursquare Most inuential brands:

History Channel, Bravo TV, National Post,
Eater.com, MTV, Ask Men, WSJ, Zagat, NY Magazine, visitPA, Thrillist, Louis Vuitton
Most inuential users

Lockhart S, Jeremy B, Naveen S
Explore
A social recommendation engine built from check-in data
Foursquare Explore
Realtime recommendations from signals: location time of day check-in history friends preferences venue similarities
Putting it all together

Nearby relevant venues Friends check-in history, similarity Similar Venues Users check-in history MOAR Signals
< 200 ms
Our data stack MongoDB Amazon S3, Elastic Mapreduce Hadoop Hive Flume R and Matlab
Open questions What are the underlying properties and

dynamics of these networks? How can we predict new connections? How do we measure inuence? Can we infer real-world social networks?
Conclusion
Unique networks formed by people interacting
with each other and with places in the real world Massive scale -- today we are working with millions of people and places here at foursquare, but there are over a billion devices in the world constantly emitting this signal of userid, lat, long, timestamp
Join us!
foursquare is hiring! 110+ people and growing foursquare.com/jobs
Blake Shaw @metablake blake@foursquare.com

Machine Learning With Large Networks of People and Places

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Machine Learning With Large Networks of People and Places

Uploaded by

Copyright:

Available Formats

Machine Learning with Large Networks of People and Places

Blake Shaw, PhD Data Scientist @ Foursquare @metablake

15,000,000+ people 30,000,000+ places 1,500,000,000+ check-ins 1500+ actions/second

The Place Graph 30m places interconnected w/ different

ow co-visitation categories menus tips and shouts

People connect places over time

Places people go after the Statue of Liberty:

Predicting where people will go next

Finding similar items

Finding similar items

Brands: Jaccard similarity

Computing venue similarity

Kij = sim(xi , xj ) xi xj = kxi kkxj k

Computing venue similarity Naive solution for

Kij = sim(xi , xj ) xi xj = kxi kkxj k

Venue similarity w/ map reduce

emit all pairs of visited venues for each user

Sum up each users score contribution to this pair of venues

The Social Graph 15m person social network w/ lots of

friends follows dones comments colocation

A new coffee shop opens...

The Social Graph

The Social Graph

Structure Preserving Embedding

SDP to learn an embedding K from A Linear constraints on K preserve the global

Structure Preserving Embedding

Aij ) max(Aim Dim ) i,j

Dij = Kii + Kjj

Video: h)p://vimeo.com/39656540 Notes on next slide

The Social Graph What does this low-dimensional structure

Location, Demographics, etc.

The Social Graph

Inuence on foursquare Tip network

How can nd the authoritative people in this

Measuring inuence w/ PageRank

Measuring inuence w/ PageRank

Equivalent to nding the principal

Aij Pij = P j Aij

Inuence on foursquare Most inuential brands:

Most inuential users

Putting it all together

Open questions What are the underlying properties and

You might also like