P. 1
Machine Learning with Large Networks of People and Places

Machine Learning with Large Networks of People and Places


|Views: 68,078|Likes:
Published by foursquareHQ

More info:

Categories:Types, Research
Published by: foursquareHQ on Mar 23, 2012
Copyright:Attribution Non-commercial


Read on Scribd mobile: iPhone, iPad and Android.
download as PDF, TXT or read online from Scribd
See more
See less





Machine Learning with Large Networks of People and Places

Blake Shaw, PhD Data Scientist @ Foursquare @metablake

What is foursquare?
An app that helps you explore your city and connect with friends A platform for location based services and data

What is foursquare?
People use foursquare to: • share with friends • discover new places • get tips • get deals • earn points and badges • keep track of visits

What is foursquare?
Mobile Social



15,000,000+ people 30,000,000+ places 1,500,000,000+ check-ins 1500+ actions/second

Video:    h)p://vimeo.com/29323612

Overview • Intro to Foursquare Data • Place Graph • Social Graph • Explore • Conclusions

The Place Graph • 30m places interconnected w/ different

• flow • co-visitation • categories • menus • tips and shouts

NY  Flow  Network

People connect places over time
• Places people go after the Museum of Modern
Art (MOMA): • MOMA Design Store, Metropolitan Museum of Art,
Rockefeller Center, The Modern, Abby Aldrich Rockefeller, Sculpture Garden, Whitney Museum of American Art, FAO Schwarz

• Places people go after the Statue of Liberty:

Ellis Island Immigration Museum, Battery Park, Liberty Island, National September 11 Memorial, New York Stock Exchange, Empire State Building

Predicting where people will go next
•Cultural  places  (landmarks  etc.) •Bus  stops,  subways,  train  staHons •Airports •College  Places •Nightlife
AMer  “bars”:  american  restaurants,   nightclubs,  pubs,  lounges,  cafes,  hotels,   pizza  places AMer  “coffee  shops”:  offices,  cafes,   grocery  stores,  dept.  stores,  malls

Collaborative filtering
How  do  we  connect  people  to   new  places  they’ll  like? People Places

Collaborative filtering
[Koren, Bell ’08]

• Item-Item similarity
• Find items which are similar to items that a user has
already liked

• User-User similarity
• Find items from users similar to the current user • Low-rank matrix factorization • First find latent low-dimensional coordinates of users
and items, then find the nearest items in this space to a user

Collaborative filtering
• Item-Item similarity
• •
Pro: can easily update w/ new data for a user Pro: explainable e.g “people who like Joe’s pizza, also like Lombardi’s” Con: not as performant as richer global models

• • User-User similarity • Pro: can leverage social signals here as well... similar
can mean people you are friends with, whom you’ve colocated with, whom you follow, etc...

Finding similar items
• Large sparse k-nearest neighbor problem • Items can be places, people, brands • Different distance metrics • Need to exploit sparsity otherwise

Finding similar items
• Metrics we find work best for recommending: • Places: cosine similarity
sim(xi , xj ) =

• Friends: intersection

xi xj kxi kkxj k

• Brands: Jaccard similarity
sim(A, B) =
|A\B| |A[B|

sim(A, B) = |A \ B|

Computing venue similarity

each  entry  is  the  log(#  of   checkins  at  place  i  by  user  j) one  row  for  every  30m   venues...



Kij = sim(xi , xj ) xi xj = kxi kkxj k



Computing venue similarity • Naive solution for
computing K :

O(n d)


• Requires ~4.5m
machines to compute in < 24 hours!!! and 3.6PB to store!

Kij = sim(xi , xj ) xi xj = kxi kkxj k



Venue similarity w/ map reduce
key user vi, vj vi, vj key vi, vj score score ... score ... visited venues score score


emit “all” pairs of visited venues for each user


final score

Sum up each user’s score contribution to this pair of venues

The Social Graph • 15m person social network w/ lots of
different interaction types:

• friends • follows • dones • comments • colocation

What happens when a new coffee shop opens in the East Village?

A new coffee shop opens...

The Social Graph

The Social Graph
How can we better visualize this network?



L 2 Rn⇥d

Graph embedding
• Spring Embedding - Simulate physical system
by iterating Hooke’s law • Spectral Embedding - Decompose adjacency matrix A with an SVD and use eigenvectors with highest eigenvalues for coordinates • Laplacian eigenmaps [Belkin, Niyogi ’02] - form graph laplacian from adjacency matrix, L = D A , apply SVD to L and use eigenvectors with smallest non-zero eigenvalues for coordinates

Preserving structure
A connectivity algorithm G(K) such as k-nearest neighbors should be able to recover the edges from the coordinates such that G(K) = A

Connectivity G(K)



Structure Preserving Embedding
[Shaw, Jebara ’09]

• SDP to learn an embedding K from A • Linear constraints on K preserve the global
topology of the input graph • Convex objective favors low-rank K close to the spectral solution, ensuring lowdimensional embedding • Use eigenvectors of K with largest eigenvalues as coordinates for each node

Structure Preserving Embedding
[Shaw, Jebara ’09]

max tr(KA)

Dij > (1

Aij ) max(Aim Dim ) ⇧i,j

where K = {K ⇤ 0, tr(K) ⇥ 1,

Dij = Kii + Kjj




Kij = 0}






L 2 Rn⇥d

1 SDP a From only connectivity information describing = a triplet (i, j, k) such maximum= 1 and Aik which neighbors, b-matching, or that Aij SVD spanning 0. weight where the step-size = ⇤t . Tony Jebara X n nodes in randomly chosen graph are connected, K specifying disorm and for eachaccepts as input clearly can we learnthe,set of This set Computer Science kernel constraint a if ⇥ tree) which aof all triplets a triplet subsumes Cl an acan use projection to enforce th Dept. fP = ⇥tr(L LA) (L) max(tr ue A, tr(Cl L⇤ L) and returns an adjacency each each individual low-dimensional above, and allows node call that tance constraints update L for matrix, ⇥ embedding > 0 thencoordinatesaccording to: we suchan s, Columbia University l S ij (L L)ij = 0, by subtracting Structure be written as theused to>reconstruct Preserving application of G y these coordinates can easily tr(Cl K) llest embedding structure preserving ifbe Embedding constraint New York, NY 10027 to 0 where d dividing each entry of L by its F the l K) Lt+1structure ij (finput Cl )Kkk Temporarily the tr(Coriginal K= Lt + SGDnetwork? . G(K) = A. =reproduces of + (Lt ), graph: 2K the 2Kik to optimized via K exactly We will maximize f (L) via projec sjj [Shaw, constraints, Jebara here dropping the centering reserving Embedding and scaling preserves’11] we gradient decent. Define the subgra oLinear the step-sizeto SPE1 learns aKeach as K tr(Cl K constraints on K=be . written step, the a enforce that matrix ⌅ constraint After we As via ces. where now formulate the SDP above as maximizing the can first proposed, single randomly chosen triplet: gt topology of the input adjacency matrixthen decomposes ⇤ Imp following objective function that tr(L L) over cansemidefinite program (SDP) and+ 2K⇥solvesKkk . use projection to enforce2KijL: ( SPE for greedy nearest-neighbor constraints 1 and the ik hich Ptr(Cl K) = Kjj ⇥ K ⇤ L L = 2L(⇥A Cl ) Red if tr in following = 0, by performing singular value decomij (L L)ij SDP: by subtracting the mean from L and et of Define distance and weightX terms of K: ⇤ x (f (L), Cl ) = ⇤ position. ⇥tr(L LA) by itsmax(tr(Cl L directly j f (L) = We of L Frobenius L), liza- dividing each entrypropose optimizing L norm.0). using 0 otherwise hing that st Dij = iKii descent 2Ktr(C beca stochastic gradient + Kjj k (SGD).l K) < 0 ij max tr(KA) l⇥S ngs, ruct K K Wij = Tony = Kii Kjj + 2Kij Dij Jebara and for each randomly chosen impo trip cted m We will maximize(1 (L) ij ) max(Aim Dim ) ⌅i,j subDij > f Avia Dept. Computer Science projected stochastic node ⇥ k tr(Cl L L) > 0 then update L acco m distr(C A, gradient decent. Define the subgradient 0 terms of a Columbia University l K) > in SPE-SGD i j P fc- a s stsosingle K = {K ⇤chosenNY⇥ 1, Aij s.t. A ⌅0} randomly York, G(K) = arg max triplet: ˜ ij Kij ˜ T Wij New 0, tr(K) 10027 where Lt+1 = Lt + (f (Lt ), ˜ logotan yields exponential number ij constraints of form: of SGD A X = s ( ⇥ mning fStructure ⇥tr(L LA) l ) if tr(Cmax(tr(Cl L⇥ where the step-size (L) =C ) = 2L(⇥Aconstraints can ⇤ L) >written L), 0). 1 Large-Scale⇤Struct C 0 re l L be preserving = 1t . A Algorithm (f (L), l Structure preserving constraints also benefit bors k-nearest neighborsotherwise {C can , ...C }, where .D trix as a set of matrices S = 0 1 , C2 bedding can use projection to enforce that l These m in methods P ns a dimensionality reduction algorithms. S SPE forl greedy nearest-neighbor constraints solves the hest ⇥ each C is a constraint matrix corresponding to ( n h Require:ijA 0, bynsubtracting th , dimension an similarly DijSDP: Aij ) max(Aim Dim ) that preserve if find randomly chosen triplet constraint C , compact coordinates ij (L L) = ⇧ B then > (1 following (i, j, k) such that Aij = 1 and Aik =l 0. ⇥ and for a tripleteach m dividing each ⇥, and maximum i 2L(⇥A Cto: of parameter properties of the input according l if tr(Cl L L) > 0 entry of L by its Fro alue certainl L⇤ L) >all then update Ldata. Many)of these lizatr(C set of 0 = orm This (L), Cl ) triplets clearly subsumes the SPE (f at 1: Initialize L0 ⌅ rand(d, n) max ectly manifold learning techniques preserve local distances ings, -balls blah constraints, and otherwise individual A, distance blah K tr(KA) allows each st K 0 ct (or optionally initialize to sp Ltopology. + = Lt We (f (Lthat ladding 0 where ), cted but not graphto be written showtr(Cl K) > explicit t+1 llest constraint Dij > (1 Aij )as t(AC ) im ) ⌅i,j ng max im Laplacian eigenmaps solution) constraints to these existingD dis- topological= Kjj 2Kij + 2Kikm Kkk . algorithms is tr(Cl K) y1 1 2: t ⌅ 0 and the ij (Aij f so- crucial for preventing 2 ) ⇥=(Aijt ),After each step, we Lwhere =DL{K+ 0,folding ⇥1t1, PCl ) = problems (f ⌅ collapsing 0} here t+1 K = step-size tr(K)(L . 2 ) K t ⇤ x, a where 3: repeat ij olog- that occur projection to enforce centering ⇤ L) ⇥scaling can use in dimensionality reduction. ij es. Temporarily dropping the that tr(L and 1 and li1 P n 4: t ⌅ ⇤t+1 ⇤ maximum weightby subtracting the mean fromblah subgraph method blah L and j constraints, preserving formulate the SDP above as h: ij (L L) = 0, ng Structure ij we can now be written hich When the connectivityof Lconstraints can maximum 5: i ⌅ rand(1 . . . n) dividing each entry algorithm G(K) is a maximizing the following by its Frobenius norm. L: objective function over

Large-scale SPE



Video:  h)p://vimeo.com/39656540 Notes  on  next  slide

Notes for previous slide: Each node in this network is a person, each edge represents friendship on foursquare. The size of each node is proportional to how many friends that person has. We can see the existence of dense clusters of users, on the right, the top, and on the left. There is a large component in the middle. There are clear hubs. We can now use this low-dimensional representation of this high-dimensional network, to better track what happens when a new coffee shop opens in the east village. As expected, it spreads ...like a virus, across this social substrate. We see as each person checks in to la colombe, their friends light up. People who have discovered the place are shown in blue. The current checkin is highlighted in orange in orange. It’s amazing to see how la colombe spreads. Many people have been talking about how ideas, tweets, and memes spread across the internet. For the first time we can track how new places opening in the real world spread in a similar way.

The Social Graph • What does this low-dimensional structure
mean? • Homophily

• Location, Demographics, etc.

The Social Graph

Influence on foursquare • Tip network
• sample of 2.5m people “doing” tips from other •
people and brands avg. path length 5.15, diameter 22.3

• How can find the authoritative people in this

Measuring influence w/ PageRank
[Page et al ’99]

• Iterative approach
• start with random values and iterate • works great w/ map-reduce X PR(j) P PR(i) = (1 d) + d k Aik
j2{Aij =1}

Measuring influence w/ PageRank
[Page et al ’99]

• Equivalent to finding the principal
eigenvector of the normalized adj. matrix



PR(i) / vi where Pv =

Aij Pij = P j Aij


Influence on foursquare • Most influential brands:
• History Channel, Bravo TV, National Post,
Eater.com, MTV, Ask Men, WSJ, Zagat, NY Magazine, visitPA, Thrillist, Louis Vuitton

• Most influential users
• Lockhart S, Jeremy B, Naveen S

A social recommendation engine built from check-in data

Foursquare Explore
• Realtime recommendations from signals: • location • time of day • check-in history • friends preferences • venue similarities

Putting it all together
Nearby relevant venues Friends’ check-in history, similarity Similar Venues User’s check-in history MOAR Signals

< 200 ms

Our data stack • MongoDB • Amazon S3, Elastic Mapreduce • Hadoop • Hive • Flume • R and Matlab

Open questions • What are the underlying properties and
dynamics of these networks? • How can we predict new connections? • How do we measure influence? • Can we infer real-world social networks?

• Unique networks formed by people interacting
with each other and with places in the real world • Massive scale -- today we are working with millions of people and places here at foursquare, but there are over a billion devices in the world constantly emitting this signal of userid, lat, long, timestamp

Join us!
foursquare is hiring! 110+ people and growing foursquare.com/jobs
Blake Shaw @metablake blake@foursquare.com

You're Reading a Free Preview

/*********** DO NOT ALTER ANYTHING BELOW THIS LINE ! ************/ var s_code=s.t();if(s_code)document.write(s_code)//-->