Intelligent Techniques for Web Personalization and Recommender Systems

AAAI Technical Report WS-12-09

Using Lists to Measure Homophily on Twitter

Jeon Hyung Kang Kristina Lerman
USC Information Sciences Institute USC Information Sciences Institute
4676 Admiralty Way 4676 Admiralty Way
Marina del Rey, CA 90292 Marina del Rey, CA 90292

Abstract homophily in online social networks in which individual in-
teractions are not constrained by geographic and organiza-
Homophily is the tendency of individuals in a social system tional proximity and are instead based on shared interests or
to link to others who are similar to them and understand- expertise. In an online social network of Twitter, for exam-
ing homophily can help us build better user models for per- ple, one is more likely to find a Semantic Web researcher
sonalization and recommender systems. Many studies have
verified homophily along demographic dimensions, such as
who is linked to another Semantic Web researcher than to an
age, location, occupation, etc., not only in real-world social app developer and vice versa, despite similar demographic
networks but also online. However, there is limited research characteristics of the two groups of users. One challenge
showing that homophily also exists when similarity is judged to demonstrate homophily is to define a metric that prop-
by topics of expertise or interests. We demonstrate the exis- erly accounts for topical similarity. (Singla and Richard-
tence of topical homophily on Twitter using a novel source son 2008) used the categories of search queries issued by
of evidence provided by Twitter lists. In this paper, we use users, in addition to their demographic characteristics, to
LDA to extract topics from Twitter lists (a collection of user measure similarity and demonstrated that people who talk to
accounts created by some user that others can follow) and each other via instant messaging are more likely to be sim-
measure similarity between listed users based on the learned ilar than a random pair of users. Others (Weng et al. 2010;
topics. We show that topically similar users are more likely to
be linked via a follow relationship than less similar users.
Schifanella et al. 2010) found that linked social media users
share topical interests and tagging vocabulary, and (Wu et al.
2011) found homophily within categories, with celebrities
Homophily is a strong organizing principle of social sys-
tending to follow other celebrities, bloggers other bloggers,
tems and has been used to explain human and social be-
and so on.
havior. Homophily refers to the tendency of individuals in
a social system to link with others who are similar to them In this paper we show homophily on Twitter using Twit-
rather than those who are less similar. The community struc- ter lists as a novel source of evidence for topical similarity.
ture homophily imposes on the social network may, in turn, In addition to broadcasting short messages, called tweets,
through the processes of influence (Christakis and Fowler registered Twitter users can follow accounts of other users
2007) and selection (Crandall et al. 2008) cause linked indi- to receive their tweets. Twitter introduced lists to help users
viduals to become even more similar. Over time, preferential manage the friends they follow. A list is created by some
linking will structure the network in such a way as to make user, referred to as the curator, who names it and adds up
the behavior of individuals (Lerman et al. 2011) and even to 500 members to it. A curator can create up to 20 lists.
future friendships (Liben-Nowell and Kleinberg 2007) more Other users can then subscribe to the list to see tweets from
predictable. list members without having to follow them directly. Essen-
Understanding homophily can help us build better mod- tially, Twitter users categorize others by tagging them with
els for user/item recommendation systems and web person- list names. By applying topic modeling techniques to lists,
alization services by taking into account users’ similarities we find the reduced dimension topic space which serves as
and their social behaviors. Existence of demographic ho- a basis for measuring similarity between list members. We
mophily, that is homophily based on demographic charac- find that topically similar users are more likely to be linked
teristics, is well established (Feld 1981; Kossinets and Watts via a follower relationship than less similar users.
2009). (McPherson, Smith-Lovin, and Cook 2001), for ex- In Section “Twitter Lists” we describe our data collec-
ample, cites over a hundred studies that support homophily tion methodology and properties of Twitter lists. Just like
along multiple dimensions, such as race, ethnicity, sex, gen- tags that are used to annotate resources in social bookmark-
der, age, religion, education, occupation, abilities, beliefs, ing sites, list names are used to categorize user accounts on
aspirations, and so on. Less empirical evidence exists for Twitter along multiple dimensions and have a long tailed fre-
quency distribution. Unlike tags, however, lists add a new re-
Copyright c 2012, Association for the Advancement of Artificial lational layer to Twitter data, since they are used to indirectly
Intelligence ( All rights reserved. follow users. In Section “Topical Homophily on Twitter”,


lection strategies to overcome the biases of limited sample size exposed by Twitter through the API (Krishnamurthy. ‘tech’ (8 lists). we expanded the list layers by collecting all lists that with previous measurements (Java et al. and identities of other members of these lists. ‘research’ (4 lists). users’ tag vo- ple contained 298K lists. kenizing list names on the hyphen. In total. one of the seed users and describe resources according to them (Golder and Hu- @jahendler (Prof. These terms. Next. In tagging systems. we also collected information about In naming lists. ‘media’ (3 lists). ‘people’ (3 lists). we collected all lists.” tion of terms in list names. (b) Starting from m newly discovered lists. Researchers have used various data col. Bakshy et al. ‘opendata’ (4 lists). in Fig. the snowball sam. ‘rpi’ (5 lists). cabulary (number of distinct terms they use) is broadly dis- ship. 10M membership. twenty of which are shown in ‘web’ (30 lists). 2011). categorizing people within subject mat- ter directories. Wu et al. tribution. Jim Hendler of RPI). Web Science evangelist. with a few terms. ball sample of users and lists as follows. as well as the vocabulary size has been listed 107 times.1 Both show a long-tailed dis- assigned have names containing terms ‘semantic’ (31 lists). as shown in Fig. This raised the 2010. 1(b). We collected a snow. recom- mending interesting users or tweets based on a topic. these users subscribe to or are members of. 1(a). social tagging systems. (b) Figure 2: Term statistics. The lists to which he has been distribution of list curators. schematically shown expertise. In addition. and 2. we collected all (a) users (from U1 to Uk ) that subscribed to them or were mem- bers of (from Uk+1 to Un ). Twitter users act very much like users of who these users were following. and 1. Weng et al. we expanded the user layer. ‘science’ (6 1 We decompose compound names into individual terms by to- lists). and Arlitt 2008.Figure 1: Schematic representation of snowball sample data collection. we use Latent Dirichlet Allocation to learn the topic distribu- tion of Twitter list members in our sample. 2011). (a) Term frequency over the entire Twitter Lists lists on log-log scale: terms are sorted by rank with the most Twitter offers an Application Programming Interface (API) frequent term ranked one. individual list curators.3M users with 111M friend. that they subscribed to or were members of: a total of 260 lists. etc. Figure 2 shows the frequency-rank distribu- “SemWeb guru. 2010. List subscrip- number of lists from 260 to almost 298K. 2007. Starting with two offer additional insights into @jahendler’s interests and initial seed users. includ- ing discovering communities of shared interests. ‘semanticweb’ (18 lists). Kwak et al. and so on. (b) Vocabulary size distribution of for data collection. experts on particular topics. ranking popular or influential users. (a) We collected all lists that n users are members of (from L1 to Lk ) and subscribers to (from Lk+1 to Lm ). ‘technology’ (6 lists). tributed and grows over time as users discover new interests As a specific example. We define a simi- larity measure based on these topics and empirically demon- strate existence of homophily. Our work demonstrates the potential of Twitter lists for numerous applications. consistent ation. In the next iter. the snowball data sample has a long tailed form. tion also shows long-tailed distribution with the numbers of we collected users associated with these 298K lists. 905K users. 27 . based on the current lists by collecting Structural Analysis of Twitter Lists all other users who are members of or subscribers to these The distribution of the number of friends and followers in lists. Gill.5M subscription links. ‘analytics’ (3 lists). yielding list subscribers ranging from one to 70K. We repeated steps (a) and (b) to collect another layer of lists and users. This yielded an additional 2573 users. ‘semweb’ (22 lists). a self-described berman 2006). general web geek. In the last step.

while the vast majority of terms travel-bloggers. follow 2825 fav-follows. family. talk. celebrity-tweets celebrities. design-blogs technology 4016 science-technology. Our data set contains 298K lists with the In addition to directly following others on Twitter. 3(b) for list subscribers. web-2-0. shop-and- twibe 8784 twibes-socialmedia. funny-stuff. Figure 3(a) shows this distribution for list cura- Twitter users broadcast messages to their followers. science-space sports-general. employers- characteristics of listed users (members) including: jp. my. politics 7955 politics. lists allow total 462K terms. tags are freely chosen by the user Art web-design. my-twitlets design-photo. chats. follow-back. news-media wine-spirits-cocktails celeb 7266 celebs. In both of cases. ing directly and indirectly? We compute the overlap quan- spired (23rd ranked “inspire” and 29th ranked “creative”). Sports-fan. food-truck. science-writers. sports- entertainment. a user will receive updates from list members even if she is ing the important role mass media plays even on Twitter (Wu not directly following them (the user has to click on the list et al. web-designmers. funny-folks wine-media-marketing.g. business 3102 business-leaders. innovators-influencers.. creative-thinkers. which occurred 21K times. faves-celebs Fan we-love-justin-bieber. creative. tech-news. art-design-photo. marketing-gurus. music-artists. innovative. food-bloggers. it-news. subscribe to lists not just for keeping track of funny things How much overlap exists among those users are follow- (i. celebrity- Linguistic Analysis of Twitter Lists gossip. personal uncontrolled vocabularies to describe a variety of marketing-contacts. jobs. profes- are used only infrequently. my 3160 my-govluv-reps.g. follow-friday... u2 . . social-networking fan list 4599 my-favstar-fm-list.un ) and m indirect (list) links Table. Table 1. influence. info Table 1: Top 20 most frequent terms in list names Leaders politicians. art-group. and Cook 2001. List curators tend to indirectly follow 28 . wine-lovers. with each point having values Direct and Indirect Following (k/n. famous-people. us-politics nyc-food.. video-games. my-list Media Social-media. u2 . list-1.. famous-folk. terms by tokenizing list names (on hyphens) and stemming Krishnamurthy. design 5522 design. pics from Space. designers celebrity 5046 celebrities. recently-followed-me • provide refining categories: e. u2 . employers. they follow directly. audio. um ). job- postings. in our data set people curate or to get the updates). online-marketing. list-2. Self-Motivation inspirational-quotes. people-i-like. funny-people jackson-fan. tech- influencers followers. entertain-me internet-marketing. Huberman. tokyo. users to indirectly follow them. terms in list names are drawn by users from their Business relation business-contacts. tech-blogs. 2.. web- from an uncontrolled vocabulary and describe different as. work-contacts. justine bieber. Term Frequency List name examples Category Sub. science 3468 movies. job-search.. creativity-innovation. movies In social tagging systems. list-3. abbs-and-tech.. things-i- like. good. most-liked. conversation-list. ceo-founders. technology-news Hobby books. news-and-politics. music. art-design Similarly. 2010. • specify some social behavior: e. movie-people. peo- turn follow the updates of others. Table 2: Categorization of Twitter list names wednesday follow. and Wu 2008. health-wellness. who-my-friends-talk- quote.. mutual-friend. friday follow velop. Even though we started from two computer science (IN DIRi = u1 . uk ). Kwak et al. my-favstar-fm-list. employment-tweets Common personal traits los-Angeles. influencers. and Fig. Mislove et al. business. • identify the type of member: e.. i-love-gaga. 2007). List Name Examples new 21040 tech-news. tech-people. web-development. Wu et al. tv-shows famous 2955 famous-people. art. the degree of overlap relative to the numbers of people fol- lowed directly and indirectly. suggest. stuff-i-like. of which 48K are unique. things-i-like Social Purpose local-friend. great-quotes. famous innovators. san diego. LA-media. conversations-and- • identify its characteristics: e. video.. world-news category tech 12940 all-about-tech. The following behavior on ple tend to indirectly follow fewer than 50% of the people Twitter has been studied by many researchers (McPherson. art- artists. humor-comedy. Weng et al. media 7810 social-media. sors. design. thursday follow. lakers- social 4853 social-media. people- • identify the member: e.. digital-tech Common Information travel-deals. used many times.. marketing. . design- pects of the tagged resource (Golder and Huberman 2006). individual terms. web-tech blogs. film-media. people 7025 people. pic to. travel-agents. sport like-me.g. 2011). music-related. designers Fun most-liked. and in tors. twibes-marketing interest savings. funny. stupid. . it-media.e. it-tech.. famous-ppl News news-politics. celebs-i-follow. titatively. laughs-and-gossip. web 4050 web-development. our final snowball data sample contains users mon links (OV ERLAPi =u1 . we first normalized 2010. friday follow. The most fre. de. since by subscribing to a list. tech. people. k/m). 2011.g. inspiring. inspire-motivate. news-politics. Figure 3 shows who subscribe to or are members of a wide variety of lists. favstar. twitter-friends. interest. web-design. news-magazined. guru. Gill. In addition. funny 4210 funny-people. Let k be the number of com- researchers. terms such as “funny”. Romero. quent term was “news”. resources. music-artistis.. To analyze the terms in Twitter lists. political. my-followers business-news. “interesting”) but also to be in. and Arlitt 2008. friday follow Smith-Lovin. celebs. business- leaders. entertainment 2764 entertainers. graphic-design. media. User can follow others through n direct (fol- We manually categorized most frequent Twitter list names in lower) links (DIRi =u1 .g.

Tagging pragmat. more users than they themselves are followed by. The figure illustrates that people tends to create the object within some categorization scheme. 2010). This gives more credit to categorizers who have more list if both List Curating Behaviors users have totally disjoint lists.243 users who curated the most following comparison for (a) list curators and (b) list sub. if user i creates N disjoint lists. Learning the Topic Model of Twitter Lists ber of lists that user i curates. cles) and non-curators (red squares).243 users who curate between 6 to 20 lists tems: categorizing and describing (K¨orner et al. average tags per post. Ni is the number of unique Latent Dirichlet Allocation (LDA) (Blei. De. and we will conclude that the user is categorizer. List curators resent a non-hierarchical assignment of objects (in this case tend to follow many people. 4(a). where Li is the num. This indicates that list curators tend to be 1 and we will conclude that i is a describer. and nij is the num. DCi will be close while subscribers use them as a source of new information. (b) Degree comparison between list curators (blue cir- scribers. and Jordan members in the lists that user i curates. while others (categorizers) user who have not curated any lists. and many ics can be measured by vocabulary size. Curators tend to follow generate disjoint lists with different users. which are shown in categorizers use one or at most a few tags to exactly place Fig. while subscribers tend lowers she has. of them have no followers. while rate) and computed their DC scores. compressed description of a document corpus. Twitter lists multiple lists using similar set of users rather than create are similar to social tagging systems in a sense that lists rep. Compared to most of curators. Twitter. Figure 4(b) shows the degree people) to categories. Two different behaviors were observed in social tagging sys. non-curators have more followers. Some users (describers) generate lists with similar Blue circles represent curators. disjoint set of lists for exclusive categorization. 2010). We observe the two behaviors also on comparison between Twitter list curators and non-curators. To analyze P different list curating behaviors. tag/resource ratio. and red rectangles represent membership but different names. and y-axis the number of fol- the users they are already following. On the other use lists as a means to categorize people they already follow. to use lists to indirectly follow new people whom they are not already following. Ng. (a) (a) (b) (b) Figure 3: Direct (via follower links) vs indirect (via lists) Figure 4: (a) DC scores of 2. We selected 2. X-axis shows the num- ber of friends a user follows. (note that 20 is the maximum number of lists one can cu- scribers use a variety of tags to describe an object. DCi will completely unsupervised model that views documents as 29 . lists. and orphan ratio (K¨orner et al. 2003) is a popular method for automatically extracting a ber of users appeared in j different lists that user i curates. hand. we compute DCi = j=1:L (j × nij )/(Ni × Li ). to 0. LDA is a If user i creates N lists with the same L members.

at topic 52 (technology. Topic distribu. a docu. fore. such as assorta- topics are different in that @DalaiLama has the second tivity on Twitter? Assortativity measures the preference of largest peak at topic 190 (politics and government) while popular (many followers) users to be linked to other popu- @BillGates has the second largest peak at topic 154 lar users. 306 followers average (from 0 to 26K) and 1. Also. such as @DalaiLama and @BillGates or @BillGates and @google. Accounts of of the 2. Next. and topics as distri. Topic distributions of is low (similarity < 1%). Figure 6(a) shows that probability resented as a mixture over 200 topics. web. the @BarackObama and @whitehouse are both peaked at probability of a link also increases.g. news and geek). We used five under topic zi . bution over words. information. or some other phenomenon. 0. Given a collection of documents. However. and then using the document’s topic dis. LDA makes together the top 2. Note erage. their other by homophily. tain similarity value.6. on a more granular level. Note that only 1. In other words. ets. 4. threshold. etc. Each topic is represented as a probability distri.000 buck- topics. Fig. and we compute the likelihood of a link signment for word wi . of a link increases steadily with similarity threshold.). Figure 6(b) shows that 24. These results show that similar users are more likely to be tions of @DalaiLama and @BillGates are peaked at linked than dissimilar users. lists seem to capture topical similarity between list members. we analyze the relationship be- possible to learn the latent topics that best explain the words tween the likelihood of a link between pairs of users and observed in the documents.8.0. linked pairs in each bin. their topical similarities. and 0. we presented continuous trends at least ten times. In the same context. To verify that the observed trends are not caused (technology. Twitter by assortativity. Note that each pair is either linked or not with cer- where zi is a latent variable indicating the hidden topic as. 0. Topical Homophily on Twitter Does topical homophily exist on Twitter? In other words. We paired 4.980 members are connected to. and stantial probability mass at this topic. are users who are more topically similar to each other.4M) and 6.000 pairs whose similarity is below some the assumption that each word is generated from one topic. is different for all documents. In this generative model. a link exists between users a variable θ. 32K followers (from 699 to 7. and most user-to-user pair similarities by subtracting Jensen-Shannon probable words in six of the topics. We say that users are linked if either a friend or a follower relationship exists be- mixtures of topics represented as a K dimensional random tween them. topically similar users are more likely to be linked. etc. However.3% of pairs are linked when their average similarity lected topics listed under the figure. more likely to be linked in the follower graph than users who are dissimilar. (from 1 to 699K) in the whole Twitter follower graph. is this effect produced topic 14 (celebrity and famous people). β).231 users who have been listed make our findings concrete.4M pairs are linked by either friends or followers mass media news sources @cnnbrk and @nytimes are relationships. while users. their topic distributions should be similar. divergence from 1.5% of pairs are linked when ure 5 shows the topic distributions of nine popular Twitter their average similarity is high (similarity < 93%). politician. we divided list members into two categories 30 . Qualitatively at least.” which is rep. 0. science. p(wi |zi . on av- both peaked at topic 50 (news. with six highest probability (stemmed) terms for se. there- butions over terms in list names that they are members of.. government. To The corpus consists of 140. media). so that similarity score ranges from 0 (most dissimilar) to 1 (most similar). ilarities with the remaining 140K list members and binned tribution θ to sample latent topic variables zi .098% topic 61 (politics. we computed pair-wise sim- the Dirichlet prior. sorted If two users are assigned to Twitter lists on similar them by similarity and divided evenly between 1.980 members. it is In the first experiment. only 2. Probability of choosing a word wi between pairs by binned together 2. @google and @mashable have a peak erage (from 0 to 30K) from 140K list members. e. distributions over 20 topics only are shown). and b if either user a follows user b or user b follows user a. Empirical Results We performed two experiments to answer the question “are Figure 5: Topic distribution θ of nine members (for simplic. For each user in our data set who is ment is generated by first picking a topic distribution θ from a member of at least ten lists. we computed the percentage of We view each list member as a “document. Specifically. different threshold values (1.). pairs of list members.151 friends av- @techcrunch.4.000 pairs.2) to bin We use LDA to learn the hidden topics of Twitter lists. we cal- culate the similarity between two users who are list mem- bers using Jensen-Shannon divergence (Lin 1991) of their learned topic distributions. @DalaiLama and @google? We study this question using Twitter lists as ev- idence to measure topical similarity. more similar users more likely to be linked ?” We computed ity.208 friends that @BarackObama and @whitehouse also have a sub. As similarity of pairs increases.

. Hofman. We repeated ganize and categorize other users. Ho. 2008) studied the interplay between simi- larity and network ties among Wikipedia editors and found that rising similarity predicts future interactions. J. 2010) found homophily on Twitter based on users’ age and country of residence. sex. M.g. a follower relationship than users who are less similar. W. 2010) investigated the interplay between homophily along diverse user attributes and the information diffusion process on so- cial media.22% to 9. Many studies have verified that peo. structural (e. (Gilbert and Karahalios 2009) presented a predictive model that maps social media data to social tie strength using thirty two variables including demographic. which describes the propensity of similar individu. Everyone’s an influencer: quantifying influence on twitter. or features of content they create. A.. probability of ilarity between pairs of users. Leskovec and Horvitz 2008. religion.54% to 3. while for less popular users. These labels serve as a basis for calcu- larity is below some threshold. als to link at a higher rate. and to indirectly follow the analysis above. ethnicity. to compute sim- Figure 6: Probability of a link between list members. we use labels created by other users to categorize the Probability of a link among top 2... dividing pairs of users into bins based topical accounts.. such as race. WSDM. 65–74. shown in Figure 6. tion of social networks and the diffusion of ideas and behav. Twitter lists allow us to explore two distinct properties ically with similarity. show characterize statistical and linguistic properties of Twitter that the probability of a link increases from 2. In future work maining explanation for this trend.000 pairs whose simi. etc.. A. Twitter lists on similarity threshold and measuring average probability offer a novel and rich data source for social data mining. we will study how topical similarity affects the behavior of social networks. Twitter lists are created by Twitter users to or- ular members have on average 1K followers. Each symbol like other studies. (Weng et al. and Watts. and that this effect cannot be explained by other fac- in both classes the probability of a link increases monoton. (a) ilarity... However. Kossinets and Watts 2009). iors on these networks. we also studied the relationship between represents probability of a tie of each bucket with average topical and structural similarity. References mophily. W. users in question. We demonstrated that Twitter a link increases from 0. Results. 31 . Ng. Kwak et al. Machine Learning Research 3:993–1022. similarity value in x-axis. friends of friends). Even though linking users who are topically more similar are also more likely to probability is different in the two classes of users. age. D. number of mutual friends. we use Twitter lists to demonstrate homophily members have on average 306K followers. Demographic homophily in so- cial media sites such as Twitter has been demonstrated by several researchers. Related Work Homophily is an well-researched topic in social science. Latent dirichlet allocation.4M pairs similar are more likely to link to each other via a friend or of list members sorted by similarity score in decreasing or. eds. Un- der and divided evenly between 1. 2003. (McPherson. is an important factor in the evolu. and Jordan. Nejdl. We believe that homophily is the re. In King.26% lists and show how they can be used to measure topical sim- for popular users. there is only limited research demonstrating that homophily also exists when similarity is judged by users’ expertise or topics of interest.g. tors. H. (De Choudhury et al. emotional. We of linking within each bin. As artifacts of human activity. E. 2011. gender. (b) 2. I. Unlike other studies that rely on users’ demographic features. Conclusion of equal size: popular and less popular members. e. while less pop.26%. (Crandall et al. similar. Bakshy. J. and Cook 2001. ple associate with each other and communicate at higher rate J. of social networks: semantics and structure. (Singla and Richardson 2008) showed that people chat with each other more often when they share interests. M. on Twitter.000 buckets. Popular In this paper.. and Li. Smith-Lovin. if they are similar along demographic characteristics. We show that users who are more classes: popular and less-popular members. D.. possibly be linked via a follower relationship than users who are less due to varying visibility or accessibility of their accounts. Blei. and other fea- tures.. Mason. Users are divided into two lating topical similairity. (Mackinnon 2006. We demonstrate the existence of topical homophily on Twitter using a novel source of evidence provided by Twit- (b) ter lists. such as information diffusion. 2010) (a) found that users who reciprocate friendship links on Twitter tend to share topical interests.

and Karahalios. Kelliher. Crandall. Conf. Golder. Divergence measures based on the shannon entropy.-P.. Gill.. and Watts. and Tseng. Kossinets. J. Am. WWW ’10. J. K. 29–42. 2010.. M. P. a social network or a news media? In WWW ’10: Proc. M.. E. 2008. Cattuto. P. 2011.. IMC ’07. WSDM.. WWW ’11. B. D. Stop thinking. 261–270. M. In Proc. Liben-Nowell. ing topic-sensitive influential twitterers. 20th Int. B.. Gummadi. A. J. 591–600. Finin.. J. on World wide web. D.. 58(7):1019–1031. New Medicine 357(4):370–379. Why We Twit- ter: Understanding Microblogging Usage and Communities. 2008. A. IEEE Transactions on Information theory 37:145–151. J. In Proceedings of the third ACM international confer- in a Large Social Network over 32 Years. in online communities. P... B. McPherson. 2006... K. F. 19–24. M. J. C. X. Conf. B. In Proc. M. on Human factors in com- puting systems.. A. H. and Fowler. C. Kleinberg. The focused organization of social ties. Gilbert. American Journal of Sociology 86(5):1015–1035. P. B. The American Journal of Sociology 115(2). Markines. A.. CHI ’09. Conf. USA: ACM.. 2007. M. D. Singla. Smith-Lovin. and Kleinberg. Suel. In WOSP ’08: Proc.. N. J. Mislove. Yes. Measurement and analysis of online social networks. Mason. New York. K¨orner. T. USA: ACM. De Choudhury. Sci. Using proximity to predict activity in social networks. ”Birds of a Feather”: Does User Homophily Wu. Who Impact Information Diffusion in Social Media? says what to whom on twitter. Druschel. 2007. Lerman. R. Huberman. J. and Arlitt. WWW ’08. and Ghosh. S. In Davison.. 1991. ACM... B.. NY.. New York. 2010. H. 2006. 2007.. The Spread of Obesity metadata.. and Gerd. Lee. S..1045. Huttenlocher. 27th Int. 1981. Leskovec. and Horvitz. 705–714. S. E. D. Annual Review of Sociol- ogy 27(1):415–444. and Liu.. Intagorn. 211–220. A. M. 160–168. 2009. Benz. and T. R. 14th ACM SIGKDD Int. Hotho. 2011. Craswell.. Marcon. So- cial networks that matter: Twitter under the microscope... A few chirps about twitter. Jiang. on World Feld. H. 19th Int. 2008. 2010.. J. 2009. S. S. D.. D.. Krishnamurthy. KDD ’08.. and He. D.. In Proc. Conf. H. Romero. J. M.. eds. Journal of Information Science 32(2). What is twitter. NY. NY. A. 7th ACM SIGCOMM conference on Internet measurement. 2010. Folks in folksonomies: social link prediction from shared 32 . There is a Correlation - S. Origins of Homophily in an Evolving Social Network. on Weng. Lin. A... E. M. Birds of a feather: Homophily in social networks. Predicting tie strength with social media. J. on World wide web. Barrat. 2007. Java. Soc. A. N. D.. Inf.. D.. USA: ACM. Twitterrank: find- Knowledge discovery and data mining. Kang. Schifanella.. and Richardson. 271–280. and Wu.. C. and Menczer. B. 915–924. USA: ACM.. Park. Song. and Huberman. New England Journal of ence on Web search and data mining.. 2010. 2008. Sundaram. Q. A. 2001. New York. L. A. Conf. In Proceeding of the 17th in- ternational conference on World Wide Web. and Bhat- tacharjee. L. Planetary-scale views on a large instant-messaging network. D.. W. NY. D. Age and geographic inferences of the livejour- nal social network. first workshop on Online social networks. Lim. Cosley. The link-prediction problem for social networks. Pro- cedings of the Joint 9th WEBKDD and 1st SNA-KDD Workshop 2007. Kwak. In Proc. and Cook.. Seligmann. Hofman. F. cite arxiv:0812. Mackinnon. and Watts. The structure of collabora- tive tagging systems. WSDM ’10. In Proc. and Moon. J. S. B. 521–530.. John.... K. The wide web. In In Statistical Network Analysis Workshop.. Feedback effects between similarity and social influence From Social Networks to Personal Behavior on the Web. start tagging: tag semantics emerge from col- laborative verbosity. York. 19th Int. J.Christakis. G. I. and Suri... Strohmaier.-H. 2008.