Trusted Search Communities

Peter Briggs and Barry Smyth Adaptive Information Cluster School of Computer Science & Informatics University College Dublin Dublin, Ireland peter.briggs@ucd.ie, barry.smyth@ucd.ie

ABSTRACT

We describe a social search technique that harnesses the search experiences of a community of searchers to generate result recommendations, in a collaborative fashion, to complement results returned from some underlying search engine. We describe a dynamic model of trust as a way to coordinate collaboration, and provide experimental results to show that search performance improves as the network evolves.
ACM Classification: General terms:

suggested by user u are regularly selected by other users, then u will receive a high trust score and his/her promotion suggestions will be considered more reliable in the future. In contrast, malicious users who attempt to fraudulently promote certain results will find their trust scores degrading as other users ignore their suggestions, so that eventually their recommendations will be eliminated from consideration. Until now, even in the work of trust-based promotion [1], community experience has always been stored in a centralised fashion. In this paper, we first look at some previous work relating to the area of collaborative search. We then introduce, for the first time, a new distributed approach to CWS in which each searcher is associated with their own local search agent, storing his/her own search history. We describe a technique for constructing a search network by connecting users together so that a query qT submitted by a user uT may receive promotions from the search agent of a different user uS , if uS has relevant search experience for qT . Furthermore, we describe how the traditional CWS ranking algorithm can be adapted for this distributed approach to promotion, and how linkages between searchers can be strengthened and weakened as their trust relationships evolve. As an aside, this scenario also facilitates a more flexible approach to community membership and community evolution: rather than relying on explicit search communities, implicit communities are allowed to form and evolve due to the creation and elimination of trust relationships. Finally, we describe the results of a preliminary evaluation that demonstrates the potential of this approach, showing how the performance of the search network improves as it evolves to form a trusted network of relationships.
RELATED WORK

H.3.3 Information Storage and Retrieval: Information Search and Retrieval. Human Factors, Algorithms, Experimentation.

Keywords:

Trust, social networks, communities, collabo-

rative search.
INTRODUCTION

Collaborative Web Search (CWS) is a centralised, communitybased approach to Web search in which the past search behaviours of a community of like-minded searchers are reused to influence the ranking of results from some underlying search engine. In particular, results that have been reliably selected in the past, for queries that are similar to the current target query, are promoted within the result-list so that the final result-ranking is better adapted to the learned preferences of the community. The work of [4] highlights the effectiveness of this approach in a number of search scenarios. CWS has traditionally adopted a single repository of search experience per community, in which the queries and resultselections of community members are combined to form an anonymous community profile. Recently we have considered the potential benefits of breaking with this tradition, so that individual users can be identified, with a view to learning how trustworthy particular individuals are in order to use their trust score to influence ranking. In [1] we describe how user trustworthiness can be mined from the selection data of other community members; for example, if result promotions
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. IUI’07, January 28–31, 2007, Honolulu, Hawaii, USA.. Copyright 2007 ACM 1-59593-481-2/07/0001 ...$5.00.

As Romano et al. [2] noted, there is an increasing demand for information retrieval systems that allow for collaborative, social information discovery. Their search system, CIRE, encouraged collaboration between teams of work colleagues and allowed users to view team queries, to add annotations to pages and queries, and to rate result pages on a relevance scale. Whereas the CIRE system assumed a predefined group with a common task, ReferralWeb [3] sought to determine the social neighbourhood of individuals automatically by mining Web pages and looking for co-occurring names. When a user performed a query, the Web search results were prioritised so that documents authored by close neighbours were promoted. Using yet another approach, the SOAP sys-

337

tem described by [5] builds a profile for each user based on their bookmark collection. Collaborative filtering techniques are then used to find profiles similar to that of the target user, and relevant results are suggested using these profiles as sources recommendation.
DISTRIBUTED COLLABORATIVE WEB SEARCH

Trust-Based Ranking

The central contribution of this paper is to describe how we can disassemble the centralised model of CWS to support a network of search partners connected by trust-weighted linkages that will evolve over time.
A Review of Collaborative Web Search

By connecting two users uT and uS , we can allow these users to collaborate during future searches so that when uT enters a new query, recommendations come from his/her own hitmatrix but also from the hit-matrix of uS ; we will discuss query propagation through the network in the next section. For now it is important to understand that the strength of the connection between a pair of users is a reflection of how trustworthy one user is to the other with respect to the provision of relevant search result recommendations. Thus, the strength of the connections between pairs of users in the network vary and are modelled as levels of trust, or trust scores. The trust score between users uT and uS is determined by the proportion of uS ’s recommendations that uT has selected; see Equation 4 where SelRecs(uS , uT ) is the number of times that uT has selected one of uS ’s recommended results, and N umRecs(uS , uT ) is the total number of recommendations made by uS to uT . It is useful to note that the connections that the trust scores are associated with are directed; that is T rust(uT , uS ) does not necessarily equal T rust(uS , uT ) , and so our search network takes the form of a directed graph.

The basic model of CWS [4] stores search behaviour as a community matrix (hit-matrix, HC ) that relates queries to selected results, such that HC (i, j) is the number of times that the members of a community C have selected page pj for query qi . When a new query qT is submitted, a set of recommendations is generated by choosing results that have been selected for similar queries; ultimately these recommendations are promoted within the result-list returned by some underlying search engine. Query similarity is typically measured in terms of query-term overlap (see Equation 1) and results are ranked by their weighted relevance. The relevance of pj to qi is calculated as the proportion of times that pj has been selected for qi (see Equation 2), and these local relevance values are combined across all similar queries and weighted by the similarity of these queries to qT ; see Equation 3. |q ∩ q | |q ∪ q | HC (i, j) ∀j HC (i, j)

T rust(uT , uS ) =

SelRecs(uS , uT ) N umRecs(uS , uT )

(4)

Sim(q, q ) =

(1)

Relevance(pj , qi )

=

(2)

According to our model of trust, recommendations that originate from users with high trust scores should be considered more reliable than recommendations that originate from users with lower trust scores. This is captured by a the trustbased relevance formula shown in Equation 5; a recommendation from uS to uT is weighted by the level of trust that uT has for the recommendations of uS based on past experiences.

W Rel(pj , qT , q1 , ..., qn ) = i=1...n Relevance(pj , qi ) • Sim(qT , qi ) i=1...n Exists(pj , qi ) • Sim(qT , qi )
Building a Trust-Enhanced Search Network

(3)

T Rel(uT , uS , pj , qT , q1 , ..., qn ) = T rust(uT , uS ) • W ReluS (pj , qT , q1 , ..., qn )
Query Propagation

(5)

The basic idea behind our new search network model is to associate each user u with his or her own hit-matrix, Hu , which stores their personal search history. In turn each user is supported by a CWS agent that works in the usual way to generate recommendations from their personal hit-matrix, and an underlying search engine, in response to their queries. Of course the limitation is that these recommendations are now only drawn from an individual user’s personal search history, instead of a much richer community search history. The solution is to facilitate the sharing of search recommendations between user search agents in a collaborative fashion. Thus we construct a search network of connected search agents. These connections can be made in a variety of ways: for example, users with similar search histories might be initially connected or users might be connected at random.

In our trust network, not all users are interconnected. For example, in Figure 1 we see that user uT is connected to users u1 and u4 . This does not mean that uT can only receive recommendations from these two directly connected users however. Instead, we allow queries to be propagated through the network at search time along chains of connected users, with recommendations propagated back along the chains to the target user. In this way, a query qT submitted by uT is first propagated to u1 and u4 and then on to users u2 , u3 , u5 , and u6 , and so on, as shown in Figure 1. If a user has recommendations for qT then these are returned to the intermediary who passed on the query, and so on back up the chain to uT . To reflect the fact that some recommendations come from further down the chain than others, recommendations that originate from an indirect search partner are weighted with a trust score that is the combination of the trust scores of the

338

sequence of partners along the chain. Currently we use a simple multiplicative model to combine these trust scores.

listed as having tagged the http://www.foaf-project.org URL (the home page of the Friend of a Friend project). For each user, we retrieved all the URLs that they had bookmarked, and the tags that they had used to label them. On average we downloaded 406 bookmarked pages and tag-sets for each user, with each profile containing an average of 242 unique tags (or queries). We then created a hit-matrix for each user, associating every bookmarked page with its respective queries to simulate a search history for each individual. In this experiment we are primarily interested in the ability of collaborating searchers to generate relevant recommendations from their hit-matrices, and so in what follows we do not rely on an underlying search engine to produce complete result-lists. We adopt a leave-one-out approach to the evaluation: for each user we simulate a search for each one of their queries, qT , in each case temporarily removing that row from their hit-matrix, and comparing the recommended search results from the search network to the known selections that this user has made for this query. Specifically, a result rS suggested by some user uS , in response to a query qT submitted by user uT , is considered relevant if and only if rS is actually contained within the hit-matrix of uT as a selection for qT . In this way we can measure precision and recall for each qT by comparing the recommendations made to uT to this user’s prior selections for qT . This method is repeated for a number of iterations, each iteration or epoch replaying the leave-one-out strategy for all users to allow the trust scores, which start out at 0.5 for all connected users, to evolve. In the initial stages of the experiment, when no trust connections exist between users, 10 users are selected at random to respond to query qT as potential recommenders. During the experiment, trust scores between a pair of users are updated after each recommendation between them, but only after 10 recommendations have been made. This ensures that there is sufficient recommendation history between them to calculate a reliable trust score.

Figure 1: Example of a trust network.

Updating the Trust Network

It is worth pointing out that this model allows for the creation of new trust links if users select results that have been recommended by a distant search partner. For instance, in the example in Figure 1 uT receives and ultimately selects a recommendation from u2 , thereby leading to the creation of a new link between uT and u2 . Currently we are exploring a variety of different heuristics for determining the initial trust scores of such connections. It is also worth noting that at the end of each search session the selections of the target user, in this case uT , are used to update the trust scores of all users who contributed a recommendation. If a user has contributed a recommendation that has been selected then their trust score will increase, if not it will decrease. Once again, we are currently evaluating a number of different trust models as alternatives to the proportional model described in Equation 4.
EVALUATION

In this section we describe the results of a preliminary evaluation of this distributed approach to CWS using data collected from real users.
Setup

The data for our evaluation comes from the profiles of 50 users of the online social bookmarking site Del.icio.us 1 . For each user we view their bookmarked pages as search results that they were interested in, and the tags they associate with these pages as the queries that they might have used when searching for those pages. In the past it has been difficult to obtain this type of logged search data, so the Del.icio.us dataset provides a valuable and reasonably plausible alternative. In order to achieve some degree of overlapping user interest, we downloaded the profiles of the first 50 users that were
1 http://del.icio.us

Figure 2: The evolution of the distribution of trust across the search network. The Evolution of Trust

How then do the trust scores evolve during the course of the experiment? The results are presented in Figure 2 as a series of trust distribution graphs produced at the end of epochs 1, 5, 10, 15, and 20. These graphs indicate that during

339

the course of the experiment trust is effectively distributing across the search network. At the end of the first epoch the majority of trust relationships still have their default strength of 0.5; there are 579 trust relationships in our search network and over 90% of these (529) have a score of between 0.5 and 0.75 at the end of epoch 1. However, the trust scores gradually settle as a result of search activity and by the end of epoch 20, just under 30% of the relationships have a trust score in this range: overall we see a gradual flattening of the trust distribution curve.
The Evolution of Search Performance

By epoch 20, this success rate has more than doubled for k = 1, with a success rate of over 6% at this position, and increased to nearly 11% for the top 10 results.

Ultimately we are interested in the ability of the search network to deliver improved search performance as the network evolves to accommodate more refined recommendations. To investigate this we compute the average precision and recall scores for the recommended result-lists of varying sizes (k = 1...10) at the end of each epoch. Once again we present the results for epochs 1,5,10,15, and 20, this time as a standard precision and recall graph in Figure 3. The first thing to note is that the precision and recall values are very low, but this more of an artifact of the strict test for relevance used in this experiment than a reflection of search quality. Nevertheless, we see marked changes in the precision and recall statistics of the search network as it evolves, with both precision and recall increasing steadily with each epoch. In future work we hope to show that, in a realistic search setting, these increases will scale up and translate to a noticeable increase in result quality as perceived by users.

Figure 4: Percentage of sessions with recommendations containing a relevant result within the top k .

CONCLUSIONS

We have introduced a new distributed model of collaborative Web search that generates community-focused search result recommendations from a search network of trusted search partners. Preliminary experimental evidence supports the view that as trust relationships evolve the recommendations become increasingly accurate.
ACKNOWLEDGMENTS

This material is based on works supported by Science Foundation Ireland under Grant No. 03/IN.3/I361.
REFERENCES

1. P. Briggs and B. Smyth. On the Role of Trust in Collaborative Web Search. Artificial Intelligence Review, 2006. to appear. 2. N. C. Romano Jr., D. Roussinov, J. F. Nunamaker, and H. Chen. Collaborative Information Retrieval Environment: Integration of Information Retrieval with Group Support Systems. In HICSS ’99: Proceedings of the Thirty-Second Annual Hawaii International Conference on System Sciences-Volume 1, pages 1053–1062, Washington, DC, USA, 1999. IEEE Computer Society. 3. H. Kautz, B. Selman, and M. Shah. Referral Web: Combining Social Networks and Collaborative Filtering. Communications of the ACM, 40(3):63–65, 1997. 4. B. Smyth, E. Balfe, J. Freyne, P. Briggs, M. Coyle, and O. Boydell. Exploiting Query Repetition and Regularity in an Adaptive Community-Based Web Search Engine. User Modeling and User-Adapted Interaction: The Journal of Personalization Research, 14(5):383–423, 2004. 5. A. Voss and T. Kreifelts. SOAP: Social Agents Providing People with Useful Information. In Proceedings of the International ACM SIGGROUP Conference on Supporting Group Work : the Integration Challenge, pages 291–298. ACM Press, 1997.

Figure 3: Precision versus recall for result-lists sizes from 1 to 10. Note the numbers alongside the epoch-1 nodes indicate the value of k (result-list size) for this precision-recall pair.

One thing to note about the above results is that precision and recall scores of 0 are included for search sessions that receive no recommendations from the search network. Thus, in Figure 4 we present an alternative performance graph which computes the average percentage of sessions that include a relevant result within the top k recommendations in sessions where recommendations are actually made. Once again we see a steady increase as the trust network evolves. For example, during epoch 1, successful results are found in the top result-list position about 3% of the time, rising to just over 9% of the time if we consider the top 10 result-list positions.

340