You are on page 1of 9

International Journal of Computational Intelligence and Information Security, December 2012 Vol. 3, No.

10 ISSN: 1837-7823

Dynamic Ranking Algorithm Using Multi Graph Technology


S.N.Sheela Evangelin Prasad1 and Dr.M.V.Srinath2
1

Associate Professor CSE Dept, Sri Krishna Engineering College, Chennai


2

Director STET Womens College, Mannargudi.

Abstract Dynamic Ranking is a system that approximates object rank results by utilizing a hybrid approach inspired by materialized views in traditional query processing. Number of relatively small subsets of the multi graph are materialized in such a way that any keyword query can be answered by running Object Rank on only one of the multi graph. Dynamic ranking generates the multi graphs by partitioning all the terms in the corpus based on their co-occurrence, executing Object Rank for each partition using the terms to generate a set of random walk starting points, and keeping only those objects that receive non-negligible scores. The intuition is that a multi graph that contains all objects and links relevant to a set of related terms should have all the information needed to rank objects with respect to one of these terms. We present a theoretically well-founded retrieval model for dynamically generating rankings based on interactive user feedback. Unlike conventional rankings that remain static after the query was issued, dynamic rankings allow and anticipate user activity, thus providing a way to combine the otherwise contradictory goals of result diversification and high recall.

Keywords : Object Rank, Page Rank, Dynamic Rank, Multi graph I Introduction Object Rank is a system to perform authority-based keyword search on databases, inspired by Page Rank. Page Rank is an excellent tool to rank the global importance of the pages of the Web, proven by the success of Google. However, Google uses Page Rank as a tool to measure the global importance of the pages, independently of a keyword query. Google uses traditional IR techniques to estimate the relevance of a page to a keyword query, which is then combined with the Page Rank value to calculate the final score of a page. We appropriately extend and modify Page Rank to perform keyword search on databases. For example, consider the publications database of Figure 1, where edges denote citations (edges start from citing and end at cited paper), and the keyword query Sorting. Then, using the original variant of Object Rank, the Access Path Selection in a Relational Database Management System paper would be ranked highest, because it is cited by four papers containing sorting (or sort). The Fundamental Techniques for Order Optimization paper would be ranked second, since it is cited by only three sorting papers. The Page Rank algorithm utilizes the Web graph link structure to assign global importance to Web pages. It works by modeling the behavior of a random Web surfer
36

International Journal of Computational Intelligence and Information Security, December 2012 Vol. 3, No. 10 ISSN: 1837-7823

who starts at a random Web page and follows outgoing links with uniform probability. Dynamic versions of the Page Rank algorithm, Personalized Page Rank (PPR) for Web graph datasets and Object Rank for graph-modeled databases have become popular which are characterized by a query-specific choice of the random walk starting points. PPR is a modification of Page Rank that performs search personalized on a preference set that contains web pages that a user likes. Object Rank extends (personalized) Page Rank to perform keyword search in databases. Object Rank uses a query term posting list as a set of random walk starting points and conducts the walk on the instance graph of the database. Object Rank has successfully been applied to databases that have social networking components, such as bibliographic data and collaborative product design. Object Rank suffers from the same scalability issues as personalized Page Rank, as it requires multiple iterations over all nodes and links of the entire database graph.

Fig.1 II Dynamic Ranking Web documents are dynamic. Newspaper homepages such as the The Hindu change several times a day, market pace sites such as amazon can change many times an hour and blogs are updated with varying frequencies when new posts and comments are added. Some of these changes are substantial and significant for information seekers- new stories appearing on a homepage or new comments to a blog post. Others hold less interest for those looking for information- visitation counters, advertisement content, or formatting changes have little impact on the page content.

37

International Journal of Computational Intelligence and Information Security, December 2012 Vol. 3, No. 10 ISSN: 1837-7823

Currently, document ranking algorithms only have a static view of the page content. In this work we explore the interaction between the dynamics of web documents and relevance ranking, using document representations that view a document as a dynamic entity. We focus specifically on navigational searches, where there is very little variation across users on the clicked results, and there tend to be a small number of highly relevant documents that are consistently relevant across time. We find that, for these queries, there are significant relationships between the likelihood of change and the relevance level of the page. We develop a novel probabilistic retrieval model which takes into account dynamic content, and show significant performance improvements over a model that only views a document at a single point in time. To our knowledge, this is the first published study looking at content change within documents from a relevance ranking perspective. III Document Dynamics and Relevance Documents change for many reasons. The Hindu pages change whenever new stories are added or old stories are updated, amazon when new classified ads are added, and academics' home pages when new papers are published. All of these pages change at different frequencies and in different amounts. In this section we provide some examples and intuitions about how such change may be used to improve relevance ranking. We examine two change features: (1) a query-relevant feature reflecting how the terms on a page (in particular those that match the query) change over time, and (2) a query-independent feature reflecting how frequently or by how much the page changes over time. Different terms in a page's vocabulary may be more stable or dynamic, they may remain constant over the lifetime of the page, or they may appear or disappear as the document changes. These differences in temporal term characteristics may lend some insight into the terms' importance on the page for various information needs. For example, on the page http://allrecipes.com, a popular website for sharing and rating recipes, stable terms that appear consistently over time include: all recipes, cook, cookbooks, copyright, desserts, easy, healthy, newsroom, quick, recipe, and recipes. These terms represent a mix of characteristic terms that are descriptive of the overall central topic of the page and navigational elements. In contrast, terms that come and go during the summer months include: independence, themed, ag, fourth, macaroni, cream, zucchini, and grilled. These terms represent specific content that may have been on the page for a period of time, in this case relating to current holidays or the most recent recipes. This dynamic group of terms, although pertinent to the content of the page at a particular time, are not central to the main topic of the page. When considering whether a document is relevant for a particular query, we may wish to consider whether the information need is more likely to be addressed by consistent or changing terms. Is the searcher more likely to be seeking dynamic or static content? Queries reflecting current events or late-breaking news may be better served by content that is recent (thus dynamic over time). In the above example, a searcher looking for recipes to cook for the Fourth of July holiday might be satisfied with term matches in the more dynamic portion of the page. On the other hand, for navigational searches we may want to favor content that is stable over a longer period of time and characteristic of the page in general. In our example, a searcher looking for the allrecipes.com homepage would be better served by that portion of the document that does not change.
38

International Journal of Computational Intelligence and Information Security, December 2012 Vol. 3, No. 10 ISSN: 1837-7823

IV Dynamic Ranked Retrieval We now formalize the goal of Dynamic Ranked Retrieval into a well-founded yet simple decision-theoretic model. The core component is the notion of a ranking tree, which replaces the static ranking of a conventional retrieval system. The nodes in the tree correspond to individual results (i.e. documents), and each user's search experience corresponds to a path in the tree. The path a particular user takes depends on that user's actions, in particular whether the user decides to expand a result to view the corresponding indented ranking. Expanding a result corresponds to taking the right branch of the corresponding node in the ranking tree, and skipping corresponds to taking the left branch. non-relevant documents. Note that users with different query intents consider different documents as relevant, and so will take different paths through the tree. We will explore other user policies later, in particular policies involving noisy user behavior. It is now very natural to score the retrieval quality of a particular user's search experience via the documents encountered on her path through the ranking tree. Note that the traversed path corresponds to the final dynamic ranking presented in the user's browser, so that the i-th document on the traversed path corresponds to the i-th document the user sees. Thus, the traversed path is essentially a user-specific ranking, which we can evaluate using existing performance measures like n DCG, average precision, or Precision @ k. V Personalization Personalization is one of the latest trends in search engines. The two key ways to achieve personalization in authority flow-based search systems like Page Rank are using a personalized base set and adjusting the authority flow weight of the edges. The former involves selecting user dependent entities as the source of the authority in the data graph. The latter allows users to assign different importance to different types of edges. For instance, a biologist querying NCBI Entrez genomic resources may assign a high weight to the gene-to-protein link type whereas a practitioner may assign a higher score to the publication-cites-publication link type. Object Rank was the first work to propose customization of the weight associated with link types. This type of ranking is referred as authority flow ranking. The problem of achieving scalable personalization based on a personalized base set, i.e., a personalization vector. However, no previous work has addressed the problem of scalable link-based personalization based on user-dependent authority flow weights. The latter is the focus of this paper. The specified problem arises both in the context of the Web as well as other databases with association links between their entities, e.g., biological, clinical or bibliographic databases. There are two reasons why personalization of authority flow is expensive. One is that the specific weights associated with a link type will be determined by the specific user when they submit a query. Another dimension is the queryspecific vs. query-independent nature of computing the ranking. Page Rank creates a global ranking of the Web pages, whereas Object Rank creates a query-specific ranking. This is achieved by adding all query-related nodes of the data graph to a base set. To summarize, the aspect of choosing a personalized authority flow weight assignment is orthogonal to that of the base set selection. Hence, our work is applicable to both the Page Rank and the Object Rank problem variants.

39

International Journal of Computational Intelligence and Information Security, December 2012 Vol. 3, No. 10 ISSN: 1837-7823

VI Quality and Scalability Object Rank returns top-k search results for a given query using both the content and the link structure in G. Since it utilizes the link structure that captures the semantic relationships between objects, an object that does not contain a given keyword but is highly relevant to the keyword can be included in the top-k list. This is in contrast to the static Page Rank approach that only returns objects containing the keyword sorted according to their Page Rank score. This key difference is one of the main reasons for Object Ranks superior result quality, as demonstrated by the relevance feedback survey reported in. For a given query, Object Rank iterates over the entire graph G to calculate the Object Rank vector r until | ri(k+1)- ri(k)| is less than the convergence threshold for every ri(k+1) in r(k+1) and ri(k) and r(k).This is a very strict stopping condition. This iterative computation may take a very long time if G has a large number of nodes and edges. This iterative computation may take a very long time if G has a large number of nodes and edges. Therefore, instead of evaluating a keyword query at query time, the original Object Rank system precomputes the Object Rank vectors of keywords in H, the set of keywords, during the preprocessing stage, and then, stores a list of <ObjId, RankValue> pairs per keyword. However, the preprocessing stage of Object Rank is expensive, as it requires |H| Object Rank executions and O(|V | . |H|) bits of storage. In fact, according to the worst- case bounds for PPR index size proven in [4], the index size must be (|V| . |H|) bits, for any system that returns the exact Object Rank vectors. ScaleRank assumes a repository of precomputed rankings for a given set of authority flow weights. It approximates the authority flow ranking of a user-specified assignment of authority flow weights by first selecting a subset of rankings from the repository and then computing a weighted combination of these selected rankings. A key principle behind ScaleRank is the authority flow linearity theorem for the aggregate surfer; her behavior is controlled by multiple personalized rankings. VII Algorithms for Dynamic Ranking In the following, we propose two efficient algorithms for constructing dynamic ranking trees. Both algorithms build ranking trees top-down by recursively adding child nodes to the current leaves (similar to most decision-tree learning algorithms). Unlike StaticMyopic, document selection is performed by conditioning on the sequence of user interactions (e.g. result expansions and skips) that led the user to that node.

40

International Journal of Computational Intelligence and Information Security, December 2012 Vol. 3, No. 10 ISSN: 1837-7823

VIII The ScaleRank System Architecture Figure 2 shows the architecture of the system, which inputs a query (a weight assignment vector q) and outputs the top K objects based on their authority score. The system maintains a repository of M candidate rankings. For each candidate ranking we store its weight assignment vector, and its ranking vector. Given a query, the Candidate Ranking Selector selects m candidate rankings out of the M in the repository based on a heuristic described below. The reason that onlymare selected is that the cost of ScaleRank depends on the number of input rankings. ScaleRank algorithm then computes the best way to linearly combine these m rankings. Finally a top K algorithm is used to produce the top K objects. Figure shows the architecture of the BinRank system. During query processing stage, we execute the Object Rank algorithm on the subgraphs instead of the full graph and produce high-quality approximations of top-k lists at a small fraction of the cost. In order to save preprocessing cost and storage, each MSG is designed to answer multiple term queries. We observed in the Wikipedia data set that a single MSG can be used for 330-2,000 terms, on average.

41

International Journal of Computational Intelligence and Information Security, December 2012 Vol. 3, No. 10 ISSN: 1837-7823

Fig.2

IX Conclusion This paper proposed a dynamic ranked retrieval model which allows users to interactively expand the ranking to further refine the information need. The model is based on a concise decision-theoretic framework that naturally generalizes both the standard and the intent-aware static retrieval models. The framework provides a principled way of evaluating dynamic retrieval systems, as well as a basis for deriving dynamic ranked retrieval algorithms. We presented two such algorithms and prove theoretical guarantees for their retrieval quality. We also evaluated the algorithms empirically and find that dynamic rankings can provide very substantial gains in retrieval performance. Finally, we showed that the retrieval functions of these algorithms can be learned from training data. Our contributions in this work include: the first evaluation of the relationship between document dynamics and relevance ranking, the introduction of a novel document ranking algorithm for use with dynamic documents, and a query independent document prior based on document dynamics. We show that these two approaches to ranking dynamic documents are complementary and both yield significant performance gains. In this paper we studied the problem of finding the most probable ranking of the set of objects when preference probabilities are known for every pair of objects. We showed the connection between this problem and a problem in multi graph and proposed three algorithms for finding the most probable ranking. Evaluation on both synthetic and real world datasets showed that none of the algorithms outperformed the others in all the situations and each one has its strengths and weaknesses. That would suggest that it probably makes sense to combine the algorithms to get optimal results.

42

International Journal of Computational Intelligence and Information Security, December 2012 Vol. 3, No. 10 ISSN: 1837-7823

References [1] J. Aalbersberg. Incremental relevance feedback. In ACM Conference on Research and Development in Information Retrieval (SIGIR), pages 11-22, 1992. [2] R. Agrawal, S. Gollapudi, A. Halverson, and S. Ieong. Diversifying search results. In ACM Conference on Web Search and Data Mining (WSDM), 2009. [3] A. Anagnostopoulos, L. Becchetti, C. Castillo, and A. Gionis. An optimization framework for query recommendation. In ACM Conference on Web Search and Data Mining (WSDM), 2010. [4] A. Abdulkader, J. A. Drakopoulos, and Q. Zhang. Comparative classifier aggregation. In ICPR 06: Proceedings of the 18th International Conference on Pattern Recognition, pages 156159, Washington, DC, USA, 2006. IEEE Computer Society. [5] N. Ailon and M. Mohri. An efficient reduction of ranking to classification. Technical report, NYU, 2007. [6] F. Balcan, N. Bansal, A. Beygelzimer, D. Coppersmith, J. Langford, and G. B. Sorkin. Robust reductions from ranking to classification. Mach. Learn., 72(1-2):139153, 2008. [7] Agarwal, S. Chakrabarti, and S. Aggarwal. Learning to rank networked entities. In KDD '06. [8] A. Balmin, V. Hristidis, and Y. Papakonstantinou.Object Rank: Authoritybased keyword search in databases. In VLDB, pages 564575, 2004. [9] S. Chakrabarti. Dynamic personalized Page Rank in entityrelation graphs. In WWW '07: Proceedings of the 16th international conference on World Wide Web, pages 571580, New York, NY, USA, 2007. ACM. [10] J. Cho and U. Schonfeld, Rankmass Crawler: A Crawler with High Page Rank Coverage Guarantee, Proc. Intl Conf. Very Large Data Bases [11] R. Fagin, R. Kumar, M. Mahdian, D. Sivakumar, and E. Vee, Comparing and aggregating rankings with ties, in PODS 04.LDB), 2007. [12] H. Hwang, A. Balmin, B. Reinwald, and E. Nijkamp, Binrank: Scaling dynamic authoritybased search using materialized subgraphs, in ICDE 09, 2009, pp. 6677. [13] G. Jeh and J. Widom, Scaling personalized web search, in WWW 03. New York, NY, USA: ACM, 2003, pp. 271279 [14] D.Fogaras, B.Racz,K.Csalogany,and .Sarlos,"Towards Scaling Fully Personalized Page Rank: Algorithms, Lower Bounds,and Experiment", Internet Math.,vol.2,no.3,pp.333358,2005.
43

International Journal of Computational Intelligence and Information Security, December 2012 Vol. 3, No. 10 ISSN: 1837-7823

[15] K.Avrachenkov,N.Litvak,D.Nemirovsky, N.Osipova,"Monte Carlo Methods in Page Rank Computation:When One Iteration Is Sufficient", SIAM J.Numerical Analysis,vol.45,no.2, pp.890-904,2007. [16] A.Balmin,V.Hristidis, Y.Papakonstantinou,"Object Rank:A uthority-Based Keyboard Search in Databases", Proc.Intl Conf.Very Large Data Bases (VLDB),2004. [17] Z.Nie,Y.Zhang,J.-R.Wen,W.-Y.Ma,"Object-Level Ranking:Bringing Order to Objects", Proc.Intl World Wide Web Conf.(WWW), pp.567-574,2005. Web

44