parsed by the process and for each result, theURL, Title, Description, Rank andSearchSource are stored, creating a Resultobject. These results are entered into a
TreeMap
data structure with the key as the
url
and the item as the
Result
object.The GUI also provides for advanced searchoptions for entering Boolean queries, Phrasesearches, selecting the number of results per search engine and the selection of searchengines to be queried.
5.1 Design Decisions
During the design of Tadpole, wevarious design decisions were taken. Someof them are listed below:
Why TreeMap?
TreeMap data structure combines the nicefeatures of a tree ( low search and retrievaltime) and Map (easy association) datastructures. By storing the results with theURL as the key, we can retrieve a result in(log n) time while removing the duplicatesand merging them in the ranking algorithm.This helps in a considerable speed up whenwe have hundreds of results from eachsearch engine.The TreeMaps thus obtained from each of the threads are then inserted in an array and passed on to the Ranking algorithm. TheRanking algorithm then returns a tree mapsorted on rank.
Why these three ranking strategies?
The positional methods are computationallymore efficient. They give a good precisionwhen compared to just aggregation of resultswithout using any ranking. The scaled-footrule method is computationally morecomplex, but is proven to have given much better performance. It is also useful in thereduction of spam to an extent. As the basicidea of this project was to study the trade-offs involved, we wanted to get a gradationin the level of computational complexity and performance and so we chose these threerank aggregation methods.
5.2 Ranking Aggregation MethodsImplementedTake the Best Rank
In this algorithm, we try to place a URL atthe best rank it gets in any of the searchengine rankings.That is,MetaRank (x) =Min(Rank1(x),Rank2(x),…. , Rankn(x));Clashes are avoided by an ordering of thesearch engines based on popularity. Thatmeans, if two results claim the same positionin the meta-rank list, the result from a more popular search engine, (say Google) is preferred to the result from a less popular one.
Borda’s Positional Method
In this algorithm,
the MetaRank of a url isobtained by computing the Lp-Norm of theranks in different search engines.MetaRank(x)=[Σ(Rank1(x)
p
,Rank2(x)
p
,…. , Rankn(x)
p
)]
1/p
In our algorithm, we have considered theL1-Norm which is the sum of all the ranksin different search engine result lists.Clashes are again avoided by search engine popularity.The search source for a URL, which isdisplayed in the meta search results, is set asthe search engine in which the URL isranked the best.
Scaled Footrule Optimization Method
In this algorithm, the scaled footruledistances are used to rank the variousresults. Let T1, T2 , … Tn be partial listsobtained from various search engines. Lettheir union be S. A weighted bipartite graphfor scaled footrule optimization (C,P,W) isdefined asC = set of nodes to be rankedP = set of positions availableW(c,p) = is the scaled- footrule distance( from the Ti’s ) of a ranking that placeselement ‘c’ at position ‘p’, given byW(c,p) =
∑
I=1k
| Ti(c)/|Ti| - p/n|Where n = number of results to be rankedand |Ti| gives the cardinality of Ti.Computation of foot-rule aggregation for partial lists is NP-hard [1]. Hence the use of
Add a Comment