Surve Akshav J., Shah Manav B., Tripathv Amiva K.
Don Bosco Institute of Technologv
akshav.surve¸vahoo.com, reachmanav¸gmail.com, tripathv.a¸gmail.com


Search engines use ranking algorithm to determine the order in which web pages
are returned on the results` page in response to a user query. However, users are the
best judges Ior measuring a web page`s relevance to their search query. This paper
presents an approach to optimize the relevance oI search engine results through
explicit user Ieedback. The goal oI this paper is to build this experimental model
named 'UFeedRevs - User Feeds Relevant Results¨. Experimental results on a set oI
subjective queries suggest a marked improvement in the ranked search results list.


Search engines build indexes based on Irequency oI occurrence and position oI
keywords along with link popularity and back link analysis |14|. Search engines do a
good job when the queries are clear and speciIic. But user queries are oIten ill-Iormed
and ambiguous. When a query such as 'sports¨ is entered into the search engine, the
results a user would expect is unclear. In this scenario human intelligence seems a
better judge to determine the relevance oI search results speciIic to the query. In this
paper, we propose a model named UFeedRevs which incorporates user Ieedback
along with other Iactors involved in the ranking algorithm.

This paper is structured as Iollows: Section 2 describes some approaches which try
to improve the relevance oI search results. Section 3 elaborates the proposed system
and its experimental architecture to incorporate user Ieedback in the ranking process.
Section 4 describes the UFeedRevs system, which collects Ieedback Ior web pages,
computes ratings based on the Ieedback and displays those ratings on the search result
list Section 5 consists oI the evaluation criteria Ior the UFeedRevs system. In section
6, we discuss actual experimental results Irom sample queries.


Various approaches have been explored to improve the relevance oI web search
results. Some oI them are as Iollows:

Web click-through data
User click-through data can be extracted Irom a large amount oI search logs
accumulated by web search engines. These logs typically contain user-submitted
search queries, Iollowed by the URL oI Web pages which are clicked by users in
the corresponding search result page. However, the problem is that beIore viewing
the page Ior the query, a vote has already been cast. The clicks-through data may be
very misleading and may introduce inaccurate metadata to associated web pages as
discussed in |14, 9|.


It is based on the concept oI balanced tree to present some critical questions Ior
guiding users to have the proper Ieedback in Iurther searching |13|. An active
Ieedback approach analyzes the distribution oI Web pages and provides users the
suggestion Ior proper Ieedback. Users are guided Ior Iast searching, and are asked
interactive questions. Besides, as the system has tried to understand what kind oI
impact the Ieedback will introduce, the item clicked Ior Ieedback would highly
beneIit the system`s perIormance |11|. The major drawback oI this system is a high
level oI user involvement.


In this scheme, search results adapt to users with diIIerent inIormation needs. Search
systems that adapt to each user`s preIerence |12| can be achieved by constructing
user proIiles based on modiIied collaborative Iiltering |4|. However, the system in
this case is exhaustive. Also, once in a while user might Ieel the need to retrieve
inIormation as against deIined in his proIile.


This section describes the UFeedRevs as implemented by us Ior experimental


Following tasks are involved in the system:

‡ Find search results Ior a particular query.
‡ Fetch Ieedback ratings (iI any) Ior those web pages and display them
alongside the search results.
‡ Accumulate user Ieedback.
‡ Filter user Ieedback.
‡ Compute new Ieedback ratings based on the newly arrived Ieedback and also
applying a Iorgetting Iactor to the old ones.

To test the UFeedRevs approach we built a prototype which would perIorm the above
mentioned tasks and named it Ieedee. It is available to test at the Iollowing URL:
http://Ieedee.dbit.in. Also, to avoid building a Iull-Iledged search engine we used
the Google Search API to access the Google Search results. UFeedRevs in actuality
should be closely tied to the search engine to explore its Iull potential. Here is a short
description oI our small prototype Ieedee.


Figure 3.1: Architecture of feedee

As seen in Figure 3.1, the user queries in Ieedee. The Core now queries a
conventional search engine (in this case, Google) and Ietches the search results. The
results along with the calculated Ieedback Irom the Feedback Rating database are
shown to the user. The user may browse any oI the search results, read through them
and provide Ieedback Ior the web pages on a scale (here 0...100).This Ieedback is then
stored in Feedback Logs database. Feedback Engine calculates the actual Ieedback
rating Ior the documents and stores it in the Feedback Ratings database. Feedback
Ratings database is used to maintain the relationship between the search results, query
and computed Ieedback.


The algorithm proceeds by collecting Ieedback (also called Ieedee calories),
classiIying the same and computing Iinal ratings Ior them.

Collecting feedback
System allows a user to give Ieedback to a search result page Ior the query, aIter
analyzing the page. User gives Ieedback in a scale ranging Irom 0-100.
A Ieedback Ior a web page is deIined as a vector denoted by

Here, N` Ieedbacks are gathered in a batch; `N` being a large number. For a web
page, when `N` Ieedbacks are received; only then ratings are calculated Ior that page.

Classifying feedback

Collected user Ieedback is classiIied into classes. As a result, a set oI discrete values
Ior each class is obtained. Thus, we maintain a record oI number oI users who have
chosen a particular class oI Ieedback.


Ieed|10| ÷ 20; // 20 users have given Ieedback rating as 10 in scale Irom 0-100
Ieed|20| ÷ 22; . .
Ieed|100| ÷ 2;

In general, IHHG>L@ 1L

where 1L corresponds to the number users who have given rating oI L Ior a web page.

UFeedRevs System Parameters

Figure 4.1 shows various parameters involved in the system. Process starts with
identiIication oI the Mode` (i.e. the value which has most number oI Ieedbacks)
which is 50 in this case.

Figure 4.1: UFeedRevs System Parameters

The dispersion is then computed along with the
cut-oII line. This process tries to
Iocus Ieedback rating evaluation based Ior mass user opinions and tries to reduce the
eIIect oI noise.

Various steps involved are as Iollows:

Mode evaluation:

We deIine mode as the value which has most number oI Ieedbacks.
AIter classiIication, we evaluate which class has maximum support among the users
and this value becomes the mode.

Statistical Dispersion around Mode:

Dispersion is a measure oI diversity in a data set. Its value is zero, Ior identical data in
the sets and its value increases with diversity. An important measure oI dispersion is
the VWDQGDUGGHYLDWLRQ, which is the square root oI the variance.

Variance determines how Iar the actual values are Irom the Mean (expected value).
The variance
! 2
can be deIined as Iollows:


We consider the deviation around an arithmetic mode and include number oI
Ieedbacks Ior corresponding classes Ialling under that range.

We deIine Dispersion Set ; as:


The mean
oI all the elements oI dispersion set is computed. This value is reduced
by 1, which we call as the
- cut-oII parameter.

The value oI 1 is an empirical parameter derived by successive testing.


An imaginary horizontal line is drawn Ior value
on the graph. The elements oI the
F set it cuts on its way, become members oI the `Cut-oII set`.

We use the Cut-off Set to consider other signiIicant classes along with those in the
Dispersion Set.

Web page rating R0

Firstly, we deIine union on two sets that would Iorm the basis to compute rating. The
Final set T is deIined as:

The arithmetic mean oI the Iinal set T then becomes rating `R` Ior the page.

Aging and Flushing

While computing new ratings Ior a web page and keyword pair, old ratings may
exist Ior the same. Very old Ieedback may not still be relevant as the web page may
get updated or because oI new user trends. Thus, there is a need Ior a model which
Ilushes very old Ieedback while the not so old and current Ieedback should inIluence
the ratings.

Let N
be the number oI user Ieedbacks preserved and R
be their cumulative
rating. Then iI N
and R
represent size oI batch and current rating respectively,
new rating R
can be derived as Iollows:

where x is an optional multiplier which may be used to boost the value oI the current
Ieedback rating.

When x ÷ 1, the Iormula reduces to its simple weighted average Iorm and thus more
importance would be given to the accumulated Ieedback since N
would always be
less than N

When N
.x tends to N
, R
approaches arithmetic mean oI R
and R

Old Ieedback is Ilushed oII Irom the system gradually and new Ieedback is

Counterattacking possible drawbacks

8)HHG5HYV is expected to perIorm well in an ideal environment. However, it may be
subjected to XQIDLUUDWLQJV.

A company web site can seek help Irom a group oI users in order to be given unIairly
high/low ratings by them. This will have the eIIect oI degrading/boosting a company
reputation, thereIore allowing the web page oI the company a higher ranking than it

It would be expected that the Iiltering would be most eIIective with a low proportion
oI unIair raters. As the proportion oI unIair raters increases, it becomes more diIIicult
to determine which raters are truthIul and which rates are lying.

Dellarocas etal. |7| IdentiIies two categories oI unIair ratings:

‡ unIairly positive ratings (which he calls EDOORWVWXIILQJ¶)
‡ unIairly negative ratings (which he calls EDGPRXWKLQJ`)

The explanation can be best understood by the Iollowing example: II a particular web
page has high user rating (say 92°). Now, iI a user rates this as just 1° relevant, then
this Ieedback can be considered as dishonest and thus can be rejected.

could be alternatively used with the system |2|.


In this section, several metrics are described, which can be used to compare two
ranked lists.


The Precision is applied to measure the perIormance oI our proposed algorithm.
Given a query Q, let R be the set oI the relevant pages to the query and , 5 , be the
size oI the set; let A be the set oI top 20 results returned by our system. Precision is
deIined as:
Precision measures the degree to which the algorithm gives an accurate result.


Given a query, we ask the ten volunteers to identiIy top 10 authoritative pages
according to their own judgments. The set oI 10 authoritative web-pages is denoted by
M and the set oI top 10 results returned by search engines is denoted by N.

Probation measures the ability oI the algorithm to produce pages that are most likely
to be visited by users. Probation measurement is more relevant to users` degree oI
satisIaction on the perIormance oI a web search engine.

Relevance of search results

The results that are displayed Ior a particular query should be relevant to the query.
Relevancy is a relative term and thereIore the user is the key who would decide on
amount oI relevance incurred by the web page Ior a particular query.

Result overlap

These results are then compared with results oI some standard web search engines.
This overlap criteria is beneIicial to check out how much has the user Ieedback
created an impact on the search system.

Given the same data set, the distributed search engine system is expected to return a
very similar, iI not identical, ranked page list oI the user Ieedback results.


We used a set oI sample queries to test our system. Following is the perIormance oI
the system on two oI those queries:

Query: '3d animation¨

Query: 'run a marathon¨

Above queries are subjective in its own context. Humans can analyze the matter Ior
its content and quality. This section contains two terms:-

‡ ȝ(Pure Average):
This represents pure average oI all the Ieedbacks collected in the batch.
‡ Calories:
This is the rating oI the page, when UFeedRevs algorithm is applied.

As shown in Figure 6.1, our system Iilters out dishonest Ieedbacks or any Ieedback
which does not have much oI an impact on the system.

Figure 6.2 is a typical scenario where in users are conIused about the rating Ior the
web page. Thus, the system rating approximates to more or less the arithmetic mean.

Figure 6.3 drives out inconsistencies in the Ieedback and provides ratings which most
oI the users think appropriate.

Figure 6.4 shows ratings when there are two modes possible as they vary with very
close Irequency oI users.

Figure 6.1: Query 1: ȝ: 71.4, Calories: 81.83

Figure 6.2: Query 1: ȝ: 49.2, Calories: 53.38

Figure 6.3: Query 1: ȝ: 58, Calories: 70.67

Figure 6.4: Query 1: ȝ: 52.8, Calories: 52.8


UFeedRevs is our attempt to improve the relevance oI web search results by
incorporating explicit user Ieedback and thus introducing a human element into the
process ranking web pages.

AIter testing UFeedRevs, its viability as a Iully Iunctional web search engine is still
unclear. The controlled testing environment made estimating its impact as a Iully
Iunctional web search engine diIIicult.

Our prototype Ieedee deployed at Don Bosco Institute oI Technology showed signs
oI adapting to the tastes oI the users. Thus, such a system has potential Ior standalone
implementation or act as an abstraction layer Ior other conventional search engines.

Considering the Iavorable response Irom the Iaculty, the student community oI Don
Bosco Institute oI Technology and other institutes reinIorces the Iaith we have in


This work is Iully supported by Don Bosco Institute oI Technology, Mumbai, India.
We thank Dr. S. Krishnamoorthy, Dr. N G. Joag and Dr. Revathy Sundararajan Ior
their invaluable support to this work.


A. Josang, E. Gray, and M. Kinateder. Analysing Topologies oI Transitive
Trust. In Proceedings of the Workshop of Formal Aspects of Securitv and Trust
(FAST 2003), Pisa, September 2003.
A. Whitby, A. Jsang and J.Indulska. Filtering Out UnIair Ratings in Bayesian
Reputation Systems. The Icfain Journal of Management Research, 4(2), pp.48-
64, February 2005.
Audun Jsang, Shane Hird, Eric Faccer, Simulating the EIIect oI Reputation
Systems on E-markets, Lecture Notes in Computer Science, Volume 2692, Jan
2003, Pages 179 194.
Boris Chidlovskii, Natalie S. Glance and M. Antonietta Grasso Xerox Boris
Chidlovskii, Natalie S. Glance and M. Antonietta. Collaborative Re-Ranking of
Search Results Grasso Xerox Research Centre Europe.
Brin, S. & Page, L. (1998). The anatomy oI a large-scale hypertextual web
search engine. Proceedings of the 7th International WWW Conference,
Brisbane, Australia, 107-117.
Chen, Z., Liu, S., Wenyin, L., Pu, G. & Ma, W. (2003). Building a web
thesaurus Irom web link ltructure. Proceedings of the 26th annual international
ACM SIGIR conference on Research and development in information retrieval.
Chrysanthos Dellarocas. Immunizing online reputation reporting systems
against unIair ratings and discriminatory behavior. Proceedings of the ACM
Conference on Electronic Commerce, pages 150-157, 2000.
Dellarocas, Chrysanthos N., ¨Sanctioning Reputation Mechanisms in Online
Trading Environments with Moral Hazard¨ (July 2004). MIT Sloan Working
Paper No. 4297-03. Available at SSRN: http://ssrn.com/abstract÷393043 or
DOI: 10.2139/ssrn.393043
Gui-Rong Xue, Shen Huang, Yong Yu, Hua-Jun Zeng, Zheng Chen, Wei-Ying
Ma. Optimi:ing Web Search Using Spreading Activation on the Clickthrough
Data, Lecture Notes in Computer Science, Volume 3306, Jan 2004, Pages 409
- 414.
Ian Rogers. The Google Pagerank Algorithm and How It Works, Article.
Kazunari Sugiyama, Kenji Hatano, Masatoshi Yoshikawa, Shunsuke
Uemura,User-Oriented AdaptiveWeb Information Retrieval Based on Implicit
Observations, Lecture Notes in Computer Science, Volume 3007, Jan 2004,
Pages 636 643
Masatoshi Yoshikawa. Adaptive Web Search Based on User ProIile
Constructed without Any EIIort Irom Users. Nagoya University, Furo,
Chikusa, Nagoya, Aichi 4648601, Japan. yosikawa¸itc.nagoyau.ac.jp
Ray-I Chang, Jan-Ming Ho. Active Feedback Ior EIIective Web
Search. September 2005,Technical Report No. TR-IIS-05-013 http://
www.iis.sinica.edu.tw/LIB/TechReport/tr2005/ tr05.html
Sadi, S., & Jamali, H.R. (2004). Shifts in search engine development. A review
of past, present and future trends in research on search engines. Webology,
1(2), Article 6.
W. Buntine, J. LoIstrom, J. Perkio, S. Perttu, V. Poroshin, T. Silander, H.
Tirri, A. Tuominen and V. Tuulos, July 5, 2005. A Scalable Topic-Based Open
Source Search Engine |hiit-2004-14.pdI, Architecture stuII|

Sign up to vote on this title
UsefulNot useful