Professional Documents
Culture Documents
Fuzzy Information Retrieval pt1
Fuzzy Information Retrieval pt1
INFORMATION RETRIEVAL
UNIVERSITY OF TRIESTE
Introduction 3
Document representation 7
Indexing function 7
Query representation 7
Query matching 7
Similarity function 8
Discussion 12
Bibliography 13
Introduction
The Boolean Retrieval model is the most known Information Retrieval System and the most used one, maybe
for its simplicity. This comes with some disadvantages such as
a) Difficult formulation of good queries because operators are difficult to combine
b) The importance of terms within a query cannot be specified
c) Retrieved documents cannot be ranked, so if you must limit a list of retrieved documents you cannot
discard the least important documents because you do not have a measure of importance.
d) Terms present in the body of a document are treated with same importance as terms present in the
title.
To overcome these issues, more advanced models had been developed such as the Vector Space model, the
Bayesian Network model, or the Probabilistic model. Nevertheless, these models differ a lot from the
Boolean model. We can instead extend the Boolean model using the fuzzy set theory. In this way, instead of
assigning 1 to a term that is present in the document and 0 otherwise, we can use a measure of membership
of a term to a document: the larger the membership is, the more important the term is for qualify the content
of the document.
First, fuzzy logic and fuzzy set theory are presented. Then, the use of fuzzy theory in an information retrieval
system. Finally, a personalized implementation of an information retrieval system will be presented.
Boolean Fuzzy
AND(x,y) Min(x,y)
OR(x,y) Max(x,y)
NOT(x) 1 − x
Note that using 1 for true and 0 for false we can see that Zadeh operators work like Boolean operators.