Professional Documents
Culture Documents
Fuzzy Information Retrieval pt5
Fuzzy Information Retrieval pt5
The portion on the left is a process that is done just for the first time. Documents are split in tokens,
stopwords are removed, stemming is performed, and quantifiers are identified. Then, the inverted index is
built and for each document a fuzzy set µ𝑑 over index terms is computed. Each index term in the document
𝑗
The portion on the right plus the one in the bottom is carried out every time the user prompts a query. Each
“predicate” of the query is treated like a document, then merged with the other parts using fuzzy operators.
So, there is a fuzzy set representing the query, with weights associated to query terms. At this point, the
similarity between the documents fuzzy sets and the query fuzzy set is computed. The result of the query is
the ranked list of documents, eventually limited by a maximum quantity number, or by a similarity threshold
value.
The authors of the paper tried this IR system on a university database composed by 1500 papers. Precision
and recall were used to measure the system performance. They found out better precision and recall
compared to the Boolean model, with the same set of similar queries. However, this increase of performance
comes with an increasing of computation time.
Bibliography
Oussalah, Mourad & Khan, Saddam & Nefti-meziani, Samia. (2008). Personalized information retrieval
system in the framework of fuzzy logic.. Expert Syst. Appl.. 35. 423-433.
Cross, V. (1994). Fuzzy information retrieval. J Intell Inf Syst 3, 29–56
Kraft, D. et al. (1994) “An extended fuzzy linguistic approach to generalize Boolean information retrieval.”
Information Sciences - Applications 2: 119-134.
Kraft, D. and D. Buell (1983). “Fuzzy Sets and Generalized Boolean Retrieval Systems.” Int. J. Man Mach.
Stud. 19: 45-56.