Field Directed Subject Search Options in J-Gate | Information Retrieval | Information Science

Volume 4 Issue 6 June 2012

Editor’s Desk

Field Directed Subject Search Options in J-Gate
Advanced search module of J-Gate has several field directed search options under drop down menu (see screen shot at the end) for each search window. Three of them, namely, „Title only‟, „Keyword only‟ and „Abstract only‟ (plus all the three together) are field directed subject search options. In addition, advanced search also has an option to search on „All fields‟ which is similar to Quick search. These filed directed subject search options appear simple and clear. Are they really?

Users often go with simple term/s or phrase/s for their subject search with or without Boolean OR or AND provided in the drop down menu of another adjacent search box. As far as choosing fields (like „Title‟, „Keyword‟ or „Abstract‟) for subject search is concerned, most users make an adhoc decision based on hunch or go for trial & error mode. It is important to note that „Title‟ is usually not sufficiently rich with required terms causing low recall, but often with better precision. The field „Keyword‟ is slightly richer and better, leading to increase in recall. The serious trap is the „Abstract‟ field which is usually over-rich with required terms and hence may retrieve all relevant items but with a very high proportion of irrelevant items. In other words, the field directed searches on „Title‟, „Keyword‟ and „Abstract‟ are in the increasing order of recall and decreasing order of precision.

Obviously searching in all the three fields may make the result worse and less likely to serve precise purpose unless the term is a rare one like GSLV in the table below. Generally, free text meta-data searches if restricted to Title will not ensure most relevant items, but produce a low proportion of non-relevant items. On the other hand, search restricted to free-text Abstract retrieves most of the relevant items, but in the process produces a very high proportion of nonrelevant items.

Some sample field directed subject searches Field/s Indexing “mobile communication” Title Keyword Abstract T+K+A All fields (Quick search) 2481 3057 6357 8608 8622 571 1089 1838 2793 2875 "Geosynchronous Satellite Launch Vehicle" OR GSLV 1 4 13 26 28

1

In order to better understand the above, we need to know two basic and most frequently used measures of information retrieval systems: 1. Recall which measures the proportion of relevant items retrieved and 2. Precision -this measures the proportion of retrieved items that are relevant. The formulae to measure in percentages are: R = [a/(a+c)] X 100 P = [a/(a+b)] X 100 Where a, b, c, d are taken from the R-P matrix:

Records Retrieved Not-retrieved Total

Relevant a (hits) c (misses) a+c

Not-relevant b (noise) d (rejections) b+d

Total a+b c+d a+b+c+d

Two complimentary measures of recall and precision are fallout and generality (click to view their definitions). The principal factor controlling the recall is exhaustivity of indexing and the principal factor controlling precision is specificity of indexing language, both can be explored in the coming issues.

M S Sridhar sridhar@informindia.co.in

1. Fallout is the proportion of non-relevant items retrieved 2. Generality is the proportion of relevant items (for a given query) in the collection F = [b/b+d] X 100 G = [(a+c)/(a+b+c+d)] X 100 Since increase in recall causes decrease in precision, a cut-off is made through the document collection to distinguish retrieved items from the non-retrieved ones: Cut-off = [a+b/ (a+b+c+d)] X 100 Note that in large systems, establishing true recall may not be possible. Instead, one has to be satisfied with best possible „recall estimate’.

2

3

Sign up to vote on this title
UsefulNot useful