Rasheed Et Al-2021-Artificial Intelligence Review

Artificial Intelligence Review
https://doi.org/10.1007/s10462-021-09972-4
Pseudo‑relevance feedback based query expansion using

boosting algorithm
Imran Rasheed1 · Haider Banka1 · Hamaid Mahmood Khan2
© The Author(s), under exclusive licence to Springer Nature B.V. part of Springer Nature 2021
Abstract
Retrieving relevant documents from a large set using the original query is a formidable
challenge. A generic approach to improve the retrieval process is realized using pseudo-rel-
evance feedback techniques. This technique allows the expansion of original queries with
conducive keywords that returns the most relevant documents corresponding to the original
query. In this paper, five different hybrid techniques were tested utilizing traditional query
expansion methods. Later, the boosting query term method was proposed to reweigh and
strengthen the original query. The query-wise analysis revealed that the proposed approach
effectively identified the most relevant keywords, and that was true even for short queries.
All the proposed methods’ potency was evaluated on three different datasets; Roshni, Ham-
shahri1, and FIRE2011. Compared to the traditional query expansion methods, the pro-
posed methods improved the mean average precision values of Urdu, Persian, and English
datasets by 14.02%, 9.93%, and 6.60%, respectively. The obtained results were also estab-
lished using analysis of variance and post-hoc analysis.
Keywords Term-selection method · Pseudo-relevance feedback · Rank aggregation

method · Query formulation · Information retrieval · Urdu language
1 Introduction
Retrieving relevant documents to satisfy user requirements is a prime challenge in the

information retrieval (IR) system. Initially, the computer-supported IR was centralized on
methods such as probabilistic or language modeling (Craswell et al. 2005; Zaragoza et al.
* Imran Rasheed
imranrasheed@cse.ism.ac.in
Haider Banka
haider.banka@gmail.com
Hamaid Mahmood Khan
hamaid.khan@gmail.com
1
Department of Computer Science and Engineering, Indian Institute of Technology (ISM) Dhanbad,
Dhanbad, India
2
Aluminum Test Training and Research Center (ALUTEAM), Fatih Sultan Mehmet Vakif
University, Beyoglu, Istanbul, Turkey
13
Vol.:(0123456789)
I. Rasheed et al.
2004) personalized search (Sieg et al. 2007) query classification (Kang and Kim 2003),
and query modification (Croft et al. 2001; Lee et al. 2008) to increase the performance of
the retrieval systems. The main objective of the query modification practices is to improve
retrieval efficiency by refining the primary query. One way to do that is through query
expansion. After the initial retrieval of the documents, the primary query is reformulated
by adding and re-weighing some extra terms [also known as pseudo-relevance feedback
(PRF)] (Lee et al. 2008; Bendersky and Croft 2008; Robertson and Jones 1976; Carpineto
and Romano 2012). For ranking and weighing query terms, a variety of weighing schemes
are available such as term frequency-inverse document frequency (TF-IDF) (Carpineto
and Romano 2012), Rocchios weight (Rocchio 1971), binary independence model (BIM)
(Robertson and Jones 1976), chi-square (CHI) (Zia et al. 2015), robertson selection value
(RSV) (Walker et al. 1996), Kullback–Leibler Divergence (KLD), Bose–Einstein1 (Bo1)
and Bose-Einstein2 (Bo2) (Amati and Van Rijsbergen 2002) just to name a few. It is well-
established that the inconsistency in PRF can lead to inaccurate retrieval of top docu-
ments (Xu et al. 2009). Therefore, the query expansion (QE) system’s performance is sig-
nificantly dependent on the efficacy of the term-selection methods. The term-selection is
addressed either using term association or corpus statistics. The term association, such as
mutual information (Church and Hanks 1990) and co-occurrence information (Van Rijs-
bergen 1977), estimate each term’s efficiency as per their occurrence in the top documents.
On the other hand, corpus statistics (KLD, BIM, and RSV) evaluate each term’s potency
based on their distribution in the corpus and their appearance in the top documents. So far,
developing a new technique for term-selection in QE is a challenge that can outperform
the other methods. However, thanks to the several existing QE term-selection methods, a
performance enhancement can be achieved using their combination. Unfortunately, such an
approach is not adequately evaluated in the Urdu language, which is otherwise widely stud-
ied in many European and other South-Asian languages. Some of the available studies on
the Urdu language for QE include the works of Rasheed and Banka (2018), who proposed
three traditional query expansion models (KLD, Bo1, and Bo2) for retrieval enhancement,
and works such as the concept search development (Riaz 2008) and ontology formulation
for the Urdu language (Thaker and Goel 2015). In this paper, a new approach is proposed
to boost the query terms. At first, automatic feedback was used to retrieve the first set of
top documents. Then, three different term-selection methods were combined, such as Kull-
back–Leibler divergence, information gain (IG), and Bose–Einstein1, to select the most
appropriate terms. Later, the terms were ranked using the rank aggregation methods such
as Condorcet and Borda count, and a boosting technique was finally applied to modify the
primary query. As a result, the proposed methodology on the available Urdu dataset signifi-
cantly improved the automatic QE performance in terms of the MAP and F1-measure. The
present work is the first of its kind on the Urdu language for IR-based studies.
The major contributions of this work are summarized as follows:
1. First, the KLD, IG, and Bo1 term-selection methods were proposed for the automatic
query expansion (AQE) based on the pseudo-relevance feedback system. Here, the
experimental analysis shows that combining multiple-term-selection methods was better
than the individual ones.
2. Second, the terms obtained from step (1) were further filtered by using Condorcet and
Borda techniques.
3. Third, a new approach was proposed to boost the query terms obtained from step (2),
which results in the required expansion of the original query.
13
Pseudo‑relevance feedback based query expansion using boosting…
4. Finally, ANOVA and Tukey post-hoc test was carried out on all the proposed methods.
The paper is organized as follows; Sect. 1 includes the introduction to Urdu query expan-
sion, while Sect. 2 presents motivation. Section 3 outlines the state-of-the-art techniques
related to query expansion. The proposed methodology and overview of salient features of
the term-selection methods are presented in Sect. 4, which is then followed by the results
and discussions and conclusions in Sects. 5 and 6, respectively.
2 Motivation
The primary aim of the present study was to enhance the usefulness of the Urdu informa-
tion retrieval system because of the limited work reported in this area. Although Urdu IR
is there for a long time, it is not well-studied due to the unavailability of large datasets.
The presence of different forums such as Text Retrieval Conference (TREC)1, Conference
and Labs of the Evaluation Forum, formerly known as Cross-Language Evaluation Forum
(CLEF)2, NII Testbeds and Community for Information Access Research (NTCIR)3, and
Forum for Information Retrieval Evaluation (FIRE)4 is available to carry out a high-level
of research on different languages such as English, European, and South Asian languages.
However, such forums are missing for the Urdu language. Therefore, we constructed a pub-
lic dataset to carry out much-needed IR based experiments on the Urdu language.
3 Related work
In the last decade, a substantial amount of work is reported on the Information Retrieval
(IR) in different languages. In Information Retrieval, the searched query is an integral part
of the IR system. When a query is made, the entered keywords can differ from those pre-
sent in the documents. Therefore, the word disparity is a fundamental challenge in IR that
needs serious consideration. When queries are longer, the probability of matching words in
both queries and the text is high, boosting the IR efficiency by retrieving the most relevant
documents. However, the queries are often reported short, measuring only 2 or 3 words, as
noticed from the World-Wide-Web application (Gabrilovich et al. 2009). This short query
results in the incongruity between the primary query and the available corpus. Thus, to
resolve this vocabulary mismatch and improve the retrieval of the top documents, the QE
technique can expand the primary query with more relatable terms present in the main
dataset. These added terms can be synonyms, plurals, or modifiers, depending on the user
feedback or query reformulation (Robertson 1977; Salton and Buckley 1990).
So far, several automatic QE methods are available that generate and rank the added
terms to increase the retrieval system’s efficiency by formulating a better query with rel-
evance feedback (Xu and Croft 2017). Moreover, the use of synonyms for words present
in the primary query is another common linguistic approach to strengthen the IR system
1
www.trec.nist.gov/ Last visited: 28-10-2020.
2
www.clef-initiative.eu/ Last visited: 28-10-2020.
3
http://research.nii.ac.jp/ntcir/index-en.html Last visited: 28-10-2020.
4
www.fire.irsi.res.in/ Last visited: 28-10-2020.
13
I. Rasheed et al.
(Voorhees 1994). Raza et al. (2019) has reported some statistical-based QE methods such
as document analysis, search and browse log analyses, and web knowledge analyses.
Pedronette et al. (2014) provided a fresh approach to address the re-ranking issue, which
depends on the similarity of top-k lists arises from the well-organized indexing structures.
Xu and Benaroch (2005) established a mix of IR procedures from data fusion and query
expansion based IR using relevance feedback. Research from Pal et al. (2014) showed an
overall improvement of the retrieval system while using query expansion on the stand-
ard TREC collections. Karisani et al. (2016) suggested more succinct query terms and
re-weight them with the help of the PRF method. Some optimization techniques such as
genetic algorithms (GA), particle swarm optimization (PSO), and bat algorithm (BA) were
also used lately to resolve QE issues in IR (Khennak and Drias 2017, 2018; Gupta and
Saini 2017; Khennak et al. 2016).
Recently, neural-based models have paid more attention to the performance of super-
vised query refinement tasks (Han et al. 2019; Li et al. 2019). Such approaches require
high quality training data so that users can learn translation from query to better modi-
fied query. In addition, BERT (Devlin et al. 2018) and its variants have achieved state-of-
the-art performance in various natural language processing (NLP) tasks. Since then, the
development of several BERT-based re-ranking models for IR (Nogueira and Cho 2019;
Nogueira et al. 2019; Yilmaz et al. 2019) has led to / been inspired. The first application
of BERT to text ranking was reported by Nogueira and Cho (2019) in January 2019 on
the MS MARCO5 passage retrieval test collection (Bajaj et al. 2016), where the task is to
rank passages (paragraph-length extracts) from web pages with respect to users’ natural
language queries from Bing query logs. The author (Kuzi et al. 2020) combines the deep
neural network model and the lexical model for the retrieval phase, leveraging both the
semantic (deep neural network-based) and lexical (keyword match-based) retrieval models
of the ad-hoc document retrieval task. The main idea of these tasks is to treat the query and
document in BERT as two consecutive sentences and to use the feed-forward layers on top
of the BERT’s classification layer to compute the relevant scores. This approach was used
for re-ranking of the passages (Nogueira and Cho 2019), and more recently for re-ranking
newswire documents (Yilmaz et al. 2019).
One of the first studies on re-weighing query terms was performed by Robertson and
Jones (1976), which was on the probabilistic search model. The basic concept was the
existence of a database containing all the required documents. The author also made a pre-
liminary assumption of the weights of the terms in the primary query to obtain the initial
set of top documents. Later, using a contingency table for the top documents, the weights
of the query terms were refined to obtain the final set. Here, no probabilistic structure was
applied; instead, the current information within the top documents was utilized.
4 Proposed system architecture
This section describes the methodologies involved in this study and their combination
strategies. At first, the data was standardized through tokenization, removal of stop words,
diacritics, and stemming (Rasheed et al. 2018). Second, the concept of query expansion
based on PRF using the TF-IDF retrieval measure (Ramos et al. 2003) was used to retrieve
5
https://microsoft.github.io/msmarco/ Last visited: 28-10-2020.
13
Fig. 1 Overview of the proposed system architecture of automatic query expansion model
the relevant documents from the given query. Some of the top relevant documents were
selected as PRF documents for constructing the candidate term pool with unique terms, as
illustrated using block architecture in Fig. 1.
The expansion models like KLD, Bo1, and IG were used to weigh the candidate terms
of Block I. These candidate terms were then further weighted by using the boosting
query term method. As a result, the terms that weigh higher were used to expand the ini-
tial query. These types of query expansions are called the Kullback–Leibler Divergence
Boosting Query Term method (KLDBQT), Bose-Einstein1 Boosting Query Term method
(Bo1BQT), and Information Gain Boosting Query Term method (IGBQT).
The rank combination methods such as Borda and Condorcet were used to combine the
multiple terms obtained from Block I. Furthermore, some of the top-ranked terms were
also used to expand the original query. Finally, the concept of boosting query terms was
used to find the ideal mixture of good candidate terms for query reformulation. Therefore,
these methods are named Borda Boosting Query Term (BBQT) method and Condorcet
Boosting Query Term (CBQT) method. The reformulated query is then submitted to the
search engine to obtain the relevant and high ranked top order documents.
4.1 Description of weighting and ranking of query expansion terms based

on similarity score
The main objective of using the PRF-based QE method is to measure the similarity
between the initial set of documents and a suitable standard for selecting expansion terms.
For the implementation of the proposed model, the steps are defined in Algorithm 1.
13
I. Rasheed et al.
4.1.1 Term‑selection expansion method
Three different term-selection methods (KLD, Bo1, and IG) were used to allot a score to
each term present in the term pool to select the candidate terms. The term pool was created
by collecting the top-ranked retrieved documents using the TF-IDF retrieval model. These
methods select the top-scored candidate terms, which later reformulated the user query.
The following Eqs. (1) to (3) present these methods mathematically.
Pn (t)
KLD w(t) =Pn (t) × log2 (1)
Pm (t)
∑ (1 + Pc )
Bo1 w(t) = tf (t, d) × log2
Pc
+ log2 (1 + Pc ) (2)
d∈n
|c| |c| ( ) (c )
∑ ∑ c
IG(f ) = − P(ci ) log P(ci ) + P(t) P i log P i
i=1 i=1
t t
|c| ( )
(3)
∑ c (c )
+ P(̄t) P i log P i
i=1
̄t ̄t
Where Pn (t) and Pm (t) are the probability of the term t in the pseudo-relevant documents n
∑
and the entire collection, respectively. d∈n tf∑(t, d) is the frequency of the query term in the
tf (t,d) ∑
top-ranked documents, Pc is given by Pc = d∈mN , while d∈m tf (t, d) is the term fre-
quency of the query term in the collection. N is the total number of documents in the col-
lection. P(t) defines the probability of the occurrence of term t, while ̄t shows the absence
of term t. Thus, ( P(̄t) = 1 − P(t) ). P(ci ) is the probability of the ith class value, and P(ci |t)
13
Table 1 Terms extracted from

methods A, B, C, D, and E Expansion terms extracted from method A: t2 , t1 , t3 , t4
Expansion terms extracted from method B: t1 , t2 , t3 , t4
Expansion terms extracted from method C: t2 , t1 , t3
Expansion terms extracted from method D: t2 , t1 , t4 , t3
Expansion terms extracted from method E: t1 , t3
Table 2 Borda term score
Term score (t1 ) = 3 + 4 + 3 + 3 + 4 = 17

Term score (t2 ) = 4 + 3 + 4 + 4 + 1.5 = 16.5
Term score (t3 ) = 2 + 2 + 2 + 1 + 3 = 10
Term score (t4 ) = 1 + 1 + 1 + 2 + 1.5 = 6.5
and P(ci |̄t) are the conditional probability of the i th class value when t occurs and does not
occur, respectively.
4.1.2 Rank aggregation method
In this section, both Borda and Condorcet were individually applied on the terms obtained
from the three different term-selection methods to form the candidate terms’ ranked list.
Both of these aggregation methods assigned weights to each candidate term, and as a
result, the most weighted candidate terms were then selected for further assessment. Thus,
the reformulated query with the boosting terms is then re-submitted to the search engine,
and the list of top-ranked documents was again retrieved (Diaz 2016). The details of the
methods are as follows:
Borda ranking method The Borda methodology is a rank aggregation method, where a
voter ranks the candidate list (Fraenkel and Grofman 2014). According to Borda, the first-
ranked term is always assigned the highest value, followed by lower values for the sub-
sequent terms. For example, out of n number of cases, the first-ranked term is assigned a
value of n followed by n − 1, n − 2 , and so on for the other terms. Alternatively, each term
has its weighting vector as ( n, n − 1, n − 2, n − 3... ). The total of all the scores received
from each voter decides the final score for each candidate term.
In the present case, four terms were extracted from 5 QE methods. The first-ranked term
was assigned a value 4 followed by 3, 2, and 1 for the subsequent terms. Table 2 shows the
Borda term scores from all the combined methods
When the candidate terms are without a rank such as t2 and t4 , which can be seen miss-
ing in method E. These terms can only be assigned values 2 or 1. Therefore, their net val-
ues was the average of 1 and 2, i.e., 1.5. Similarly, t4 was the only term missing in method
C, so it can only occupy the last position. Thus it was assigned a value of 1. Finally, the
candidates with the highest score win the test (Felsenthal and Nurmi 2019). The Borda
method is described below in Tables 1 and 2:
Thus, the final ranked list t1 , t2 , t3 , t4 selected by the Borda method was used to expand
the user query.
Condorcet ranking method According to the Condorcet ranking method, the term-
selection depends on the candidate’s term victory against all of its opponents in a head-
to-head election. The term that wins maximum times is a winner (Wei et al. 2014). The
13
I. Rasheed et al.
Table 3 The pair-wise Candidate terms W X Y Z

comparison matrix for the four
candidate terms
W – 2:5:0 3:4:0 4:2:1
X 5:2:0 – 3:4:0 7:0:0
Y 4:3:0 4:3:0 – 5:2:0
Z 2:4:1 0:7:0 2:5:0 –
Table 4 The pair-wise Candidate terms Win score Lose score Tie score
comparison matrix for the four
candidate terms
W 1 2 0
X 2 1 0
Y 3 0 0
Z 0 3 0
retrieved candidate terms from the term-selection methods (KLD, Bo1, and IG) were
first tested against each other in a pairwise competition, as shown in Table 3. The first
three positions separated by a colon corresponds to the win, lose, and the tied results
of the pairwise competition. For example, between W and X terms, the ’W’ term won
twice and lost five times against ’X’ without a tie. Overall, seven different tests were
conducted between each term, and their net results are presented in Table 4.
For the final results, a simple rule was adopted. The candidate with the maximum
win was the winner, whereas the candidate with a minimum loss was the winner in case
of a tie. If the losing score was also a tie, then the two candidates were held a joint win-
ner. According to Tables 3 and 4, the candidates with maximum scores were found in
the following order of Y, X, W, and Z. Finally; these high ranked candidates were then
used to expand the query.
4.1.3 Boosting query terms method
The candidate terms obtained from Sects. 4.1.1 and 4.1.2, can have both strong and
weak terms that can affect the final query. Therefore, to select the optimal combination
of expansion terms, the boosting query term method is used.
According to this method, the initially retrieved documents were first assigned a spe-
cific weight according to the user’s feedback or relevance. Moreover, the query terms
were also assigned a weight based on the weighted documents. The main steps used in
the boosting scheme for the computation of weights for each term are as follows:
1. The initial query is given by the vector T as T = t1 , t2 , t3 , ... where ti denotes the ith query
term. The total weight of each query term ti as follows:
n
∑ d
𝜆ti = 𝜆tij (4)
j=1
13
In Eq. 4, 𝜆ti denotes the total weight of ti , n is the no. of chosen documents from the
d
initially retrieved top documents, and 𝜆tij signifies the weight of document dj , jth means
the retrieved document, as per the query term ti.
d
2. The theory is that 𝜆tij is valued only if dj is strictly applicable as per the user’s require-
ment. However, retrieval engine has retrieved dj as a relevant document, there seems to
d
be a possibility that dj is not as relevant as it could be. To address this issue, 𝜆tij can be
defined as the modified weight of the term ti in the document dj , as the product of weight
d
of the term ti in the document dj ( 𝛼ti j ) and the relevance of dj ( 𝛽dj ) as per the user’s
d d
requirement ( 𝜆tij = 𝛼ti j × 𝛽dj).
3. The traditional weighting model (TF-IDF) can be used to compute the weight of each
d d
query term ti based on the occurrence in the document dj ( 𝛼ti j ). The 𝛼ti j value increases
proportionally to the number of times a term ti presents in the document dj and decreases
with the number of documents in the collection that contain the term. This implies a
d
high 𝛼ti j value is calculated by a term ti that has a high frequency in a document dj , and
low document frequency in the collection.
4. To compute the obtained document’s prominence as per the primary query or 𝛽dj , in Step
2, the initially retrieved pseudo-relevance documents were considered as the perfect
match for the user’s requirement. Thus, each document’s relevance was evaluated using
the distance between two documents in the retrieved set.
5. The equation below was used to calculate the likeness between each top document and
other documents in the retrieved set:
∑
SimCos(d��m⃗, d��⃗j )
𝛽dj =
k=1,k≠1
(5)
n−1
Here, d��m⃗ and d��⃗j are the related Euclidean vectors of mth and jth documents in the
retrieved set, SimCos is the cosine similarity for assessing the match among, d��m⃗ and d��⃗j
and n is the no. of chosen top documents from the initially retrieved set.
The total weight 𝜆ti can be written as in Eq. (6):
n
∑ d
𝜆ti = fti j × idfti × 𝛽dj (6)
j=1
6. Lastly, the log normalization was applied to smooth the calculated values:
n
∑ d
𝜆ti = log(1 + fti j × idfti × 𝛽dj ) (7)
j=1
In the above equation, to evade the zero values in the logarithmic function, the value
one is added. The weight of query terms is further modified by using Eq. (7). 𝜆ti is
related to the occurrence of T terms in the top documents and the inverse document
frequency of T terms in the entire collection. It is also sensitive to those most similar
documents to the other top documents through 𝛽dj.
7. 𝛽dj assumes the point nearer to all the top documents is the optimum point for documents’
weight. Thus, the closer the document is to the cluster center, the greater the document’s
weight. Grouping similar documents into clusters reinforces the prospects of retrieving
relevant documents. The documents cluster is formed with documents containing can-
didate terms similar to the initial query. The hypothesis states that if a document from
the cluster is relevant to a search query, then it is likely that other documents from the
13
I. Rasheed et al.
Table 5 Collection attributes
Attributes Roshni Hamshahri1 FIRE2011
Number of documents 85,304 1,66,453 3,92,577

Number of queries 1–52 100 126–175
Number of unique terms 1,98,365 4,93,389 6,27,56,468
Average document length (terms) 345.37 237.16 273
Encoding type UTF-8 UTF-8 UTF-8
same cluster are also relevant. Thus, clustering enables retrieving documents that do not
contain initial query terms, thereby increasing the relevant documents retrieval effi-
ciency.
This study’s proposed boosting methods are KLDBQT, Bo1BQT, IGBQT, CBQT, and
BBQT.
5 Experimental study
5.1 Datasets
This section evaluates the term-selection methods on three different language datasets,
such as Urdu, Persian, and English. For the Urdu language, the Roshni dataset contains
85,304 various news articles of different news domains. It also includes a set of 52 queries
(Rasheed and Banka 2018).
For the Persian language, the version-1 of the Hamshahri dataset was used in this study
(AleAhmad et al. 2009). The collection was composed of 1,66,453 news articles from
Hamshahri newspaper6 for the period of 1996 to 2003. The corpus also includes 100 stand-
ard queries (at CLEF2008 and CLEF2009) along with judgments.
For the English language, the FIRE dataset (Majumder et al. 2010) was used contain-
ing medium-sized newswire articles from two distinct sources (The Telegraph and BD
News24) supplied by ISI Kolkata, India.
Each of the datasets includes three sets:
1. A set of documents.
2. A set of queries (usually referred to as TREC topics) that can be answered by retrieving
several documents.
3. The anticipated results for the queries, known as the relevant judgments.
Table 5 highlights the comprehensive descriptions of datasets.
6
https://www.hamshahrionline.ir.
13
5.2 Evaluation metrics
To evaluate the results, the trec_eval7 program was used. The following sub-section high-
lights the measures undertaken in this study.
Average precision (AP) For systems that return a ranked sequence of documents, it is
preferred to consider the order in which the returned documents are presented. This allows
the measurement of the averaged precision values from the rank positions where a relevant
document is retrieved.
∑dr i
i=1 ranki
AP = (8)
dr
where dr = number of relevant docs for that query and i∕ranki = 0 if document i was not
retrieved.
Mean average precision (MAP) It summarizes rankings from multiple queries by aver-
aging average precision:
∑Q
APi
MAP = i=1 (9)
Q
where Q = number of queries in a collection.
F1-Measure It is the weighted harmonic mean of precision and recall. F1-measure is a
single measure that trades off precision versus recall.
2 ∗ Pr ∗ Rc
F1 = (10)
Pr + Rc
R-precision R-precision is the precision after dr documents have been retrieved, where dr is
the number of relevant documents for the query.
5.3 Experimental setup or tool used
The Terrier8 tool was used for initial indexing and retrieving documents for all the experi-
ments. Stopwords were removed during indexing. In the case of Urdu documents and
queries, the Assas-band stemmer Naseer et al. (2009) was used for stemming; likewise,
a hybrid Persian stemmer was used for stemming Persian documents and queries (Taghi-
Zadeh et al. 2017). We selected 50 queries from the CLEF2008 version for experimental
analysis of Persian datasets, while Porter stemmer was used to stem queries and documents
of FIRE-2011 collection for the English language. However, the only title section of FIRE
queries was used for the experimental assessment. Only single terms in the documents
and queries were indexed. We used the TF-IDF term-weighting methodology in all the
experiments.
Moreover, parameters were set to TERRIER’s default values. Also, we used the origin
tool to measure the ANOVA-based statistical significance between the proposed method
and the other existing methods. Table 6 presents the essential attributes of all the languages.
7
https://trec.nist.gov/trec_eval.
8
http://terrier.org/.
13
I. Rasheed et al.
Table 6 Description of the basic attributes used in query expansion

Dataset1 Dataset2 Dataset3
Baseline 0.3203 0.1641 0.1516

Retrieval model TF-IDF TF-IDF TF-IDF
Dataset name Roshni Hamshahri1 FIRE2011
Language Urdu Persian English
No. of queries used to retrieve Only 52 Only 50 (CLEF2008) 50 (126 – 175)
Query field used to retrieve All three sections i.e. All three sections i.e. Only <title> section
<title>, <desc> and <title>, <desc> and
<narr> <narr>
Table 7 Mean average precision #Docs Methods 5 10 15 30 50

on different sets of documents for
Roshni dataset
5 docs KLDBQT 0.3370 0.3507 0.3543 0.3563 0.3562
Bo1BQT 0.3506 0.3513 0.3522 0.3526 0.3534
IGBQT 0.3369 0.3509 0.3543 0.3564 0.3560
CBQT 0.3354 0.3377 0.3421 0.3520 0.3508
BBQT 0.3548 0.3553 0.3508 0.3524 0.3580
10 docs KLDBQT 0.3527 0.3541 0.3546 0.3569 0.3572
Bo1BQT 0.3415 0.3426 0.3461 0.3479 0.3504
IGBQT 0.3416 0.3534 0.3571 0.3626 0.3649
CBQT 0.3426 0.3569 0.3548 0.3586 0.3538
BBQT 0.3411 0.3540 0.3576 0.3629 0.3652
15 docs KLDBQT 0.3395 0.3419 0.3516 0.3542 0.3536
Bo1BQT 0.3423 0.3480 0.3508 0.3545 0.3521
IGBQT 0.3396 0.3481 0.3480 0.3537 0.3535
CBQT 0.3400 0.3500 0.3513 0.3535 0.3522
BBQT 0.3507 0.3508 0.3531 0.3569 0.3583
Bold values are the maximum among all expansion models for a spe-
cific set of documents
In the current approach, three different document sets of expansion models were tested
with five different groups of top-ranked terms. The chosen set of documents were 5, 10
and 15 while the selected terms were 5, 10, 15, 30 and 50 and the findings of the study are
presented in Tables 7, 8 and 9.
5.4 Experimental outcomes of boosting method
After the initial documents retrieval, the boosting approach was applied on the Urdu
dataset. The maximum performance improvement of 11.26% compared to the baseline
was observed at two points [i.e. (#D10,#T30) and (#D15,#T50)], and the minimum per-
formance of 5.03% was noticed at (#D5,#T5). On the other hand, for the Persian data-
set, the maximum performance of 8.39% was found at (#D15,#T50) and the minimum
performance of -0.49% was at (#D15,#T5), while the FIRE dataset showed a maximum
13
(a) (b)
(c)
Fig. 2 Percentage-wise improvements in MAP of the boosting method over baseline on three datasets at dif-
ferent ranks of documents
performance improvement of 5.14% at (#D10,#T30) and the minimum performance of

-0.16% at (#D5,#T5), as shown in Fig. 2.
5.5 Experimental results with the Urdu language
5.5.1 Innovative outcomes of the individual terms selection QE methods
Figures 3a–c depicts the improvement in the MAP of the three basic QE terms selection
methods compared to the base TF-IDF retrieval model at three different sets of documents,
i.e., 5, 10 and 15. Out of all the tested methods, the KLD method resulted in superior per-
formance across the selected document sets.
After the extraction of the term-pool in the Block I, the boosting method was applied
to enhance the terms ranking in the given query, as shown in Fig. 1. For example, after
obtaining the term-pool from the KLD method, the boosting was applied, called the KLD-
BQT method. Similarly, boosting was also used on the other term-pools and were named as
Bo1BQT and IGBQT. It was found that compared to the standard expansion terms selec-
tion methods, all the proposed (KLDBQT, Bo1BQT and IGBQT) methods enhanced the
retrieval performance, as demonstrated in Fig. 4. Moreover, among the proposed methods,
the performance of IGBQT was found maximum compared to other methods in the top 10
documents category.
5.5.2 Experimental outcomes of rank aggregation methods
In Block II of Fig. 1, the term-pool 4 was first created with the query terms obtained from
term-pools 1 to 3. Later, the rank aggregation methods such as Borda and Condorcet
13
I. Rasheed et al.
(a) (b)
(c)
Fig. 3 Percentage-wise improvements in MAP for the basic term-selection methods over baseline on
Roshni dataset at different ranks of documents
Fig. 4 Percentage-wise improve-
ments in MAP over baseline,
using basic expansion methods
with or without boosting query
term method with top 10 docu-
ments
methods were applied for term ranking. Finally, the boosting method was applied to the
ranked terms to boost the original query, and the methods were called BBQT and CBQT.
Among the boosting methods (KLDBQT, Bo1BQT, IGBQT, BBQT and CBQT), the per-
formance retrieved using the BBQT method showed the best outcome in terms of mean
average precision (MAP) as shown in Table 7, and the MAP values for BBQT method
resulted in the maximum increment of 14.02% (#D10,#T50 ) from the baseline across all the
documents set, as shown in Fig. 5. Moreover, all the boosting methods’ performance was
generally observed improving with the increasing number of terms.
5.6 Experimental results with the Persian language
Table 8 illustrates the results of the mean average precision. Among all the proposed meth-
ods (KLDBQT, Bo1BQT, IGBQT, CBQT and BBQT), the BBQT method for the follow-
ing document-term pair (#D15,#T30 ) showed the best retrieval performance with 9.93%
increment from the baseline across all the different document sets such as 5, 10, and 15.
The retrieval outcome in terms of percentage is shown in Fig. 6.
13
(a) (b)
(c)
Fig. 5 Percentage-wise improvements in MAP using boosting query term method over baseline on Roshni
dataset at different ranks of documents

on different sets of documents for
Hamshahri dataset
5 docs KLDBQT 0.1678 0.1693 0.1755 0.1716 0.1709
Bo1BQT 0.1682 0.1688 0.1699 0.1717 0.1704
IGBQT 0.1679 0.1749 0.1788 0.1726 0.1742
CBQT 0.1703 0.1747 0.1786 0.1754 0.1740
BBQT 0.1693 0.1731 0.1793 0.1759 0.1746
10 docs KLDBQT 0.1718 0.1748 0.1750 0.1754 0.1772
Bo1BQT 0.1705 0.1743 0.1753 0.1744 0.1736
IGBQT 0.1724 0.1726 0.1768 0.1758 0.1755
CBQT 0.1716 0.1743 0.1760 0.1750 0.1747
BBQT 0.1722 0.1750 0.1766 0.1756 0.1750
15 docs KLDBQT 0.1629 0.1743 0.1759 0.1746 0.1784
Bo1BQT 0.1616 0.1709 0.1777 0.1751 0.1758
IGBQT 0.1654 0.1764 0.1755 0.1786 0.1794
CBQT 0.1659 0.1761 0.1729 0.1795 0.1709
BBQT 0.1702 0.1757 0.1781 0.1804 0.1737
5.7 Experimental results with the English language
Table 9 illustrates the results of the mean average precision. Among all the proposed
methods, the BBQT method for the following document-term pair ( #D10,#T30 ) showed
13
I. Rasheed et al.

on different sets of documents of
FIRE datasets
5 docs KLDBQT 0.1508 0.1521 0.1533 0.1545 0.1547
Bo1BQT 0.1506 0.1517 0.1532 0.1534 0.1544
IGBQT 0.1510 0.1521 0.1560 0.1575 0.1570
CBQT 0.1523 0.1539 0.1547 0.1555 0.1565
BBQT 0.1533 0.1553 0.1589 0.1591 0.1588
10 docs KLDBQT 0.1573 0.1575 0.1579 0.1585 0.1566
Bo1BQT 0.1561 0.1566 0.1570 0.1572 0.1565
IGBQT 0.1565 0.1581 0.1593 0.1603 0.1595
CBQT 0.1575 0.1599 0.1600 0.1607 0.1601
BBQT 0.1569 0.1591 0.1599 0.1616 0.1602
15 docs KLDBQT 0.1574 0.1577 0.1582 0.1576 0.1571
Bo1BQT 0.1564 0.1568 0.1571 0.1569 0.1555
IGBQT 0.1573 0.1587 0.1598 0.1602 0.1602
CBQT 0.1587 0.1600 0.1601 0.1604 0.1601
BBQT 0.1580 0.1595 0.1608 0.1609 0.1606
(a) (b)
(c)
Fig. 6 Percentage-wise improvements in MAP using boosting query term method over baseline on Ham-
shahri dataset at different ranks of documents
the best retrieval performance with 6.60% increment from the baseline across all the dif-
ferent document sets such as 5, 10, and 15. The retrieval outcome in terms of percentage
is shown in Fig. 7.
13
(a) (b)
(c)
Fig. 7 Percentage-wise improvements in MAP using boosting query term method over baseline on FIRE
dataset at different ranks of documents
Between all the three language datasets (Urdu, Persian and English), the BBQT method
outperformed all the other proposed methods. The maximum increment in MAP values
was observed in the Urdu language.
5.8 Query wise retrieval effectiveness
In this section, the F1-measure was calculated to measure the performance of each of the
52 queries. The outcomes were evaluated and compared with the baseline results of all the
proposed methods. Here, F1-measure values were determined at three document sets of top
5, 10, and 15 documents for the three standard language datasets. Query-wise analysis for
Roshni, Hamshahri and FIRE dataset is demonstrated in Figs. 8a–c respectively.
Figure 8a shows the comparison of F1-measure obtained from the Roshni dataset (Urdu)
by the different proposed method for the top 10 documents and 50 terms (#D10,#T50 ). As
observed, the BBQT method outperforms all the other proposed methods for all the dif-
ferent 52 queries. Similarly, for the Hamshahri (Persian) and FIRE (English) dataset for
the following document set of (#D15,#T30 ) and (#D10,#T30 ), respectively, the BBQT method
was again found performing better than all the other proposed methods, as shown in Fig. 8.
5.9 ANOVA test
Analysis of variance is a statistical methodology for measuring the difference between

two or more means. ANOVA test performance is referred to as F-statistic. We assume
that the null hypothesis implies that the average of the other proposed methods such as
KLDBQT, Bo1BQT, IGBQT, CBQT, and the best-performing proposed BBQT method
13
I. Rasheed et al.
(a)
(b)
(c)
Fig. 8 Comparison of F1-measure at top rank documents and terms
are equivalent, i.e., H0: 𝜇KLDBQT = 𝜇Bo1BQT = 𝜇IGBQT = 𝜇CBQT = 𝜇BBQT. On

the other hand, we define the alternative hypothesis, which states that the average of the
five methods is not identical, i.e., H1: 𝜇KLDBQT ≠ 𝜇Bo1BQT ≠ 𝜇IGBQT ≠ 𝜇CBQT
≠ 𝜇BBQT. We carried out ANOVA analysis for 10 documents and 50 terms on the
Urdu dataset. The value of significance level is considered as, 𝛼 = 0.05. The results of
13
ANOVA for 10 documents and 50 terms are presented in Table 10. It is worth mention-
ing here that the ANOVA test was conducted for 20 samples.
In general, if the calculated F-statistic in the ANOVA test exceeds its critical value,
the null hypothesis can be rejected. On analyzing the results displayed in Table 10b, we
reject the null hypothesis as F-statistic > F-critical value. It shows that the mean of the five
expansion methods is not the same. It is because the calculated p-value (the probability that
measures the evidence against the null hypothesis) is far less than the assumed significance
level ( 𝛼 = 0.05). Hence, there are statistically significant differences in the top 10 docu-
ments and the top 50 terms of these five methods. However, we cannot say which groups
(BBQT-KLDBQT, BBQT-Bo1BQT, BBQT-IGBQT, and BBQT-CBQT) differ from each
other. Therefore, we conducted the most popular, the Tukey HSD (Honestly Significant
Difference) post hoc analysis on the results of ANOVA tests, and compared the mean dif-
ference of each group by maintaining the 95% confidence level. A post hoc study offers
information on specific differences between means. Here, F-statistic is sufficiently large,
indicating that the BBQT model was better in performance than all the other proposed
methods in the pairwise comparisons.
5.10 Discussions
The efficiency of an information retrieval system relies on two primary variables. First,
the original query should contain at least one term having more information than the other
terms. Second is the number of the relevant documents retrieved in response to the original
query. With these variables’ help, the query expansion methods reformulate the original
query terms with terms obtained from the previously retrieved documents to improve the
overall document retrieval efficiency. In this study, the query expansion methods were pro-
posed to extract the terms from the retrieved documents to modify the original query. The
extracted terms were weighted according to their proximity to the original query terms.
A sample of five different initial queries (Query No. 7, 9, 10 and 16) with their English
version are shown in Table 11. Table 12 shows a modified form of the original queries after
applying the BBQT methodology. Here, the additional terms were also added to boost the
retrieval efficiency, and all the terms were then re-weighted.
In Table 12, the first two expansion terms extracted using BBQT in query 7 were “Chau-
han” and “Ashok” , which gets a weight of 0.8628 and 0.3326, respectively. On
the other hand, with CBQT, the expansion terms “Chauhan” and “Ashok” get
a lower weight of 0.8417 and 0.3308, respectively. Moreover, the term “Mumbai”
hold a 15th position with 0.1376 weight in the expansion term list using a BBQT method
whereas the same word occupied a 19th position with 0.1219 weight using CBQT method,
signifying a different ranking as well as term weight in the two selected methods. The
above analysis clearly expressed that compared to the CBQT method, the terms’ weight in
the BBQT method had slightly better values due to its better rank aggregation approach.
Moreover, compared to all the proposed methods, the combination of borda and the
boosting methods (BBQT) resulted in the maximum MAP values. This indicates that the
selection of terms having the highest-ranking depends on the terms rank aggregation, TF-
IDF values, and cosine similarity between the numbers of chosen documents, which is sen-
sitive to the documents having the highest similarity to the other top documents - through
𝛽dj . This experimental analysis’s highest performance improvement was 14.02% on the
Urdu dataset followed by Persian (9.93%) and English (6.60%). Moreover, between the
proposed methodologies of the two blocks of Fig. 1, the Block II methodologies (BBQT
13

13
Table 10 ANOVA test and Tukey post hoc analysis
Groups Count Sum Average Variance
(a) Summary of input

KLDBQT 20 6.4731 0.323655 0.000444567
Bo1BQT 20 6.3661 0.318305 0.000264691
IGBQT 20 7.0615 0.353075 0.000312674
CBQT 20 7.3269 0.366345 0.000183770
BBQT 20 7.3868 0.369340 0.000380727
Source of variation Sum of square df Mean square F-statistic p-value F-critical
(b) Summary of output

Between groups 0.045498772 4 0.011374693 35.8500113 3.09526E−18 2.467493623
Within groups 0.030142134 95 0.000317286
Total 0.075640906 99
Between (I-J) Mean difference (I-J) Standard error Lower bound Upper bound
(c) Tukey posthoc analysis

BBQT-KLDBQT 0.0456850* 0.0056328 0.030021 0.061349
BBQT-Bo1BQT 0.0510350* 0.0056328 0.035371 0.066699
BBQT-IGBQT 0.0162650* 0.0056328 0.000601 0.031929
BBQT-CBQT 0.00299500 0.0056328 − 0.012669 0.018659
*Mean difference is significant at the 0.05 level

I. Rasheed et al.
Table 11 Initial query terms
Table 12 List of top 15 candidate terms of different queries on Roshni dataset using BBQT method
and CBQT) outperformed all the methods of Block I (KLDBQT, Bo1BQT, and IGBQT),
signifying better query expansion performance. In summary, the proposed methods showed
a significant improvement in recall and precision values due to the use of a small but
semantically related number of terms in the expansion process. The irrelevant addition of
expansion terms can affect the retrieval process, resulting in poor document retrieval per-
formance. Moreover, all the results were found statistically significant using both the
ANOVA test and Tukey post-hoc analysis.
6 Conclusion and future work
The effectiveness of the selected query terms significantly affects the performance of the
information retrieval system. Therefore, the query expansion methods are used to expand
and adjust the primary query to enhance retrieved documents’ relevance. The paper has
explored different sets of term-selection methods for query expansion to enhance the infor-
mation retrieval system’s effectiveness by using an automatic query expansion technique
on Urdu, Persian, and English news benchmark datasets. Therefore, the BBQT method was
proposed along with two distinct ranking methods and a boosting query term method. The
13
I. Rasheed et al.
main aim was to establish a correlation between the query terms and the document terms
through query boosting, which is significant in closing the gap between the query made and
the retrieved documents. The proposed methodology enabled the effective retrieval of the
relevant documents by adding proper expansion terms in the primary query. We found that
the BBQT method in synergy with the ranking method and boosting query terms method
outperformed all the other proposed approaches in terms of the mean average precision.
Moreover, the performance improvement using all the proposed methods was found better
on the Urdu dataset than Persian and English datasets, with BBQT scoring a maximum
improvement of 14.02%. The proposed expansion method’s statistical significance was also
established using the ANOVA test, followed by its Tukey post hoc analysis.
References
AleAhmad A, Amiri H, Darrudi E, Rahgozar M, Oroumchian F (2009) Hamshahri: a standard persian text
collection. Knowl-Based Syst 22(5):382–387
Amati G, Van Rijsbergen CJ (2002) Probabilistic models of information retrieval based on measuring the
divergence from randomness. ACM Trans Inf Syst 20(4):357–389
Bajaj P, Campos D, Craswell N, Deng L, Gao J, Liu X, Majumder R, McNamara A, Mitra B, Nguyen T et al
(2016) Ms marco: a human generated machine reading comprehension dataset. arXiv preprintarXiv
:1611.09268
Bendersky M, Croft WB (2008). Discovering key concepts in verbose queries. In: Proceedings of the 31st
annual international ACM SIGIR conference on research and development in information retrieval, pp
491–498
Carpineto C, Romano G (2012) A survey of automatic query expansion in information retrieval. Acm Com-
put Surv 44(1):1–50
Church K, Hanks P (1990) Word association norms, mutual information, and lexicography. Comput Lin-
guist 16(1):22–29
Craswell N, Robertson S, Zaragoza H, Taylor M (2005). Relevance weighting for query independent evi-
dence. In: Proceedings of the 28th annual international ACM SIGIR conference on research and devel-
opment in information retrieval, pp 416–423
Croft WB, Cronen-Townsend S, Lavrenko V (2001) Relevance feedback and personalization: a language
modeling perspective. In DELOS, Citeseer
Devlin J, Chang M-W, Lee K, Toutanova K (2018) Bert: Pre-training of deep bidirectional transformers for
language understanding. arXiv preprint arXiv:1810.04805
Diaz F (2016) Pseudo-query reformulation. In: European conference on information retrieval. Springer, pp
521–532
Felsenthal DS, Nurmi H (2019) 20 voting procedures designed to elect a single candidate. In: Voting proce-
dures under a restricted domain. Springer, pp 5–16
Fraenkel J, Grofman B (2014) The Borda count and its real-world alternatives: comparing scoring rules in
Nauru and Slovenia. Aust J Polit Sci 49(2):186–205
Gabrilovich E, Broder A, Fontoura M, Joshi A, Josifovski V, Riedel L, Zhang T (2009) Classifying search
queries using the web as a source of knowledge. ACM Trans Web 3(2):1–28
Gupta Y, Saini A (2017) A novel fuzzy-PSO term weighting automatic query expansion approach using
combined semantic filtering. Knowl-Based Syst 136:97–120
Han FX, Niu D, Chen H, Lai K, He Y, Xu Y (2019) A deep generative approach to search extrapolation and
recommendation. In: Proceedings of the 25th ACM SIGKDD international conference on knowledge
discovery and data mining, pp 1771–1779
Kang I-H, Kim G (2003) Query type classification for web document retrieval. In: Proceedings of the 26th
annual international ACM SIGIR conference on research and development in informaion retrieval, pp
64–71
Karisani P, Rahgozar M, Oroumchian F (2016) A query term re-weighting approach using document simi-
larity. Inf Process Manag 52(3):478–489
Khennak I, Drias H (2017) An accelerated PSO for query expansion in web information retrieval: applica-
tion to medical dataset. Appl Intell 47(3):793–808
13
Khennak I, Drias H (2018) Data mining techniques and nature-inspired algorithms for query expansion.
In: Proceedings of the international conference on learning and optimization algorithms: theory
and applications, pp 1–6
Khennak I, Drias H, Kechid S (2016) A new modeling of query expansion using an effective bat-inspired
optimization algorithm. IFAC-PapersOnLine 49(12):1791–1796
Kuzi S, Zhang M, Li C, Bendersky M, Najork M (2020) Leveraging semantic and lexical match-
ing to improve the recall of document retrieval systems: a hybrid approach. arXiv preprintarXiv
:2010.01195
Lee KS, Croft WB, Allan J (2008) A cluster-based resampling method for pseudo-relevance feedback.
In: Proceedings of the 31st annual international ACM SIGIR conference on research and develop-
ment in information retrieval, pp 235–242
Li R, Li L, Wu X, Zhou Y, Wang W (2019) Click feedback-aware query recommendation using adver-
sarial examples. In: The World Wide Web conference, pp 2978–2984
Majumder P, Mitra M, Pal D, Bandyopadhyay A, Maiti S, Pal S, Modak D, Sanyal S (2010) The fire
2008 evaluation exercise. ACM Trans Asian Lang Inf Process 9(3):1–24
Naseer A, Hussain S, et al (2009) Assas-band, an affix-exception-list based Urdu stemmer. In: Proceed-
ings of the 7th workshop on Asian language resources (ALR7), pp 40–47
Nogueira R, Cho K (2019) Passage re-ranking with bert. arXiv preprintarXiv:1901.04085
Nogueira R, Yang W, Cho K, Lin J (2019) Multi-stage document ranking with bert. arXiv preprintarXiv
:1910.14424
Pal D, Mitra M, Datta K (2014) Improving query expansion using wordnet. J Assoc Inf Sci Technol
65(12):2469–2478
Pedronette DCG, Almeida J, Torres RDS (2014) A scalable re-ranking method for content-based image
retrieval. Inf Sci 265:91–104
Ramos J et al (2003) Using tf-idf to determine word relevance in document queries. In: Proceedings of
the first instructional conference on machine learning. New Jersey, USA, vol 242, pp 133–142
Rasheed I Banka H (2018) Query expansion in information retrieval for Urdu language. In: 2018 fourth
international conference on information retrieval and knowledge management (CAMP). IEEE, pp 1–6
Rasheed I, Gupta V, Banka H, Kumar C (2018) Urdu text classification: a comparative study using
machine learning techniques. In: 2018 thirteenth international conference on digital information
management (ICDIM). IEEE, pp 274–278
Raza MA, Mokhtar R, Ahmad N (2019) A survey of statistical approaches for query expansion. Knowl
Inf Syst 61:1–25
Riaz K (2008) Concept search in Urdu. In: Proceedings of the 2nd PhD workshop on information and
knowledge management, pp 33–40
Robertson SE (1977) The probability ranking principle in IR. J Doc
Robertson SE, Jones KS (1976) Relevance weighting of search terms. J Am Soc Inf Sci 27(3):129–146
Rocchio J (1971) Relevance feedback in information retrieval. The smart retrieval system-experiments in
automatic document processing, pp 313–323
Salton G, Buckley C (1990) Improving retrieval performance by relevance feedback. J Am Soc Inf Sci
41(4):288–297
Sieg A, Mobasher B, Burke R (2007) Web search personalization with ontological user profiles. In: Pro-
ceedings of the sixteenth ACM conference on conference on information and knowledge manage-
ment, pp 525–534
Taghi-Zadeh H, Sadreddini MH, Diyanati MH, Rasekh AH (2017) A new hybrid stemming method for
Persian language. Digital Scholarsh Hum 32(1):209–221
Thaker R, Goel A (2015) Domain specific ontology based query processing system for Urdu language.
Int J Comput Appl 121(13):20–23
Van Rijsbergen CJ (1977) A theoretical basis for the use of co-occurrence data in information retrieval.
J Doc 32:106–199
Voorhees EM (1994) Query expansion using lexical-semantic relations. In: SIGIR’94. Springer, pp 61–69
Walker S, Robertson S, Boughanem M (1996) Okapi at trec-6: automatic ad hoc, vlc, routing and filter-
ing. In: Proceedings of the fifth text retrieval conference. Gaithersburg, pp 500–240
Wei Z, Gao W, El-Ganainy T, Magdy W, Wong K-F (2014) Ranking model selection and fusion for
effective microblog search. In: Proceedings of the first international workshop on social media
retrieval and analysis, pp 21–26
Xu J, Croft WB (2017) Quary expansion using local and global document analysis. Acm Sigir Forum
51:168–175
Xu Y, Benaroch M (2005) Information retrieval with a hybrid automatic query expansion and data fusion
procedure. Inf Retr 8(1):41–65
13
I. Rasheed et al.
Xu Y, Jones GJ, Wang B (2009) Query dependent pseudo-relevance feedback based on Wikipedia. In: Pro-
ceedings of the 32nd international ACM SIGIR conference on research and development in informa-
tion retrieval, pp 59–66
Yilmaz ZA, Yang W, Zhang H, Lin J (2019) Cross-domain modeling of sentence-level evidence for docu-
ment retrieval. In: Proceedings of the 2019 conference on empirical methods in natural language pro-
cessing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP),
pp 3481–3487
Zaragoza H, Craswell N, Taylor MJ, Saria S, Robertson SE (2004) Microsoft Cambridge at TREC 13: web
and hard tracks. In: TREC, vol 4, p 1
Zia T, Akhter MP, Abbas Q (2015) Comparative study of feature selection approaches for Urdu text catego-
rization. Malays J Comput Sci 28(2):93–109
Publisher’s Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and
institutional affiliations.
13

Rasheed Et Al-2021-Artificial Intelligence Review

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Rasheed Et Al-2021-Artificial Intelligence Review

Uploaded by

Copyright:

Available Formats

Artificial Intelligence Review

Pseudo‑relevance feedback based query expansion using

Imran Rasheed1 · Haider Banka1 · Hamaid Mahmood Khan2

Keywords Term-selection method · Pseudo-relevance feedback · Rank aggregation

Retrieving relevant documents to satisfy user requirements is a prime challenge in the

4 Proposed system architecture

Fig. 1 Overview of the proposed system architecture of automatic query expansion model

4.1 Description of weighting and ranking of query expansion terms based

4.1.1 Term‑selection expansion method

Table 1 Terms extracted from

Table 2 Borda term score

Term score (t1 ) = 3 + 4 + 3 + 3 + 4 = 17

4.1.2 Rank aggregation method

Table 3 The pair-wise Candidate terms W X Y Z

4.1.3 Boosting query terms method

Number of documents 85,304 1,66,453 3,92,577

Table 5 highlights the comprehensive descriptions of datasets.

5.3 Experimental setup or tool used

Table 6 Description of the basic attributes used in query expansion

Baseline 0.3203 0.1641 0.1516

Table 7 Mean average precision #Docs Methods 5 10 15 30 50

5.4 Experimental outcomes of boosting method

performance improvement of 5.14% at (#D10,#T30) and the minimum performance of

5.5 Experimental results with the Urdu language

5.5.1 Innovative outcomes of the individual terms selection QE methods

5.5.2 Experimental outcomes of rank aggregation methods

5.6 Experimental results with the Persian language

Table 8 Mean average precision #Docs Methods 5 10 15 30 50

5.7 Experimental results with the English language

Table 9 Mean average precision #Docs Methods 5 10 15 30 50

5.8 Query wise retrieval effectiveness

Analysis of variance is a statistical methodology for measuring the difference between

are equivalent, i.e., H0: 𝜇KLDBQT = 𝜇Bo1BQT = 𝜇IGBQT = 𝜇CBQT = 𝜇BBQT. On

(a) Summary of input

(b) Summary of output

(c) Tukey posthoc analysis

*Mean difference is significant at the 0.05 level

Table 11 Initial query terms

6 Conclusion and future work

You might also like

4 Proposed system architecture

Fig. 1 Overview of the proposed system architecture of automatic query expansion model

4.1 Description of weighting and ranking of query expansion terms based

4.1.1 Term‑selection expansion method

Table 1 Terms extracted from

Table 2 Borda term score

4.1.2 Rank aggregation method

Table 3 The pair-wise Candidate terms W X Y Z

4.1.3 Boosting query terms method

5.3 Experimental setup or tool used

Table 6 Description of the basic attributes used in query expansion

Table 7 Mean average precision #Docs Methods 5 10 15 30 50

5.4 Experimental outcomes of boosting method

5.5 Experimental results with the Urdu language

5.5.1 Innovative outcomes of the individual terms selection QE methods

5.5.2 Experimental outcomes of rank aggregation methods

5.6 Experimental results with the Persian language

Table 8 Mean average precision #Docs Methods 5 10 15 30 50

5.7 Experimental results with the English language

Table 9 Mean average precision #Docs Methods 5 10 15 30 50

5.8 Query wise retrieval effectiveness

Table 11 Initial query terms

6 Conclusion and future work