Apweb 2014

Structural-based relevance feedback in XML retrieval KAMOUN FOURATI luis, TMAR Mohamed, and BEN HAMADOL Abdelagid Multimedia Information systems and Advaneed Computing Ls Higher Institute of Computer Seienee and Multimedia, University of stp: fay ntact rm. tay fax, Tunisia Abstract. Contearily to classical information retrieval systems, the sy toms that treat structured docunats include the structural dimension through the document and query comparison, Thus, relevant rests ate I the while de rikoa it vument fragments that mateh the user need rather than the hv ease, the dociuuent aud quety struetute should vwoll as daring the re= Formulation. Query reformulation should also inelude the structural di annent. Ln wecount inthe tetzieval process u formulation Is on one hand aad the lrngiuents jd, stiuetutal relevance feedback, We start front the origival query La ather, Structure hints analysis allows us to identify nodes that qiatele aback stop. Th truetural hints an Jovant by: the user ot the the relesaaee ingpact © the user query and to rebwild it dui this pa ain goal is to sha NAL an dataset 1 nptitnization. Some experiments have been uncdeetaken inte wided by INEX! ta show the effectiveness of our pc Keywords: tlevanes feedback, XML, INEX, le of descent anattis 1 Introduction The goal of information cetrieval systerns (IRS) isco satisfy the information needs ofa iser. This need is expressed by a query ta be niateled fo all the dacunrents in the corpus to seleet those that could ansiver to the user need. Beemuse of the atnbiguity, atl the incompleteness of his query, the user is, it mnast eases, hot satisfied with tle returned results, To overconte this problems, there ean be allemlives tn the initial query si-as to inuprove the resilts, Among the thost poplar patterns in information retrieval {II}, we cite the relevance feedback RE) wluch is based on the julunents of relevance of the documents found by tlhe TRS nud is iutended co re es;pres tle ulormpation weed fron: tue initial query in an effort to find more relevant documents. But with the standardization of TINitintive for the Evaluation of XML roteioval, am evaluation ferwn that aims at promoting vetsieval capabilities on NAIL documentsStructural relevance: fowback in XML retrieval the Web to XML schemas 2 presents new problems antl hence new needs for ustomnized infornuation acerss. However, the ‘ditional IRS ao uot exploit Chis irneture afdcenments, snehudiig the RF funetion, hicewd, the user emt expres his meee hy a set of keywords, asin the traditional IRS, and ena ade struct ural constraints to better target the songht senmanties, Thos, takin into wecount th structure af the doeunients and tht af the query by the infarntion rvtvieval systems handling strnetured documents is eeessary in the feedback process \lany initiatives of relevance feerlhack have heer. proposed to rewrite the nser ery. The niajority of these approaches are content-based, which means Chat nuly the query terns are updated, snd rolatively ceweighted te iinprose the result. Only @ few approaches meslifes the query structure. hn this paper, wu propose an approach of structure-based relevance feedback, We assnme that th ery structure could be reformulated based ou the structure of the document lenwents judged as relevant. This paper is organized a follows: in the second section, we give a survey on the related works to XML relevance feedbark, We Dreseat inthe third section our approneh of query refornulation, based on th irneture releviauee feedback. In the fourth section, we present the experiments anul the obtained results. The fifth section concludes 2) Related work Many initiatives of XML query reformulation lias been proposed. fu the west nies, RF approaches lias heen adapted in order to take into account the stm tural dimension. Villatoro-Tello etal. describes iy [12) a system developed by tl Language and Reasoning Group of UAM for the Relevance Feedhaek track of INEX 2012. The system fornses on the problem of ranking documents in ecor diate to their eelevanee. [Cis mnainly bases om different liypotleses such ns that urtent TR niachines ate able to retrieve relerant documents for most of gen ‘al queries, but they cannot generate « pertinent ranking aud focused ferclhack enuld provide more atid better eletnents for the ranking process than isolated query termis, ‘The wuts aint to demonstrate hat si related relevance feedback itis possible te improve the fal rank trieved doeunents, Balog et al. propuse x general probabilistic fran tity search to evaluiete and provide insight in the many’ way’ of us a input for query mortelling [I]. Phey focus on the nse of category infomation anne demonstrates the effectiveness of eategory-based expansion using exinnple aitities, Seeikel ral ‘Phessalel I] de incorporation of strnetnral aspects bythe feed ack process. The frst approach ve-ranks results tuned by an initial keyword-based query using struetural fea tomes derived fran resalts with knossn relevaner. Their seeont! approael involves these types ribe 1Wo approwehes whieh focus on the X structured document {s¢ XML document) is characterized by a content and a Sucture, This struetute possibly completes semantics express by the content fand becomes a constraint with whiel IRS must comply inorder to sitisiy the user information nesssedback in XML retrieval 3 Structural relevan xpanding traditional keyword queries ito content-andestrueture queries. OF ficial results, evaluated using the INEX. 2005 [5] assessment method! based on tanke freezing, show that rerankiny ontperfortns the quety expansion method on this dat Aniong these approaches, only a few consider that RF in the query structure is treessary: Te is common tt cowrite the query based oa its sieuetine, and le in (3), [9) andl [15) . but modification of the yuery structure self is nat aukdressed Tour approach, ase consider that che content of the relevant elements structural RE is necessary, partienlarly if the XML retrieval system takes ite account the structural dinteusion iu the nantehing process, Since ays use an XMT retrieval systenn that matches the structure in addition to the emntont Lf. xe sesame that the structure reformation eunkd improse the retrieval perfomance 3. Structuralbased relevance feedback: our approach fhe oar approach yr fiacns sentially ow the stractnre of the original query and tliat of oe Inlevd, this study allows ns to reinforee the inapertance of thesw structures in che relornuilated! query to better identify the nuast relevant frasment= to the u twerls, The analysis of structures allows us identify the mast relevant sales arid the insalved celationships. The content of these iragnents and those of the initial quer are also taken nto account, ‘Their analysis allows us to select le tiost relevant terns that will be injected in new quers. Our approach is based on fu neajar phases, The first aims ab representing tlie query aru tle a relevant frassnents in « single representative structure, ‘The seond is focusex! on query rewriting. jent fragments deemed to he relevant to the user structure hits ued 3.41 Query and relevant fragments representation According 10 most appronches of relevance feedback, the query construction is clone by building a representative pattern for relevant abjeets and another pattern for ierelovant artes, aul then Ini a representation close te th frst and far from the second For example, the Rocchio’s method [#) considers n representative pattern of » docuntent set by their ceutrod. < linear eombiuation of the original query and the centronls of the relevant documents and irrelevant-ones eany be assmmed ns potentially suitable nser need Although simplistic. te Rocehio’s method is the most widespread. This sine plicity isdue to the nature of the neinipubited objects, udeed, Recehia’s met fiid is adapted to the ense where documents are full text, in sul ease, exch docu cerully: a vector of weighted terms). Whore the lations, the vector representation heeatnes sitne nent is expressed by a vector Jocutuents embody structural plistie, this results in # significant loss of structural emtrast and tHerefare te reconstruction of «unified structure becornes impessible, As for us, we believe thot the structure is an aubfitianal dimensionStructural relevance fiedbaek ity XML retrieval Aunnique dimension is not enough to encode the structural information one Jitwension vector}, thus we need to encude all documents inte te dimensions, Fhy usin matrices rathoe than vertons That reasoning has led us to traduer the documents and the query ina rnuitris format instead of a weishted termi veetor. ‘Phiose matriews are enriched by values caleulated from transitive relationship function, Then, tlw representative strneture of query and judeed relevant frasgnents {thal we eall 8) is constructed Line of descent matrix We build for each document a matris called fine af descent smatrie (LDN), which must show all existing ties of relation between different nodes, This representation should also reflect the positions of the varions odes in the fragments as they aro also important in the structural rebvaner feedback. For an XML tree (or sub-tree) 4, we assoriate the mattix deine hy My vn) — (Pron © Ath pt of Where P is a constant value shih represents the weight of the descent relationship, and n’ are tse nodes of the tree A As for us, we represent each of the relevant fraginents and the initial query in the LBM forn?. The value af the constant 2 for the query LIM eoustruction is sgceater than: that used for the construction af ather LDMs (ihieh represent te Ps fallowins: the principle used in the Rocchio’s methud which uses refornmiation parameters having dillerent effets (1 for the initial query. ¢ for the relevant documents ventrod andl 5 for the non relevant dacuments centrod where (Sa <1 anid Ale 550) relevant fr nents) to strenethen the weight of the initial query e« Content integration in LDM The content of each element represented in, EDM nmst juiery’s siructitre and the set uf terns af this query: So, We propase to integrate ternis of exch element in LDA Fach element node min LDM is characterized by a tag naue and a set of vwoighied terms: ry — (hagie (A. iF) De fl t0l Fo. tA) Alaa tlbins Bet where: fag: tag nme of element ny, fy: BE Comm in ans weifye at}: Weight of terri fj nv element 29; based on the its frequeniey inn atl the total mamber of Taken into necount, In RE in XML retrieval we ain te rewrite Jements that contain it (2). 2 ianiber wf elements That an complesity 9 hore needed because of thekow number of jnvlged documents comparing to the corpus size. In our experiments, we under the telovanes foodback in a pseudo-teedbaek ay on the top 20 ranked fragment result ont the frst rouna getrieval, [a the ather hind (INEN'GS se the mattis sige ean not excess is cover 190 in all the collet Vion and about 3 in a single fragmentStructural reluyance feedback in XML retrieval Let ns cmisider that the element nae appears it three positions in a XML tree as Fallows AL] (A. GE gt) (Fa. 148, (9. et (lM). Po Dodo. 2] — AAO. U3. OT (en). UF 022) F500 Ue.) MIS] = CA. {E4100}, a D9} (45,009. (Ha 0.2) tte O8), Us. 0.8)}) All| and Alp] have relatively the sarne content wehiet is different from Aba] In the LDM matrix, A will appear twiee, with two different contents (fs. ft} and (fs 45 40). The content similarity is done by the inner produet lke in veetor inode! |) we assuare that All] and A) are similar site they have the same tay tuane (A) and All) x12) — 0.926 > Phe which is nor te case for 1] x Als 0.051 < Theor AZ) Als) — 60° {Ph is en experiinental theeshole, Tits sxampht, we take Th— 05}. Al) and A will be agarewatedt and represented in EDX as follore by a single Amid the eentrod of 1] and Ala) as follows A= (A (M5) (00,075), (Fase) (04,23) Setting relationships between a uode and its descendants XYIL retrieval is usually: cone ina vague way [LI]. The XML retrieval system has to query with tolerated differences (a few missing elements or more additional ones) between the query structure and the document, Consequently, we believe that the most effective way to bring this toleranee is to assure that one element not only connected {0 its Piles ndes, but to all its descendants, A relationship) between toodes inthe samne Tine of descent is weighted hy tie distin in the NRE tree So, we use the Transitive Relationship tunetion TR which is defined as Follows 8 Malian ] = Malena) + PREM.) Malo whose ZV is the sot ofall different nodes in the tree A and Ay is its LDM sand Tr) ~ be yea Matrix $ construction The new query structuse is built starting thou tw shined EDMs. Let us emisider # = fj. 25... 8,} with) are the relevant judged fromnents and Qui0 the initial quesy, the query structure is built starting fron the enimalated LDALS N?. Sher nt] = Molen ne Mylan und for each ns aeit m8) anit Qual + pp Ty paeitia dt The constants ainda the same used it line af deseont matrix constr tion to strengthen the weight of old query’s terms. If calunun in contains soveral low values, then the node will Lend to appear as a leaf node in the re formmulated query. [! on the eontrary one row contains several low vilues, then, the node will tend to he see as a teat node ia the vefortuulated query if ruldition, the corresponding column cuntaits several high values, otheneise, te node will tend to appear as an internal node. Thus, in order to bnikd the new juery structure, we ene detertnine the new root{Structural relevance fowdback itr NAIL retrieval 3.2 Query rewriting Root identification ‘The structure query construction starts by identifying its toot, The root is craracterized by a high number of child! nodes and a west number of parents, For example, to find the roat we simply return the element R. whic has the greatest weight in tlie rows of the nuatris Sand the fewest weight mits eolmnns. The root Ris then suck that # max So Sfp mag sina! The argument to maxitnize refleets Cat the candidate nodes to represent the rat should have as mained lose wahies as possible in the relative rey YE Shen) and as niinitual low values as possible the eolnin (32h nl} Spor 0). Wie ace it relatively to the total sunt of the matris values spired frou the (fx if factor (term frequcrrg. inverse of document frequcnca} comnenily sed in trotional ivformation retrieval [7] whiel affeets inapontanse teva tertn for a doctnuent proportiomally to its frequency in the doctmueat tern Frequency) and inversely proportionally to the number of documents in the collection shere it appears at last once Building the new query strneture Quer the root lyis her established fron the mmatris S, ace proceed ta the recursive development phase af the tree repre senting the structure of the new query. The developinent of the tee starts by the root Rau ten by determninite all the child uodes of R, the same operation is performed recursively for the chile nodes of # until reaching the leaf nodes. Each, Jninent ris developed by attribnting te it ils potentially chill nodes 0” (nn) whose S[r.n'] > Threshold, caleulated from the mean average gy and the stane Jard deviation a, of its relative child nodes, Indeed, Uwe mean average aned the stanulard deviation sill illustrate the probability that a node is am actual child hode af the eurtent nade n, This reshald isiefined as 7 Sb l= te with jin =p DSi) and oy IT He value of > is relatively high, the tree outeonee will tend to he stllow sud ramified stud view vers, The value of > alloaes the estinnation far each elertent 21 the munber of child wades, The objective of this interval is to reeonstruet a tree as wide and deep as the XML fragments from whiel the query should be inferred. This value is then defined experimentally 4 Experiments and results To carry ont our experiments we use INEX'IS dataset anil we only considered the VVCAS [ (Copies whose relevance vawuely depends on the structural con. Araints} queries type. Indeed the need for reformation of the query structireStructural relovancw feedback iy XMIL retrieval is appropriate to the task, We nse also the metties proposed by INEX whieh are Tased anon the extenled eummlated gain (VCC) (6,.For a given rank & the value of narCG 4 reflects the relative gain the user nceuntulated up te tha rank. We only present tlue results of the generalized quantization fanetion whieh, is most suitable for VVCAS queries (10 queries proposed by INEX The table I shows the results obcainee! from XIVIR a cesearcl systens hased This table presents a comparison between the valnes ab= 1 tree: matehin tained before (BRE) and after RE (ARF), AA is the absolute improvement af tw relevance feodhaek runt over the original base run proposed by INEX i TSP | ARE Tost] \ ERIEE) Table 1. Comparative result= hotare ORT) nis) ater GANT) structural RE fy onr experiments we assume that the top kf table 2 shows the results obtained from different numbers of relevent fraginents fin COCO CON Fo [tose a [De ore Th PSEC [ao] aT aS ST ae ae Tote [ram Psa Table 3. Resilts or diferent timber We can see through our experiuents Hat our RE approach significantly: ina proves the results. We note that during these experiments we tefominulate ani ra] content, anil therelare we the queries structures without changing their « believe that tis reforsmulation has brought an evolution that eould be aeernt ated by the reformulation af the content 5 Conclusions and Future Wo We have proposed in this paper aur approael ta structural selevanee feedback in XML retrieval. We proposed a representation of the original query and relevant Irwments unver a inateix form, Alter sonw processing aud calculations on tw sited anatris and alter some agualysis we lave heer able to identify the rnost relevant nodes and their relationships that eonmwet thenStructural relevance: fowdback in XML retrieval The obtained results show that structural relewnee feedback eontribates to the inprovenwnt of XML retrieval, The
You might also like
Fear: Trump in the White House
From Everand
Fear: Trump in the White House
Bob Woodward
Rating: 3.5 out of 5 stars
3.5/5 (738)
A Man Called Ove: A Novel
From Everand
A Man Called Ove: A Novel
Fredrik Backman
4.5/5 (4770)
A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
From Everand
A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
Dave Eggers
3.5/5 (231)
Magazines
Podcasts
Sheet music
The Sympathizer: A Novel (Pulitzer Prize for Fiction)
From Everand
The Sympathizer: A Novel (Pulitzer Prize for Fiction)
Viet Thanh Nguyen
4.5/5 (122)
Devil in the Grove: Thurgood Marshall, the Groveland Boys, and the Dawn of a New America
From Everand
Devil in the Grove: Thurgood Marshall, the Groveland Boys, and the Dawn of a New America
Gilbert King
4.5/5 (266)
The Little Book of Hygge: Danish Secrets to Happy Living
From Everand
The Little Book of Hygge: Danish Secrets to Happy Living
Meik Wiking
3.5/5 (401)
Grit: The Power of Passion and Perseverance
From Everand
Grit: The Power of Passion and Perseverance
Angela Duckworth
Rating: 4 out of 5 stars
4/5 (590)
Never Split the Difference: Negotiating As If Your Life Depended On It
From Everand
Never Split the Difference: Negotiating As If Your Life Depended On It
Chris Voss
4.5/5 (844)
Principles: Life and Work
From Everand
Principles: Life and Work
Ray Dalio
4/5 (609)
Yes Please
From Everand
Yes Please
Amy Poehler
4/5 (1898)
The Subtle Art of Not Giving a F*ck: A Counterintuitive Approach to Living a Good Life
From Everand
The Subtle Art of Not Giving a F*ck: A Counterintuitive Approach to Living a Good Life
Mark Manson
4/5 (5814)
Shoe Dog: A Memoir by the Creator of Nike
From Everand
Shoe Dog: A Memoir by the Creator of Nike
Phil Knight
4.5/5 (540)
A Tree Grows in Brooklyn
From Everand
A Tree Grows in Brooklyn
Betty Smith
4.5/5 (1929)
Team of Rivals: The Political Genius of Abraham Lincoln
From Everand
Team of Rivals: The Political Genius of Abraham Lincoln
Doris Kearns Goodwin
4.5/5 (234)
The Hard Thing About Hard Things: Building a Business When There Are No Easy Answers
From Everand
The Hard Thing About Hard Things: Building a Business When There Are No Easy Answers
Ben Horowitz
4.5/5 (348)
The World Is Flat 3.0: A Brief History of the Twenty-first Century
From Everand
The World Is Flat 3.0: A Brief History of the Twenty-first Century
Thomas L. Friedman
3.5/5 (2259)
Her Body and Other Parties: Stories
From Everand
Her Body and Other Parties: Stories
Carmen Maria Machado
4/5 (822)
Hidden Figures: The American Dream and the Untold Story of the Black Women Mathematicians Who Helped Win the Space Race
From Everand
Hidden Figures: The American Dream and the Untold Story of the Black Women Mathematicians Who Helped Win the Space Race
Margot Lee Shetterly
4/5 (897)
The Gifts of Imperfection: Let Go of Who You Think You're Supposed to Be and Embrace Who You Are
From Everand
The Gifts of Imperfection: Let Go of Who You Think You're Supposed to Be and Embrace Who You Are
Brene Brown
4/5 (1092)
The Glass Castle: A Memoir
From Everand
The Glass Castle: A Memoir
Jeannette Walls
4.5/5 (1717)
Angela's Ashes: A Memoir
From Everand
Angela's Ashes: A Memoir
Frank McCourt
4.5/5 (441)
The Emperor of All Maladies: A Biography of Cancer
From Everand
The Emperor of All Maladies: A Biography of Cancer
Siddhartha Mukherjee
4.5/5 (271)
John Adams
From Everand
John Adams
David McCullough
4.5/5 (2409)
Wolf Hall: A Novel
From Everand
Wolf Hall: A Novel
Hilary Mantel
4/5 (3811)
Rise of ISIS: A Threat We Can't Ignore
From Everand
Rise of ISIS: A Threat We Can't Ignore
Jay Sekulow
3.5/5 (137)
The Woman in Cabin 10
From Everand
The Woman in Cabin 10
Ruth Ware
3.5/5 (2522)
The Perks of Being a Wallflower
From Everand
The Perks of Being a Wallflower
Stephen Chbosky
4.5/5 (2104)
Elon Musk: Tesla, SpaceX, and the Quest for a Fantastic Future
From Everand
Elon Musk: Tesla, SpaceX, and the Quest for a Fantastic Future
Ashlee Vance
4.5/5 (474)
Sing, Unburied, Sing: A Novel
From Everand
Sing, Unburied, Sing: A Novel
Jesmyn Ward
4/5 (1104)
The Light Between Oceans: A Novel
From Everand
The Light Between Oceans: A Novel
M.L. Stedman
4.5/5 (789)
On Fire: The (Burning) Case for a Green New Deal
From Everand
On Fire: The (Burning) Case for a Green New Deal
Naomi Klein
4/5 (74)
The Art of Racing in the Rain: A Novel
From Everand
The Art of Racing in the Rain: A Novel
Garth Stein
4/5 (4203)
The Constant Gardener: A Novel
From Everand
The Constant Gardener: A Novel
John le Carré
3.5/5 (104)
Brooklyn: A Novel
From Everand
Brooklyn: A Novel
Colm Tóibín
3.5/5 (1947)
The Outsider: A Novel
From Everand
The Outsider: A Novel
Stephen King
4/5 (1850)
The Yellow House: A Memoir (2019 National Book Award Winner)
From Everand
The Yellow House: A Memoir (2019 National Book Award Winner)
Sarah M. Broom
4/5 (98)
The Unwinding: An Inner History of the New America
From Everand
The Unwinding: An Inner History of the New America
George Packer
4/5 (45)
Little Women
From Everand
Little Women
Louisa May Alcott
4/5 (104)
Steve Jobs
From Everand
Steve Jobs
Walter Isaacson
4.5/5 (807)
Bad Feminist: Essays
From Everand
Bad Feminist: Essays
Roxane Gay
4/5 (1018)
Manhattan Beach: A Novel
From Everand
Manhattan Beach: A Novel
Jennifer Egan
3.5/5 (792)

Apweb 2014

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Apweb 2014

Uploaded by

Copyright:

Available Formats

You might also like