This action might not be possible to undo. Are you sure you want to continue?

, intentionally try to deceive IR systems [42]. For example, there are papers instructing webpage authors on the methods for “increasing one’s ranking” on various IR systems [54]. Ideally, an IR system should be impervious to such spamming. The tendencies of Web users also present additional challenges to Web IR systems. Web users generally input very short queries, rarely make use of feedback to revise the search, seldom use any advanced features of the system and almost always view only the top 10-20 retrieved documents [5, 31]. Such user I sistemi di ritrovamento dellepriority on the speed and accuracy of the IR system. VSMﬁnal tendencies put high informazioni precedentemente visti come il The e lʼLSI non feature, and most important for pagine web, Web’s unique hyperlink structure. possono essere utilizzati nel caso dellethis paper, is thein quanto il costo computazionale e This inherent structure provides extra, and as will become clear later, very valuable l'immagazzinamento dei dati comporta un grande lavoro. information that Esistono principalmente 3 can be exploited to improve IR methods. sistemi di strutturazione del web: This hyperlink structure is exploited by three of the most frequently cited Web IR HITS(Hypertext induced Topic search) Topic Search) [36], PageRank [14, 15] and SALSA • methods: HITS (Hypertext Induced • PageRank[41]. HITS was developed in 1997 by Jon Kleinberg. Soon after Sergey Brin and Larry • SALSA Page developed their now famous PageRank method. SALSA was developed in 2000 in reaction to the pros and cons of HITS and PageRank. This paper is organized as follows. Sections 2-4 cover HITS and PageRank and their connections, followed by SALSA in section 5. Other Web IR methods are brieﬂy overviewed in section 6 and section 7 contains predictions for the future of Web IR.

Introduzione

HITS

Ogni pagina nel web viene rappresentata come un nodoStructure ofnodo viene 2. Two Theses for exploiting the Hyperlink e ciascun the Web. Each page/document attraverso degli archi as a node in quali large graph. The interconnesso ad un altro on the Web is representeddirezionati, i a very rappresentano i directed arcs con i link. collegamenti realizzati connecting these nodes represent the hyperlinks between the documents.

The graph of a sample Web is depicted in Figure 2.1.

1 2

3

4

5

6

Nellʼimmagine si possono individuare 6 pagine web che sono interconnesse tra loro The HITS IR method deﬁnes authorities and hubs. An authority is a document attraverso gli archi direzionati. with deﬁnisce duewhile a hub di nodi: outlinks. See Figure 2.2. Il metodo HITS several inlinks, tipologie has several The HITS thesis is that good hubs point to good authorities and good authorities un documento link • Authorities, è pointed to by goodaventeHITS in ingressoa hub score and authority score to are hubs. assigns both • Hubs, è un documento avente link in uscita Hits si basa su un semplice concetto “buoni hubs sono puntati da buoni authorities e buoni authotiries sono puntati da buoni hubs. Hits deﬁnisce per ogni nodo sia il relativo 3 EIGENVECTOR METHODS FOR WEB INFORMATION RETRIEVAL punteggio di authorities e sia di hubs.

Fig. 2.1. Hyperlink Structure of a 6-node sample Web

Auth

Hub

Fig. 2.2. An authority node and a hub node

each webpage. For example, the higher the authority score of a particular page, the more authoritative that document is.

Preso un 4 nodo, possiamo deﬁnire punteggio AND C. D. MEYER A. N. LANGVILLE di authorities come xi e punteggio di hubs come yi. Preso E , insieme di archi che connettono i vari nodi della struttura del web, be LANGVILLE AND C. D. MEYER 4 deﬁniamoand a hub score yi . Let E dalthe set ofal nodo j. Dettoin theper ogni pagina èij eij ,arco direzionato A. N.nodo i all directed edges ciò Web graph and let e possibile 4un valore iniziale di A. N. LANGVILLE AND C. authorities ,deﬁniti come yi(0) represent the directed edge from node i di hubsj. Given that each page has somehow e xi assegnare4 punteggio to node e D. MEYER A. N. LANGVILLE AND C. D. MEYER (0) and assigned anyinitial E be the set of x(0) and edges in yi , HITS successively score i . Let authority been a hubsuccessivi punteggi scoreall idirectedhub scorethe Web graph and let eij (0),Hits deﬁnisce i score y . Let E be the set of all directed edges in the Web graph and let e come: and a hub

represent score y Let E be the reﬁnes these scoresi .by computing node i all directedGiven that each page hasand let eij and a hubthe directed edge from set of to node j. edges in the(0) Web graph somehow i ij (0) A. N. LANGVILLE AND 4 represent the directed edge from node i toxnode C. D. MEYER y , HITShas somehow the directed edge from node i to node j. Given that each page has somehow been assigned an initial authority score i and Given that each page successively represent j. hub score i(0) (0) (0) been these scores by computing initial hub score y (0), HITS successively reﬁnes (k) (k−1) (k) (k) been assigned an initialEauthority score xi and edges for the iWeb graph. and let e hub score yi 1, 2, 3, . . . successively x scores Let j authority of x (3.1) aassigned an i . byycomputingsetscore =i and xj and and these = be the in k = , HITS ij reﬁneshubi score y by computing yi all directed reﬁnes these scores ∈E j:eji j:eij ∈E Given that each page has somehow represent the directed edge from node i to node j. (k) (k−1) (k) (k) xi (3.1) assigned= initial j authority score x(0) and hub score k (0) , HITS .successively y and y = xj for = 1, 2, 3, . . . been These equations can (k−1) be written in imatrix form with the yi help of the adjacency (k) an (k) i (k) (k−1) and y (k) = j:eij ∈E x (k) for k = 1, 2, 3, . . . . x(k) = j:e ∈E yj (3.1) these scoresjiby computing = i reﬁnes (3.1) xj matrix L of ithe essere jscrittaand yi = Questa equazionexpuò directedyWeb graph. forma matriciale , for k = 1, 2, 3, . . . . in i attraverso lʼutilizzo di una j j:eji ∈E j:eij ∈E j:eji can denominata j:e These equations∈E matrixijform with help particolare matrice di adiacenzabe written inan L che∈E node ithe nodegrafo del web. rappresenta il of the adjacency 1, if there exists edge from to j, TheseofL = equations can Webwritten in matrix form(k) for k = 1, 2, 3,the .adjacency be graph. y matrix form with the help of the adjacency matrix L x(k)ij= directed (k−1)written in(k) = the These i equations0, yj be canotherwise. (3.1) and i xj with the help of . . . matrix L of the directed Web graph. matrix L of the directed Web graph. an edge ∈E node i to node j, j:eji ∈E j:eij from 1, if there exists For example, Lij = consider the small graph in Figure 3.1 with corresponding adjacency 0, if there exists an edge from node i to node j, 1, if there exists matrix from node i to node j, 1, be written These equations can otherwise. in an edge form with the help of the adjacency Lij = Lij = 0, otherwise. 0, otherwise. matrix L of the directed Web graph. For example, consider the small graph in Figure 3.1 with corresponding adjacency 1 2 For example, consider piccolo graph in Figure 3.1 with to node j, Per esempio considerando unthe small exists an edge fromtipo: icorresponding adjacency For example, consider the small graph in Figure 3.1node corresponding adjacency 1, if there grafo di questo with Lij = 0, otherwise. 1 2

2 For example, consider the small1 graph in Figure 3.1 with corresponding adjacency

1

2

1

2

3

4

3 4 Fig. 3.1. Node-link graph for small 4-node Web

matrix L.

3 3

4 4

Fig. 3.1. Node-link graph for small 4-node Web

1 1 0 d2 1 1 0 2 1 3 0 4 d d d d . L= Fig. 3.1. Node-linkdgraph for0smalld4-node Web dd1 001 112 d3 104 3 13 1 d2 4 dd 01 10 01 00 4 1 L = d2 0 1 1 0 . matrix L. d 0 1 0 1 In matrix notation, the equations d3 1 assume the0form in (3.1) 0 1 L = d2 d d d d . 01 12 03 04 d4 0 1 0 1 3 x(k) = LT y(k−1) 0 and y(k) = Lx(k) . d4 0 1 1 0 1 In matrix notation, the equations in (3.1) assume the form 0 1for 0 2 This leads to the following = Tdin 1 assume the form computing equations (k−1) (3.1) In matrix notation, the x(k) L iterative algorithmy(k) 1 . (k) . the ultimate author = L d3 0 and 0 y 1 ity and hub scores x and y. possiamo riscrivere = Lx Introdotta la matrice di adiacenza le equazioni di calcolo dei punteggi 0 column (k) 0 Lx(k) . ones. Other positive 1 0 vector of all (0) x(k) = LTd4(k−1) y 1. Initialize: This come:y = e, where e is algorithm computing the ultimate authorin forma matricialeleads to the following iterative a and y for= starting vectors may be used. (3.1) assume 3.2.)form ity matrixleads to the and y. iterative algorithm for computing the ultimate authorIn and hub scores xthe equations in (See section the notation, following This 2. Until convergence, do 1. Initialize: x(0) = y. e, e is vector of all ones. Other positive ity and hub scores y and(k) where(k−1) a column(k) =be T y L used. (See T y 3.2.) (k) . and (k−1) Lx = (0) starting vectorsxmay where ex(k) = section vector of all ones. Other positive is a column 1. Initialize: y = e, L y 2. Until convergence, do iterative(See Thisstarting vectors may be used. (k) = section 3.2.) leads to the following y algorithm for computing the ultimate authorLx(k) 2. Until convergence, Questo tipo di procedimentoandundo iterativo per il ity and hub scores x è y. metodo x(k) = LT y(k−1) calcolo dei punteggi di hubs e k =k+1 1. Initialize: y(0) = e, where e is (k) column(k−1)(k) of all ones. Other positive a vector authorities. y Lx(k) x LT y Normalize= (k) and y (see section 3.2). startingèvectors may be used. (See x (k) 3.2.) section Esprimiamo quello che lʼalgoritmo interativo:= k + 1 y(k) Lx k 1. Impostiamo2. vettore y(0) condo il Until convergence, elementi tutti pari a 1 (k) Normalize x(k) and y (see section 3.2). k =k+1 2. Fintanto che vi è convergenza, esegui x(k) = LT y(k−1) (k) Normalize x(k) and y (see section 3.2). " " " " Calcola x(k) y(k) = Lx(k)

La

matricematrix L. di adiacenza matrix L.

L è la seguente: 0 d1

d d2 d small Fig. 3.1. Node-link1graph for 3 d4 4-node Web Fig. 3.1. Node-link graph for small 4-node Web

3 4

" " " " Calcola y(k) k =k+1 " " " " K=k+1 Normalize x(k) and y(k) (see section 3.2). " " " " Normalizza x(k) e y(k) Dal passo 2 dellʼ algoritmo sappiamo che:

EIGENVECTOR METHODS FOR WEB INFORMATION RETRIEVAL

5

5

**Note that in step 2 of this algorithm, the two equations
**

Note that in step 2 of this algorithm, the two (k−1) equations x(k) = LT y

EIGENVECTOR METHODS FOR WEB INFORMATION RETRIEVAL

y(k) Lx(k) x = LT y(k−1)

**attraverso un passo è possibile sempliﬁcare le due equazioni come:
**

can be simpliﬁed by substitution to x(k) = LT Lx(k−1) x = L Lx

EIGENVECTOR METHODS FOR WEB (k−1) INFORMATION RETRIEVAL T y(k) = LLT y(k−1) .

y(k) = Lx(k) can be simpliﬁed by substitution to

5

y(k) = LLT (k−1) . Note that in step 2 of this deﬁne the iterativeyequations These two new equations algorithm, the two power method for computing the domT inant eigenvector for the matrices LT L and LLT . Since the for computing the domThese two new equations deﬁne the iterative (k−1) method matrix L L determines power x(k) = LT ypotenze per ottenere lʼ autovettore (destro) T queste due equazioni scores, the matrices LTauthority matrix and LL is knowndetermines delle the authority utilizzano il metodoL and LLT . Since the matrix LT L as the hub inant eigenvector for it is called the (k) y(k) positive dominantematrix. due matrici T arecalled theLa matrice Lt*Land LLT is known as thematrice di delle LT L and LL it is e L*Lt. authority matrix viene deﬁnita come hub Lt*L symmetric = Lx semideﬁnite matrices. Computing the the authority scores, authoritiesauthority vectorL*Lt T the hub vector y can bematrice di hubs. Entrambe le the e la matrice x and are symmetric come viewed as ﬁnding dominant right-hand matrix. LT L and LL viene deﬁnita positive semideﬁnite matrices. Computing matrici sono can be simpliﬁed by substitution to T eigenvectors of Lx L and LLT , vector y can be viewed as ﬁnding dominant right-hand authority vector positive. simmetriche semi-deﬁnite and the hub respectively. T T eigenvectors of LImplementation. The Timplementation of HITS involves two main x(k) = L Lx 3.1. HITS L and LL , respectively. (k−1)

steps. First, a neighborhood graph N The implementation of HITS involves two main related 3.1. HITS Implementation. = LL to the query terms is built. Second, the y Implementazione diforgraph documentyin N .are computed is built. Second, lists authority and hub scores HITS related to the query terms and two ranked the each N steps. First, a neighborhood

(k) T (k−1)

Per implementare lʼalgoritmo second stepLT L described. in the previous section,presented to thethe most authoritative documents and mostdeﬁnire the matrix LT L determines inant eigenvector for the matrices was and LLT Since 2 fasi: of IR user. Since the HITS è necessario “hubby” documents are we focus on the authority Sincedocuments containingdescribed in the previousis known we focus the the ﬁrst del grafo it second step was references to ad unaT terms 1. Realizzazionestep. scores, delle vicinanze(N) rispetto the queryquery.are putthe hub the IR user. All the is called the authority matrix and LL section, as into on T neighborhood Alldi Hubs symmetric positive semideﬁnite matrices. documents. the N. matrix. step. graph T . There are various per ognithe query presente nel grafo 2. Calcolothe ﬁrstL L and LL Nare e containing references todetermine these Computing One dei punteggi documents authorities ways to pagina terms are put into the simple methodgraph N . the inverted y can be viewed ﬁle. Thisthese documents. One authority vector consults Therevector term-document as ﬁnding dominant look like: neighborhood x and the hub are ordinamento seguendo i might right-hand 3. Visualizzazione di due classiﬁche di various ways to determine ﬁlepunteggi di hubs e • aardvark: term LL inverted term-document eigenvectors of LT L and 1 - doc 3, doc 117, doc 3961 ﬁle. This ﬁle might look like: simple method consults the T , respectively. authorities. .

of the most authoritative for each document power method for and are presented to These two and hub scores deﬁne the iterative in N are computedcomputing the domauthority new equations documents and most “hubby” documents two ranked lists

• • aardvark: term 1 .- doc 3, doc 117, doc 3961 . 3.1. HITS Implementation. The implementation of HITS involves two main . . term 10 doc N relateddocthe query 19, docis built. Second, steps. • aztec: neighborhood -graph15, doc 3,grafo delle pagine 1199, doc 673 the First, a . Iniziamo dal primo passo, la generazione del to 101, doc terms avviene attraverso la • and 94, doc are computed 1199, authoritybaby: hub scores 11 each 56, doc 3, docN 31, doc 3 docand two ranked lists document in 101, doc term for selezione of the most authoritative10 - doc 15, doccorrispondenti19, quelli utilizzati nella query. di tutti iaztec: documenti aventi termini most “hubby” documents aredoc 673 to a . documents and presented • baby: term 11 - doc 56, doc 94, doc 31, doc 3 . . Esistono vari IR • algoritmi che permettono wasgenerarein the previous section, we focus on di described lʼassociazione termini-documenti nel . the user. Since the second step . • zymurgy: semplici è 223 caso del web. Uno dei piùterm m. - doc il metodo di generazioneterms are put intoterminithe ﬁrst step. All documents containing references to the query della matrice the For eachzymurgy: term m - docmentioning that term are stored in list form. Thus, • term, the documents 223 neighborhood metodo There are generare unadetermine these documents. graph N ways to struttura di documentiainversa. Ilterms 10 . potrebbe various documents 15, 3, 101, questo tipo:One query term, the documentswould pull that term are stored in list form.673, 56, and 11 mentioning 19, 1199, Thus, For each on simple method consults the inverted term-document ﬁle. This ﬁle might look like: 94, and 31 into N .10Next, 11 would pull documents 15, of nodes19, N is expanded a query on terms and the graph around the subset 3, 101, in 1199, 673, 56, • aardvark: term 1 - doc 3, doc 117, doc 3961 by adding nodesN . Next, the graph around in N subset of nodes to by is expanded. 94, and 31 into that either point to nodes the or are pointed in N nodes in N . . This expansion allows some latent semantic associations to be made. by nodes for N . • . by adding nodes that either point to nodes in N or are pointed to That is, in the

aztec: term 10 - latent doc 3, doc 101, doc to doc car, doc is, queryexpansion allows some doc 15, about documents containing 1199,Thatdocuments This •term car, with the expansionsemantic associations 19, be made. some 673 for the baby: term 11 - now be added doc , hopefully containing automobile may doc 56,about documentsdoc 3 resolving the documents query•term car, with the expansion doc 94,to N 31, containing car, some problem of synonyms. automobile may nowcan become very large due to the expansion process; containing However, the .set N be added to N , hopefully resolving the problem of . • . synonyms. containing set N can become very large due to indegree or outdegree. a documentHowever, thethe query terms may possess a huge the expansion process; • zymurgy: term m - doc 223 a document containing the query number of inlinking huge and outlinking nodes Thus, in practice, the maximum terms may possess a nodesindegree or outdegree. For each term, the documents mentioning that term are stored in list form. Thus, Thus, for a particular node in N is ﬁxed, at say 100. For and outlinking nodes to add in practice, the maximum number of inlinking nodesexample, only the ﬁrst a query on terms 10 and 11quelli pull documents 15, 3, 101, 19, 1199, 673, 56, Per ogni termine for a particular node wouldche sono i documentiexample, onlyuso. ﬁrst to outlinking nodes of a document ﬁxed, at a 100. term are fanno N . Deﬁniti quelli 100add vengono deﬁniti in N is containingsay query For che added to the The into N . Next, the graph of grafo tenendo expanded che sono i94, and 31 building sono ritenuti rilevanti,issi asubset relatedare in N isto levelThe documenti nodes of a document around the genera il nodes building Npresente anche 100 outlinkingche the neighborhoodcontainingstrongly term to added query . sets process of graph by adding nodes that either point to nodes in N or are pointed to by nodes in N . process of building thevi sono tra i graph matrix topresi to considerazione. le varie interconnessioni chesome latent semanticdocumenti to be made. That is, for sets Questo in information ﬁltering, neighborhood vari is strongly related insmaller more querywhich reduces a sparse a much building level This expansion allows associations the in information [63]. relevant matrixﬁltering, which reduces a sparse matrix to a much smaller grafo viene espanso attraverso lʼinserimentodocuments containing car, somemore query-che query term car, with the expansion about di ulteriori nodi che sono puntati o documents Once automobile puntano airelevant matrix inN is built,now adjacency toespansione del grafo theproblem of nodi deﬁniti [63]. may the be added N , L corresponding to viene principalmente containing the set precedenza. Questamatrixhopefully resolving the nodes in N Once built, the adjacency matrix L corresponding nodes/documents is formed. the set N isof L is much smaller than the total number of to the nodes in N synonyms. However, utilizzato performed. The order theL is much smaller than the total number of nodes/documents ovviare al order of set della sinonimia. Lʼespansione deve avvenire dopo aver problema N can become very large due to the expansion process; is The on the Web. Therefore, computing authority and hub scores using or outdegree. a document containing the query terms may possess a aggiunti al grafo minimo, huge indegree the dominant deﬁnito il numeroWeb. Therefore, computing authority and hub scores using the dominant in quanto on the massimo di nodi che possono essere Thus, in practice, the lʼespansione può inlinking nodes and outlinking un grafo molto se non viene deﬁnito un limitemaximum number of portare alla generazione dinodes to add for a particular node in N is ﬁxed, at say 100. For example, only the ﬁrst grande. 100 outlinking nodes of a document containing a query term are added to N . The Dopo aver deﬁnito building thepassa alla creazione della matriceto building level setsLʼordine di process of il grafo si neighborhood graph is strongly related di adiacenza L. grandezza della matrice L è di granreduces ainferiore rispettomuch dimensionequeryin information ﬁltering, which lunga sparse matrix to a alla smaller more dellʼintero grafo relevant matrix [63]. rappresentante la dimensione del web. Dopo di che si passa al calcolo dellʼ autovettore Once the set N is di authorities e di hubs. dominante delle due matrici built, the adjacency matrix L corresponding to the nodes in N is formed. The order of L is much smaller than the total number of nodes/documents on the Web. Therefore, computing authority and hub scores using the dominant

Convergenza di Hits

Come abbiamo precedentemente visto lʼalgoritmo HITS non è altro che il metodo delle potenze applicato alle matrici A. N. e L*Lt. Dato cheC. D. MEYER Lt*L LANGVILLE AND le matrici Lt*L e L*Lt sono simmetriche, 10 semideﬁnite positive e non negative,con autovalori tutti distinti(molteplicità algebrica pari a 1) il metodo delle nodes 1 and 2 share convergenza. means that potenze porta alla a common outlink node, node 3. The (4, 2)-element

relationships MEYER A. 10 Punti di C. D. between authority and C. D. MEYER A. N. LANGVILLE ANDforza N. LANGVILLE AND co-citation and hubs and co-reference to claim e punti deboli di HITS implies that nodes 4 and 2 do not share a common outlink node. Ding et al. use these

ink ranking and simple presentiapproximation to a decent approximation to hub ranking provides a decent sul web score documenti outlink ranking provides hub ranking 4. PageRank. In 1998 (the same year that HITS was born), Google founders [23, 22]. Larry on by noting that HITS has been incorporated HITS has been incorporated into the CLEVER it the We close this section by noting that into the CLEVERPageRank concept and made Svantaggi: Page and Sergey Brin formulated their basis for their is also Center As HITS on den Research Center [1]. dalla query engine of the underlying also Google web site, “The heart of project at IBM Almaden search part [15]. [1]. stated is the part of the underlying • Dipendenza HITS Research tm our software is PageRank · · [it] Teoma. ed by the search engine Teoma.by the search ·engineprovides the basis for all of our web search tools.” ranking technologyallo spamming • Suscettibile used By avoiding the inherent weaknesses of HITS, PageRank has been responsible for • Problema del topic drift n 1998 (the same elevating Google to(theborn), year the founderswas born), search engine. year that HITS the same Google world’s 4. PageRank. In 1998 was position of that HITS preeminent Google founders ey Brin formulatedand Sergey Brin formulated their PageRank concept and cataloged,the After web pages concept by robot crawlers Larry Page their PageRank retrieved and made it theare indexed and made it PageRank enginebasis for their searchthe Google web site, “The heart of toweb site, “The heart so that at [15]. As stated on assigned[15]. As query time the Google perceived importance of values are engine prior to stated on according anktm our [it] provides the basisranked [it]our web search tools.” all query termssearch tools.” · · · software is PageRanktm ·all of provides the basisto the of our web can be presented to query time PageRank a for · · list of pages related for the user PageRank has been responsible importance is determined for rent weaknesses of the inherent weaknesses of HITS, PageRank has been responsible by “votes” By avoiding HITS, almost instantaneously. PageRankfor in the assegna, of the world’s he position of the world’ssi the of links search other pages on theuna query, un punteggio, legato In Pagerank form positionfrom engine. preeminent search engine. elevating Google topreeminentprima dellʼesecuzione di web. The idea is that votes (links) from important sites and cataloged, are indexed etrieved by After web pagesare indexed shouldquestamore weight than votes (links) from less important robot crawlers di una pagina; in carry maniera, una voltacataloged, la query, allʼutente allʼimportanza retrieved by robot crawlersPageRank and eseguita PageRank sites, and to signiﬁcance of a vote so from ior to query timeassigned prior perceived timedi pagine correlate anytermini inseriti nella query. (or viene according the to query importance (link) perceived source should be tempered values are mostrata una lista ordinata according that at ai importance so that at to scaled) thethe number è determinata daiis voting for (linkingche puntano alla pagina in by una pagina of sites the source “voti”, ovvero link to). These notions are list of query time a ranked list of pages related to the query terms can be presented to pages related to di query terms can be presented to Lʼimportanza encapsulated by deﬁning the rank of a given page P to be ntaneously.questione. instantaneously. determinedr(P ) “votes” is determined by “votes” PageRank the user almost importance is PageRank by importance from other Lʼidea diof the web. iThe idea is da sitivotes B = {all idea ispointing to P (links) calcolo pages (links) in the formonbase è from voti (links) that the web. The pages that votes }, links che other pages on importanti dovrebbero avere più peso nel r(Q) P , where hould carry important sites r(P votes voti (links)weight important (links) from less important more weight rispetto carry more provenienti da siti fromdel punteggio than ) = a (links) from less than votes meno importanti, e che lʼimportanza should |Q| ance of a vote (link) signiﬁcance ofin base al numero di |Q| = numberdalla sorgente. Q. link uscenti of out links from sites,di un voto from any sourceP should be from any source should be tempered (or and the sia “scalato” Q∈B vote (link) tempered (or a r of sites the source isnumber for sites the source is voting for are These notions (linking to). These notions scaled) by the voting of (linking to). so computation necessarily requires iteration. are This is a recursive deﬁnition, If there are Indicandoadunque con Prank r(P ) of generica appartenente allʼinsieme di pagine B, il P una pagina a given page P to be ing theencapsulated by given page to be rank r(P ) of deﬁning the n pages, P1 , P2 , . . . , Pn , arbitrarily assign each page an initial ranking, say, r0 (Pi ) = punteggio (r) della i-esima pagina P sarà dato da: 1/n,= {allsuccessively reﬁne the }, BP then pages pointing to P ranking by computing r(Q) BP = {all pages pointing to P }, r(Q) , where r(P ) = number of, out links from Q. (Q) where |Q| |Q| = r |Q| rj (Pi ) = |Q| = j−1 P number of out links2, 3, . . .Q. , for j = 1, from . Q∈B

P

that simple inlink ranking provides a decent approximation to the HITS authority means that score andand 2 share a 3. The outlink node, nodes 1 simple outlink ranking provides a decent 3. The (4, 2)-element nd 2 share a common outlink node, nodecommon (4, 2)-element nodeapproximation to hub ranking Vantaggi: 22]. [23, implies that common outlink node. base al punteggio di node. sia et al. use these nd 2 do not•share anodes 4 and 2 do not share a common outlink hubs eDingin base al punteggio di al. Doppia classiﬁcazione sia in Ding et thatuse these been incorporated into the CLEVER We close this section by co-citation to claim and co-reference to claim noting relationships between authority and and has authority and co-citation and hubs and co-reference HITShubs authorities at IBM Almaden Research Center [1]. HITS is also part of the underlying project nking provides adimensione delle provides ahubs e authorities sono to the più piccole del numero di that •simple inlink approximation to the HITSapproximation molto HITS authority decent La decent ranking matrici by the searchauthority ranking technology used di engine Teoma.

The notation |Pi | is the number of outlinks from page Pi . Again, this is the power by setting = method. Ifj (P2 ), · · · ,exists, ) , and iteratively is deﬁned to be πT = lim rj (P1 ), r the limit rj (Pn T j→∞ πj , This is accomplished by setting πT =the jPageRank vector· , rj (Pn ) , and iteratively r (P1 ), rj (P2 ), · · j ith component πi is the computing and the iterativamente come PageRank of Pi . This is the raw idea, but for both e si calcolerà theoretical and practical reasons (e.g., ensuring convergence, customizing rankings, 1/|Pi | if Pi links to Pj , and adjusting here P is the T matrix with pij = convergence rates), the matrix P must be Pi links to Pj , adjustments 1/|Pi | if adjusted—how T (4.1)πj = πj−1made is described in matrix with pij = 0 otherwise. are P, where P is the §4.3.

nition, so computation necessarily requires iteration. If there are This is a recursive deﬁnition, so computation necessarily requires iteration. If there are page an initial setting say, r (Pi n , arbitrarily assign each accomplished byranking,πT quanto)per rj (P2 ), · · · ,ilrj (Pn ) , and iteratively Eʼ chiaramente un arbitrarily assign each page an= j n pages, P1This .is . , Pn , processo ricorsivo, in = 0 rj (P1 ), calcolare punteggio ial = , P2 , . initial ranking, say, r (P ) passo jreﬁne the rankingserve calcolare il punteggio delle pagine Q al passo j-1esimo. 0 by computing esimo,computing reﬁne the ranking by computing 1/n, then successively Questo rende dunque il processo iterativo. rj−1 (Q) punteggi associati a 1/|Pi | if P links to Pj Il vettore dei Tfor πT = P, 2, 3, . .rj−1tutte le pagine, chiamato vettore di iPageRank,, sarà (Pi ) = , . (Q) (4.1)πj = jj−1 1, where .P is the matrix with pij = rj (Pi ) = , for j = 1, 2, 3, . . .0 . |Q| otherwise. dunque: Q∈BPi |Q|

Q∈BPi

Q∈BPi

|Q|

πT j

0 otherwise. 4.1. Markov . Again, the is the The “raw” Google matrix P is nonnegative he number of outlinks from page PiModel of this Web. power with | is is number of outlinks limj→∞ πj exists, The PageRank|Pirow sums equal to one πT zero. Zero row, i . Again, this is to pages that that the notation vectorthe deﬁned to be or = from pageTPsums correspond the power T T nt πi ismethod. If the limiti .exists, is the raw idea,vector is both to be π = limj→∞ πj , the PageRank of P This the PageRank but for deﬁned and (e.g., component πi is the PageRank of rankings, cal reasonsthe ith ensuring convergence, customizingPi . This is the raw idea, but for both

ij S1 S2 for j = 1, 2, ity .of moving from state i (page i) to state j (page j) in one step (click). Fo 3, . . . |Q| S T11 T12 P= 1 . assume that, starting from any node (webpage), it is equally likely to fol S2 0 T22 links to = rj (P1 ), rj (P2 ), · · ·the(Pn ) , andA. N. LANGVILLE at another node. Thus, , rj outgoing iterativelyarrive AND C. D. MEYER 12 dove ate in set S2 has been reached, the chain never returns to the states of S1 . 0 .5 .5 0 0 0 ple, if 12 page PiAdjusting PA.aN. LANGVILLE j , and PD. contains onlyusing the hyperlink web 4.3. contains only . Thereto page P AND C. j with strictly link are a few problems MEYER 12 .5 AND if surfer build Pi , then Google’s1/|Pi |the Web towho jhits A. N. LANGVILLEis 0 C. D. 0 that will adequately either Pi probability .5 MEYER 0 trapped into0 structure random Pi links to P , a transition or Pj of matrix matrix with pij = 0 .5 0 0 between these two pages otherwise. as noted earlier, = of reducibility..5 0P can fail to be a endlessly, which is the essence raw Google matrix 0 deﬁne PageRank. Adjusting areThere problemsproblems with strictly , hyperlink P 4.3. AdjustingFirst, Thereevery few thea 0 with 4.3. P . 0 reducible stochastic matrixone in which P a row of are zeros for eachstrictly.5 .5 using the hyperlin Markov chain is because P has . a stateall few 0 reachable using the is eventually 0 node with zero outdegree. structure the path from structure P . there exists the power node iprobability .5 all build transition node j matrix 0 will outlinks from That the Webthis is Web ato build transition0probability that 0 that will adequatel y other state.pageof iis,Again, of to byareplacing eachazeroto 0 with for /n, i, j.matrix adequately This is easily remedied row eT .5 where n is the can of T P Tas noted earlier, the raw Google matrix P orderfail to be deﬁne Correggere to PageRank. First, , eRank deﬁne is deﬁnedla matrice = limj→∞ πjearlier, the raw Google vector PageRank.be π can fail ility is a desirable property First, as noted this alonefeature that all0ofmatrix D. MEYER to be a it 0 0 guarantees1 P 0 0 A. LANGVILLE the problems. P. Call this12 because P.is precisely thea doesn’tall zerosAND C. node with zero outdegree new matrix ¯ But stochastic matrix for a row of N. zeros ﬁx geRankstochastic matrix because Pbecause P has allrowdistribution for each of Pipossesses a unique (and positive) stationary of for each vectorwith zero outdegree. . This is the raw idea, but hasboth arkov chain Another greater diﬃculty can (and usually does) arise: P mayTbe a reducible manode ¯ Laconvergence,is easily remedied byraw (unadjusted) “Google matrix” described in (4.1) matrice P, così ottenuta, può presentare un problema This customizing rankings, replacing each zero row with e più righe n is .g., ensuring Perron-Frobenius theorem at work [45, Chapt. 8].particolare: una o/n, wheredella the order o which is Tiny Web’s each zero row with eT /n, where n is the order of P—it’sThis is because the underlying chain is reducible. Reducible chains are those that contain the replacing trix easily remedied byadjustments ButThere are doesn’t ﬁx all ¯ matrice può avere4.3. Adjusting Call somma pari a 0. P. he matrix P musttheP. otherthis new matrixleading distributions few problems acrossstrictly using th ¯ suitable probability this as described the problems. modiﬁcation ofof be adjusted—howP. chainthis P . to doesn’t ﬁx may earlier of thereordering of For e rawinmatrixmatrix P eventually P alone a all ofbe used with the rows. Google the But ¯ nodi nella rete non hanno alcun outlink; tali nodi¯ a problems. P. sets this new che alcuni Call signiﬁca which alone becomes trapped—i.e., by may be a reducible ma Questo statesAnother greater diﬃculty can (and usually does)probability chiamati that will sono ¯ structure of the to build a stochastic letteraturaweb usage dangling”.thethat users accessing page arise: P theas likely to jump matrix, transition matrix show reducible chain transition ¯ 2to have but the structure of Webworld wide a can such that P twicematrix logs can (and usually web is arise: P may be a reducible maare states the greater“nodi underlying be canonical in Another come diﬃculty of a does) T made trix because certainly form problema vienethenonnegative chain is reducible. ilReducible chains are lʼordinethat contai reducible. Hence further adjustment is as 3, thencon (the second row those might alte Web. The “raw” Googledeﬁne PageRank. First, queste righeearlier, the et/n dove n èof matrix P can asof states is to sostituendo necessary porder ensure those that contain they are which the reducible. Reducible vettore raw Google P) Questo sets matrix P risolto jump to page noted inbecomes trapped—i.e., by a reordering o trix because the underlying chain is in eventually 2 lity. Brindella matriceforce irreducibility that chainpicture a row ofchains are for each node with zero . Zero row and Page deﬁned as matrix into the P has by making every sums correspond to pages that because stochastic all zeros P. states the transition They originally did2 so trapped—i.e., reducible by sets of from in which the chain eventuallySbecomes chain can be by to have the canonica reordering of S ctly reachablestates every other state. matrix of areplacing each adding a made a eT /n, where n is 1 This¯is form to P è easily remedied by = ( .6667 0 zero row have the .canonical T chain can be made to with T states the transition matrix ion matrix Enuova matrice to form of a stocastica, ovvero:12 p¯ La = ee /n matrice reducible P. Call una è pari a P = S1 P. But T alone doesn’t ﬁx all ) the problems this new matrix 2 T11 this . .3333 0 0 0 of (4.3) 1 • la somma lungo le righe form S2 0 T22S1 S2 ¯ ¯ Another − a 1 ¯ Other ideas for ﬁlling in the entries of P have been suggested may be P dominante greater • ha lʼautovalore = αP + (1pariα)E, diﬃculty can (and usually does) arise: P [62, 52]. a re Once a state in In general, thereached, S1 eigenvalue for every the states ofmatrix P t trix because the underlyingtheS1 S2Tnever returns to stochastic S1 chain is 11 T12 P= (4.3) set S2 has been dominant chain reducible. . Reducible chains are. those S 0 T22 s a scalar Forsegue che, se1. The Googleil reasoning chain eventually Pjconverge, converge only to there between 0 Consequently, in which the was that PageRank , and Ptrapped—i.e., by a n and lʼiterazione percontains del vettore this stochastic Ne example, if web page Pi the PageRank a 2 di to page becomes j contains sets of states if calcolo only iteration (4.1) converges, it converges S1tosoddisﬁ link jump to a new T11 la seguente relazione: T12 odels a weblink to P“teleportation” tendency T randomly allʼautovettore sinistro normalizzato =T che π (4.3) surfer’s i states Google’s randombeen reached, the. chainchain canisbe made intohave th a , a state in set SP has matrix who hits trapped states of S Oncethen the transition π 2 satisfying22 either Pi or Pj left-hand eigenvector S surfer of a reducible never returns to the to 2 1 0 T entering a bouncing between these two pages it assumes which each URL has of reducibility. URL on the command line, and endlessly, that is the essence Forform example, if web page Pi Tcontains only a link to page Pj , and Pj contains onl likelihood of being selected. Later Google adoptedin moreerealistic and is eventually reachable An irreducible P , then Google’sπrandomπT = whostate eithercolumn ofis trapped int πT is P, (4.2) Markov chain = one a which every hits lessa Pi or Pj ones), (e is surfer 1, Once using a better (and more ﬂexible) perturbation matrix returns to the states of S1 . a state link to S2i has been reached, the chain never in set ic stance by a from every other state. That is, therepages endlessly, which is the essence of reducibility. exists a path from node i to node j for all i, j. bouncing between these two only S P , S2 ForIrreducibility is a desirable iproperty becauseaitlink to page 1 jfeature ofj the Markov chain [ example, which is the stationary (or steady-state) distribution P contains only if web page P contains is in which every and that guarantees T An irreducible , Markov chain is one precisely the state is eventually reachabl Implementazione Google’s di PageRank E possesses a unique a link to Markov chain = ev random surfer who hits 1 stationary 12 jPageRank value of e either P 8]. This is why Google is,(and P = S a path Pi ordistribution vector the . i trapped for i , then that a Pfrom every other state. That intuitively characterizesT nodeis to node j intoall i, j positive) T11 there exists from (4.3) T 0 T bouncingπbetweenlong-run proportion of time allʼindividuazione dellʼautovettore that guarantee is the essence by the del Perron-Frobenius si riduce at S allows site of πT =che il vector these twodesirable property whichworkisthatnon- 22 reducibility. P—it’s vT vettore pages endlessly, because it precisely the a web Dato calcolo the > 0 e “personalization”Irreducibility is ais di PageRanktheoremspent 2at [45, Chapt. 8].feature surfer eternal a probability vector that ¯ modiﬁcation chain o, one in which every state P AnThe that alinks laofrandom.1Google matrix P leading leastas described earlier on Markov chain dominante sinistro Markovthe raw isequivalentemente, allapositive) eventually reachable probabilities forirreducible per at matrice possessesMore importantly, atto isstationary distribution vecto teleporting to particularPpages. a unique (and risoluzione del sistema to producesotherthe T (IstatethecontoTbe of the a path= che thewideto neverj returns i, the a Once a matrix, Perron-Frobenius reached, node i web is that ¯ lineare taking π perturbation π S haspuò sembrare world chain sia un 8]. P) in but e 1, been E from from every πT stochastic-Computing2 PageRank. theev at work [45,nodesuchof PageRank s = state. That set the structure of Since permits siness viewpoint,omogeneoTπP—it’s = 0 is, there=existsformtheorem T determinarlo Chapt.for all P j. 4.2. al contrario, la dimensionecontains onlynecessary in ¯order ensure P co the computation quasiand is almost certainly reducible. Hence raw up adjustment (si leading to that described j compito semplice. Ma, problema pensi to sono FordesirablePageRank because isdown according to che P as Pj , tion” by ﬁddling with isTheexample, if web page Pi itdel preciselyP a featurecipageguarantees earlie v a to modiﬁcation of values Google matrix adjust property Irreducibility toT solving either the the further or problem isthe link equivalently, solving th eigenvector structurethe the la sceltaeither Pis or P thattP (4.2) world wide web every is 5.000.000.000 diBrin stochastic force Google) limita into pagine P then Google’s random irreducibility.Other to nel, database di irreducibility notevolmente or, by degli making such link produces and i Page unique may be al considerations [60].achainapossesses amatrix, but the used surfer picture distributioni vector terms (and positive) stationary as well, but, that a Markov possonoperturbationefﬁcientemente perThey πofe who hits so by adding a j T algoritmi che reachable from every otherP) = 0 with originally did essere usati πT (I − state. il calcolo di = .1, determining PageRank m πT neous linear system stateTdirectly ¯ ¯ reducible. Hence bouncing of ensur se, P = T P + (1P—it’s is a certainly T α)E the convex combination two pages endlessly, Chapt. is the of matrices πα perturbation matrix E between these theoremstochasticcontrary,isthe size essenceproblem = π −is almost rather easy task.¯ But two further adjustment necessary in order redu Perron-Frobenius to form at work [45, which 8]. = la and Page force¯irreducibilityovvero the picture ofsi making ever /n like quite to the a ¯ is anche se aand irreducible to Inoltre,stochasticstocastica, ee matrice P hence P possesses in unique every statethe eventuall può essere riducibile, into ¯al suo interno is irreducibility.irreducible and Brin by such that P An Markov chain is leading to one The Tmodiﬁcation ofalmost 4,300,000,000 other state. which as described earlier possono individuare dellereachable Google matrix Ppari a 0.They Pdatabase) severelyadding sottomatrici currently the raw now elementi called “the Google ¯ state directly P that is ¯ congenerally tutti in Google’soriginally did so by limits from every pages y distribution π . It’s from every other state.αThat is, there exists a path from such that P the matrix ¯ node j P = structure α)E, produces a perturbation T that = eeT /neﬀectively used to compute πT .node i to this co stochastic matrix, butcan be + (1 − ofform world wide web is In fact, ¯ the P ¯ to the of web è π matrix La struttura del algorithms is Edesirable to da rendere PageRank sicuramente riducibile, questo signiﬁca and its stationary distribution tale is theareal la matrice Pvector. Irreducibility because it is precisely the ensure that is almost certainly reducible. “the world’spropertymatrix computation”order feature met che bisogna a scalar between Hence further adjustment is necessary in stochastic ulteriormente correggerla. where α is has been called 0 and 1. The largest reasoning was that this [47]. Direct ¯ Google+ (1 − (and positive) stationary distribu that Markov chain irreducibility P= irreducibility. those aandsurfer’s force possesses αP into the randomlyby making new Brin tuned for “teleportation” a ¯unique α)E, can’t handle the overwhe as eigensolvers matrix models a web Page sparsity) as welltendency to picture jump to a every T T π = inizialmente, the other state. and theorem atdid I creatori di reachable P—it’s scelsero di forzare lʼirriducibilità della matriceso [45, adding state directlyGoogle,aπURL on every Perron-Frobeniusit originallythat each URL has a8]. from the command line, They assumes work practical choices. page by where α is a scalar between 0 and 1. The Google reasoning was by Chapt. entering and variants of the power method seem to be the only that this stochasti ¯ perturbandola ulteriormente: T /n to Pof the The modiﬁcation perturbation matrix E by eeweb surfer’sLater raw Google matrix to randomlyto reported new form an equal matrix models a selected. to“teleportation” tendency P realistic andP as descr likelihood of= Google to¯compute the PageRank vectorleading jump to a to Google adopted a more has been less required being produces a stochastic democratic order entering a aURL on(and command line,perturbation matrix wide web is s stance by using better the more ﬂexible) and it assumes that page by of several days. matrix, but the structure of the world each URL ha

j−1 (Q)

,

graph as a square transition probability matrix P whose element p

is the

where α is a scalar between 0 vector vThe Googleprobability vector that originally did so b and 1. that allows nonstate directly reachableT from is a reasoning was Theythis stochastic where the “personalization” > 0 every other state. T E = ev , matrix modelsperturbationteleporting to eeT /n to pages.to randomly jump at least a web surfer’s “teleportation” tendency More ¯ uniform probabilities for matrix E = particular P to form importantly, to a new T page by aentering the “personalization” perturbation to be a probability vector that allows non a URL on the T and it assumes that ev URL has from businessscalare, compreso command line, > 0 is of the form E =each permits where e il parametro α viewpoint, taking the evector v fra 0 1. T an “intervention” by probabilities for teleporting to¯particular (1upmore realistic and to at leas equal likelihood of beingwith v to adjust Google adopted a or down according less ¯ P = α values − α)E, uniform ﬁddling selected. Later PageRank P + pages. More importantly, commercial considerations a better (and the perturbation may be used as well, evT democratic from a business viewpoint, takingmore ﬂexible) perturbation matrix = but, permit stance by using [60]. Other perturbation terms to be of the form E ¯ ¯ in any case, where P + by − α)E iswith vT to combination of two stochasticdown according t P = α α is a scalar a convex adjust The Google reasoning was that thi “intervention” (1 ﬁddlingbetween 0 and 1.PageRank values up or matrices ¯ ¯ E = evT , P and Ecommercial models a web [60]. irreducible and hence tendencybe used as well,jum such that P is stochastic and Other perturbation terms may toarandomly bu matrix considerations surfer’s “teleportation” P possesses unique ¯ ¯ ¯ stationary distribution πT . It’s the − α)E is a convex combination called “the Google matrice in any case, P = αP +a URL onP that is now generallyand twoassumes that eac page by entering (1 matrix the command line, of it stochastic T ¯ matrix” and its E such thatvectorstochastic and irreducible vector that possesses a where the “personalization” distribution >T0isisthe probability and vector. P allows non-uniqu a real PageRank hence ¯ P and stationary P is v π

con

is almost certainly = αselected.Hence Google adjustment is necessary in o an equal likelihood of P reducible. T Later further adopted a more realistic and les being P + (1 − α)E, E = ev , 1 Clicking BACK or entering a URL on the command line is excluded in this by m irreducibility. Brin and Page force irreducibility into the matrix democratic stance by using a better (and more ﬂexible) perturbationpicture model

¯

¯

matrix models a web surfer’s “teleportation” tendency to randomly jump to a new page by entering a URL on the command line, and it assumes that each URL ha an equal likelihood of being selected. Later Google adopted a more realistic and les democratic stance by using a better (and more ﬂexible) perturbation matrix Successivamente Google decise di utilizzare una matrice di perturbazione differente: E = evT ,

non leas T rom a business viewpoint, taking the perturbation to be of the form E = ev permit Implementazione di PageRank “intervention” by ﬁddling with vT to adjust PageRank values up or down according t commercial considerations [60]. associa un punteggio basatoterms may beciascunaas well, but Bisogna notare che PageRank Other perturbation sullʼimportanza a used ¯ pagina = e¯ non (1 − sulla is a convex combination of two stochastic n any case, PWeb,αP +basato α)Erilevanza. WEB INFORMATION RETRIEVAL matrice EIGENVECTOR METHODS step fontamentali: FOR La sua implementazione prevede due ¯ is stochastic and irreducible and hence P possesses a uniqu ¯ P and E such that P ¯ 1. Si determinano π insiemi nodi che contengono i is now generally called stationary distribution gli T . It’sdithe matrix P thattermini inseriti nella query each node mi Forcing the irreducibility riordinati a seconda del punteggio di Pagerank e restituiti “the Googl by adding direct connections between 2. I T matrix” andnodi restituiti vengono its 15 TION RETRIEVAL stationary distribution π is the sense, only onevector. allʼutente overkill. To force irreducibility in a minimal real PageRank nonzero entry ne

dove vT è chiamato “vettore di personalizzazione” ed è tale che vT > 0. Lʼuso di questa particolare vector vT > 0 is a probability intervenire per where the “personalization” matrice permette agli ingegneri di Google di vector that allows “correggere” il vettore di PageRank, ad esempio per promuovere uniform probabilitiesper punire una pagina to particular pages.una punteggio. for teleporting che cerca di alterare il proprio pagina More importantly, at sponsorizzata o

In particolare, circle, eigenvalue on the unitper il calcolo del vettore di PageRank: C. D. MEYER canonical form (4.3). In other words, if rmuted to e [45]. This means that the I. stated. After specifying a value for simply Si speciﬁca un particolare valore per α; the tuning parameter α, set πT = eT /n, 0 P is guaranteed theset πTof P, /n, iterate EIGENVECTOR METHODS FOR WEB INFORMATI EIGENVECTOR METHODS FOR W II. si imposta tuning parameter to converge eT and T where n is α, size0 = T11 T12 0 ¯ ¯ or πT for III. si itera ﬁno al raggiungimento del grado desideratoC = convergenza , the MarkovP = matrix , where per la For an irreducible stochastic matrix, th T ¯ T 0 0 is one(4.5) reason why Brin and C πT 22For an P + (1 − α)vT all other eigenvalues have is only one ei k+1 = απk irreducible stochastic matrix, there modulus strictly 1 − α)vT Unlike HITS, there are now all other eigenvalues have modulus strictlytolessirreducible sto[ applied an than one ¯ untilprobability di PageRank power method applied to power methoddominant eigenvector—the P the desiredvector of convergence degree 15 CTOR METHODS FOR WEB INFORMATION RETRIEVAL is attained. ned. ¯ to the unique y positiveirreducible. Several other ways of forcing irreducibility have beenmatrix st an irreducible stochastic sugges en P is Convergenza This it. First, the method implementation (4.5) has a couple of things going the PageRank vector method and for it. First, the for the Google things going for le stochastic matrix, there is only one eigenvaluethethe unit circle,inherent. Second, evT main vector- forcing dinherent. Second, the main ¯vector- Google unique dominant eigenvector—the approach.vector analyzed [39, 10, 59], extreme to on seems is favor Page added the fudgestationary the E = the factor matrix does not destroy the but sparsity that to ducible stochasticlamatrix è una ¯ This stocastica ed irriducibile, for the Google matrix. ranking s have modulus strictly matrice P one [45]. and the PageRank vector essa hawith uniqueness of the This isv less than means that the Dato happy these of πT of Google’s sparse inner issues lʼunico autovalore no that the eigendistribution matrix che A rather multiplication sparse P requires onlyapproach is products, and these sparse accident matrice arse an irreducible stochastic products, ed to inner dominante and a 1, mentrekis guaranteed to converge pari tutti Page added gli altri autovalori fudge factor be used to start the iteration. nvergence isproducts are matrix Pimplemented in thestrettamente minori di 1. forcing imperative a crucialisissue, can matrix irreducibility. U allel. P + (1 − α)E is imperative Tin an advantageous manner. As pointed out earlier, Using parallelism inner easily parallel. allʼautovettore dominante, il ant eigenvector—the stationaryconvergenza del metodo delle potenzeUsing Even though the power method appli = α ¯ Questo assicuraorder aﬀected no issues with uniqueness of parallelism is vector, and any p la vector π for the Markov matrix iplications—it’s on matrix. This the ranking rative system/eigensolver methods is one reason why Briniterative system/eigensolver methods do for vettore di PageRank size. a Google the vector for the problem of this πT. do More advanced and converges to a unique PageRank vector, s t ymptotic deliver irreducibility. convergence of governed ogle’s fail to ratethedue web. èUnlike HITS, in practicema la velocità con cui due to scope of the of of of full convergence, butconvergenza, used to start failespecially considering the e the version forcing cruciale non dunquethe power method is to deliver by the degree ge factor matrixtheoreticalto immense can bethere are now thethe iteration. il metodo immense matr speed Lʼelemento la ueness of the rankingthe by and report andprobability vector the power method applied reportirredu onal complexity.requirements and increased computational di calcoli che devonoand Pageto PageRank op e of (4.5) converge, considerando soprattutto closest enorme complexity. Brin unlike For the is governed Page ationstorage Brinvector, the any positive la quantità subdominant eigenvalues. HITS, the Goo between anddominant Even though of billions, since essere the iteration. wer rate of eseguiti claimche PageRank lavora sullʼintero Web. As alluded earlier, the rate ¯ ic method—they dato show that if the respective¯ spectrums aretoσ(P) = {1, µ2of. .rateµ is the to a results . , atrix rather quick convergence withconverges matrix unique PageRank claim→usefulasymptoticconv it’sconvergenceto the results the simple power method—they vector,0,the in light ,of (4.4), easy to useful e power method 322, 000, 000. Tirreducible stochastic P which 2 so, a P of order n = applied convergenza nel metodo delleconsidering therate at“rate” λkmatrix-vector multip La velocitàvectoronly 50 power iterations on a P regolata dal = 322, 000,k000. di potenze è of order n of the ovvero he personalization the ratev convergence is a crucial issue, especially scope ¯ obtained are ue PageRank {1, λ , . . of rate at which (αµ2 ) → 0, regardless of t d σ(P) = vector, 2with , λ }, then in PageRank computation. andnimplepoiché 1) with very rg at least of2the matrix-vector multiplications—it’s on the order PageRank =PageRank operates the web for the scope µ ≈ have been several of billions, since unlike HITS,computation and imple-Googl There [2] in E evT . The structure of on tanford. Arasu et al. k suggest using recent advances in ike HITS, PageRank operates on Google’s version of the full web. As alluded thehigh probability, rate of convergence o asymptotic so the rate using down mentation, α one test to method. On Arasu mple powerhow fast proposedInexam-= αµk for tokearlier,3, . . . , n, et al. [2] suggestof convergence λkof by researchers at Stanford. other words, Google engineers can dictate .4) asymptotic rate of → 0. largely (4.5) is governed by the 2, = r, the k the according convergence[56] method On onvergenceGauss-Seidel the iteration in place of the simple power method. of (4.4), test examto how cially in the beginning the asymptotic rate rate at which λ2 → 0, so, in lightchoose one the asymptotic 0, so, in light of (4.4), of of convergence is the small they α to be. k in the beginning of the iteration ple,laet al., of theconvergenza è dunque regolataespecially regardless Google engineers are fo of k rd, Kamvar they reportvalue of convergence,at which (αµ2 ) → 0,.T have faster the personalization vector v sev- rate most da velocità di developedthe personalization T vector v Consequently,of the value pg.the → 0, regardless gardlessconvergence. One technique in E =at Stanford,structure et in E = developed sev- (or50 of the value of of researchers evT . The Kamvar of al., have forces [45, 1 evT µ = history. Another = ccelerate La balancing2 act. a di convergenza è 2 rm a delicateweb forces µWeb,1inoltre, forza µ2 = 1; di conseguenza la velocità the α is, the faster the convergen ructure of the struttura del group(or at least μ2 ≈ 1) with very The smaller web 2 9, ∆rateeralconvergence ofconvergence power how probability, so the makes convergencethe web 2 = 1to do Furthermore, toboils down to method of 0.accelerate hyperlink structure that µ is boils de the to high fast ’s 30]. of is,modiﬁcations true linkvelocemente αk →the web rate of it likely of of (4.5) used (or method, to speedlegata a quanto structure that . In the convergence. One technique maller α unicamente the the less (4.5) the e engineers can dictate extrapolation, how diﬀerentincreases dups µ uses 1). over the rate ofpower similar words, Google engineerstovalues for α can producetobetw of 50-300% Therefore, choosing according to farthermethod,1 can dictate the rate very diﬀe basic convergence α to Aitken’s ∆2 from speed convergence of conv ast importance. And slightly other to be the gap age be.2 ≈ quadratic ].toThe the In group of researchers The tendenti a 0, il metodo converge onlyvelocemente ma la same particolare,vector. have results show speedups of 50-300% over the basic drastically, but per valori di α più does convergence slow power PageRank small they choose nks. Moreover, are del→ 1,empiricallya convergence to PageRank. meno realistica; and engineers thus webtonot the delicate balancing act. α to be. [39]. Googleλ2 andmethods speeds per determinare il punteggio delle pagine è Google originally repor perform es aggregation as α minimal additional overhead [35]. The same group of researchers have struttura forced usata method,surface to well smaller α is, the less the true Google engineers are forced to perform with as the he faster = viceversa per valori di α tendenti a 1,that λ converge più lentamente114 usa (.85)114 < 10−8 the uses sues begin convergence, makes it il metodo = .85. Since4.4.2.) PageRank al algorithm to an adaptive ing αalso.85, which but methodlikely Consequently, aggregation methods tounaAccuracy. Anoth (λ2 ma = empirically 2 a BlockRank algorithm that uses of the web is developedweb più realistica. The smaller α is, the faster the convergence, We do not sma used to determine webpage importance. And slightly PageRank computations. but the ents of that roughly 114 As soon the struttura delvector. iterations PageRank lows achieve speedups of a factor of 2of theαpower algorithm it at least has to be method know an α can produce very diﬀerent PageRanks. Moreover, as One 1, not method gives an adaptive highon theto d → ﬁnal [34]. structure of the web is used accuracy enough or but uses ey slow no longer computed. Empiri- begin to surface as well are drastically, but sensitivity issues hyperlink to determine webpage ce −8 to monitor the convergence measures, elements of the PageRank vector. accuracy th ation the work the accuracydecreases of individual most likely a higher degree ofAs soon retur 10as issue is per iteration of for Google’s PageRank diﬀerent values for α can ranked pages thatdiﬀerent PageRanks % produce very Google commonly T be computed. Suppose is y withas partitions the P matrix intohaveHowever, on the conπiside, the maximally irreduci which Google this vector works, ent work in practical situations.only does convergence no longer between but 1.Empiriey need elements of implementation issueconverged, they are slowwill is known to 0 andsensitivityπissua drastically, follow a power law vector per iteration decreases or Zi ank Accuracy.and large list of nodes is the accuracy of work the often htweencal evidence shows speedier convergence by 17% as the that a small section of the tail of this vect no outlinks) Another proach clearlynondangling alters the with which nature of the tions. We do not know the accuracytrue [39]. Google works, chain much more than the minima

be added to the leading position in theAND 1)-block of zeros in P once it has b A. N. LANGVILLE (2, C. D. MEYER 14

throughout the iteration history [33]. Very recent work partitions the P matrix into n the problem is reduced by a factor

Accuratezza in PageRank Dato che il vettore πT e un vettore di probabilità, ogni componente i-esima di tale vettore sarà un numero compreso tra 0 e 1. Supponendo che il vettore sia di dimensioni 1 x 4 miliardi, sarebbe richiesta unʼaccuratezza di almeno 10-9 per poter distinguere le singole componenti presenti nel vettore, soprattutto per quel che riguarda le componenti in coda ad esso. PageRank Updating Google ha affermato che il vettore di PageRank è aggiornato una volta ogni 2-3 settimane. Nel frattempo è chiaro che la struttura del web venga notevolmente modiﬁcata. Il vettore di PageRank relativo ad un periodo precedente nn può dunque essere riutilizzato per calcolare quello aggiornato a causa dellʼintroduzione o della modiﬁca delle pagine o dei link fra le stesse. Questo problema rappresenta un problema molto importante per gli ingegneri di Google ed è ancora unʼarea in cui si effettua molta ricerca. Punti di Forza e punti deboli di PageRank Vantaggi: • Indipendenza dalla query nellʼassegnazione dei punteggi • Resistenza allo spamming in quanto PageRank è una misura globale • Flessibilità della personalizzazione tramite il vettore vT Svantaggi: • Topic Drift, in quanto il punteggio associato alle pagine è basato sullʼimportanza delle stesse, e non sulla rilevanza rispetto allʼinformazione cercata dallʼutente.

SALSA

20

Salsa nasce conN. LANGVILLEA. N. LANGVILLEforza C. D. MEYER di Hits e PageRank al ﬁne di superare le A. lʼintento di unire i punti di AND C. D. MEYER AND 20 principali limitazioni di entrambe le tecniche di Web Retrival. denotedLʼidea di base è quella di associare, dueE, where Vh is thedi Hub e uno di Authority, a ciascun G, is built. G is deﬁned by three sets: Vh Va , punteggi, uno set of hub denoted G, three sets: V , Va , nodes (all nodes in N with is built. G is deﬁned byset of authorityhnodes E, where Vh is the set of hub nodes nodo, usando le outdegreedi Markov. catene > 0), Va is the > 0), Va is the set (allauthority nodes (all nodes nodes (all nodes in N with outdegree of

in N with indegree > 0) and E is the set of directed edges in N . Note that a node in N with indegree > 0) and E with neighborhood graph given N in N may be in both Vh and Va . For the example is the set of directed edges inby . Note that a node Implementazione di Salsa and Va . For the example with neighborhood graph given by Figure 3.2, in N may be in both Vh

Figure 3.2,

h Inizialmente si costruisce, analogamente a quanto accade per Salsa, un grafo di vicinanza Va = {1, 3, 5, 6}. Vh = {1, 2, 3, 6, 10}, N associato ad una query. Va = an “authority side”. Nodes Successivamente si costruisce side” and bipartito The bipartite undirected graph G has a “hubun grafo{1, 3, 5, 6}.G con

V = {1, 2, 3, 6, 10},

in Vh are listed on the hub side and nodes in Va are on the authority side. Every The bipartite undirected graph G has a “hub side” and an “authority side”. Nodes directed edge in E is represented by an undirected hedge nodi di Hubtwo Markov V = { in G. Next, }

in Vh are listed on the hub side and nodes in Va are on the authority side. Every Va by an undirected edge directed edge in E is represented = { nodi di Authority } in G. Next, two Markov

E = { archi }

1 1

1

hub side 2 3 authority side

1

3

hub side

2

5

3

authority side

6

3 5

6

10

6

6 10 transition probability matrix H, chains are formed from G, a hub Markov chain with

Fig. 5.1. G: bipartite graph for SALSA

and an authority Markov chain with matrix A. Reference [41] contains a formula Fig. 5.1. G: bipartite graph for SALSA for computing the elements of H and A, but we feel a more instructive approach to building H and A clearly reveals SALSA’s connection to both HITS and PageRank. chains are formed from G, a hub Markov chain with transition probability matrix H, Recall that L is the adjacency matrix of N used by HITS. HITS computes authority Si costruiscono a partire matrix L, while PageRank computes a measure contains a formula and an authority Markov chain with matrix matrice di Adiacenza L, successivamente a and hub scores using the unweighted dal grafo G prima una A. Reference [41] partire authority score using a row-weighted matrix r we feelsiffatte: instructive approach to for computing the elements of H and A, but ed Lc a uses analogous to anda L si costruiscono altre due matrici LP. SALSA moreboth building H to A clearly hub and authority scores. Let r be HITS and PageRank. row and column weightingand compute itsreveals SALSA’s connection toLboth L Recall L = ognithe adjacency è Lc be L N used somma lungo la riga i-esima that L is its row sum and diviso per la by HITS. HITS with each nonzero row rdivided byelemento lij matrix of with each nonzero column computes authority divided by itsand Lc =sum. Then H, SALSA’sdiviso per consists of the nonzero computes a j-esima column scores elemento lij è hub matrix la somma lungo la colonna measure hub ogni using the unweighted matrix L, while PageRank METHODS FOR WEB INFORMATION RETRIEV EIGENVECTOR rows and columns of Lr LT and A is the nonzero rows and columns of LT Lmatrix P. SALSA uses both analogous cto an authority score using a row-weighted r . For our c running example (from Figure 3.2), row and column weighting to compute its hub and authority scores. Let Lr be L and with each nonzero10 divided by its row 2 3 and Lc be L with each nonzero column 1 2 3 5 6 row 1 sum 5 6 10 1 2 3 5 6 10 1 divided by its 1 0 0 0 1 0 column sum. Then H, SALSA’s hub matrix consists of the nonzero 0 1 0 1 0 0 1 0 1 0 1 0 0 1 2 2 2 3 rows and0columns of r LT and A is the nonzero rows and columns of LT Lr . For our L c 1 0 1 0 0 0 0 0 2 0 0 0 2 2c 1 0 0 0 0 0 0 0 0 0 1 0 3 running example (from Figure 3.2), 0 0 0 0 1 0 3 3 0 0 0 0 1 0 L= Lr = 3 . Lc = 0 0 0 0 0 0 , 0 0 0 0 0 0 , 5 5 5 0 0 0 0 0 0 10 1 10 0 0 1 1 1 0 2 3 5 6 6 0 0 1 1 0 0 2 3 5 6 6 0 0 0 1 1 0 0 2 2 2 1 16 0 0 10 0 0 0 1 1 0 0 0 1 0 1 10 0 0 0 0 11 0 0 2 0 210 0 0 0 0 0 1 0 0 3

2 1 3 0 L= 5 0 6 0 10 0

0 0 0 0 0

0 0 0 1 0

0 0 0 1 0

0 1 0 0 1

0 0 0 0 0

,

Hence

2 1 0 0 0 0 0 3 0 0 0 0 1 0 , Lr = 10 5 0 10 2 30 50 6 0 0 5 21 3 2 0 12 01 12 00 120 6 1 0 2 2 12 00 1 00 01 0 0 0 10 2 0 0

Lr LT = c 3 1 3 5 0 6 1 4 1 10 3 0 0 0 0 0 1

1 3 1 3

0 0 0 0

0 0 0

3 4

, 0 0

1 3 1 3

1 1 1 2 0 3 0 LT Lr = c 5 0 6 0 10 0

T

2 0 0 0 0 0 0

3 0 0

1 2 1 2 1 6

5 0 0

0

0 0

1 4 1 2

0 0 2 0 3 0 0 12 0 12 12 1 12 0 0 210 0 0 0 1 0 0 0 2 0 and 3 0 0 0 2 0 31 5 6 10 0 1 1 12 3 5 6 1 1 3 1 Lc = . 3 0 0 1 30 0 0 0 0 30 0 5 0 35 05 0 60 10 3 0 2 2 T 3 1 2 3 5 6 10 1 12 12 1 1 L 0 12r c = 5 6 0 2 30 1512 6 0 10 L12 5 0 1 0 0 0 0 0 2 1 0 00 1 0 0 0 2 3 1 2 3 5 6 10 1 2 0 2 0 0 0 0 0 00 111020 0 00 00 0 0 1 0 0 0 2 0 12 0 12 12 1 0 1 12 1 1 1 A questo punto costruiscono le 2 1LrLTc e 0 TcLr3 0 0 0 0 1 6 T 1 3 0 0 10 0 4 3 4 00 si 0 1 0 matrici 1 0 L 0 1 3 0 3 0 1 0 0 1 0 0Hence 2 3 2 4 4 0 0 0 0 Lr LT 0 0 03 0 0 0 3 , Lc Lr = 2 = 3 1 0 0 1 1 c 1 2 1 5 0 10 5 0 0 1 1 1 0 0 2 0 0 10 2 3 0 00 01 01 1 0 0 0 0 10 1 3 0 0 0 0 6 10 1 1. 2 3 L 1 Lc 3 5 3 0 3 Lc 0 3 0 0 3 , 0LT r =2 3 = 50 0 0 04 04 0 0 3.2 0 3 5 6 103 6 0 30 1 0 5 30 6 0 14 1 0 1 = . 4 6 6 0 0 0 0 5 0 0 0 0 1 c 5 0 5 2 0 0 2 1 3 0 0 0 0 1 0 12 122 12 0 10 0 0 0 1 0 0 0 0 10 0 0 0 0 0 0 10 12 1 1 Taking 0 0 0 0 0 0 0 15 0 0 1 3 just the nonzero rows and colum 0 0 1 0 12 3 3 6 0 0 2 0 0 1 0 103 0 0 0 2 1 0 1 6 0 60 00 0 1 2 6 1 20 0 0 1 1 1 T 0 4 4 6 0 0 0 0 0 1 0 0 3nonzero 0 3 3and0columns 4 Lr L0 form H gives 3 1 Taking 10 0the 1 0 3 3 to just 0 0, 0 T L0 = 0 0 2 of 4 c . rows 0 1 1 1 T 3 Lr Lc = Lc r 0 3 0 0 10 0 1 1 3 3 5 0 0 1 2 0 0 5 0 0 0 0 0 0 2 1 5 Hence 1 6 6 4 0 1 0 3 6 0 0 1 2 0 35 0 10 0 1 4 6 5 6 1 5 10 1 1 12 nonzero 2 3 and6columns of10 r LTto form 650 gives L 2c 0 2 5 30 H 6 1 10 1 20 3 5 rows 101 0 1 0 0 03 5 0 6 210 0 2 1 13 1 0 0 0 0 3 0 3 3 10 2 3 2 0 12 0 12 12 1 2 0 5 12 1 12 12 12 12 1 2 3 2 2 2 0 1 0 0 0 Taking just 12 nonzero rows and columns Lr LT to form 1gives 0 0 0 0 0 0 0 0 0 1 H0 0 the 0 0 0 of 1 12 3 2 6 1210 0 0 12 012 0 c 1 1 H = 3 1 0 1 1 1 1 1 1 3 H = 2 0 0 03 0 0 0 30 3 3 3 0 e,3 inﬁne,0si ottengono2le matrici: 0 0 2 04 40 0 5 2 0 3 3 0 2 0 T . 3 , L r = 1 0 1 . 3 2 6 10 3 1 1 cL 1 0 6 1 6 0 00 1 0 1 3 1 0 0 5 0 0 0 01 0 12 0 12 512 0 12 1 2 05 2 0 1 1 0 4 4 4 3 23 3 0 3 02 0 1 1 2 3 4 4 1 1 6 1 0 1 02 rT 0 1 0 6 0 0 0 1 0 5 3 , LT Lr =1210 1 00 L 3c 0= 4 4L 10 2rLTc 12 12c 12 0 5 pari 3 1 1 0 0 10da6 L0 6le righe e le colonne 0 0a 0 0 1 0 0 3 0 . 3 0 01 1 3 H 0 0 1 5 rimuovono 0 2 2 10 0 H1matrice di1Hub0= si 0 10 0 0 0 0. 0 00 0 3 3 3= 3 1 0 1 3H 3 1 L 3 0 3 6 colonne 1 3Authority3= si0rimuovono3da LTcr le 1righe e 1le. 0 0 pari a00 5 0 0= 3 0 0 A matrice di1 6 4 6 6 3 3 st the nonzero rows and 4 10 Lr L1 to 04 Similarly,6 1 0 0 6 columns of 0T form1H gives 0 41 0 00 4 0 10 0 0 0 0 0 c 0 4 3 3 1 3 1 Similarly, 1 1 1 10 0 0 10 1 1 2 3 3 0 10 0 3 3 3 6 1 3 5 6 3 3 H Taking5just the nonzero rows and columns of Lr LT to form gives 2 3 2 c 1 0 12 12 12 12 1 1 0 0 0 Similarly, 0 0 1 0 2 0 1 1 1 2 3 6 10 3 0 1 4 4 1 2 1 . 0 H= 3 1 0 1 . 5 1 3 25 63 A 2= 1 3 3 3 50 1 2 0 3 1 2 6 1 0 0 12 10 0 12 12 12 3 4 45 0 6 1 0 0 1 5 1 03 0 1 1 1 6 1 1 A= 10 2 3 01 1 04 40 0 0 6 0 6 0 3 3 3 1 5 = 1 1 1 0 0 0 H = 3A 5 00 2 11 00. 1 . 2 then 3 A If G is 3 connected, 32 5 H and are both irreducible Markov chains and π 1 6 1 30 1 1 1 6 6 00 6 00 63 0 2 EIGENVECTOR METHODS FOR WEB INFORMATION RETRIEVAL1

3

Hence

1

. 4 A = due1 4 4 stationary si4torna ad osservare la struttura the query with neighborhood gr Per calcolare1i 0 3 2vettori dei punteggivector of H, gives the1hub scores for di G. H and 1 1 irreducible 5 61 5 If G is connected, then T 10 A areauthority 0 3Markovis not connected, then H and A contain 0and πa gives the 3 0 scores. If G chains and πT , the both 3 2 2 h 16stationary vector of5H, gives the hub scores for the querythis If G is connected, then H and A ar 1 00 1 0 0 0 neighborhood graph N ple irreducible components. In withcase, the global hub and authority scores m 1 6 1 1 6 • Se G è connesso 4 4 authority scores. If G is not connected, then H and Avector multi- gives the hub sco 30 T stationary contain of H, and a the Similarly,0 π2 gives 0 . pasted together from the stationary vectors for each individual irreducible comp A= 1 1 T 5 irreducible components. Markov chains and πand the ected, then H and Aplevettori2 stazionari di HIn thisrispettivamentejustiﬁcation forscoresabovebe if-then statements.i are 2both irreducible e A, case, the global hub T ,eauthority the must two gives the i punteggi scores. If G h si calcolanopasted together from(Reference [41] contains5the6 and πa contenenti authoritydi 6 i0 1 0 5 the stationary vectors 3 irreducible component. 1 r of H, gives theAuthority 6 for 6the query with neighborhoodindividualN A matrices for our example, this case hub(Reference [41] contains the justiﬁcation forfor each theple if-then statements.) scores graphirreducible components. In it is easy From the structure of H and Hub e T the above two that the 1 connected. 0 multieconnected, then H and A are is notthe structure ofthen not andhAthe 0 for our H and A containtomultiplestationary vect authority scores. If G both irreducible Markov chainsand 1π matrices Thus, example, it is easy seethe connected compo connected, G isHHandA ,0 containpasted together from From 1 vector of H, gives the hub scores for the query with neighborhood graph N 1 1 H contains two 3, 6, 10}, wh that the global hub Thus, 3 and A 2contain components, C = {2} and D components. omponents. se this case,G is not connected. and A H 0connected multiple connected[41] contains = {1,justiﬁcatio • scores. If G è connesso A authority scores . ves the authority InG NON is not connected, then Hand= contain multi- 4 4 must be 1 0 C 1= {2} and(Reference 10}, while A’s D = {1, 3, F = {3, 5, 6}. Also the from the str 0 connected 5 clear H the global hub and authority scores must are E 6, cible components. In this case, contains two connected components, 2 be 2component. from the stationary vectors for each individualcomponents{3, 5, 6}. = {1} andfrom the structure of the H and A irreducible 5 connected components are and A is the periodicity Alsothe Markov chains. All irreducible compon = clear 6 and F 1 0 6 of H E = {1} component. 0 Si determinanothe structure of From così i vettori ether from the justiﬁcation for the above two if-then statements.) implica che of contiene più cateneirreducible di contains the stationary vectors for each individualand Markov irriducibili. 6 All irreducible components are H e [41] containsstazionari per H and A is sottocatena. of the Markov chains. implying thatnot chains of T aperiodic. The stat the justiﬁcation ciascuna the periodicity contain self-loops, that G is the connected. Thus, H and for the above two if-then A statements.) H connected, for H example, irreducible components chains and π tructure of of the H If G Aand matricesthen our implyingtwoto the irreducible see and A A contain self-loops,and the thatitsee easyare Markov of stationary h , the to the structure the H and is matrices for ourvectors for A are bothischains H aperiodic. TheH are example, it is easy two connected In particolare, nellʼesempio gives the hub si possono are query contains A le seguenti sotto components vectorscontain multiple connected components. individuare in neighborhood graph N for of H, stationary A A containirreducible components of Hthe vectorthe two considerato, scores for components. H e with not connected. Thus, H and onnected. Thus, H and multiple connected catene: πT gives the authority 2 1 3 10 connected components6are ns connected components, C {2} and D =scores.210}, while A’s 1while6 then H and A contain multi-E = {1} and two connected components, C = = {2} and{1, 3, 6, If{1, is not connected, A’s and a D = G 3, 6, 10}, 1 3 1 10 T T 1 1 1 1 T clear from the structure components are Eple{1} and F = {3, 5, 6}. Also(C) = this case, thehglobal hub andand A= the periodicity of the M = 1 , , πT π (C)3= 1of H 1 π (D) is 3 3 6 , ponents periodicity ofirreducible components. Also 1Dalla matrice H structureauthority scores6 must be are E = {1} and F chains. 5,πhirreducible clear h (D) =the 6 3 6 ,h = {3, All6}. In fromof A is the the Markov components pasted together from the stationary vectors for each individual irreducible component. H and contain self-loops, of the Markov chains. All irreducible } stationary the periodicityimplying that the chains are aperiodic. TheC = { 2components of A contain self-loops, implying t (Reference [41]H are contains the justiﬁcation for the above two if-then the two irreducible compone r the two irreducible components of the chains are aperiodic. 6, 10 stationary for statements.) = { A matrices vectors in self-loops, implying thatstructure of the HDand 1, 3, The } for our example, it is easy to see From the 2 two irreduciblethat G is not connected. Thus, H and A contain multiple connected components. components of 1 3 6 10 H are 2 T 1 1 1 1 πh (C) = 1 , πT (D) = dalla matrice A h 3 6 3 6 , H2contains two connected3 components, C = {2} and D = {1, 3, 6, 10}, while A’s 1 6 10 E = { 1 } πT (C) = 1 , π h connected T components1are 1E = {1} and F = {3, 5, 6}. Also clear from the structure T 1 1 F = , { 3, 5, 6} πh (C) = of1H, and h (D)the periodicity 3of the Markov chains. All irreducible components of πA is = 3 6 6 H and A contain self-loops, implying that the chains are aperiodic. The stationary I vettori stazionari per ogni catena saranno i seguenti: vectors for the two irreducible components of H are A. N. LANGVILLE AND C. D. MEYER 22 2 1 3 6 10 T T 1 1 1 1 π (C) = , πh (D) = 3 6 3 6 , while the stationaryhvectors for1the two irreducible components of A are

πT (E) a = 1 1 , 3 πT (F ) a =

1 3

5

1 6

6

1 2

.

Proposition 6 of [41] contains the method for pasting the hub and authority scores for the individual components into global scoring vectors. Their suggestion is simple and intuitive. Since the hub component C only contains 1 of the 5 total hub nodes, its stationary hub vector should be weighted by 1/5, while D, containing 4 of the 5

Lʼultimo step consiste nellʼunire insieme gli n vettori di authority in un unico vettore e gli n vettori di hub in un altro vettore. Il meccanismo per unirli è il seguente: I. Ogni componente del vettore si moltiplica per x / n , dove x rappresenta il numero di nodi nellʼinsieme considerato, ed n il numero totale nellʼinsieme di Hub/Authority

N.B. Nel calcolo dei vettori stazionari, è cosa desiderabile in SALSA avere dei graﬁ non connessi, poiché riduce il costo computazionale legato al calcolo degli stessi (saranno ovviamente vettori di dimensioni piccole rispetto allʼintero grafo) Punti di forza e punti deboli di SALSA Vantaggi: • Non soffre del problema del topic drift • meno suscettibile allo spam poichè i punteggi di Hub e Authority sono meno inﬂuenzabili rispetto a Hits • mantiene il doppio ranking, basato sullʼimportanza, come in PageRank • poco costoso se il grafo G è disconnesso Svantaggi: • dipendente dalla query • se il grafo di vicinanza è riducibile, i vettori stazionari risultanti potrebbero non essere unici, e quindi il metodo potrebbe nn convergere.

Sign up to vote on this title

UsefulNot usefulBreve sommario sui principali metodi di Web Retrival: Hits, PageRank e Salsa

Breve sommario sui principali metodi di Web Retrival: Hits, PageRank e Salsa

- BookChapter_11 good one.pdfby jyothimonc
- Kalmar-Nagy Graph Decomposition Methods for Uncertainty Propagation in Complex, Nonlinear Interconnectedby kalmarnagy
- (Discrete Mathematics and Its Applications) Jason J Molitierno-Applications of Combinatorial Matrix Theory to Laplacian Matrices of Graphs-CRC Press (2012)by Anonymous qbKJwPZ7cT
- Matrices KLby man

- Matrices L
- Matrices L
- consensus
- LdpccDel1-1
- Introduction to Combinatorial Matrix Theory
- Matrices LMNOP
- Sec 93
- control
- BookChapter_11 good one.pdf
- Kalmar-Nagy Graph Decomposition Methods for Uncertainty Propagation in Complex, Nonlinear Interconnected
- (Discrete Mathematics and Its Applications) Jason J Molitierno-Applications of Combinatorial Matrix Theory to Laplacian Matrices of Graphs-CRC Press (2012)
- Matrices KL
- Louis H. Kauffman- Math423HW Graph Theory
- 9780387952413-t1
- Geo310 Exercise 2 - Connectivity
- LU HW2 Solutions
- deKerchoveNV07
- Sci Lab Symbolic Operations
- MatricesANDVectors_V2016.1
- m.tech- Dig Comm. - Adv. Engg. Maths- Mdlc11
- Note16_Matrix_Algebra1_Basic_Operations.pdf
- Miller+Bighash -Intro Random Matrix Th 74pp
- matrices-determinants.pcdf
- dp
- hw4-sol
- Operations on Vectors and Matrices
- Matrices
- Matrix Algebra
- 2005 q 0031 Density Matrices 2
- Lecture 4
- HITS e Pagerank

Are you sure?

This action might not be possible to undo. Are you sure you want to continue?

We've moved you to where you read on your other device.

Get the full title to continue

Get the full title to continue reading from where you left off, or restart the preview.

scribd