Professional Documents
Culture Documents
Professor
Student Name
Student Number
Assignment 1
Question-1:
1.(25) Please prove the following equalities and inequalities:
10n (n2).
100n = (n).
22n (2n).
Answer:
10n (n2)
Lets assume that 10n (n2). The 3 constant c1, c2, and n0 are defined as:
c1 n2 10n c2 n2
Formula of -notation:
(g(n)) = {f(n) : positive constants c1, c2, and n0, such that n n0, we have 0 c1g(n) f(n)
c2g(n)}.
100n = (n)
Let us assume equation is correct then there are 3 constants c1, c2, and n0.
As per -notation formula:
c1n 100n c2n
For every value of c1 100 and 100 c2 we can find out the value of n.
Eg: 99n 100n 101n. This equation holds for every value of n when c1 100 and
100 c2.
Hence proved.
22n (2n).
Let assume that 22n (2n). Then there will be 3 constants c1, c2, and n0.
As per the formula: (g(n)) = {f(n) : positive constants c1, c2, and n0, such that n
n0, we have 0 c1g(n) f(n) c2g(n)}.
c12n 22n c22n
We select right hand side of the function to proceed with it.
22n c22n.
2n c 2
n log c2
So, above formula only holds when n log c2. Equation will not hold when n >
log c2.
22n (2n) hence proved..
Question-2:
(15) In Fig. 2, we show a network, in which each node stands for a page and each
arc for a link from a page to another. Please give the transition matrix for the
network. Also, explain why the solution to the equation:
A = MA can be used as the estimation of page importance, where A is a vector of n
variables and M is an n n transition matrix.
Answer:
Transition Matrix:
P1
0
P2
0
0
0
0
0
P3
0
1
0
0
0
P4
0
0
P5
0
0
0
1
0
The web navigation for the above transition matrix can be expressed as random
walker move.
Let M has sxy entries in row x and column y, where:
1.
sxy = 1/r if page y has a link to page x, and there are a total
2.
After a large number of moves, the walkers distribution of possible locations is the
same at each step. To overcome this, the solution A = MA can be used as the
estimation of page importance, where A is a vector of n variables and M is an n n
transition matrix.
So, the time that the random walker spends at a page is used as the measurement
of importance.
After 50 to 100 iterations of this process, the amount of time spent by the user on
the particular page on Web will be exactly close to the above results.
Advanced Algorithm Design
Assignment 1
So, the equation A = MA helps in finding the amount of time the user spends on the
page and this can be used as the estimation of page importance
Question-3:
(10) Explain why the following equation (for estimate the importance of pages)
works in the presence of spider traps and dead ends.
Pnew = MPold + (1 - )T
Answer:
When a user enters a set of pages where there is no link outside the set, its
called Spider Trap.
When a user enters a page where there is no link to the outside world, its
called Dead End.
In both the above scenarios, the user gets stuck and the walk ends.
If we apply the relaxation to the matrix of Web with Spider Traps, it can result in a
limiting distribution where all probabilities outside a spider trap are 0.
Limiting random walker is allowed to wander at random. By doing this, the walker
follows a random out-link, with probability (normally, 0.8 0.9) and with
probability 1 - (called the taxation rate), we remove that walker and deposit a new
walker at a randomly chosen Web page.
Using the above strategy,
i.
If walker gets stuck in Spider Trap, after few time steps, walker will disappear
and replaced by a new walker
ii.
If the walker reaches a dead end and disappears, a new walker takes over
shortly
Let Pnew and Pold be the new and old distributions of the location of the walker, after
1 iteration, we can express the relationship between them as following:
Pnew
= 0.8
1-
Transition Matrix M
Pold + 0.2
Fraction Of Time
Based on the above equation, if we multiply the transition matrix with the
probability of 0.8, we can get the new location of the walker and with 0.2 probability
we can start the walker from the random place that helps the walker to come out of
the dead end or spider trap situation.
This is the reason why Pnew= MPold+ (1 - )T is used to overcome dead end or
spider traps because it helps in to move the walker out of the situation.
Question-4
(20) Fig. 3 shows a tree encoding. The quadruples can be stored as a sequence
sorted by LeftPos values by using the depth-first search. Design an algorithm to
transform it into another sequence sorted by RightPos values.
T:
Answer:
Algorithm:
Let X(i) be all data streams sorted by LeftPos.
Let R(i)be new data streams sorted by RightPos.
Begin
repeat until each X(i) becomes empty
{
identify i such that the first element v of X(i) is of the minimal LeftPos value;
remove v from X(i);
while Stack is not empty and Stack.top() is not v s ancestor
do
{
d Stack.pop();
Let d = (j, u);
put u at the end of R(i);
}
Advanced Algorithm Design
Assignment 1
Stack.push(i, v);
}
Stack = Pop out all the remaining elements
Insert into corresponding R(i);
End
Question-5:
(15) In the following table, we show the key words of five documents, as well as
the key word sequences sorted by frequencies. Please construct a trie for the sorted
sequences and a header table for all the key words to speed up the evaluation of
conjunctive queries of form word1 word2 wordi. Also, show how a
conjunctive query is evaluated by using the trie.
DocID
Items
12 f,a,a,
c,c,i, h,
c,
j,c,m,
f,ff,Sorted
i,pa,
a,b,
m,hp, j
b,
item
34 b,
b, i, fi c,
f, b,
i
b,
5 a, f, c,
c, m,c,pf,sequence
a, im, p
Answer:
Frequency of each word is found by the following:
af(w) =
af(j) =1/5
Root
Header Table
Item
s
c
Links
{1,2,4,5}
c
o
{1,2,3,5}
{1,2,5}
{2,3,4}
{2,3,4}
i
m
{1,3,4}
{1,5}
{1,5}
h
{1,5}
{1,5}
b
o
f
o
{1,3,4} i
a
o
m
o
f
o
{2,3,4}
b
o
{1,2,3,5}
{1,2,5}
a
o
b
{2,3,4}
o
h
{1,3,4}
i
p
{2}
{2}
{1,5}
{1}
o Wordi1 Wordik
Find a node in the trie, which is labeled with word i1
If the path from the root to wordi1 contains all wordj(j = 1, , k), Return the
document identifiers associated with wordi
The check can be done by searching the path bottom-up, starting from word i1.
In this process, we will first try to find word i2 , and then wordi3, and so on.
Example:
We have a query say: c b f
The frequency of each query word:
af(c) = 4/5
af(b) = 3/5
af(f) = 4/5
After sorting the frequencies in increasing order we have the result as:
bfc
Root
Header Table
Item
s
c
Links
f
a
b
o
b
o
a
o
b
o
f
o
i
m
f
o
c
o
a
o
p
h
j
m
o
p
Question-6:
(20) The following is a directed graph G. Please find a spanning tree of it and then
label the nodes in the spanning tree by intervals. Also, construct an interval
sequence for each node, which can be used to check the reachability queries with
respect G.
Answer:
Spanning Tree:
a[0,13)
b[1,6)
c[2,5)
j[12,13)
r[6,10)
d[5,6)
e[7,10)
h[10,13)
i[11,12)
p[3,5)
f[8,9)
g[9,10)
k[4,5)
a[0,13)
b[1,6)
c[2,5)
d[5,6)
r[6,10)
h[10,13)
e[7,10)
i[11,12)
j[12,13)
p[3,5)
f[8,9)
g[9,10)
k[4,5)
Topological order: a, b, h, j, r, e, i, f, g, c, p, d, k
Reverse topological order: k, d, p, c, g, f, i, e, r, j, h, b, a
L(k) = [4, 5)
L(i) = [4,5)[5,6)[8,9)[11, 12)
L(d) = [4,5) [5,6)
L(e) = [7,10)
L(p) = [3, 5)
L(r) = [2,5)[6, 10)
L(c) = [2, 5)
L(j) = [2,5)[6,10)[12,13)
L(g) = [4, 5)[5, 6)[9, 10)
L(h) = [4,5)[5,6)[7,10)[10,
13)
L(f) = [4, 5)[5, 6)[8, 9)
L(b) = [1,6)
L(a) = [0, 13)
Reachability Query Check:
Let u and v be two nodes of G.
Advanced Algorithm Design
Assignment 1
u is a descendant of v, if and only if, there exists an interval [, ) in L(v) such that
u [, ).
Example:
[f, f ) = [4, 5)[5,6)[8,9)
L(h) = [4,5)[5,6)[7,10)[10, 13)
Interval of f is in the interval of hImplies
of node h.
END