Professional Documents
Culture Documents
INVERTED INDEX
Chapter 4
LET’S WALK THROUGH A SAMPLE TEXT
<d2, The roof is a tin roof. There is a tin can on the roof.>
<d3, Cat kicks the can. It rolls on the roof and falls on the next roof.>
Still the input the reducer is sorted by the value but by the key
<this,<d1,1> <is, <d1,1>> <cat, <d1,1>>, <cat, <d1,1>>, <sits, <d1,1>> <on <d1,1>> <roof, <d1,1>>
<this <d2,1>> <roof, <d2,1>> <is, <d2,1>> <tin <d2,1>> <can, <d2,1>> <on <d2,1>>, <roof<d2,1>>
<cat, <d3,1>> <kicks, <d3,1>> <can, <d3,1>> <it, <d3,1>> <rolls, <d3,1>>, <on, <d3,1>> <roof, <d3,1>> <and <d3,1>>
<falls, <d3,1>> <on, <d3,1>> <next,<d3,1>> <roof<<d3,1>>
<this,<d4,1>> <cat,<d4,1>> <rolls,<d4,1>> <too,<d4,1>> <it,<d4,1>> <sits,<d4,1>> <on,<d4,1>> <cam,<d4,1>>
<this, [<d1,1><d2,1><d4,1>>
IMPLEMENTATION
• From Baseline to an improved version
• Observe the sort done by the Reducer. Is there any way to push this into the MR runtime?
• Instead of
• (term t, posting<docid, f>)
• Emit
• (tuple<t, docid>, tf f)
• This is our previously studied value-key conversion design pattern
• This switching ensures the keys arrive in order at the reducer
• Small memory foot print; less buffer space needed at the reducer
• See fig.4.4
MODIFIED MAPPER
method reduce (tuple <t,n>, tf [f1, ..]) This means a new term
if (t # tprev) ^ (tprev # 0)
{ emit (term t, posting P);
What is Initialize and Close?
reset P; }
P.add(<n,f>)
tprev t
Close
Observe that reducer
emit(term t, postings P) gets you the terms in the format you
need for answering queries.
IMPROVED MR FOR II 10/27/2021
class Reducer
class Mapper
method Initialize
method Map(docid n; doc d) tprev = 0; P = new PostingsList
H = new AssociativeArray method Reduce(tuple <t, n>; tf [f])
method Close
emit(term t, postings P)
ADVANTAGE OF MODIFIED
INVERTED INDEX
• Pseudo-code of a scalable inverted indexing algorithm in
MapReduce.
• By applying the value-to-key conversion design pattern, the
execution framework is exploited to sort postings so that
they arrive sorted by document id in the reducer.
• Also removal of Sort in reducer results in efficient reducer.
OTHER MODIFICATIONS