You are on page 1of 63

echizen_tm

Sep. 16, 2012


(Wavelet Matrix)

(1)

(2)

1
1

(=)
()



2000

ID1,000,000ID5,000,000
20

5000

20005000
500

()2


(1)
FM-Index
(Suffix Array)

BWT(Burrows Wheeler)
RBWT(Burrows Wheeler)

RBWT


(2)
gwt
tb_yasu

tb_yasu

http://d.hatena.ne.jp/tb_yasu/
20120909/1347196146

SPIRE2012


echizen_tm
Sep. 30, 2012


(1 slide)
(1 slide)
(9 slides)
(11 slides)
access (6 slides)
rank (9 slides)
(1 slide)
(1 slide)
(2 slides)
rank2(7 slides)


ID: echizen_tm
: EchizenBlog-Zwei
: web
:
()

:
:



2012102125

SPIRE

()

(1/9)

()


(OK)

(2/9)
3


LOUDS

(3/9)

access(i):

rank(b, i):

ib

select(b, i):

ib
()

(4/9)
access(b, i)

0110
access(0) = 0, access(1) = 1

access(2) = 1, access(3) = 0

(5/9)
rank(b, i)

0110
rank(0, 1) = 1, rank(0, 2) = 1

rank(0, 3) = 1, rank(0, 4) = 2
rank(1, 1) = 0, rank(1, 2) = 1
rank(1, 3) = 2, rank(1, 4) = 2

(6/9)

access(i):

rank(c, i):

ic

select(c, i):

ic
()

(7/9)
access(c, i)

abbc
access(0) = a, access(1) = b
access(2) = b, access(3) = c

(8/9)
rank(c, i)

abbc
rank(a, 1) = 1, rank(a, 2) = 1

rank(a, 3) = 1, rank(a, 4) = 1
rank(b, 1) = 0, rank(b, 2) = 1
rank(b, 3) = 2, rank(b, 4) = 2
rank(c, 1) = 0, rank(c, 2) = 0
rank(c, 3) = 0, rank(c, 4) = 1

(9/9)

DSIRNLP#2

access, rank

(OK)


(1/11)

access
rank
(access)

rank


(2/11)

2


(3/11)


(4/11)
()
1


(5/11)
2()

1()0


(6/11)
2()

1()1


(7/11)
[0, 3]
[4, 7]

,,,

,,,


(8/11)
3()

2()0


(9/11)
3()

2()1


(10/11)

[0,1][2,3][4,5][6,7]4


(11/11)


access(1/6)
access(0)


1()01
2()

rank(1, 0) = 0

20(6)


access(2/6)
access(0)


2()60

3()

rank(0, 6) = 4

34


access(3/6)
access(0)
100(2)
104access(0) = 4


access(4/6)
access(5)


1()50
2()

rank(0, 5) = 1

21


access(5/6)
access(5)


2()11

3()

rank(1, 1) = 1

31(8)


access(6/6)
access(5)
010(2)
102access(5) = 2

rank(1/9)
rank(4, 10)

rank(2/9)
rank(4, 10)

1()
1

rank(3/9)

rank(4, 10)


1()1

2()

10

2(6)

rank(4/9)

rank(4, 10)


1()1

2()

110 rank(1, 10) = 5

25(11)

rank(5/9)
rank(4, 10)

2()

rank(6/9)

rank(4, 10)


2()0

3()

26 rank(0, 6) = 4
34

rank(7/9)

rank(4, 10)


2()0

3()

211 rank(0, 11) = 7


37

rank(8/9)
rank(4, 10)

3()

rank(9/9)

rank(4, 10)


3()

4051

rank(0, 4) = 1 01
rank(0, 7) = 3 03

3 1 = 2 42
rank(4, 10) = 2

access, rank, select

access, rank, select


[1] The Wavelet Matrix

WT:
WTNP: Levelwise

WM:

RG: R.Gonzalez
BV

RRR: R.Raman,

V.Raman, S.S.Rao
BV

(1/2)
[1] The Wavelet Matrix

Claude & Navarro; SPIRE2012


http://www.dcc.uchile.cl/~gnavarro/ps/
spire12.4.pdf

[2] (takeda25)
http://d.hatena.ne.jp/takeda25/

[3] EchizenBlog-Zwei(echizen_tm)
http://d.hatena.ne.jp/echizen_tm/

(2/2)

libcds: (F. Claude)

https://github.com/fclaude/libcds

wavelet-matrix-cpp: takeda25
wat-arrayrank()
https://github.com/hiroshi-manabe/wavelet-matrix-cpp

shellinford: echizen_tm
FM-Indexrank
https://code.google.com/p/shellinford/


rank2(1/7)
The Wavelet Matrix
rank
bv.rank
2 * log()
2 * log(256) = 16

* log()

bv.rank
log(256)
256*64bit=2KB
bv.rank8


rank2(2/7)
rank

2
bv.rank

(rank(c, i))
depth = 0, begin = 0, end = i
while (depth < log()) {

bit = cdepthbit
begin = bv[depth].rank(bit, begin)
end = bv[depth].rank(bit, end)
depth++;
}


rank2(3/7)
(rank(c, i))
depth = 0, begin = 0, end = i
while (depth < log()) {

bit = cdepthbit
begin = bv[depth].rank(bit, begin)
end = bv[depth].rank(bit, end)
depth++;

2
beginend
begin
(256)


rank2(4/7)
rank
2
beginend
begin
(256)

2
begin

bv.rank
begin
*log()
beginend
beginend


rank2(5/7)
rank(c, i)
depth = 0, begin = 0, end = i
while (depth < log()) {

bit = cdepthbit
begin = bv[depth].rank(bit, begin)
end = bv[depth].rank(bit, end)
depth++;

rank(c, i)
depth = 0, begin = cbegin, end = i
while (depth < log()) {
}

bit = cdepthbit
end = bv[depth].rank(bit, end)
depth++;


rank2(6/7)
begin
2
begin
begin
bv.rank(0, begin), bv.rank(1, begin)
rank(0, i)
rank(1, i)

rank(4, i)
rank(5, i)

rank(2, i) rank(6, i)
rank(3, i) rank(7, i)


rank2(7/7)

takeda25wat-array
100,000,0001,000

(): 1.49 micro sec (53%)


(wat-array): 2.51 micro sec (100%)

echizen_tmFM-Index

(/)
4,000,000256

(): 144 micro sec (67%)


: 190 misco sec (89%)
: 214 micro sec (100%)

You might also like