(Wavelet Matrix)
(1)
(2)
1
1
(=)
()
2000
ID1,000,000ID5,000,000
20
5000
20005000
500
()2
(1)
FM-Index
(Suffix Array)
BWT(Burrows Wheeler)
RBWT(Burrows Wheeler)
RBWT
(2)
gwt
tb_yasu
tb_yasu
http://d.hatena.ne.jp/tb_yasu/
20120909/1347196146
SPIRE2012
echizen_tm
Sep. 30, 2012
(1 slide)
(1 slide)
(9 slides)
(11 slides)
access (6 slides)
rank (9 slides)
(1 slide)
(1 slide)
(2 slides)
rank2(7 slides)
ID: echizen_tm
: EchizenBlog-Zwei
: web
:
()
:
:
2012102125
SPIRE
()
(1/9)
()
(OK)
(2/9)
3
LOUDS
(3/9)
access(i):
rank(b, i):
ib
select(b, i):
ib
()
(4/9)
access(b, i)
0110
access(0) = 0, access(1) = 1
access(2) = 1, access(3) = 0
(5/9)
rank(b, i)
0110
rank(0, 1) = 1, rank(0, 2) = 1
rank(0, 3) = 1, rank(0, 4) = 2
rank(1, 1) = 0, rank(1, 2) = 1
rank(1, 3) = 2, rank(1, 4) = 2
(6/9)
access(i):
rank(c, i):
ic
select(c, i):
ic
()
(7/9)
access(c, i)
abbc
access(0) = a, access(1) = b
access(2) = b, access(3) = c
(8/9)
rank(c, i)
abbc
rank(a, 1) = 1, rank(a, 2) = 1
rank(a, 3) = 1, rank(a, 4) = 1
rank(b, 1) = 0, rank(b, 2) = 1
rank(b, 3) = 2, rank(b, 4) = 2
rank(c, 1) = 0, rank(c, 2) = 0
rank(c, 3) = 0, rank(c, 4) = 1
(9/9)
DSIRNLP#2
access, rank
(OK)
(1/11)
access
rank
(access)
rank
(2/11)
2
(3/11)
(4/11)
()
1
(5/11)
2()
1()0
(6/11)
2()
1()1
(7/11)
[0, 3]
[4, 7]
,,,
,,,
(8/11)
3()
2()0
(9/11)
3()
2()1
(10/11)
[0,1][2,3][4,5][6,7]4
(11/11)
access(1/6)
access(0)
1()01
2()
rank(1, 0) = 0
20(6)
access(2/6)
access(0)
2()60
3()
rank(0, 6) = 4
34
access(3/6)
access(0)
100(2)
104access(0) = 4
access(4/6)
access(5)
1()50
2()
rank(0, 5) = 1
21
access(5/6)
access(5)
2()11
3()
rank(1, 1) = 1
31(8)
access(6/6)
access(5)
010(2)
102access(5) = 2
rank(1/9)
rank(4, 10)
rank(2/9)
rank(4, 10)
1()
1
rank(3/9)
rank(4, 10)
1()1
2()
10
2(6)
rank(4/9)
rank(4, 10)
1()1
2()
25(11)
rank(5/9)
rank(4, 10)
2()
rank(6/9)
rank(4, 10)
2()0
3()
26 rank(0, 6) = 4
34
rank(7/9)
rank(4, 10)
2()0
3()
rank(8/9)
rank(4, 10)
3()
rank(9/9)
rank(4, 10)
3()
4051
rank(0, 4) = 1 01
rank(0, 7) = 3 03
3 1 = 2 42
rank(4, 10) = 2
[1] The Wavelet Matrix
WT:
WTNP: Levelwise
WM:
RG: R.Gonzalez
BV
RRR: R.Raman,
V.Raman, S.S.Rao
BV
(1/2)
[1] The Wavelet Matrix
[2] (takeda25)
http://d.hatena.ne.jp/takeda25/
[3] EchizenBlog-Zwei(echizen_tm)
http://d.hatena.ne.jp/echizen_tm/
(2/2)
libcds: (F. Claude)
https://github.com/fclaude/libcds
wavelet-matrix-cpp: takeda25
wat-arrayrank()
https://github.com/hiroshi-manabe/wavelet-matrix-cpp
shellinford: echizen_tm
FM-Indexrank
https://code.google.com/p/shellinford/
rank2(1/7)
The Wavelet Matrix
rank
bv.rank
2 * log()
2 * log(256) = 16
* log()
bv.rank
log(256)
256*64bit=2KB
bv.rank8
rank2(2/7)
rank
2
bv.rank
(rank(c, i))
depth = 0, begin = 0, end = i
while (depth < log()) {
bit = cdepthbit
begin = bv[depth].rank(bit, begin)
end = bv[depth].rank(bit, end)
depth++;
}
rank2(3/7)
(rank(c, i))
depth = 0, begin = 0, end = i
while (depth < log()) {
bit = cdepthbit
begin = bv[depth].rank(bit, begin)
end = bv[depth].rank(bit, end)
depth++;
2
beginend
begin
(256)
rank2(4/7)
rank
2
beginend
begin
(256)
2
begin
bv.rank
begin
*log()
beginend
beginend
rank2(5/7)
rank(c, i)
depth = 0, begin = 0, end = i
while (depth < log()) {
bit = cdepthbit
begin = bv[depth].rank(bit, begin)
end = bv[depth].rank(bit, end)
depth++;
rank(c, i)
depth = 0, begin = cbegin, end = i
while (depth < log()) {
}
bit = cdepthbit
end = bv[depth].rank(bit, end)
depth++;
rank2(6/7)
begin
2
begin
begin
bv.rank(0, begin), bv.rank(1, begin)
rank(0, i)
rank(1, i)
rank(4, i)
rank(5, i)
rank(2, i) rank(6, i)
rank(3, i)
rank(7, i)
rank2(7/7)
takeda25wat-array
100,000,0001,000
echizen_tmFM-Index
(/)
4,000,000256