You are on page 1of 1

ALGORITHM 1: Two-stage matching method (DRK-2S)

for each warp w parallel do


for each thread t within w parallel do
Search for subpattern of length m̄ < m using DRK St ← potential matches Rt ← beginning
indices of St
end
S
for each index r ∈ t∈w {Rt } do
for 0 ≤ k < dm/32e do
Compute F ← Fp (y(r+32k) . . . y(r+31+32k) ) if F is not equal to Fp (x32k . . . x(32k+31) ) then
r is not a match
end
end
if all above k fingerprints matched then
r is a match
end
end
end

reduce the memory bandwidth requirements compared to DRK, and as we will see in the next
section, achieve large speedups in practice.

2.8 Performance Evaluation


In this section, we experimentally assess the performance of all our proposed algorithms. Unless
otherwise stated (Section 2.8.5), we use randomly generated texts and patterns with arbitrary
alphabet sizes4 . All our parallel methods are run on an NVIDIA Tesla K40c GPU, and compiled
with NVIDIA’s nvcc compiler (version 7.5.17). All our sequential codes are run on an Intel
Xeon E5-2637 v2 3.50 GHz CPU, with 16 GB of DDR3 DRAM memory. We used OpenMP
v3.0 to run our CPU codes in parallel over 8 cores (2 threads per core). All parallel GPU
implementations are by the authors, while all CPU codes are from SMART library [32] and run
in a divide-and-conquer fashion over our multi-core platform.
4
Because the RK algorithm is independent of the content of the text or pattern, the runtime for any text or pattern
for a fixed (m, n) will be identical.

29

You might also like