You are on page 1of 19

“Rabin Karp Algorithm Implementation With Java”

Written by;

Gema Hafizh Maulidi (2220010013)


Jascon Johanest Kembuan (2220010015)
Muhammad Rajiful Haq Gea (2220010032)

Faculty;
Indah Ayu Yuliani, ST, MM.

Class;
2SE1

CEP CCIT - Fakultas Teknik Universitas Indonesia Gedung Engineering Center Lt. 1,
Kampus Baru UI Depok 16424
PREFACE

Praise the author for the presence of God Almighty for the blessing of His abundance of
grace and gifts so that the author can compile this paper on "Rabin Karp Algorithm
Implementation With Java" can be completed in a timely manner. This paper was prepared to
fulfill the assignment of the Information and Communication Technology course.

I would like to express my gratitude to the lecturers on the Java Programming subject who
have allowed me to compile this paper. I am aware this paper is far from perfect. For this
reason, I have strong suggestions and criticisms, for the perfection of the composition of the
next paper.

Thank you, and hopefully, this paper can make a positive contribution to all of us.

Depok, May 2023

Author

1
TABLE OF CONTENT

PREFACE.................................................................................................................................1
TABLE OF CONTENT...........................................................................................................2
TABEL OF FIGURES..............................................................................................................3
CHAPTER I..............................................................................................................................4
INTRODUCTION....................................................................................................................4
1.1 Background..................................................................................................................................4
1.2 Writing Obejctive.........................................................................................................................4
1.3 Problem Domain..........................................................................................................................4
1.4 Writing Methodology...................................................................................................................5
1.5 Writing Framework......................................................................................................................5
CHAPTER II............................................................................................................................6
BASIC THEORY.....................................................................................................................6
2.1 Algorithm....................................................................................................................................6
2.2 Rabin Karp's algorithm.................................................................................................................6
2.3 How Rabin-Karp Algorithm Works...............................................................................................7
2.4 Compare Rabin Karp Algorithm with Another Algorithm.............................................................7
2.5 Advantages and disadvantages of the Rabin Karp Algorithm.......................................................8
CHAPTER III...........................................................................................................................9
PROBLEM ANALYSIS...........................................................................................................9
3.1 Rabin-Karp Implementation on Java Programming......................................................................9
3.2 Dry Run Table.........................................................................................................................14
CHAPTER IV.........................................................................................................................16
CONCLUSION AND SUGGESTION..................................................................................16
4.1 Conclusion.................................................................................................................................16
4.2 Suggestion.................................................................................................................................16
BIBLIOGRAPHY..................................................................................................................17

2
TABEL OF FIGURES

Figure 3 1 Code 1...................................................................................................................................9


figure 3 2 Code 2..................................................................................................................................10
figure 3 3 Code 3..................................................................................................................................11
figure 3 4 Code 4..................................................................................................................................13
figure 3 5 Run of Code.........................................................................................................................13

3
CHAPTER I

INTRODUCTION

1.1 Background
Advances in information technology, some work can be done more easily with the
help of computer technology, such as processing data. Data processed with the help of a
computer will feel more effective and efficient so as to produce the desired information.
Behind the conveniences obtained such as copying digital files, this tendency can have a
negative impact on the interests of groups and individuals, one of which is document
plagiarism. A solution that can seek the act of copying the document is to make a comparison
of the copied journals. Comparison is done by calculating the percentage rate of similarity of
each word in the journal. Therefore, an algorithm called Rabin Karp Algorithm was designed.

In computer science, the Rabin-Karp algorithm or Karp-Rabin algorithm is a string


search algorithm created by Richard M. Karp and Michael O. Rabin (1987) that uses hashing
to find exact matches of a string pattern in text. It uses a rolling hash to quickly filter out text
positions that do not match the pattern, and then checks for matches at the remaining
positions.

A practical application of the algorithm is detecting plagiarism. Given source


material, the algorithm can rapidly search through a paper for instances of sentences from the
source material, ignoring details such as case and punctuation. Because of the abundance of
the sought strings, single-string searching algorithms are impractical.

1.2 Writing Obejctive


The purpose of writing this paper is to provide information about the Rabin Karp
Algorithm, principles and how the rabin karp algorithm works in applications that are used in
accordance with its functions.

1.3 Problem Domain


This ISAS will discuss about the Rabin Karp Algorithm implamentation which
include the understanding, principles of Rabin Karp Algorithm, function, example of

4
application according to the function owned rabin karp algorithm and comparison of rabin
karp algorithm with one of the other algorithms.

1.4 Writing Methodology


The writing method that applied by author is quite simple that the way is. Research,
discuss and analyze all of the information materials. Search all of the information from
official and reliable sources.

1.5 Writing Framework


To be more focused, then the discussion will be organized with a systematics writing as
below:

1. Chapter I Introduction
This chapter describes the background of the problem, the problem boundary, the
purpose of the writing, the writing methodology used, and the systematic writing of
the paper.
2. Chapter II Basic Theory
This chapter describes what Algorithm in general is, Rabin Karp Algorithm is, and
Introduction of the principles and functions of the Rabin Karp Algorithm.
3. Chapter III Problem Analysis
This chapter has more in depth discussion of some examples of pattern search using
the application of the Rabin Karp Algorithm, an explanation of how the Rabin Karp
Algorithm works and comparison of Rabin Karp Algorithm with one of the other
algorithms.
4. Chapter IV Conclusion and Suggestion
This chapter contains the conclusions of the authors based on the experience gained
after doing research and getting useful suggestions from various sources.

5
CHAPTER II

BASIC THEORY
2.1 Algorithm
There are several experts define the algorithm as follows :

1. An algorithm is defined as an effort with a logically and systematically arranged


sequence of operations to solve a problem to produce a certain output (Kani, 2020).
2. An algorithm has the meaning of a computational procedure that takes several values
or sets of values as input to then be processed as output. In other terms an algorithm
is a sequence of computational steps that converts input (input) into output (output)
(Cormen, Leiserson, Rivest, & Stein, 2009).
3. Algorithms can be interpreted as a sequence of steps to solve a problem (Munir &
Leony, 2016).
4. Algorithms are systematic logic, methods and stages used to solve a problem
(Wahono, 2008).

From some understanding of algorithms by experts, it can be concluded that algorithms can
be interpreted as a series of systematic (sequential) steps to solve a problem. The steps in
solving the problem that must be understood are not in the form of a programming language,
but steps that will later be converted into a programming language.

2.2 Rabin Karp's algorithm


The Rabin Karp algorithm is a string matching algorithm that uses a hash function as
a comparison between the sought-after string (m) and the substring in the text (n). If the hash
values of both are the same, a comparison will be made once again against the characters. If
the results of the two are not the same, then the substring will shift to the right. The shift is
carried out as many times (n-m) times. Efficient calculation of hash values at the time of shift
will affect the performance of this algorithm.

Steps in algortima Rabin Karp :

6
1. Eliminate punctuation and convert to the source text and the word you want to
search for into words without letters.
2. Dividing the text into grams specified by the k-gram value
3. Search for hash values with the hash function of each formed word
4. Looks for the same hash value between two texts
2.3 How Rabin-Karp Algorithm Works
1. Hash Functions: The Rabin-Karp algorithm uses hash functions to calculate the hash
value of the searched pattern and the constantly shifting window within the text.
This hash function must have deterministic properties, meaning that if the inputs
provided are the same, then the output will always be the same. In addition, efficient
hash functions are also very important for the performance of these algorithms.
2. Initialization: The first step in the Rabin-Karp algorithm is to initialize. We need to
calculate the hash value of the searched pattern and the first window in the text. The
first window should be the same size as the length of the pattern being searched. In
this stage, we can also calculate the hash value of the pattern we are looking for later
in the comparison.
3. Hash Comparison: After initialization, the Rabin-Karp algorithm compares the hash
value of the searched pattern with the hash value of the first window in the text. If
these hash values are the same, there is a possible match. However, there may be
false positives, which is when two strings with different hash values produce the
same hash value. Therefore, after a match has occurred in the hash value, it is
necessary to do a character-by-character comparison to ensure the actual match.
4. Swipe Window: If no match occurs in the previous step, the window will be shifted to
the right by one character. The hash value of the new window is calculated based on
the hash value of the previous window and the new characters entered. This process
avoids repeated recalculation of hashes from the same substring.
5. Steps 3 and 4 are repeated until the window reaches the end of the text or a match is
found in the pattern you're looking for. If a match occurs, we can take appropriate
action, such as noting the position of the match or stopping the search if we just
want to know if the pattern is present in the text.

2.4 Compare Rabin Karp Algorithm with Another Algorithm


In this paper, we choose to compare the Rabin Karp Algorithm with the Knutt Moris Pratt
Algorithm in general. There are several differences, such as:

7
1. Rabin-Karp traces text characters one by one in character series (contigu), but the
comparison process (its key hash calculation) is relatively easy (with Horner's rule
the hash key can then be calculated from the previous hash key), while Knut-Morris
Pratt "jumps" several characters in the character series after processing the fringe
(prefix and suffix) which is relatively more difficult as it is hardly related to previous
fringe.
2. Rabin-Karp doesn't really work in complexity when compared to Knut-Morris-Pratt,
which implies a longer string matching time.
3. Rabin-Karp hardly needs as much extra memory as Knut-Morris-Pratt needs to store
fringe (prefixes and suffixes).
So, it can be seen that the rabin-karp algorithm and KMP algorithm have their own
advantages and disadvantages. So that we can adjust the needs for the program we want to
create.

2.5 Advantages and disadvantages of the Rabin Karp Algorithm


The Rabin-Karp algorithm has several advantages, namely, it has a relatively easy calculation
process and can be used in the case of searching for strings with long patterns. Meanwhile,
the disadvantage of Rabin Karp algorithm is that it has long processing stages so that it takes
a rather long time and the accuracy of detecting this algorithm is very dependent on the
position of the sentence.

8
CHAPTER III

PROBLEM ANALYSIS
3.1 Rabin-Karp Implementation on Java Programming

Figure 3 1 Code 1

Line by line, here is an explanation for each piece of code:


1. public class Rabinkarp
This is a class declaration with the name "Rabinkarp". The class name must be the
same as the name of the Java file where this code is stored.
2. Public Final Static Int D = 26
Declares a static variable with the name "d" and an integer data type. Its value is 26.
This variable is used to calculate the hash.
3. static void search(String pattern, String txt, int q)

9
Defines a static method named "search" with three parameters: "pattern" (String data
type), "txt" (String data type), and "q" (integer data type). This method aims to look
for certain patterns in a text using the Rabin-Karp algorithm.
4. int m = pattern.length()
Declares a local variable "m" with an integer data type. The value is the length of the
string "pattern". This variable stores the length of the pattern to be searched.
5. int n = txt.length()
Declares a local variable "n" with an integer data type. The value is the length of the
string "txt". This variable stores the length of text in the file .
6. int i, j
Declares two local variables "i" and "j" with an integer data type. This variable will
be used as a loop variable in later iterations.
7. int p = 0
Declares a local variable "p" with an integer data type and gives an initial value of 0.
This variable will be used to store the hash value of the pattern.
8. int t = 0
Declares the local variable "t" and gives an initial value of 0. This variable will be
used to store the hash value of the text as the iteration progresses.
9. int h = 1
Declares a local variable "h" with an integer data type and gives an initial value of 1.
This variable is used to calculate the hash of patterns and text.
10. for (i = 0; i < m - 1; i++)
Loop to calculate the initial hash value of patterns and text. This loop runs from 0 to
m-1, where m is the length of the pattern.
11.h = (h * d) %q
Update the value of the variable "h" by multiplying it by "d" and then taking the rest
of the quotient by "q". This is done on each iteration of the loop to compute the hash.
12. for (i = 0; i <= n - m; i++)
Loop for each possible position of the pattern in the text. This loop runs from 0 to n-
m, where n is the length of the text.

10
figure 3 2 Code 2

1. for(i = 0; i < m; i++)


This loop runs for each character in the pattern. It starts from 0 to m-1, where m is
the length of the pattern.
2. p = (d * p + pattern.charAt(i)) %q
In each iteration of the loop, the hash value of the pattern (p) is updated. This value
is calculated using the Rabin-Karp hash formula: p = (d * p + ASCII(pattern[i])) %
q. The variable "d" is the constant (26 in this case) used to multiply the previous
value, "p" is the hash value of the previous pattern, "pattern.charAt(i)" is the current
character in the pattern, and "q" is the modulus used to ensure the hash value stays
within the right range.
3. t = (d * t + txt.charAt(i)) %q
In each iteration of the loop, the text hash value (t) is updated. This value is
calculated by a formula similar to the pattern hash: t = (d * t + ASCII(txt[i])) %q.
The variable "t" is the hash value of the previous text, "txt.charAt(i)" is the current
character in the text, and "q" is the modulus used to ensure the hash value stays
within the right range.

11
figure 3 3 Code 3

1. if (p == t)
Checks whether the hash value of the pattern (p) is equal to the hash value of the text
(t) at the current position. If they are equal, it indicates a possible pattern match at
the current position in the text.
2. for(j = 0; j < m; j++)
Loop to check character by character of the pattern and text at the current position.
This loop runs from 0 to m-1, where m is the length of the pattern.
3. if(txt.charAt(i + j) != pattern.charAt(j))
Checks if the characters in the text at the current position + j are not the same as the
characters in the pattern at the j position. If there is a difference in characters, this
loop will be terminated using a 'break' statement.
4. if (j == m)
Checks if the previous loop finishes running until the end of the pattern (j == m). If
yes, it means that the entire pattern matches the text at its current position. In this
case, the pattern is found in the text.
5. System.out.println("Pattern is found at position: " + (i + 1))

12
Print a message indicating that the pattern was found at the current position in the
text.
6. if (i < n - m)
Check if there is still a possibility to match patterns in the text. If i is less than n-m,
it means that there are still characters in the text that have not been processed.
7. t = (d * (t - txt.charAt(i) * h) + txt.charAt(i + m)) % q
In this formula, "t" represents the hash value of the previous text, "oldChar"
represents the character removed from the sliding window, "h" is the multiplier
corresponding to the removed character, "newChar" is the new character inserted
into the sliding window, and "d" and "q" are constants.
8. if (t < 0)
Checks if the text hash value (t) becomes negative after the previous operation. If
yes, then t is converted to t + q to ensure the hash value stays within the right range.

figure 3 4 Code 4

13
In this section we use the scanner function to store the text value and pattern to be filled, then
after that the system will display the same string position

figure 3 5 Run of Code

After filling in the text and pattern using the scanner function, we can see the position of the
text and pattern that have something in common.

3.2 Dry Run Table


Here the table showing the steps of the topological sorting algotithm for rabin karp
algorithm:

Line Code Explanation


3 public final static int d = 26; Define the constant valuefor
the character range
5 static void search(String Define the search method
pattern, String txt, int q) {
6 int m = pattern.length(); Get the length of the pattern
7 int n = txt.length(); Get the length of the text
8 int i, j; Initialize loop variables
9 int p = 0; Initialize pattern hash
10 int t = 0; Initialize text hash
11 int h = 1; Initialize the value of h
13 for (i = 0; i < m - 1; i++) Calculate h for rolling hash
14 h = (h * d) % q; Multiply h by d and modulo
q
17 for (i = 0; i < m; i++) { Calculate hash for pattern

14
and text
18 p = (d * p + Calculate pattern hash
pattern.charAt(i)) % q;
19 t = (d * t + txt.charAt(i)) % Calculate text hash
q;
23 for (i = 0; i <= n - m; i++) { Iterate through the text
24 if (p == t) { Compare pattern and text
hash
25 for (j = 0; j < m; j++) { Check for pattern match
26 if (txt.charAt(i + j) != Compare characters
pattern.charAt(j)) break; |
Break the loop if characters
don't match
29 if (j == m) Check if the entire pattern
matches
30 System.out.println("Pattern Print the position of the
is found at position: " + (i + match
1));
34 if (i < n - m) { Update text hash and handle
negative values
35 t = (d * (t - txt.charAt(i) * h) Update text hash value
+ txt.charAt(i + m)) % q;
36 if (t < 0) Handle negative hash values
37 t = (t + q); Add q to the negative hash
value
40 public static void Entry point of the program
main(String[] args) {
41 Scanner scanner = new Create a Scanner object for
Scanner(System.in); input
42 System.out.print("Enter the Prompt for text input
text: ");
43 String txt = Read the text input
scanner.nextLine();
44 System.out.print("Enter the Prompt for pattern input
pattern to search: ");
45 String pattern = Read the pattern input
scanner.nextLine();
Line Code Explanation
47 int q = 13; Define the prime number
48 scanner.close(); Close the scanner
50 search(pattern, txt, q); Call the search method to
find the pattern in the text
52 } Close the main method
54 } Close the class

15
CHAPTER IV

CONCLUSION AND SUGGESTION

4.1 Conclusion
The conclusion of this paper is that the rabin karp algorithm uses a process hashing
with a predetermined formula to detect the presence of similarities, two texts compared to
transforming into the form of a series of numbers referring to the ASCII table. If the pattern
and text iterations are the same then it means there is occurrence of pattern in our text. The
greater the number of characters or sentences in the text, the greater the time it takes to detect
the degree of similarity, because rabin karp this algorithm searches one by one the characters

Overall, the Rabin-Karp algorithm is a powerful tool for string matching, especially when
dealing with large texts and multiple pattern searches. However, remember to consider the

16
potential limitations of the algorithm, such as collisions in the hash function and the need for
efficient hash calculations.

4.2 Suggestion
With the hashing method on the rabin karp algorithm can be used in keyword
searches, such as words that often appear in the text and are unique, a program can be created
to search for them. The resulting program certainly has advantages and disadvantages, but of
course it can be overcome by providing developments in the program.

BIBLIOGRAPHY

[1] Programiz. Rabin-Karp Algorithm. (Accessed from


https://www.programiz.com/dsa/rabin-karp-algorithm on March 16 2023)

[2] Piptools. Rabin-Karp Algorithm. (Accessed from https://piptools.net/algoritma-


rabin-karp/ on March 16 2023)

[4] Putra, N. P., & Sularno, S. (2019). Penerapan Algoritma Rabin-Karp Dengan
Pendekatan Synonym Recognition Sebagai Antisipasi Plagiarisme Pada
Penulisan Skripsi. Jurnal Teknologi Dan Sistem Informasi Bisnis, 1(2), 130-140.

17
[5] Yusuf, B., Vivianie, S., Marsya, J. M., & Sofyan, Z. (2019, November). Analisis
Perbandingan Algoritma Rabin-Karp dan Ratcliff/Obershelp untuk Menghitung
Kesamaan Teks dalam Bahasa Indonesia. In SEMINAR NASIONAL APTIKOM
(SEMNASTIK) (pp. 61-69).

[6] Andres, N., & Christopher, H. S. Penelaahan Algoritma Rabin-Karp dan


Perbandingan Prosesnya dengan Algoritma Knut-Morris-Pratt.

[7] Alamsyah, N. (2017). Perbandingan Algoritma Winnowing Dengan Algoritma


Rabin Karp Untuk Mendeteksi Plagiarisme Pada Kemiripan Teks Judul
Skripsi. Technologia: Jurnal Ilmiah, 8(3), 124-134.

[8] Unida. Pengertian Algoritma. (Accessed from


https://unida.ac.id/teknologi/artikel/pengertian-algoritma.html on March 16
2023)

[9] Lede, P. A. R. L., Fanggidae, A., & Polly, Y. T. (2016). Implementasi Algoritma
Rabin-Karp Untuk Mendeteksi Dugaan Plagiarisme Berdasarkan Tingkat
Kemiripan Kata Pada Dokumen Teks. Jurnal Komputer dan Informatika
(JICON), 2(1), 50-64.

[10] Mujahidin, Z. (2013). Implementasi Metode Rabin Karp Untuk Mendeteksi


Tingkat Kesamaan Dua Dokumen. (Doctoral dissertation, UNIVERSITAS
ISLAM NEGERI SULTAN SYARIEF KASIM RIAU)

18

You might also like