Chapter I - 5 Edited 24-03-19

CHAPTER 1
INTRODUCTION
At the present world, the use of the software is undoubtedly increasing.
People use specific software for educational purposes and personal use. These
include bank transactions e-commerce. It will save lots of money, time and
human resources. Therefore, most of the companies computerize their
business operations.
There’s such a lot information hold on it, that whenever a user asks for a
few information, the computer has to search to its files to explore the data or
information and build it to the user. Same is that the life of a computer, there is
such a lot information hold on in it, that whenever a user asks for a
few information, the computer has to search its memory to seek for the
info and build it accessible to the user. The laptop has its procedures to search
over its memory fast.
According to the article published by Google, reveals that roughly 40% of
people search only on a smartphone. People are searching Google via
smartphone than ever before the company says, with the most prevalent
categories revolving around health, parenting, and beauty. Other findings from
Google’s study are 80% of people search google are using Smartphones, 67%
of people use a desktop computer, 16% of people use a tablet 57% of people
use more than one type of device, 27% of people use a smartphone only, 14%
of people use a desktop computer. Later IDC white paper distributed in 2012,
1
the creators share that a worldwide study of 1200 data specialists and IT
experts found that they spend a normal of 4.5 hours a week trying to find
documents on their computers. The individuals who got to discover things the
foremost, and who ought to be the leading at finding them. Instep, they are
investing half of those 4.5 hours looking for, and not seeing, the records they
require. At that point, they spend the other half reproducing what they haven't
found.
https://www.searchenginejournal.com/mobile-search-rise-almost-half-people-
search-smartphones-study/175544/
Nowadays the use of search engines is essential in doing school papers
and most people who are using a search engine are doing it for research
purposes. People are mostly looking for answers or at least to data with which
to make a decision. Searching is one of the simplest things to do on the internet
or with the computer regarding find files on the computer.
Moreover, because of this modernization, computer is the most important
in this era, and searching is the most straightforward people that can do in the
most modern computers, people do their jobs in their personal computer
basically they store there finished work in their computer, some people
download a large files in the internet and save in the computer storage,
therefore the use of digital storage in the present is essential.
According to makeuseof.com, many people save their files on their
desktop; some people keep it for understandable reason. It provides instant
2
access with a single click. These people did not know the risk like hackers can
access desktop files. Sometimes, people who keep saving in their desktop are
getting confused about the sdata they saved; people sort the data; others are
using the search bar to find the information they need. The use of the search
bar is significant because it makes the user search easily what they are
seeking. for
To search for files, the search algorithm is fundamental; it is the step-by-
step process used to trace exact data among a collection of data. It is
considered a fundamental of computing. In this course, when searching for
data, the distinction between a fast application and a slower
one often lies within the utilization of the proper search algorithm. Search
algorithms will be classified as their mechanism of searching. Linear search
algorithms check each record for the one related to a target key in a linear
fashion.
In this course, string-searching-algorithm or string-matching-algorithm
is an essential category of string algorithms that try and find a place
wherever one or many strings (also known as patterns) found among a more
significant line or text.
Naïve String Matching is one of the fast string search algorithms incomes
with matching short length patterns. This application can help people who
cannot wholly manage their computer desktops in such a way that it's easier
than ever to spend a long time searching for it. It can search the word inside the
3
ms-word(.docx) document in a short time. To seek a string, a naïve string
algorithm can be used. Naïve string matching algorithm is the fastest in string
matching algorithm; it can process at no time; it can match a string in (Θ(nm)).
PURPOSE AND DESCRIPTION
Search applications are very prevalent nowadays. Unorganized files on
computers or has no proper file management is the most common problem by
users, the use of this study is to apply the naïve string matching algorithm in
searching file/s containing the text being sought and to add feature like how
many words found in the file/s are, when was the last time the text/word/phrase
is being searched. This feature is not seen or present in the current search
engine of windows OS.
This study includes a creation of an application that applies Naïve String
Matching Algorithm in searching text inside the (.docx) document, during the
execution of the program the user can locate the folder where the user wants to
explore in the browse button and then user enters the desired word in the
search box provided, once the search button is clicked it immediately searches
the ms-word document and display in the rich textbox.
Furthermore, after the search, the user will click one of the listed ms-
word document, and it will extract the word to the panel box of the application it
will show the words within documents that were selected path by the users put
in the search box.
4
OBJECTIVES:
The objectives of this study are the following:
 To develop a search engine that can search contents of MS-word
documents using naïve string matching algorithm and displays the
document.
 To determine the accuracy of naïve string matching algorithm in
searching files containing the certain text/word/phrase being searched by
testing several test data.
SCOPE AND LIMITATION
SCOPE
The study covers the concept of Naïve string matching algorithm applied
in the searching text in an MS-word document. The study also includes the
creation of simulator that has the following functionalities.
 Can search the content of MS-word documents.
 Can generate a search log, a word that recently searches, searches time
and date.
 Can search MS-word through sub-folders
 Can exclude or include sub-directories in searching
LIMITATION
5
 The simulation can search with the maximum length of 100 composed of
the alphabet (upper and lower case (a, b, c, A, B, C).
 The simulation program cannot read.PDF, XLSM, PPTX, and other

document file format are not included.
 Other files aside form .doc, .docx are not included.
CHAPTER II
REVIEW OF RELATED LITERATURE
According to Christian Charras, Thierry Lecroq of Université de Rouen
(2015) String-matching algorithms are necessary modules used in
implementations of functional software existing under most operating systems.
Moreover, they emphasize programming methods that serve as paradigms in
other fields of computer science (system or software design).
Furthermore, the authors of A Fast Multiple String-pattern Matching
Algorithm, Yanggon Kim and Sun Kim of Towson University (1999) they
proposed a simple and efficient multiple string pattern matching algorithm
based on a compact encoding scheme. The algorithm they used scans text
from left to right while encoding characters in the document based on the
alphabet that occurs in the input patterns. And they conclude that their
algorithm demonstrates the ability to handle a vast number of models
simultaneously and runs faster than five grep and are in many cases. The
hashing techniques are used in other multiple-string matching algorithms to
handle a large number of patterns.
6
Also according to the author of Algorithms for string searching Ricardo A.
Baeza-Yates, the author surveys several algorithms for searching a string in a
piece of text. The authors include several theoretical and empirical results, as
well as the actual algorithm. The authors conclude that string matching
algorithms depend on the alphabet size and pattern size. If the pattern is small
(1 to 3 characters long) it is better to use the naive algorithm. Also if the
alphabet size is large, then Knuth-Morris-Pratt's algorithm is a good choice. In
all the other cases, in particular, for long texts, Boyer-Moore's algorithm is
better. Finally, the Horspool version of the Boyer-Moore algorithm is the best
algorithm, according to the execution time, for Almost all pattern lengths. The
shift-or algorithm has a run lug time similar to the KMP algorithm. However, the
main advantage of the KMP algorithm is that we can search for more general
patterns.
Furthermore, according to the author of Evaluation of String Matching
Algorithms Simon Wahlström (2013), In their paper, an evaluation of five string
searching algorithms presented; Brute Force, Boyer-Moore, Knuth-Morris-Pratt,
Karp-Rabin, and the Horspool algorithm. They discussed that the string search
algorithms algorithm had been provided with an explanation of the semantics on
how the algorithms work. The algorithms that have been researched and
explained have their unique weaknesses and strengths. In finding a small
pattern alphabet then Brute force/Naïve algorithm is an excellent choice since it
is easy to implement.
7
Moreover, according to TARA: An Algorithm for Fast Searching of
Multiple Pattern on Text File by M. Oguzhan Külekci (2007) in his paper he
introduced a new multi-pattern matching algorithm that performs searching of
fixed-length strings on text files very fast by benefiting from bit-parallelism. The
algorithm is given the name TARA. Bounded gaps, as well as character classes
in keywords, are also supported, in his research in searching multiple patterns
in text files, the experimental results on language text indicate that for small
number of patterns the unoptimized implementation of the algorithm is
approximately 1.5 times faster than grep software and 5 times than its nearest
successor of the AC and CW variant. The TARA algorithm that the author used
it is believed that for practical usage it represents a very convenient way of
searching with the simplicity of speed of the algorithm as the modern computers
today are sufficient for daily life problems.
Moreover, according to the authors of Study of Different Algorithms for
Pattern Matching by Rahul B. Diwate, Prof. Satish J. Alaspurkar (2013) in every
search engine uses different search algorithms for handling different types of
data. Full search algorithm increases the pattern matching process. In the
paper discussed complexity, efficiency, and techniques used by the algorithms
relate to different. The paper proposed analysis and comparison of different
algorithms for full search equivalent pattern matching like complexity, efficiency,
and techniques. The author concludes that each algorithm has its
characteristics. The Boyer Morris and Knuth– Morris–Pratt algorithm is more
useful for searching. We focused on the complexity of each algorithm, Knuth–
8
Morris–Pratt algorithm having less time complexity and Boyer Morris algorithms
having less preprocessing time complexity. Fast DTW algorithm is best for all
Image, Audio and Video pattern processing. Fast DTW has a linear time and
space complexity. The time performance of exact string pattern matching can
be significantly improved if an efficient algorithm is used.
According to the authors of Multithreaded Implementation of Hybrid String
Matching Algorithm by Akhtar Rasool, Dr. Nilay Khare, Himanshu Arora, Amit
Varshney, Gaurav Kumar Maulana Azad National Institute of Technology (2012)
Hybrid pattern matching algorithm is made after combining KMP & Boyer-Moore
string searching algorithms to generate a new algorithm. It also a pattern searching
algorithm that searches a pattern from left to right in the string. Reducing the time
required in the worst/average case it an effort to reduce processing time, the goal is
to combine the best/average case advantages of the algorithm with the worst case
guarantees of KMP. It results in the comparison shows that the Hybrid algorithm
significantly improves the matching efficiency. The main drawback of the Boyer-
Moore type algorithms is the pre-processing time and the space required, which
depends on the alphabet size and the pattern size. For this reason, if the pattern is
small (1 to 3 characters long) it is better to use the naive algorithm.
According to the authors of Importance of String Matching in Real-World
Problems by Kapil Kumar Soni, Rohit Vyas, Amit Singhal, In string matching pattern
strings are searched within a larger string or text. Let us assume that pattern string
“p" and text string „S.‟ The problem of string matching deals by finding whether a
pattern set „p‟ occurs in „S‟ or not. And if „p‟ occurs when the position of it should
9
be reported in „S‟ where “p‟ occurs. There are two types of string matching Exact
string matching and Approximate string matching. String matching has dramatically
influenced the field of computer science and will play an essential role in various
real-world problems. As time grows, more and more efficient string matching
algorithms will be used. Since 1950 lots of single and multiple patterns string
matching algorithms have been suggested. There are many more possible areas in
which string matching can play a crucial role in excelling.
Chapter 3
THEORETICAL BACKGROUND
Finding all occurrences of a pattern in a text is a problem that frequently
arises in text-editing programs. Typically, the text is a document being edited,
and the pattern searched for is a particular word supplied by the user. Efficient
algorithms for this problem can greatly aid the responsiveness of the text-
editing program. The idea of the naive solution is just to make a comparison
character by character of the text T [s...s + m − 1] for all s ∈ {0, . . . , nm + 1}
and the pattern P[0...m − 1]. It returns all the valid shifts found.
The naive algorithm finds all valid shifts using a loop that checks the
condition P[1 . . m] = T[s + 1 . . s + m] for each of the n - m + 1 possible values
of s.
NAIVE-STRING-MATCHER (T, P)
n length[T]
10
m length[P]
for s 0 to n – m
do if P[1 . . m] = T[s + 1 . . s + m]
According to the research of Zvi Galil and Joel Seiferas in Time-Space-
Optimal String Matching (1987) earlier string-matching algorithms follow a
single general scheme. That scheme considers prospective positions p for the
pattern in the text in increasing order, and it maintains the length q > 0 of a
pattern prefix known to match the text starting following position p ([0, q], = [ p,p
+ q],). For appropriately calculated p' >p and q,' then, the algorithms search as
follows:
Each time q reaches the pattern length 1x1, a full instance of the pattern has
been found following position p in the text (x = [ p,p + Ix]],); the search can be
continued by dropping out of the while-loop. (We consider y(p + q + 1) = x(q + 1) to
11
be false whenever p + q + 1 > ( y ] or q + 1 > 1x1, so this will be automatic.) Of
course, the algorithms should halt when the end of the text is reached (p = I ~1).
The previous algorithms differ only in how they calculate p' and q'. The naive
algorithm conservatively calculates p' =p + 1 and q' = 0. Since [0, qlX = [ p,p + q],,
however, consideration of p’ =p + shift is futile unless [0, q - shift], = [shift, q],; so
the Knuth-Morris-Pratt algorithm calculates p’ =p + shift,(q), where
shift,(q) = min{sh@ > 0 I [shift, qlX = (0, q -shift],},
In the research of M. Oguzhan Kulekci in TARA: An Algorithm for Fast
Searching Multiple Patterns on Text files (2007) the algorithm performs very fast in
practice. Experiments are conducted to compare the performance of the proposed
algorithm with widely used GNU grep file search utility and also with nine variants of
Aho&Corasick and Comentz&Walter algorithms on natural language text.
The TARA Algorithm execute by Let P = {p0, p1, . . . , pm−1} be the set of m
patterns that are to be scanned in text T[0 . . . n − 1] of n characters, and LP = {lp0,
lp1, . . . , lpm−1} be the corresponding lengths of patterns in P. The alphabet is
denoted by Σ. Maximum and minimum values of LP are stored in maxlen and
minlen variables. The algorithm is explained below on an example where it is
assumed that P = {bal, peynir, re[cc¸]el}, LP = {3, 6, 5}, maxlen = 6, and minlen = 3.
12
In the research of Multithreaded Implementation of Hybrid String Matching
Algorithm by Akhtar Rasool, Dr. Nilay Khare, Himanshu Arora, Amit Varshney,
Gaurav Kumar (2012) the algorithm came in existence after combining KMP &
Boyer-Moore string searching algorithms to generate a new algorithm. It also a
pattern searching algorithm that searches a pattern from left to right in the series.
Reducing the time required in the worst/average case it an effort to reduce
processing time, the goal is to combine the best/average case advantages of the
algorithm with the worst case guarantees of KMP. According to the experiments we
have conducted, the new algorithm is among the fastest in practice for the
computation of all occurrences of a pattern p = p[1..m] in a text string s = s[1..n] on
an alphabet of size n giving a time complexity of O(m + n).[6,7]
Given a String Sand Pattern P of size m and n respectively. Step1: We have
provided a String S of size m, break that string into two parts (i.e., S1 and S2).
13
Chapter 4
METHODOLOGY
This chapter presents the design of the application that developed through
Netbeans IDE. This also includes the discussion of the functions used and
algorithm applied in the application. At this point, the overall progress of the
simulator is presented for accurate understanding.
Below is the diagram that will elaborate on the primary process of the
simulator; this also gives the reader the concept of the functions is used in the
application.
14
Fig. 1. Use Case Diagram
ACTIVITY DIAGRAM
The figure below presents the workflow of the program application in a
graphical way.
15
Fig. 2. Activity Diagram
SEQUENCE DIAGRAM
The diagram below shows how the object interacts with each other and
the order of those interactions. The process is represented vertically, and the
interactions are shown as arrows.
16
Fig. 3. Sequence Diagram
NAÏVE STRING MATCHING ALGORITHM FLOWCHART
17
Figure 4. Naïve String Matching Algorithm Flowchart
Source Code:
public static void search(String txt, String pat) {
int M = pat.length();
int N = txt.length();
for (int i = 0; i <= N - M; i++) {
int j;
for (j = 0; j < M; j++) {
if (txt.charAt(i + j) != pat.charAt(j)) {
break; }
if (j == M) //if pat[0...M-1] = txt[i, i+1, ...i+M-1]
System.out.println("Pattern found at index " + i);
String txt = ex.getText().toString();
String pat = TextFinder.textSearchText.getText().toString();

search(txt.toString().toLowerCase(), pat.toString());
NAÏVE STRING MATCHING PSEUDOCODE
NAIVE-STRING-MATCHER (T, P)
1. n length[T]
2. m length[P]
18
3. for s 0 to n – m
4. do if P[1 . . m] = T[s + 1 . . s + m]
CHAPTER 5
RESULT AND DISCUSSION,

CONCLUSION AND RECOMMENDATION
This section shows the result of the study. Discussion, determination, and
recommendation are found at the end of this chapter.
RESULT AND DISCUSSION
19
The following tables below show the elapsed time of the Naïve String
Matching Algorithm in searching MS-word document content.
Table 1. MS-word String Match
Sub-directories in searching MS-word document
MS-word Number of Matched MS-word Filename from

Documents pattern patterns selected path
sunflower 6 Sample3.docx
shallow 16 Sample7.docx
tell 4 Sample7.docx
Sample3.docx
nevertheless 4 Sample3.docx
needless 4 Sample6.docx
Sample1.docx
Sample3.docx
bad 15
Sample6.docx.
Sample7.docx
deep 2 Sample7.docx
flood 5 Sample9.docx
20
Figure 5. Text “nevertheless” result
Figure 6. Text “sunflower” results
21
Figure 7. Text “needless” result
Figure 8. Text “deep” result
22
Figure 9. Text “flood” result
In the table 1 presents the elapsed time in nanoseconds of the process using
naïve string matching algorithm in searching MS-word content, as the images
shown above the string that searched has many pattern indexes, it means there are
several words that match the pattern inside the ms-word documents.
23
Conclusion
After the analysis of the text that has been searching. The following findings
were made:
The MS-word searching content is the way used by the researcher to search
for text inside the MS-word document using Naïve String Matching algorithm.
However, when running the MS-word search content, matched patterns matches
text by detecting pattern indexes. This indexes determine how many pattern have
matched within the particular ms-word document.
Recommendation
After the analysis of the directories that have been searched, the following
findings were made:
Based on the results after running the test data in the application, the
following conclusions were made by the researcher: To the future researchers, I
greatly recommend to add the following features:
The application can search throughout the local area networks. Can search
ms-word to specific shared path or folder from computer to another though Local
Area Network.
BIBLIOGRAPHY
24
Page Title: WHY PEOPLE USE SEARCH ENGINES: RESEARCH, SHOPPING,
AND ENTERTAINMENT
Address: https://www.dummies.com/web-design-development/search-engine-
optimization/why-people-use-search-engines-research-shopping-and-
entertainment/
Page Title: Multithreaded Implementation of Hybrid String Matching
Address: http://www.enggjournals.com/ijcse/doc/IJCSE12-04-03-032.pdf
Page Title: Multithreaded Implementation of Hybrid String Matching ...
Page Title: Evaluation of String Searching Algorithms
Address: https://pdfs.semanticscholar.org 878c943735e75935e995b58.pdf
Page Title: Tara: An algorithm for fast searching of multiple patterns on text files
Address:
https://www.researchgate.net/publication/4321078_Tara_An_algorithm_for_fast_se
arching_of_multiple_patterns_on_text_files
Page Title: We assume that the text is an array T 1n of length n
Address: https://www.coursehero.com/file/p5d1hjv/We-assume-that-the-text-is-an-
array-T-1n-of-length-n-and-that-the-pattern-is-an/
Page Title: TR 87 February 1981 - urresearch.rochester.edu
Address: https://urresearch.rochester.edu/fileDownloadForInstitutionalItem.action?
itemId=10186&itemFileId=22371
25
Page Title: Multithreaded Implementation of Hybrid String Matching ...
Page Title: Omegaexpression is the set of functions that grow
Address: https://www.coursehero.com/file/p2kq6vr/Omegaexpression-is-the-set-of-
functions-that-grow-faster-than-or-at-the-same/
Page Title: A very fast string matching algorithm for small alphabets and long
patterns
Address:https://www.researchgate.net/publication/225725083_A_very_fast_string_
matching_algorithm_for_small_alphabets_and_long_patterns
Page Title: A Fast Multiple String-Pattern Matching Algorithm
Address: https://www.google.com/search?q=of+A+Fast+Multiple+String-
pattern+Matching+Algorithm,
+Yangon+Kim+and+Sun+Kim+of+Towson+University+(1999)&spell=1&sa=X&ved=
0ahUKEwj66YyU5I_hAhUNcCsKHYi0DrcQBQgpKAA&biw=1517&bih=730
Page Title: Algorithms for string searching Ricardo A. Baeza-Yates
Address: https://www.semanticscholar.org/paper/Algorithms-for-String-Searching
%3A-A-Survey-Baeza-Yates/bc2f8507f00a419aebe9d9ccb56a68919cc19b46
Page Title: Evaluation of String Matching Algorithms
Address:https://pdfs.semanticscholar.org/8afc/6c601aa4ae2e0878c943735e75935
e995b58.pdf
26
Page Title: Time-space-optimal string matching
Address: https://dl.acm.org/citation.cfm?id=802463
Page Title: Multithreaded Implementation of Hybrid String Matching Algorithm
Address: www.enggjournals.com/ijcse/doc/IJCSE12-04-03-032.pdf
APPENDIX A
Source Code:
27
MS-word search
package thesis;
import de.schlichtherle.io.File;
import java.awt.event.KeyAdapter;
import java.awt.event.KeyEvent;
import java.awt.event.KeyListener;
import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.PreparedStatement;
import java.sql.ResultSet;
import java.sql.SQLException;
import java.sql.Statement;
import java.util.Calendar;
import java.util.Date;
import java.util.GregorianCalendar;
import java.util.HashMap;
import java.util.Iterator;
import java.util.List;
import java.util.Map;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
import javax.swing.DefaultListModel;
import javax.swing.SwingUtilities;
28
import org.apache.commons.io.FilenameUtils;
import javax.swing.JFileChooser;
import javax.swing.JOptionPane;
import static thesis.DOCXParser.search;
public class TextFinder extends javax.swing.JFrame {
public String query;
public Connection con;
public Statement state;
public TextFinder() {
initComponents();
CurrentDate();
textSearchText.addKeyListener(new KeyAdapter() {
public void keyTyped(KeyEvent e) {
char keyChar = e.getKeyChar();
if (Character.isUpperCase(keyChar)) {
e.setKeyChar(Character.toLowerCase(keyChar));
29
});
try {
Class.forName("com.mysql.jdbc.Driver");
con =
DriverManager.getConnection("jdbc:mysql://localhost:3306/searchlog", "root", "");
state = con.createStatement();
// JOptionPane.showMessageDialog(this, "connected");
} catch (Exception dummy) {
JOptionPane.showMessageDialog(null, dummy);
public void CurrentDate() {
Calendar cal = new GregorianCalendar();
int month = cal.get(Calendar.MONTH);
int year = cal.get(Calendar.YEAR);
int day = cal.get(Calendar.DAY_OF_MONTH);
date_text.setText(+day + "/" + (month + 1) + "/" + year);
public void update(){
try{
}catch(Exception e){
30
}
public void check(){
String word = textSearchText.getText();
try
Statement stmt = con.createStatement();
String selectquery = "SELECT * FROM logs where word =

'"+textSearchText.getText()+"'";
System.out.println(selectquery);
ResultSet rs= stmt.executeQuery(selectquery);
System.out.println(rs.next());
if (rs.next())
//infoMessage("already word added","arlet!!");
// JOptionPane.showMessageDialog(this, "already added ");
else
int s = 1;
31
int searched_count = 1;
String insertquery = ("INSERT INTO dupword (word) VALUES ('" +

textSearchText.getText() + "')");
// int x = stmt.executeUpdate(insertquery);
//// System.out.println(x);
stmt.execute(insertquery);
// String updatequery = ("INSERT INTO dupword (searched_count)

VALUES('') where searched_count = " + searched_count+ ");
// infoMessage("word added","arlert!!");
// JOptionPane.showMessageDialog(this, " added");
} catch (Exception e) {
System.out.println(e);
//ADD data
public void insert() {
ResultSet rs = null;
32
Date date = new Date();
try {
// query = "SELECT * FROM logs WHERE word ='sunflower'";
query = ("INSERT INTO logs (word,time,date)VALUES ('" +

textSearchText.getText() + "','" + Integer.toString(date.getHours()) + ":" +
Integer.toString(date.getMinutes()) + ":" + Integer.toString(date.getSeconds()) + "','"
+ date_text.getText() + "')");
// JOptionPane.showMessageDialog(null,
"inserted","data_saved",JOptionPane.INFORMATION_MESSAGE);
//time_text.setText("");
//date_text.setText("");
state.executeUpdate(query);
} catch (Exception dummy) {
JOptionPane.showMessageDialog(null, dummy);
private void buttonStopActionPerformed(java.awt.event.ActionEvent evt) {
33
worker.interrupt();
buttonStop.setEnabled(false);
private void buttonSearchActionPerformed(java.awt.event.ActionEvent evt) {
//first try to make sure the value in textSearchPath is a valid directory
try {
matches = new HashMap(); //clear any old results
listResults.setModel(new DefaultListModel()); //clear any old results
//jLabel6.getText()).setTitle("search result (0)");
// jLabel6.setText("Result (0)");
File searchDir = new File(textSearchPath.getText());
if (!searchDir.isDirectory()) {
throw new Exception("The Search Path value does not appear to be a

valid directory.");
if (textSearchText.getText().length() == 0) {
throw new Exception("Please enter text to search for in the Containing

field.");
34
// if(!checkPDF.isSelected() && !checkPlainText.isSelected() && !
checkPowerPoint.isSelected() && !checkWord.isSelected()) throw new
Exception("Please select at least one file type to search for text.");
// Finder.error(e);
// return;
//valid search criteria, start the search
worker = new Thread() {
private String fileNamePattern = null;
private boolean interrupted = false;
public void run() {
SwingUtilities.invokeLater(new Runnable() {
public void run() {
setEnableStates(true);
});
fileNamePattern = textFileName.getText();
searchDirectory(textSearchPath.getText());
public void run() {
35
setEnableStates(false);
labelSearching.setText("");
});
// check();
insert();
public void searchDirectory(String directory) {
File currentDir = new File(directory);
File[] files = (de.schlichtherle.io.File[]) currentDir.listFiles();
for (int i = 0; i < files.length && !interrupted; i++) {
//update the search location visual cue
final String fileName = files[i].getAbsolutePath();
public void run() {
labelSearching.setText(fileName);
labelSearching.setToolTipText(fileName);
36
});
if (files[i].isDirectory() && !files[i].isArchive() &&

checkRecursive.isSelected()) {
searchDirectory(files[i].getAbsolutePath());
} else if (files[i].isDirectory() && files[i].isEntry()) {
searchDirectory(files[i].getAbsolutePath());
} else if (!files[i].isDirectory()) { //just a plain, ordinary directory, and

we're not recursing
checkIfMatch(files[i]);
} else {
//file is a normal directory, and recursion is off, ignore
if (Thread.interrupted()) {
interrupted = true;
private void checkIfMatch(final File file) {
if (FilenameUtils.wildcardMatchOnSystem(file.getName(),
fileNamePattern)) {
//filename match hit, now try to parse the file according to check-boxes
List matchingLines = null;
//parse all files as plain-text first regardless of extension (if enabled)
37
if (matchingLines == null) { //plain-text extraction failed
//parse file by extension type
// if(checkWord1.isSelected()) {
// matchingLines = PlainTextParser.findMatches(file,
textSearchText.getText());
// }
if (checkWord.isSelected() &&
file.getAbsolutePath().toLowerCase().endsWith(".doc")) {
matchingLines = MSWordParser.findMatches(file,
// matchingLines = DOCXParser.findMatches(file,
} else if (checkWord.isSelected() &&

file.getAbsolutePath().toLowerCase().endsWith(".docx")) {
matchingLines = DOCXParser.findMatches(file,
if (matchingLines != null) {
synchronized (matches) { //could be performing a concurrent read

operation
matches.put(file.getAbsolutePath(), matchingLines);
public void run() {
38
DefaultListModel listModel = (DefaultListModel)
listResults.getModel();
listModel.addElement(file.getAbsolutePath());
// jLabel6.getText()).setTitle("search result (" + listModel.size() +

"):");
// jLabel6.setText("Result (" + listModel.size() + "):");
} });
} }
};
worker.start();
//System.out.println("a="+elapsedTime);
public void jButtonAction(java.awt.event.ActionEvent evt) {
buttonSearchActionPerformed(evt);
private void buttonBrowseActionPerformed(java.awt.event.ActionEvent evt) {
fileChooser.setDialogTitle("Select Directory Search Path");
fileChooser.setFileSelectionMode(javax.swing.JFileChooser.DIRECTORIES_ONLY
);
39
if (fileChooser.showOpenDialog(this) == fileChooser.APPROVE_OPTION) {
textSearchPath.setText(fileChooser.getSelectedFile().getAbsolutePath());
private void listResultsValueChanged(javax.swing.event.ListSelectionEvent evt) {
// TODO add your handling code here:
if (listResults.getSelectedIndex() >= 0) {
List lines = null;
Iterator it = null;
synchronized (matches) { //could get concurrent access exception for

Hashmap read operations?
lines = (List) matches.get((String) listResults.getSelectedValue());
if (lines != null)
it = lines.iterator();
StringBuffer text = new StringBuffer("");
while (it.hasNext()) {
text.append((String) it.next());
text.append("\n");
textLines.setText(text.toString());
} else {
40
textLines.setText("");
} else {
textLines.setText("");
private void checkWordActionPerformed(java.awt.event.ActionEvent evt) {
private void checkWord1ActionPerformed(java.awt.event.ActionEvent evt) {
private void jButton1ActionPerformed(java.awt.event.ActionEvent evt) {
logsearch search = new logsearch();
search.setVisible(true);
private void textSearchPathActionPerformed(java.awt.event.ActionEvent evt) {
private void setEnableStates(boolean searching) {
41
textFileName.setEnabled(!searching);
textSearchPath.setEnabled(!searching);
textSearchText.setEnabled(!searching);
checkRecursive.setEnabled(!searching);
checkArchives.setEnabled(!searching);
checkPDF.setEnabled(!searching);
// checkPlainText.setEnabled(!searching);
checkWord1.setEnabled(!searching);
checkWord.setEnabled(!searching);
// checkWord1.setEnabled(!searching);
buttonBrowse.setEnabled(!searching);
buttonSearch.setEnabled(!searching);
buttonStop.setEnabled(searching);
// Finder.getInstance().setBusy(searching);
DOCX parse
package thesis;
42
import de.schlichtherle.io.FileInputStream;
import java.util.LinkedList;
import org.apache.poi.xwpf.extractor.XWPFWordExtractor;
import org.apache.poi.xwpf.usermodel.XWPFDocument;
public class DOCXParser {
TextFinder f = new TextFinder();
long startTime = System.nanoTime() / 1000000;
//NAIVE STRING ALGORITHM START
for (int i = 0; i <= N - M; i++) {
int j;
/* For current index i, check for pattern
match */
for (j = 0; j < M; j++) {
43
break;
if (j == M) //if pat[0...M-1] = txt[i, i+1, ...i+M-1]
System.out.println("Word found at index " + i);
long endTime = System.nanoTime() / 1000000;
long duration = (endTime - startTime);
System.out.println(duration);
public static List findMatches(File file, String text) {
// DOCXParser d = new DOCXParser();
List matchingLines = new LinkedList();
XWPFDocument doc = null;
XWPFWordExtractor ex = null;
String docText = null;
String line = null;
44
try {
doc = new XWPFDocument(new FileInputStream(file));
ex = new XWPFWordExtractor(doc);
docText = ex.getText();// + " " + header.getText() + " " + footer.getText();// +

" " + ex.getHeaderText() + " " + ex.getFooterText();
//docText = docText.replaceAll("\\s", " ");
//NAIVE STRING MATCHING ALGORITHM END
int index = docText.toLowerCase().indexOf(text);
while (index >= 0) {
int start = index >= 20 ? index - 20 : 0;
int end = index + 20 < docText.length() ? index + 20 : docText.length();
line = docText.substring(start, end);
matchingLines.add(line);
index = docText.toLowerCase().indexOf(text, index + text.length());
45
//fall through to return, could be because file is not UTF-8 readable, or some
other IOException
} finally {
//no cleanup
return matchingLines.size() > 0 ? matchingLines : null;
public static void main(String[] args) {
DOC parse
/*
46
* To change this license header, choose License Headers in Project Properties.
* To change this template file, choose Tools | Templates
* and open the template in the editor.
*/
package thesis;
/**
* @author Lenovo
*/
import de.schlichtherle.io.FileInputStream;
import java.util.LinkedList;
import org.apache.poi.hwpf.HWPFDocument;
import org.apache.poi.hwpf.extractor.WordExtractor;
import static thesis.DOCXParser.search;
public class MSWordParser {
TextFinder f = new TextFinder();
//NAIVE STRING ALGORITHM START
47
for (int i = 0; i <= N - M; i++) {
int j;
/* For current index i, check for pattern
match */
for (j = 0; j < M; j++) {
break;
if (j == M) // if pat[0...M-1] = txt[i, i+1, ...i+M-1]
System.out.println("Word found at index " + i);
public static List findMatches(File file, String text) {
48
List matchingLines = new LinkedList();
HWPFDocument doc = null;
WordExtractor ex = null;
String docText = null;
String line = null;
try {
doc = new HWPFDocument(new FileInputStream(file));
ex = new WordExtractor(doc);
docText = ex.getText();// + " " + ex.getHeaderText() + " " +

ex.getFooterText();
// docText = docText.replaceAll("\\s", " ");
//NAIVE STRING MATCHING ALGORITHM END
int index = docText.toLowerCase().indexOf(text);
while (index >= 0) {
int start = index >= 20 ? index - 20 : 0;
int end = index + 20 < docText.length() ? index + 20 : docText.length();
line = docText.substring(start, end);
matchingLines.add(line);
49
index = docText.toLowerCase().indexOf(text, index + text.length());
//fall through to return, could be because file is not UTF-8 readable, or some
other IOException
} finally {
//no cleanup
return matchingLines.size() > 0 ? matchingLines : null;
Database Connections
package databes;
50
import java.sql.Connection;
import javax.swing.*;
import java.sql.DriverManager;
public class DatabaseConnection {
Connection con = null;
public static Connection ConnecrDb(){
try{
Class.forName("com.mysql.jdbc.Driver");
Connection con =
DriverManager.getConnection("jdbc:mysql://localhost/searchlog","root","");
// JOptionPane.showMessageDialog(null, "Connected");
return con;
} catch(Exception e){
JOptionPane.showMessageDialog(null, e);
return null;
Search Log
package thesis;
51
import java.sql.*;
import javax.swing.*;
import net.proteanit.sql.DbUtils;
import databes.DatabaseConnection;
import java.util.Calendar;
import java.util.GregorianCalendar;
public class logsearch extends javax.swing.JFrame {
Connection con = null;
ResultSet rs = null;
PreparedStatement pst = null;
public logsearch() {
initComponents();
con = DatabaseConnection.ConnecrDb();
update_table();
CurrentDate();
private void update_table() {
try {
52
String sql = "select * from logs order by time desc";
pst = con.prepareStatement(sql);
rs = pst.executeQuery();
logtable.setModel(DbUtils.resultSetToTableModel(rs));
public void CurrentDate() {
Calendar cal = new GregorianCalendar();
int month = cal.get(Calendar.MONTH);
int year = cal.get(Calendar.YEAR);
int day = cal.get(Calendar.DAY_OF_MONTH);
SLdate.setText(+day + "/" + (month + 1) + "/" + year);
private void jButton1ActionPerformed(java.awt.event.ActionEvent evt) {
String sql = "delete from logs where word=?";
try {
pst.setString(1, jLabel1.getText());
pst.execute();
53
// JOptionPane.showMessageDialog(null, "Log History Deleted");
update_table();
private void logtableMouseClicked(java.awt.event.MouseEvent evt) {
try {
int row = logtable.getSelectedRow();
String Table_click = (logtable.getModel().getValueAt(row, 0).toString());
String sql = "select * from logs where word='" + Table_click + "'";
rs = pst.executeQuery();
if (rs.next()) {
String add = rs.getString("word");
jLabel1.setText(add);
} } catch (Exception e) {
54
APPENDIX B
SCREENSHOTS
The following figures show the screen layouts of the designed program.
Main Screen
55
Browsing Path
Display match patter MS-word document
Document containing pattern match Display
Search Log
56
APPENDIX C
SOFTWARE SPECIFICATION
SOFTWARE SPECIFICATION
-Windows 10 Operating Sytstem 64 bit
-Java
-Netbeans 8.0.1
-XAMPP
HARDWARE SPECIFICATION
-1TB HDD
57
-4GB DDR4 RAM
-Intel(R) Core(TM) i3-6006U CPU @ 2.00Ghz 2.00Ghz
-Intel(R) HD Graphics 520(2112 MB)
RAMON MAGSAYSAY MEMORIAL COLLEGES

Office of the Program Director
INFORMATION TECHNOLOGY EDUCATION PROGRAM
General Santos City, Philippines
Document Type: Document No. : DAP–03-01- 29- B
T HESIS / CAPSTONE PROJECT REPORTS Issue No.: SY20 Revision No.:
Document Title: Effective Date: June 01, 2017
Adviser-Advisee MOU Page 1 of 1
Adviser – Advisee Memorandum of Understanding
All students in thesis programs must complete this form contingent with
submission of a thesis topic for approval. The signatures of student and
adviser indicate that they intend to abide by the terms and provisions of
this agreement. A copy of the signed Memorandum should be submitted to
the Program Director.
Date MARCH 18, 2019

Student’s Name RUDJIE Q. CARILLO
Adviser’s Name HANZEL GRACE L. JARIOL
Degree MASTER IN INFORMATION TECHNOLOGY
58
Title of Project:
AN APPLICATION OF NAÏVE STRING MATCHING ALGORITHM IN SEARCHING MS-
WORD DOCUMENT CONTENT IN WINDOWS PLATFORM
Role and Responsibilities of the Adviser

All advisers are expected to have good knowledge of the research discipline.
The Thesis Adviser has the overall responsibility for guiding the student through
the process of the successful completion of a thesis that fulfills the expectations of
scholarly work at the appropriate level as well as meets the requirements of the
Department and the School. The following conditions have been read and agreed
upon by student and adviser:
 be able and willing to assume principal responsibility for advising the
student;
 have adequate time available for this work and be accessible to the student;
 provide adequate and timely feedback to both the student and the
Committee regarding student progress toward degree completion;
 guide and provide continuing feedback on the student's development of a
research project by providing input on the intellectual appropriateness of the
proposed activities, the reasonableness of project scope, acquisition of
necessary resources and expertise, necessary laboratory or computer
facilities, etc.;
 establish key academic milestones and communicate these to the student
and appropriately evaluate the student on meeting these milestones.
 Ensures that the study proposed by the student conforms to the standard of
the College and has immediate or potential impact on the research thrust of
the school.
 Guides the students in their Research / Capstone Project in the following
tasks while in the proposal stage:
o Defining the research problems/objectives in clear specific terms
o Building a working bibliography for the research
o Identifying variables and formulating hypothesis, if any
o Determining research design, population to be studied, research
environment, instruments to be used and the data collection
procedures
 Meets the advisee regularly (at least twice a month, NOTE: the researcher
must seek proper appointment) to answer questions and help resolve
impasses and conflicts.
 Points out errors in the development work, in the analysis, or in the
documentation. The adviser must remind the Proponents/Researchers to do
their work properly.
 Reviews thoroughly all deliverables at every stage of the Research /
Capstone Project, to ensure that they meet the department's standards. The
59
adviser may also require his/her Proponents/Researchers to submit progress
reports regularly.
 Recommends the Proponents/Researchers for Proposal Hearing and Oral
Defense. The adviser should not sign the Proposal Hearing Notice and the
Oral Defense Notice if he/she believes that the Proponents/Researchers are
not yet ready for Proposal Hearing and Oral Defense, respectively. Thus, if
the Proponents/Researchers fail to appear in the Proposal Hearing or Oral
Defense, it is partially the adviser's fault.
 Clarifies points during the Proposal Hearing and Oral Defense.
 Ensures that all required revisions are incorporated into the appropriate
documents and/or software.
 Keeps informed of the schedule of Research / Capstone Project activities,
required deliverables and deadlines.
 Recommends to the Proposal Hearing and Oral Defense panel the
nomination of his/her advisee’s Research / Capstone Project for an award.
Role and Responsibilities of the Student/Advisee

While it is expected that students receive guidance and support from their
adviser and all members of the Thesis Committee, the student is responsible for
actually defining and carrying out the program approved by the Thesis Committee
and completing the thesis/capstone project. As such, it is expected that the student
assumes a leadership role in defining and carrying out all aspects of his/her degree
program and thesis/capstone project. Within this context, students have the
following responsibilities:
 Keep informed of the Capstone Project Guidelines and Policies.
 Keep informed of the schedule of Research / Capstone Project activities,
required deliverables and deadlines posted by Adviser and Dean.
 Submit on time all deliverables specified in this document as well as those to
be specified by the Adviser and Dean.
 Submit on time all requirements identified by the Capstone Project Oral
Defense Panel during the Oral Defense.
 Submit on time the requirements identified by the adviser throughout the
duration of the Capstone Project.
 Schedule regular meetings (at least once a week) with the Adviser
throughout the duration of the Capstone Project. The meetings serve as a
venue for the proponent to report the progress of their work, as well as raise
any issues or concerns.
 Schedule regular meetings (at least once in a semester) with the Dean
throughout the duration of the Capstone Project.
 Pays promptly of the monetary obligation, thus, the adviser’s fee amounting
to P1,000.00 from school year 2014-2015, per semester.
 Failure to comply with the deliverables as required by the adviser subjects
the advisee to be excluded from the research project, thus, the capstone
he/she has initiated will no longer qualify for oral defense.
60
In addition, the student and adviser should discuss/define:
 Ownership and use of data
 A plan for presentations and publications based on the thesis
 Authorship protocols for presentations and publications
CONFORME:
Signatures
RUDJIE Q. CARILLO MARCH 20, 2019

Student Date
HANZEL GRACE JARIOL, MIT MARCH 20, 2019

Adviser Date
Document Type: Document No. : DAP–03-01- 29- D
Project Working Title Page 1 of 1
PROJECT WORKING TITLE FORM

Proponent/Researcher:
61
RUDJIE Q. CARILLO
Proposed Project Title:
AN APPLICATION OF NAÏVE STRING MATCHING ALGORITHM IN SEARCHING MS-WORD DOCUMENT
CONTENT IN WINDOWS PLATFORM
Submitted by: Noted:
RUDJIE Q. CARILLO HANZEL GRACE JARIOL, MIT

(Signature of Researcher over printed name) (Signature adviser over printed name)
Date: ______________________ Date: ______________________
Recommending Approval: Approved:
JIM JAMERO, MIT ETHEL L. OCZON, MSIS

(Panelist Signature over printed name) (Signature the Dean over printed name)
Date: ______________________ Date: ______________________

Document Type: Document No. : DAP–03-01- 29- C
Research / Capstone Title Hearing Notice Page 1 of 1
RESEARCH / CAPSTONE TITLE HEARING NOTICE
Date filed: ____________________

Ref. Code: ____________________
Date: ____________________
Time: ____________________
Venue: ____________________
62
COLLEGE/ INSTITUTE/ DEPARTMENT: College of Information Technology Education
Research Title:
AN APPLICATION OF NAÏVE STRING MATCHING ALGORITHM IN SEARCHING MS-WORD DOCUMENT CONTENT

IN WINDOWS PLATFORM
Proponent:
RUDJIE Q. CARILLO___________________________________________________
CERTIFICATION
The undersigned members comprising the panel for oral examination hereby agree
to the schedule of hearing for the above research.
HANZEL GRACE L. JARIOL, MIT JIM JAMERO, MIT

RESEARCH ADVISER PANEL MEMBER 1
ERECKJUN E. CASTAÑO ETHEL L. OCZON, MSIS

PANEL MEMBER 2 PANEL CHAIR
CERTIFICATION
This is to certify that the undersigned had edited the manuscript of “RUDJIE
Q. CARILLO entitled “AN APPLICATION OF NAÏVE STRING MATCHING
ALGORITHM IN SEARCHING MS-WORD DOCUMENT CONTENT IN WINDOWS
PLATFORM” as to its content and grammar.
Done this 21th day of March 2019
63
This certification is issued upon the request of the student mentioned earlier
to whatever purpose it may serve.
VANGELINE O. ERUM, PhD

English Critic
CURRICULUM VITAE
PERSONAL INFORMATION
NAME: Rudjie Q. Carillo
City Address: Sarangani Homes P-1 Prk. Malakas Brgy San Isidro, General Santos
City
Contact Number: 09354061270
Email Address: redjexcarillo11@gmail.com
64
Gender: Male
Age: 21 years old
Birthday: October 10, 1997
Place of Birth: General Santos City
EDUCATIONAL BACKGROUND
COLLEGE: Ramon Magsaysay Memorial Colleges Pioneer Avenue, General
Santos City
Course: Bachelor of Science in Computer Science
SECONDARY: Lagao National High School Aparente St. General Santos City
ELEMENTARY: Dadiangas West Central Elementary School, General Santos City
65

Chapter I - 5 Edited 24-03-19

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Chapter I - 5 Edited 24-03-19

Uploaded by

Copyright:

Available Formats

CHAPTER 1

At the present world, the use of the software is undoubtedly increasing.

human resources. Therefore, most of the companies computerize their

information and build it to the user. Same is that the life of a computer, there is

such a lot information hold on in it, that whenever a user asks for a

few information, the computer has to search its memory to seek for the

info and build it accessible to the user. The laptop has its procedures to search

over its memory fast.

According to the article published by Google, reveals that roughly 40% of

people search only on a smartphone. People are searching Google via

Nowadays the use of search engines is essential in doing school papers

to make a decision. Searching is one of the simplest things to do on the internet

or with the computer regarding find files on the computer.

Moreover, because of this modernization, computer is the most important

most modern computers, people do their jobs in their personal computer

therefore the use of digital storage in the present is essential.

According to makeuseof.com, many people save their files on their

desktop; some people keep it for understandable reason. It provides instant

To search for files, the search algorithm is fundamental; it is the step-by-

step process used to trace exact data among a collection of data. It is

considered a fundamental of computing. In this course, when searching for

data, the distinction between a fast application and a slower

one often lies within the utilization of the proper search algorithm. Search

algorithms will be classified as their mechanism of searching. Linear search

algorithms check each record for the one related to a target key in a linear

In this course, string-searching-algorithm or string-matching-algorithm

is an essential category of string algorithms that try and find a place

wherever one or many strings (also known as patterns) found among a more

significant line or text.

matching algorithm; it can process at no time; it can match a string in (Θ(nm)).

PURPOSE AND DESCRIPTION

Search applications are very prevalent nowadays. Unorganized files on

computers or has no proper file management is the most common problem by

engine of windows OS.

This study includes a creation of an application that applies Naïve String

the ms-word document and display in the rich textbox.

in the search box.

The objectives of this study are the following:

 To develop a search engine that can search contents of MS-word

documents using naïve string matching algorithm and displays the

 To determine the accuracy of naïve string matching algorithm in

searching files containing the certain text/word/phrase being searched by

testing several test data.

SCOPE AND LIMITATION

creation of simulator that has the following functionalities.

 Can search the content of MS-word documents.

 Can search MS-word through sub-folders

 Can exclude or include sub-directories in searching

 The simulation program cannot read.PDF, XLSM, PPTX, and other

REVIEW OF RELATED LITERATURE

According to Christian Charras, Thierry Lecroq of Université de Rouen

(2015) String-matching algorithms are necessary modules used in

implementations of functional software existing under most operating systems.

Moreover, they emphasize programming methods that serve as paradigms in

other fields of computer science (system or software design).

Furthermore, the authors of A Fast Multiple String-pattern Matching

proposed a simple and efficient multiple string pattern matching algorithm

algorithm demonstrates the ability to handle a vast number of models

hashing techniques are used in other multiple-string matching algorithms to

handle a large number of patterns.

Baeza-Yates, the author surveys several algorithms for searching a string in a

(1 to 3 characters long) it is better to use the naive algorithm. Also if the

alphabet size is large, then Knuth-Morris-Pratt's algorithm is a good choice. In