Professional Documents
Culture Documents
INTRODUCTION
People use specific software for educational purposes and personal use. These
include bank transactions e-commerce. It will save lots of money, time and
business operations.
There’s such a lot information hold on it, that whenever a user asks for a
few information, the computer has to search to its files to explore the data or
smartphone than ever before the company says, with the most prevalent
categories revolving around health, parenting, and beauty. Other findings from
Google’s study are 80% of people search google are using Smartphones, 67%
of people use a desktop computer, 16% of people use a tablet 57% of people
use more than one type of device, 27% of people use a smartphone only, 14%
of people use a desktop computer. Later IDC white paper distributed in 2012,
1
the creators share that a worldwide study of 1200 data specialists and IT
experts found that they spend a normal of 4.5 hours a week trying to find
documents on their computers. The individuals who got to discover things the
foremost, and who ought to be the leading at finding them. Instep, they are
investing half of those 4.5 hours looking for, and not seeing, the records they
require. At that point, they spend the other half reproducing what they haven't
found.
https://www.searchenginejournal.com/mobile-search-rise-almost-half-people-
search-smartphones-study/175544/
and most people who are using a search engine are doing it for research
purposes. People are mostly looking for answers or at least to data with which
in this era, and searching is the most straightforward people that can do in the
basically they store there finished work in their computer, some people
download a large files in the internet and save in the computer storage,
2
access with a single click. These people did not know the risk like hackers can
access desktop files. Sometimes, people who keep saving in their desktop are
getting confused about the sdata they saved; people sort the data; others are
using the search bar to find the information they need. The use of the search
bar is significant because it makes the user search easily what they are
seeking. for
fashion.
Naïve String Matching is one of the fast string search algorithms incomes
with matching short length patterns. This application can help people who
cannot wholly manage their computer desktops in such a way that it's easier
than ever to spend a long time searching for it. It can search the word inside the
3
ms-word(.docx) document in a short time. To seek a string, a naïve string
algorithm can be used. Naïve string matching algorithm is the fastest in string
users, the use of this study is to apply the naïve string matching algorithm in
searching file/s containing the text being sought and to add feature like how
many words found in the file/s are, when was the last time the text/word/phrase
is being searched. This feature is not seen or present in the current search
Matching Algorithm in searching text inside the (.docx) document, during the
execution of the program the user can locate the folder where the user wants to
explore in the browse button and then user enters the desired word in the
search box provided, once the search button is clicked it immediately searches
Furthermore, after the search, the user will click one of the listed ms-
word document, and it will extract the word to the panel box of the application it
will show the words within documents that were selected path by the users put
4
OBJECTIVES:
document.
SCOPE
The study covers the concept of Naïve string matching algorithm applied
in the searching text in an MS-word document. The study also includes the
Can generate a search log, a word that recently searches, searches time
and date.
LIMITATION
5
The simulation can search with the maximum length of 100 composed of
the alphabet (upper and lower case (a, b, c, A, B, C).
CHAPTER II
Algorithm, Yanggon Kim and Sun Kim of Towson University (1999) they
based on a compact encoding scheme. The algorithm they used scans text
from left to right while encoding characters in the document based on the
alphabet that occurs in the input patterns. And they conclude that their
simultaneously and runs faster than five grep and are in many cases. The
6
Also according to the author of Algorithms for string searching Ricardo A.
piece of text. The authors include several theoretical and empirical results, as
well as the actual algorithm. The authors conclude that string matching
algorithms depend on the alphabet size and pattern size. If the pattern is small
all the other cases, in particular, for long texts, Boyer-Moore's algorithm is
better. Finally, the Horspool version of the Boyer-Moore algorithm is the best
algorithm, according to the execution time, for Almost all pattern lengths. The
shift-or algorithm has a run lug time similar to the KMP algorithm. However, the
main advantage of the KMP algorithm is that we can search for more general
patterns.
Karp-Rabin, and the Horspool algorithm. They discussed that the string search
how the algorithms work. The algorithms that have been researched and
is easy to implement.
7
Moreover, according to TARA: An Algorithm for Fast Searching of
fixed-length strings on text files very fast by benefiting from bit-parallelism. The
algorithm is given the name TARA. Bounded gaps, as well as character classes
in text files, the experimental results on language text indicate that for small
approximately 1.5 times faster than grep software and 5 times than its nearest
successor of the AC and CW variant. The TARA algorithm that the author used
searching with the simplicity of speed of the algorithm as the modern computers
search engine uses different search algorithms for handling different types of
data. Full search algorithm increases the pattern matching process. In the
algorithms for full search equivalent pattern matching like complexity, efficiency,
and techniques. The author concludes that each algorithm has its
8
Morris–Pratt algorithm having less time complexity and Boyer Morris algorithms
having less preprocessing time complexity. Fast DTW algorithm is best for all
Image, Audio and Video pattern processing. Fast DTW has a linear time and
space complexity. The time performance of exact string pattern matching can
Matching Algorithm by Akhtar Rasool, Dr. Nilay Khare, Himanshu Arora, Amit
Hybrid pattern matching algorithm is made after combining KMP & Boyer-Moore
algorithm that searches a pattern from left to right in the string. Reducing the time
required in the worst/average case it an effort to reduce processing time, the goal is
to combine the best/average case advantages of the algorithm with the worst case
guarantees of KMP. It results in the comparison shows that the Hybrid algorithm
significantly improves the matching efficiency. The main drawback of the Boyer-
Moore type algorithms is the pre-processing time and the space required, which
depends on the alphabet size and the pattern size. For this reason, if the pattern is
Problems by Kapil Kumar Soni, Rohit Vyas, Amit Singhal, In string matching pattern
strings are searched within a larger string or text. Let us assume that pattern string
“p" and text string „S.‟ The problem of string matching deals by finding whether a
pattern set „p‟ occurs in „S‟ or not. And if „p‟ occurs when the position of it should
9
be reported in „S‟ where “p‟ occurs. There are two types of string matching Exact
string matching and Approximate string matching. String matching has dramatically
influenced the field of computer science and will play an essential role in various
real-world problems. As time grows, more and more efficient string matching
algorithms will be used. Since 1950 lots of single and multiple patterns string
matching algorithms have been suggested. There are many more possible areas in
Chapter 3
THEORETICAL BACKGROUND
and the pattern searched for is a particular word supplied by the user. Efficient
algorithms for this problem can greatly aid the responsiveness of the text-
editing program. The idea of the naive solution is just to make a comparison
and the pattern P[0...m − 1]. It returns all the valid shifts found.
The naive algorithm finds all valid shifts using a loop that checks the
of s.
NAIVE-STRING-MATCHER (T, P)
n length[T]
10
m length[P]
for s 0 to n – m
do if P[1 . . m] = T[s + 1 . . s + m]
single general scheme. That scheme considers prospective positions p for the
pattern in the text in increasing order, and it maintains the length q > 0 of a
pattern prefix known to match the text starting following position p ([0, q], = [ p,p
+ q],). For appropriately calculated p' >p and q,' then, the algorithms search as
follows:
Each time q reaches the pattern length 1x1, a full instance of the pattern has
been found following position p in the text (x = [ p,p + Ix]],); the search can be
11
be false whenever p + q + 1 > ( y ] or q + 1 > 1x1, so this will be automatic.) Of
course, the algorithms should halt when the end of the text is reached (p = I ~1).
The previous algorithms differ only in how they calculate p' and q'. The naive
algorithm conservatively calculates p' =p + 1 and q' = 0. Since [0, qlX = [ p,p + q],,
Searching Multiple Patterns on Text files (2007) the algorithm performs very fast in
algorithm with widely used GNU grep file search utility and also with nine variants of
The TARA Algorithm execute by Let P = {p0, p1, . . . , pm−1} be the set of m
assumed that P = {bal, peynir, re[cc¸]el}, LP = {3, 6, 5}, maxlen = 6, and minlen = 3.
12
In the research of Multithreaded Implementation of Hybrid String Matching
Algorithm by Akhtar Rasool, Dr. Nilay Khare, Himanshu Arora, Amit Varshney,
Gaurav Kumar (2012) the algorithm came in existence after combining KMP &
pattern searching algorithm that searches a pattern from left to right in the series.
processing time, the goal is to combine the best/average case advantages of the
algorithm with the worst case guarantees of KMP. According to the experiments we
have conducted, the new algorithm is among the fastest in practice for the
provided a String S of size m, break that string into two parts (i.e., S1 and S2).
13
Chapter 4
METHODOLOGY
This chapter presents the design of the application that developed through
Netbeans IDE. This also includes the discussion of the functions used and
algorithm applied in the application. At this point, the overall progress of the
Below is the diagram that will elaborate on the primary process of the
simulator; this also gives the reader the concept of the functions is used in the
application.
14
Fig. 1. Use Case Diagram
ACTIVITY DIAGRAM
graphical way.
15
Fig. 2. Activity Diagram
SEQUENCE DIAGRAM
The diagram below shows how the object interacts with each other and
the order of those interactions. The process is represented vertically, and the
16
Fig. 3. Sequence Diagram
17
Figure 4. Naïve String Matching Algorithm Flowchart
Source Code:
int M = pat.length();
int N = txt.length();
int j;
if (txt.charAt(i + j) != pat.charAt(j)) {
break; }
NAIVE-STRING-MATCHER (T, P)
1. n length[T]
2. m length[P]
18
3. for s 0 to n – m
4. do if P[1 . . m] = T[s + 1 . . s + m]
CHAPTER 5
This section shows the result of the study. Discussion, determination, and
19
The following tables below show the elapsed time of the Naïve String
sunflower 6 Sample3.docx
shallow 16 Sample7.docx
tell 4 Sample7.docx
Sample3.docx
nevertheless 4 Sample3.docx
needless 4 Sample6.docx
Sample1.docx
Sample3.docx
bad 15
Sample6.docx.
Sample7.docx
deep 2 Sample7.docx
flood 5 Sample9.docx
20
Figure 5. Text “nevertheless” result
21
Figure 7. Text “needless” result
22
Figure 9. Text “flood” result
In the table 1 presents the elapsed time in nanoseconds of the process using
shown above the string that searched has many pattern indexes, it means there are
several words that match the pattern inside the ms-word documents.
23
Conclusion
After the analysis of the text that has been searching. The following findings
were made:
The MS-word searching content is the way used by the researcher to search
for text inside the MS-word document using Naïve String Matching algorithm.
However, when running the MS-word search content, matched patterns matches
text by detecting pattern indexes. This indexes determine how many pattern have
Recommendation
After the analysis of the directories that have been searched, the following
Based on the results after running the test data in the application, the
The application can search throughout the local area networks. Can search
ms-word to specific shared path or folder from computer to another though Local
Area Network.
BIBLIOGRAPHY
24
Page Title: WHY PEOPLE USE SEARCH ENGINES: RESEARCH, SHOPPING,
AND ENTERTAINMENT
Address: https://www.dummies.com/web-design-development/search-engine-
optimization/why-people-use-search-engines-research-shopping-and-
entertainment/
Address: http://www.enggjournals.com/ijcse/doc/IJCSE12-04-03-032.pdf
Address: http://www.enggjournals.com/ijcse/doc/IJCSE12-04-03-032.pdf
Page Title: Tara: An algorithm for fast searching of multiple patterns on text files
Address:
https://www.researchgate.net/publication/4321078_Tara_An_algorithm_for_fast_se
arching_of_multiple_patterns_on_text_files
Address: https://www.coursehero.com/file/p5d1hjv/We-assume-that-the-text-is-an-
array-T-1n-of-length-n-and-that-the-pattern-is-an/
Address: https://urresearch.rochester.edu/fileDownloadForInstitutionalItem.action?
itemId=10186&itemFileId=22371
25
Page Title: Multithreaded Implementation of Hybrid String Matching ...
Address: http://www.enggjournals.com/ijcse/doc/IJCSE12-04-03-032.pdf
Address: https://www.coursehero.com/file/p2kq6vr/Omegaexpression-is-the-set-of-
functions-that-grow-faster-than-or-at-the-same/
Page Title: A very fast string matching algorithm for small alphabets and long
patterns
Address:https://www.researchgate.net/publication/225725083_A_very_fast_string_
matching_algorithm_for_small_alphabets_and_long_patterns
Address: https://www.google.com/search?q=of+A+Fast+Multiple+String-
pattern+Matching+Algorithm,
+Yangon+Kim+and+Sun+Kim+of+Towson+University+(1999)&spell=1&sa=X&ved=
0ahUKEwj66YyU5I_hAhUNcCsKHYi0DrcQBQgpKAA&biw=1517&bih=730
Address: https://www.semanticscholar.org/paper/Algorithms-for-String-Searching
%3A-A-Survey-Baeza-Yates/bc2f8507f00a419aebe9d9ccb56a68919cc19b46
Address:https://pdfs.semanticscholar.org/8afc/6c601aa4ae2e0878c943735e75935
e995b58.pdf
26
Page Title: Time-space-optimal string matching
Address: https://dl.acm.org/citation.cfm?id=802463
Address: www.enggjournals.com/ijcse/doc/IJCSE12-04-03-032.pdf
APPENDIX A
Source Code:
27
MS-word search
package thesis;
import de.schlichtherle.io.File;
import java.awt.event.KeyAdapter;
import java.awt.event.KeyEvent;
import java.awt.event.KeyListener;
import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.PreparedStatement;
import java.sql.ResultSet;
import java.sql.SQLException;
import java.sql.Statement;
import java.util.Calendar;
import java.util.Date;
import java.util.GregorianCalendar;
import java.util.HashMap;
import java.util.Iterator;
import java.util.List;
import java.util.Map;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
import javax.swing.DefaultListModel;
import javax.swing.SwingUtilities;
28
import org.apache.commons.io.FilenameUtils;
import javax.swing.JFileChooser;
import javax.swing.JOptionPane;
public TextFinder() {
initComponents();
CurrentDate();
textSearchText.addKeyListener(new KeyAdapter() {
if (Character.isUpperCase(keyChar)) {
e.setKeyChar(Character.toLowerCase(keyChar));
29
});
try {
Class.forName("com.mysql.jdbc.Driver");
con =
DriverManager.getConnection("jdbc:mysql://localhost:3306/searchlog", "root", "");
state = con.createStatement();
// JOptionPane.showMessageDialog(this, "connected");
JOptionPane.showMessageDialog(null, dummy);
try{
}catch(Exception e){
30
}
try
System.out.println(selectquery);
System.out.println(rs.next());
if (rs.next())
else
int s = 1;
31
int searched_count = 1;
// int x = stmt.executeUpdate(insertquery);
//// System.out.println(x);
stmt.execute(insertquery);
// infoMessage("word added","arlert!!");
} catch (Exception e) {
System.out.println(e);
//ADD data
ResultSet rs = null;
32
Date date = new Date();
try {
// JOptionPane.showMessageDialog(null,
"inserted","data_saved",JOptionPane.INFORMATION_MESSAGE);
//time_text.setText("");
//date_text.setText("");
state.executeUpdate(query);
JOptionPane.showMessageDialog(null, dummy);
33
worker.interrupt();
buttonStop.setEnabled(false);
try {
// jLabel6.setText("Result (0)");
if (!searchDir.isDirectory()) {
if (textSearchText.getText().length() == 0) {
34
// if(!checkPDF.isSelected() && !checkPlainText.isSelected() && !
checkPowerPoint.isSelected() && !checkWord.isSelected()) throw new
Exception("Please select at least one file type to search for text.");
} catch (Exception e) {
// Finder.error(e);
// return;
SwingUtilities.invokeLater(new Runnable() {
setEnableStates(true);
});
fileNamePattern = textFileName.getText();
searchDirectory(textSearchPath.getText());
SwingUtilities.invokeLater(new Runnable() {
35
setEnableStates(false);
labelSearching.setText("");
});
// check();
insert();
SwingUtilities.invokeLater(new Runnable() {
labelSearching.setText(fileName);
labelSearching.setToolTipText(fileName);
36
});
searchDirectory(files[i].getAbsolutePath());
searchDirectory(files[i].getAbsolutePath());
checkIfMatch(files[i]);
} else {
if (Thread.interrupted()) {
interrupted = true;
if (FilenameUtils.wildcardMatchOnSystem(file.getName(),
fileNamePattern)) {
//filename match hit, now try to parse the file according to check-boxes
37
if (matchingLines == null) { //plain-text extraction failed
// if(checkWord1.isSelected()) {
// matchingLines = PlainTextParser.findMatches(file,
textSearchText.getText());
// }
if (checkWord.isSelected() &&
file.getAbsolutePath().toLowerCase().endsWith(".doc")) {
matchingLines = MSWordParser.findMatches(file,
textSearchText.getText());
// matchingLines = DOCXParser.findMatches(file,
textSearchText.getText());
matchingLines = DOCXParser.findMatches(file,
textSearchText.getText());
if (matchingLines != null) {
matches.put(file.getAbsolutePath(), matchingLines);
SwingUtilities.invokeLater(new Runnable() {
38
DefaultListModel listModel = (DefaultListModel)
listResults.getModel();
listModel.addElement(file.getAbsolutePath());
} });
} }
};
worker.start();
//System.out.println("a="+elapsedTime);
buttonSearchActionPerformed(evt);
fileChooser.setFileSelectionMode(javax.swing.JFileChooser.DIRECTORIES_ONLY
);
39
if (fileChooser.showOpenDialog(this) == fileChooser.APPROVE_OPTION) {
textSearchPath.setText(fileChooser.getSelectedFile().getAbsolutePath());
if (listResults.getSelectedIndex() >= 0) {
Iterator it = null;
if (lines != null)
it = lines.iterator();
while (it.hasNext()) {
text.append((String) it.next());
text.append("\n");
textLines.setText(text.toString());
} else {
40
textLines.setText("");
} else {
textLines.setText("");
search.setVisible(true);
41
textFileName.setEnabled(!searching);
textSearchPath.setEnabled(!searching);
textSearchText.setEnabled(!searching);
checkRecursive.setEnabled(!searching);
checkArchives.setEnabled(!searching);
checkPDF.setEnabled(!searching);
// checkPlainText.setEnabled(!searching);
checkWord1.setEnabled(!searching);
checkWord.setEnabled(!searching);
// checkWord1.setEnabled(!searching);
buttonBrowse.setEnabled(!searching);
buttonSearch.setEnabled(!searching);
buttonStop.setEnabled(searching);
// Finder.getInstance().setBusy(searching);
DOCX parse
package thesis;
42
import de.schlichtherle.io.File;
import de.schlichtherle.io.FileInputStream;
import java.util.LinkedList;
import java.util.List;
import org.apache.poi.xwpf.extractor.XWPFWordExtractor;
import org.apache.poi.xwpf.usermodel.XWPFDocument;
int M = pat.length();
int N = txt.length();
int j;
match */
if (txt.charAt(i + j) != pat.charAt(j)) {
43
break;
System.out.println(duration);
XWPFWordExtractor ex = null;
44
try {
ex = new XWPFWordExtractor(doc);
search(txt.toString().toLowerCase(), pat.toString());
matchingLines.add(line);
45
} catch (Exception e) {
//fall through to return, could be because file is not UTF-8 readable, or some
other IOException
} finally {
//no cleanup
DOC parse
/*
46
* To change this license header, choose License Headers in Project Properties.
*/
package thesis;
/**
* @author Lenovo
*/
import de.schlichtherle.io.File;
import de.schlichtherle.io.FileInputStream;
import java.util.LinkedList;
import java.util.List;
import org.apache.poi.hwpf.HWPFDocument;
import org.apache.poi.hwpf.extractor.WordExtractor;
47
int M = pat.length();
int N = txt.length();
int j;
match */
if (txt.charAt(i + j) != pat.charAt(j)) {
break;
48
List matchingLines = new LinkedList();
WordExtractor ex = null;
try {
ex = new WordExtractor(doc);
search(txt.toString().toLowerCase(), pat.toString());
matchingLines.add(line);
49
index = docText.toLowerCase().indexOf(text, index + text.length());
} catch (Exception e) {
//fall through to return, could be because file is not UTF-8 readable, or some
other IOException
} finally {
//no cleanup
Database Connections
package databes;
50
import java.sql.Connection;
import javax.swing.*;
import java.sql.DriverManager;
try{
Class.forName("com.mysql.jdbc.Driver");
Connection con =
DriverManager.getConnection("jdbc:mysql://localhost/searchlog","root","");
// JOptionPane.showMessageDialog(null, "Connected");
return con;
} catch(Exception e){
JOptionPane.showMessageDialog(null, e);
return null;
Search Log
package thesis;
51
import java.sql.*;
import javax.swing.*;
import net.proteanit.sql.DbUtils;
import databes.DatabaseConnection;
import java.util.Calendar;
import java.util.GregorianCalendar;
ResultSet rs = null;
public logsearch() {
initComponents();
con = DatabaseConnection.ConnecrDb();
update_table();
CurrentDate();
try {
52
String sql = "select * from logs order by time desc";
pst = con.prepareStatement(sql);
rs = pst.executeQuery();
logtable.setModel(DbUtils.resultSetToTableModel(rs));
} catch (Exception e) {
JOptionPane.showMessageDialog(null, e);
try {
pst = con.prepareStatement(sql);
pst.setString(1, jLabel1.getText());
pst.execute();
53
// JOptionPane.showMessageDialog(null, "Log History Deleted");
} catch (Exception e) {
JOptionPane.showMessageDialog(null, e);
update_table();
try {
pst = con.prepareStatement(sql);
rs = pst.executeQuery();
if (rs.next()) {
jLabel1.setText(add);
} } catch (Exception e) {
JOptionPane.showMessageDialog(null, e);
54
APPENDIX B
SCREENSHOTS
The following figures show the screen layouts of the designed program.
Main Screen
55
Browsing Path
Search Log
56
APPENDIX C
SOFTWARE SPECIFICATION
SOFTWARE SPECIFICATION
-Java
-Netbeans 8.0.1
-XAMPP
HARDWARE SPECIFICATION
-1TB HDD
57
-4GB DDR4 RAM
All students in thesis programs must complete this form contingent with
submission of a thesis topic for approval. The signatures of student and
adviser indicate that they intend to abide by the terms and provisions of
this agreement. A copy of the signed Memorandum should be submitted to
the Program Director.
58
Title of Project:
AN APPLICATION OF NAÏVE STRING MATCHING ALGORITHM IN SEARCHING MS-
WORD DOCUMENT CONTENT IN WINDOWS PLATFORM
59
adviser may also require his/her Proponents/Researchers to submit progress
reports regularly.
Recommends the Proponents/Researchers for Proposal Hearing and Oral
Defense. The adviser should not sign the Proposal Hearing Notice and the
Oral Defense Notice if he/she believes that the Proponents/Researchers are
not yet ready for Proposal Hearing and Oral Defense, respectively. Thus, if
the Proponents/Researchers fail to appear in the Proposal Hearing or Oral
Defense, it is partially the adviser's fault.
Clarifies points during the Proposal Hearing and Oral Defense.
Ensures that all required revisions are incorporated into the appropriate
documents and/or software.
Keeps informed of the schedule of Research / Capstone Project activities,
required deliverables and deadlines.
Recommends to the Proposal Hearing and Oral Defense panel the
nomination of his/her advisee’s Research / Capstone Project for an award.
60
In addition, the student and adviser should discuss/define:
Ownership and use of data
A plan for presentations and publications based on the thesis
Authorship protocols for presentations and publications
CONFORME:
Signatures
61
RUDJIE Q. CARILLO
Proposed Project Title:
AN APPLICATION OF NAÏVE STRING MATCHING ALGORITHM IN SEARCHING MS-WORD DOCUMENT
CONTENT IN WINDOWS PLATFORM
62
COLLEGE/ INSTITUTE/ DEPARTMENT: College of Information Technology Education
Research Title:
RUDJIE Q. CARILLO___________________________________________________
CERTIFICATION
The undersigned members comprising the panel for oral examination hereby agree
to the schedule of hearing for the above research.
CERTIFICATION
This is to certify that the undersigned had edited the manuscript of “RUDJIE
63
This certification is issued upon the request of the student mentioned earlier
CURRICULUM VITAE
PERSONAL INFORMATION
City Address: Sarangani Homes P-1 Prk. Malakas Brgy San Isidro, General Santos
City
64
Gender: Male
EDUCATIONAL BACKGROUND
Santos City
SECONDARY: Lagao National High School Aparente St. General Santos City
65