You are on page 1of 38

IDEA PLAGIARISM DETECTOR

Submitted in partial fulfillment of the requirements


of the degree of

B. E. Computer Engineering
By

gy
lo
ANURAG UPADHYAY 38

o
JATAN RATHOD 63

hn
MANAN SATRA 67

c
Te
Under the Guidance of
Mr. JERIN THANKAPPAN of
Assistant Professor
te
Dept. of Computer Engineering
u
s tit
In
s
ci
an
Fr

Department of Computer Engineering


.

St. Francis Institute of Technology


St

(Engineering College)
University of Mumbai
2016-2017
St
.F
ra
nc
is
In
st
itu
te
of
Te
c hn
ol
o gy
St
.F
ra
nc
is
In
st
itu
te
of
Te
c hn
ol
o gy
St
.F
ra
nc
is
In
st
itu
te
of
Te
c hn
ol
o gy
Abstract

Plagiarism relates to the act of taking information or ideas of someone else and
demands it as your own. Basically it reproduces the existing information in modified format. In
every field of education, it becomes a serious issue. Various techniques and tools are derived
these days to detect plagiarism. Various types of plagiarism that exists which include text

gy
matching, idea plagiarism, copy paste, grammar based method etc .We find a lot of plagiarized
content among students’ assignments and papers. As many of them are unaware of academic
requirements, which leads them to copy-paste other’s work and pass them on as their own.

o lo
In this project to detect plagiarism, we use text matching to compare the submitted

hn
document for plagiarism. The checker shall look out for key words between the given and the
original documents and after checking shall determine whether it is copied or an original work.

c
As mentioned above, idea plagiarism, sometimes the plagiarism can be in such a format that

Te
even though the words are different, they convey the same idea. Identifying that kind of
plagiarism is the main function of this project.
of
u te
s tit
In
s
ci
an
. Fr
St

iv
Contents

Chapter Contents Page


No.
1 INTRODUCTION
1.1 Description 1

gy
1.2 Problem Formulation 1
1.3 Motivation 2

lo
1.4 Proposed Solution 2
1.5 Scope of the Project 2

o
2 REVIEW OF LITERATURE 3

hn
3 SYSTEM ANALYSIS
3.1 Functional Requirements 5

c
3.2 Non Functional Requirements 5

Te
3.3 Specific Requirements 5
3.4 Use-Case Diagrams and description
of 6
4 ANALYSIS MODELING
4.1 Data Modeling 8
te

4.2Activity Diagrams 9
u

4.3 Functional Modeling 11


tit

4.4 TimeLine Chart 12


5 DESIGN
s

5.1 Architectural Design 14


In

5.2 User Interface Design 15


6 IMPLEMENTATION
s

6.1 Hardware and Software Used 16


ci

6.2 Algorithms / Methods Used 16


an

6.3 Working of Project 17


7 TESTING
Fr

7.1 Type of Testing Used 35


7.2 Test Cases 36
.

8 CONCLUSION AND FUTURE WORK 39


St

8.1 CONCLUSION 25
8.2 FUTURE WORK 25

APPENDIX
LITERATURE CITED
ACKNOWLEDGEMENTS

v
List of Figures

Fig. No. Figure Caption Page No.

3.4.1 Use case diagram 6

gy
4.1.1 E-R diagram 8
4.2.1 Teacher activity diagram 9

lo
4.2.2 Student activity diagram 10

o
4.3.1 DFD Level 0 11

hn
4.3.2 DFD Level 1 11
4.4.1 Timeline Chart 1 12

c
4.4.2 Timeline Chart 2 12

Te
4.4.3 Timeline Chart 3 13
4.4.4 Timeline Chart 4 13
5.1.1 Architectural Design
of 14
5.2.1 Sample GUI 15
te
7.2.1 Test Case 1 36
7.2.2 Test Case 2 37
u

7.2.3 Test Case 3 37


tit

7.2.4 Test Case 4 38


s
In
s
ci
an
Fr
.
St

vi
List of Tables

Table No. Table Title Page No.

3.4.1 Use case description 7

gy
7.2.1 Test Cases for System Testing 36

o lo
c hn
Te
of
u te
s tit
In
s
ci
an
Fr
.
St

vii
List of Abbreviations

Sr. No. Abbreviation Expanded form

1 IEEE Institute of Electrical and Electronics Engineers

gy
2 OS Operating System

lo
3 UML Unified Modeling Language

o
4 ER Entity Relationship diagram

hn
5 DFD Data Flow Diagram
6 GUI Graphic User Interface

c
7 API Application Program Interface

Te
of
u te
stit
In
s
ci
an
Fr
.
St

viii
Chapter 1

Introduction

1.1 Description

Plagiarism relates to the act of taking information or ideas of someone else and

gy
demands it as your own. Basically it reproduces the existing information in modified format. In
every field of education, it becomes a serious issue. Various techniques and tools are derived

lo
these days to detect plagiarism. Various types of plagiarism are there like text matching, copy

o
paste, grammar based method etc. The intent and purpose of this project is to help us to

hn
identify whether a paper is plagiarized or not.

We find a lot of plagiarized content among students’ assignments and papers. As many

c
of them are unaware of academic requirements, which leads them to copy-paste other’s work

Te
and pass them on as their own.

In this project to detect plagiarism, we use text matching to compare the submitted
of
document for plagiarism. The given document shall be compared line by line with related
published papers from international journals like IEEE, Springer, Elsevier and various others.
te

The checker shall look out for key words between the given and the original documents and
u

after checking shall determine whether it is copied or an original work.


tit

The proposed problem thus obviously is related to copying someone’s matter as citing
s

it as your own, or maybe not giving legal credits for the same. Thus plagiarism checker helps
In

identify such copy leaks. Our plagiarism checker, i.e, the IDEA plagiarism checker not only
checks for copied content but goes one step higher and also detects if the idea has been copied
s

but the syntax has been altered.


ci
an

1.2 Problem Formulation


Fr

In the current Indian education system, any type of plagiarism checker has not
been implemented in any field. This leads to a high amount of data being copied and
.

not being credited to authors in the field of research. Plagiarism at institution level or at
St

university level doesn’t have any kind of plagiarism checker to detect whether the
students have copied documents from other students of their batch / department or from
the students of previous years i.e. their seniors. This disturbs the innovative ideas of
students; it hinders their creativity, depriving them of tackling problems and getting
optimal solutions for the same. Even if the plagiarism checker is implemented, students
intelligently just change the wordings and thus it still is considered as a copy leak.

1
Thus, to avoid such issues the concept of idea plagiarism came into picture which
fingerprints the documents and checks for plagiarism on a larger scale using syntax
based checking.
This checker can also be extended to paper publications to check if the current
paper under consideration has been copied from any previous paper and whether the
main idea of the paper remains same or not, or else, if taken from a previous certified
source, then has accredited appropriate credits to the author of the work they have used.

1.3 Motivation

gy
Plagiarism is bad as it does not give the original author recognition for their work, it

lo
prevents the plagiarizing student from learning and it fraudulently deceives the person
who is marking the work. In addition, plagiarism suggests that the student in question is

o
of bad character as they are willing to lie about their work.

hn
Other than that, plagiarism should be disallowed as it gives false credits and recognition
to undeserving individuals. Discouraging plagiarism encourages original work among

c
students as well as researchers. With such intentions we move forward with this project.

Te
1.4 Proposed Solution of
The proposed problem thus obviously is related to copying someone’s matter as citing
te
it as your own, or maybe not giving legal credits for the same. Thus plagiarism checker
helps identify such copy leaks. Our plagiarism checker, i.e. the IDEA plagiarism
u

checker not only checks for copied content but goes one step higher and also detects if
tit

the idea has been copied but the syntax has been altered.
s
In

1.5 Scope of the Project


s

Usually people associate plagiarism with only academic papers, journals, research or
ci

maybe the assignments given to students. But plagiarism is prevalent among various
an

fields other the field of education. Web content writing is also hit by plagiarism as
many websites copy contents from other websites and pass them off their own.
Fr

Plagiarism is also present in journalism as well. Sometimes newspapers might copy the
news or key details from other newspapers can publish them without citations.
.

Although such plagiarism can be detected with ease, plagiarism involving images,
St

audio/video files etc. cannot be detected with ease.

2
Chapter 2

Review of Literature

Plagiarism has always been a crime in foreign countries and especially in the field of
education. There are plagiarism checkers in every university to check if the student has copied
any of the content of his/her assignments from others. There have been many plagiarism

gy
detectors to check for the same. Few of the existing plagiarism detecting tools are [1]:

lo
1. Duplichecker (www.duplichecker.com)

o
o For checking plagiarism user has to paste or upload the content file and click

hn
“search” button. It will compare the content with online sources and give the
report.

c
o It is very useful because all the source websites from where the content ware

Te
copied are displayed and researcher can use this service for giving correct
references.
o But it can only check a maximum of 1500 words at a time and only accepts
of
word file (.doc) and text file (.txt).
te

2. Plagiarism Checker (http://smallseotools.com/plagiarism-checker/)


u

o User has to paste the content in the given box, and click the big green button
tit

“Check for plagiarism!”


o It will compare the content with the sources available on internet and give the
s

report.
In

o This tool can identify the original source of plagiarized content that was copied
from the internet.
s

o It can accept maximum 1500 words in one search at a time.


ci

o Also it can accept doc file.


an

3. Plagiarism Software (https://www.plagiarismsoftware.net/)


Fr

o User has to copy & paste or upload a file in the given box and then click the
“search” button. It will compare the content with online sources and give report.
.

o User can upload the file.


St

o It can check result line by line.


o It can check text document only.
o There is no facility to check .pdf file

3
The main domain of the plagiarism detection tools can be divided into two categories: text
documents and program source codes. Though program documents can be regarded as text
strings, they have a special structure which cannot be found in natural documents. Some
detection tools reported the names of detection techniques, and their main domains of
application. Some tools such as SNITCH, Turn-it-in, and CHECK can also incorporating with
a database system to detect plagiarized copies. CHECK also incorporates with web search
engines to compare the documents with web documents. [2]

The problem of plagiarism or copy case is increasing very rapidly because of digital era of

gy
resources available on World Wide Web (WWW). Plagiarism of digital documents seems a
serious problem in today’s era. Plagiarism refers to the use of someone’s data, language and

lo
writing without proper acknowledgment of the original source. Plagiarism of another author’s

o
original work is one of the biggest problems in publishing, science, and education. Plagiarism

hn
in text documents can be in several forms like plagiarized text may be copied one-to-one,
passages may be modified to a greater or lesser extent or they may be translated or it is act of

c
claiming to be author of information that actually someone else wrote. The plagiarism can be

Te
defined as “the unauthorized use or close imitation of the ideas and language of someone else”
[3]
of
Plagiarism mainly happens as copy-and-paste of the code, replacing the name of functions or
variables, reordering the sequence of the statement, type redefinition, and so on. At present,
te
there are three homologous software detection technology methods on the market: text-based
similarity detection, token-based similarity detection and syntax structure-based similarity
u

detection. Token-based similarity detection technology can find the plagiarism of copy-and-
tit

paste of the code, replacing the name of functions or variables, reordering the sequence of the
s

statement but type redefinition. [4]


In

There are no two humans, no matter what languages they use and how similar thoughts they
s

have, write exactly the same text. Thus, written text, which is stemmed from different authors,
ci

should be different, to some extent, except for cited portions. If proper referencing is
an

abandoned, problems of plagiarism and intellectual property arise. The existence of academic
dishonesty problems has led most, if not all, academic institutions and publishers to set
Fr

regulations against the offence. Borrowed content of any form require directly or indirectly
quoting, in-text referencing, and citing the original author in the list of references. [5]
.
St

4
Chapter 3

System Analysis

This chapter first presents both functional and nonfunctional requirements of the system.

3.1 Functional requirements:

gy
1. Teachers can find plagiarized assignments:

lo
 The system reads the submitted assignments and enters them to the algorithm to

o
find the degree of similarity between them.

hn
2. Viewing visually aided cheating (similarity) reports.
 Teachers can display cheating (plagiarism) report, which contains all submitted

c
Te
assignments and the percentage of similarity of each assignment with others.
 The main functions such a registration, login, create courses shall be made for
ease of access. of
3.2 Non-Functional requirements:
te

1. Compatibility
u

System should be compatible with Java runtime environment because it will be


tit

implemented in that language.


s
In

2. Ease to use
Teachers will interact with the system to generate plagiarism report through a user-
s

friendly graphical user interface. Furthermore, the generated reports will contain both
ci

textual and visual (bars, charts, etc.) representation for the results.
an

3.3 Specific Requirements:


Fr

Hardware Requirements:

1. Processor: PENTIUM 1 or above.


.

2. RAM requirements: 1 GB or above.


St

Software Requirements:

1. Operating System: Windows Vista or above/ Mac OS X.


2. JDK version 1.7 or above.
3. Netbeans 8.0.2/Eclipse Mars installed.

5
3.4 Use Case
The use case is a UML diagram used to describe the system as whole. As shown in figure,
the system has a database and two user types; namely: teacher and student.

Select Subject

gy
Student

lo
<<include>>

o
Select Assignment

hn
Database

c
Te
Upload Assignment

<<include>> of
Teacher
te

Check Plagiarism
u
tit

Fig 3.4.1: Use Case Diagram


s
In
s
ci
an
.Fr
St

6
USE CASE TEMPLATE:

Use case name: Check Plagiarism.

Super use case: Select Subject.

gy
Actor(s): Teacher and Database.

lo
o
Brief description: The teacher can check whether the assignment submitted

hn
is plagiarized or not.

Preconditions: The student must have uploaded the assignment

c
beforehand.

Te
Post-conditions: After the plagiarized result is displayed the teacher can
either accept or discard the assignment.
of
Flow of events: 1. Firstly the student uploads the assignment on the
portal.
te

2. Then the teacher shall upload the assignment on


u

the plagiarism detector software.


tit

3. After that the plagiarism result shall be


displayed.
s
In

Alternative flows and There are currently no assignments uploaded.


exceptions:
s
ci

Priority: This use case has the highest priority.


an

Non-behavioral Stable internet connection and a desktop.


Fr

requirements:

Assumptions: The teacher has uploaded the correct assignment.


.
St

Table 3.4.1: Use Case Description Table.

7
Chapter 4

Analysis Modeling

4.1 Data Modeling -- E-R model

gy
lo
o
c hn
Te
of
u te
s tit
In

Fig 4.1.1: E-R diagram


s
ci
an
.Fr
St

8
4.2 Activity Diagram

1. Teacher activity diagram

gy
lo
o
c hn
Te
of
u te
tit
s
In
s
ci
an
.Fr
St

Fig 4.2.1: Teacher Activity Diagram

9
2. Student activity diagram.

gy
lo
o
c hn
Te
of
u te
s tit
In
s
ci
an

Fig 4.2.2: Student Activity Diagram


Fr
.
St

10
4.3 Functional modeling -- Data flow diagrams

DFD level 0:

gy
lo
Fig 4.3.1: DFD Diagram

o
c hn
DFD level 1:

Te
of
u te
s tit
In
s
ci
an

Fig 4.3.2: DFD Level 1


.Fr
St

11
4.4 Timeline Chart

gy
lo
o
c hn
Te
Fig 4.4.1: Timeline Chart 1
of
u te
tit
s
In
s
ci
an
.Fr
St

Fig 4.4.2: Timeline Chart 2

12
gy
lo
o
c hn
Te
Fig 4.4.3: Timeline Chart 3

of
u te
stit
In
s
ci
an
Fr
.
St

Fig 4.4.4: Timeline Chart 4

13
Chapter 5

Design

5.1 Architectural Design

gy
lo
o
c hn
Te
of
u te
s tit
In
s
ci
an
.Fr
St

Fig 5.1.1: Architectural Design

14
5.2 User Interface Design

Sample GUI:

gy
lo
o
chn
Te
of
te
Fig 5.2.1: Sample GUI
u
s tit
In
s
ci
an
.Fr
St

15
Chapter 6

Implementation

6.1 Hardware and Software used

Hardware used:

gy
 Processor : Intel core i3 (4th gen) or above.

lo
RAM : 2GB or above.
 HDD : 10GB or above.

o
 Working internet connection.

hn
Software used:
 Operating System: Windows 7/8/10 or Ubuntu 14.04 or above.

c

Te
JDK v1.7 or above.
 Netbeans/Eclipse java IDE installed.
 File readers for different file formats i.e. .txt, .pdf, .doc.
of
6.2 Algorithms/Methods used:
u te

Thesaurus API:
tit

We used an API for finding the synonyms of any word for which the document is to be
checked for similar occurrences. It is really essential as we are detecting the underlying of the
s

document and even with the use of different words, with the help of synonyms, plagiarism can
In

be detected.
s

Algorithm used:
ci

Knuth-Morris-Pratt Algorithm for string comparison:


an

KMP algorithm:
algorithm kmp_search:
Fr

input:
an array of characters, S (the text to be searched)
.

an array of characters, W (the word sought)


St

output:
an integer (the zero-based position in S at which W is found)

define variables:
an integer, m ← 0 (the beginning of the current match in S)
an integer, i ← 0 (the position of the current character in W)

16
an array of integers, T (the table, computed elsewhere)

while m + i < length(S) do


if W[i] = S[m + i] then
if i = length(W) - 1 then
return m
let i ← i + 1
else
if T[i] > -1 then
let m ← m + i - T[i], i ← T[i]

gy
else
let m ← m + 1, i ← 0

o lo
(if we reach here, we have searched all of S unsuccessfully)

hn
return the length of S

c
Te
6.3 Working of the project:

IPlagiarism.java:
of
te
package iplagiarism;
u

import java.io.FileNotFoundException;
tit

import java.io.FileOutputStream;
import java.io.PrintStream;
s

import javax.swing.SwingUtilities;
In

public class IPlagiarism {


s
ci

public static void main(String[] args) throws FileNotFoundException {


an

PrintStream out = new PrintStream(new FileOutputStream("Output.txt"));


System.setOut(out);
Fr

SwingUtilities.invokeLater(new Runnable() {
@Override
public void run() {
.
St

GUI gui = new GUI();


gui.Display();
}
});
}
}

17
The IPlagiarism.java file contains the main method of the system, from where the execution of
the project starts. Initially, the GUI is called, executed and displayed from the GUI.java file.

KMPMatcher.java:

package iplagiarism;
import java.util.ArrayList;
import java.util.List;
public class KMPMatcher {

gy
int count = 0;

lo
List<Integer> position = new ArrayList<>();

o
List<String> matchedWords = new ArrayList<>();

hn
public void KMPSearch(String pat, String txt) {

c
int M = pat.length();

Te
int N = txt.length();

int lps[] = new int[M]; of


int j = 0;
te
computeLPSArray(pat, M, lps);
u
tit

int i = 0;
while (i < N) {
s

if (pat.charAt(j) == txt.charAt(i)) {
In

j++;
i++;
s

}
ci

if (j == M) {
an

int pos = i - j;
if (!position.contains(pos)) {
Fr

position.add(pos);
count++;
matchedWords.add(pat);
.
St

}
j = lps[j - 1];
} else if (i < N && pat.charAt(j) != txt.charAt(i)) {
if (j != 0) {
j = lps[j - 1];
} else {
i = i + 1;

18
}
}
}
}

void computeLPSArray(String pat, int M, int lps[]) {

int len = 0;
int i = 1;
lps[0] = 0;

gy
while (i < M) {

lo
if (pat.charAt(i) == pat.charAt(len)) {

o
len++;

hn
lps[i] = len;
i++;

c
} else {

Te
if (len != 0) {
len = lps[len - 1];
} else { of
lps[i] = len;
i++;
te
}
}
u

}
tit

}
s

}
In

The KMPmatcher.java file is the heart of the system where the matching of strings between the
s

documents occur. We have used the Knuth-Morris-Pratt string matcher algorithm for the
ci

system. It takes two inputs namely the pattern and the string: the pattern is that string which is
an

to be found from the supplied document to be checked for plagiarism and the string is that
string which is supplied from the sample or the original document.
Fr

Thesaurus.java
package iplagiarism;
.
St

import java.io.IOException;
import java.util.ArrayList;
import java.util.HashMap;

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;

19
import org.jsoup.select.Elements;

public class Thesaurus {

ArrayList<String> wordList;
HashMap<String, ArrayList<String>> synonyms;

public Thesaurus(String[] words) {


this.synonyms = new HashMap<>();
this.wordList = new ArrayList<>();

gy
for (String word : words) {
wordList.add(word);

lo
}

o
}

hn
public Thesaurus(String word) {

c
this.synonyms = new HashMap<>();

Te
this.wordList = new ArrayList<>();

wordList.add(word); of
}
te
public HashMap<String, ArrayList<String>> getSynonyms() throws IOException,
InterruptedException {
u

if (synonyms.isEmpty()) {
tit

for (String word : wordList) {


s

try {
In

Document doc = Jsoup.connect("http://www.thesaurus.com/browse/" + word)


.userAgent("Mozilla/5.0 (Windows NT 6.3; rv:36.0) Gecko/20100101
s

Firefox/36.0")
ci

.timeout(3000)
an

.get();
Elements block = doc.getElementsByClass("relevancy-block");
Fr

Elements list = block.select(".relevancy-list");


Elements text = list.select(".text");
ArrayList<String> synonymList = new ArrayList<>();
.
St

for (int j = 0; j < text.size(); j++) {


synonymList.add(text.get(j).text());
}
synonyms.put(word, synonymList);
} catch (IOException e) {
}
}

20
}
return synonyms;
}
}

Thesaurus.java gives out the synonyms of the word provided as input. We have used synonyms
to find whether the idea of the document has been plagiarized or not. The thesaurus API gives
out synonyms of the word provided from the URL pre-provided in the API. It thus requires an
active internet connection for the system to find out synonyms using the above class.

gy
o lo
c hn
Te
of
u te
s tit
In
s
ci
an
. Fr
St

21
Chapter 7

Testing

Testing is the final verification and validation activity within the organization itself. During
testing the major activities are concerned on the examinations and modifications of the source
code. Testing is a process of executing a program with the intent of finding an error. A good

gy
test is one that uncovers an as yet undiscovered error.

lo
7.1 Type of testing used:

o
hn
Black Box Testing
Black-box testing is a method of software testing that examines the functionality of an

c
application without peering into its internal structures or workings. This method of test can be

Te
applied virtually to every level of software testing: unit, integration, system and acceptance. It
typically comprises most if not all higher level testing, but can also embody unit testing.

Functional Testing
of
Functional testing is a quality assurance (QA) process and a type of black-box testing that
te
bases its test cases on the specifications of the software component under test. Functions are
u

tested by feeding them input and examining the output, and internal program structure is rarely
tit

considered (unlike white-box testing). Functional testing usually describes what the system
does. Functional testing does not imply that you are testing a function (method) of your module
s

or class. Functional testing tests a slice of functionality of the whole system.


In

Functional testing typically involves six steps:


s

 The identification of functions that the software is expected to perform


ci

 The creation of input data based on the function's specifications


an

 The determination of output based on the function's specifications


 The execution of the test case
Fr

 The comparison of actual and expected outputs


 To check whether the application works as per the customer need.
.
St

22
7.2 Test Cases:

Test Case Input Expected result Observed results Pass/Fail


1 Files having same 100% plagiarized. 100% plagiarized. Pass.
content
2 Files having nothing in 0% plagiarized. 7.25% plagiarized. Fail
common.

gy
3 Files with different 100% plagiarized 73.91% plagiarized. Pass
words but same idea.

lo
4 Empty Directory Error Error Pass
5 .pdf file No error No error Pass

o
6 .doc file No error No error Pass

hn
7 .docx file No error No error Pass
8 .txt file No error No error Pass

c
9 Incompatible File type Error Error Pass

Te
10 File having image Error Error Pass
11 Only 1 file provided Error Error Pass
12 No Internet Connection Error
of Error Pass

Table 7.2.1: Test cases for system testing


u te
s tit
In
s
ci
an
.Fr
St

Fig 7.2.1: Test Case-1

23
gy
lo
o
chn
Te
of
te
Fig 7.2.2: Test Case-2
u
stit
In
s
ci
an
Fr
.
St

Fig 7.2.3: Test Case-3

24
St
. Fr
an
ci
s
In
stit
u

25
te Fig 7.2.4: Test Case-4

of
Te
c hn
o lo
gy
Chapter 8

Conclusion and Future Work

8.1 Conclusion

gy
Plagiarism is found to be a worm in the current education system which eats up the
thinking capability of the student and drastically reduces the imagination and performance of a

lo
student. Unlike foreign institutes we currently have no Plagiarism tools to find plagiarism in
documents provided at the college level. We thus have implemented a plagiarism detector

o
which not only detects plagiarized documents word by word or sentence by sentence but also

hn
detects if the overall idea of the document has been plagiarized or is unique. This checker can
be implemented at college and international paper publication level in future.

c
Te
8.2 Future Work of
The future work includes better accuracy, high file handling rate, speeded up results and
te
analysis, tracking and saving records pertaining to particular student/teacher, centralized server
app that can be accessed by all simultaneously. We look forward to improve and update our
u

system as and when the need arises. User feedback shall also have a crucial impact on our
tit

future versions of the system.


s
In
s
ci
an
. Fr
St

26
Appendix

Knuth-Morris-Pratt (KMP Algorithm):

Knuth–Morris–Pratt string searching algorithm (or KMP algorithm) searches for occurrences
of a "word" (W) within a main "text string" (S) by employing the observation that when a
mismatch occurs, the word itself embodies sufficient information to determine where the next

gy
match could begin, thus bypassing re-examination of previously matched characters.
The KMP algorithm has a better worst-case performance than the straightforward algorithm.

lo
KMP spends a little time precomputing a table (on the order of the size of W[], O(n)), and then
it uses that table to do an efficient search of the string in O(k).

o
hn
Analysis:

c
Te
Since the two portions of the algorithm have, respectively, complexities of O(k) and O(n), the
complexity of the overall algorithm is O(k+n).
of
These complexities are the same, no matter how many repetitive patterns are in W or S.
u te
s tit
In
s
ci
an
. Fr
St

27
Literature Cited

[1] Tripathi, Richa, Puneet Tiwari, and K. Nithyanandam. "Avoiding plagiarism in research
through free online plagiarism tools." Emerging Trends and Technologies in Libraries and
Information Services (ETTLIS), 2015 4th International Symposium on. IEEE, 2015.

[2] Ryu, Chang-Keon, et al. "Detecting and tracing plagiarized documents by reconstruction
plagiarism-evolution tree." Computer and Information Technology, 2008. CIT 2008. 8th IEEE

gy
International Conference on. IEEE, 2008.

lo
[3] Agarwal, Juhi, et al. "Intelligent plagiarism detection mechanism using semantic

o
technology: A different approach." Advances in Computing, Communications and Informatics

hn
(ICACCI), 2013 International Conference on. IEEE, 2013.

c
[4] Han, Lifang, et al. "Type redefinition plagiarism detection of token-based

Te
comparison." 2010 International Conference on Multimedia Information Networking and
Security. IEEE, 2010.
of
[5] Alzahrani, Salha M., Naomie Salim, and Ajith Abraham. "Understanding plagiarism
linguistic patterns, textual features, and detection methods."IEEE Transactions on Systems,
te
Man, and Cybernetics, Part C (Applications and Reviews) 42.2 (2012): 133-149.
u
s tit
In
s
ci
an
. Fr
St

28
Acknowledgements

Firstly we would like to express thankfulness to our head of department Dr. Kavita Sonawane
without whom we would not have had the opportunity to make this project.
Also our project coordinators Mr. Shamsuddin Khan, Ms. Vincy Joseph and Ms. Nidhi Gaur to
provide us this exciting opportunity to make this project.
We would like to thank our professor as well as our project guide Mr. Jerin Thankappan to

gy
provide us necessary guidance to make this project. His constructive criticism helped us make
this project in a better way.

lo
No task at hand is successfully complete without the blessings of parents and support of
friends. It is because of the blessings of my parents that I have accomplished the task of

o
completing this dissertation. I am also thankful to all my friends who have helped and

hn
supported me during difficult times.

c
Te
of
u te
s tit
In
s
ci
an
. Fr
St

29

You might also like