Professional Documents
Culture Documents
2.1 Definition: Plagiarism Concepts
2.1 Definition: Plagiarism Concepts
2.1 Definition
9
Chapter 2
6. Incomplete citing.
8. Providing proper citations, but using the same words and syntax.
10
Plagiarism Concepts
11
Chapter 2
Example:
12
Plagiarism Concepts
13
Chapter 2
Insertion
Direct deletion
Literal
copy
Substitution
Reordering words
Change syntax
Plagiarism
paraphrasing
Intelligent summarising
Modified
copy translation
Semantic change
14
Plagiarism Concepts
15
Chapter 2
Based on whether the source documents used for the comparison are
from the internet or from the local system, detection approaches are further
classified as online or offline. Online plagiarism detection systems compare
a suspicious document with a reference collection, which is a set of
documents available on the web whereas an offline plagiarism detection
16
Plagiarism Concepts
17
Chapter 2
18
Plagiarism Concepts
19
Chapter 2
Table 2.1 Summary of text features and tools required for their
implementation
Text Feature
Works Tools required
features examples
Yerra, R. and Ng, Y.K., (2005). Character Feature selector
n-grams
Lexical Kasprzak, J. Et al., (2009) Word n- Tokenizer
Koberstein, J. and Ng, Y.K., (2006). grams
Alzahrani,S. And Salim,N., (2010)
Scherbinin, V. and Butakov, S., chunks Tokenizer, POS
(2009) tagger, text
chunker
Elhadi, M. and Al-Tobi, A., (2009) POS and Sentence splitter,
Ceska, Z. and Fox, C., 2011. phrase Tokenizer, POS
structure tagger
Syntactic Li, Y., McLean, D., Bandar, Z.A., Word order Sentence splitter,
O'shea, J.D. and Crockett, K., Tokenizer,
(2006). compressor
Yerra, R. and Ng, Y.K., (2005) sentence Sentence splitter,
Alzahrani, S.M. and Salim, N., Tokenizer, POS
(2008) tagger, text
chunker , partial
parser
Yerra, R. and Ng, Y.K., (2005) Synonyms Tokeniser, POS
Alzahrani,S, (2008) tagger, Thesaurus
Y. Li,D.McLean, Z.A. Bandar, J. D. Semantic Sentence splitter,
Semantic O’Shea, andK. Crockett, (2006) dependencies Tokenizer, POS
tagger, text
chunker , partial
parser, semantic
parser
H. Zhang and T. W. S. Chow, (2011) Block HTML parser,
specific specialised parsers
Structural S. Alzahrani, (2012) Content Tokeniser ,
specific specialised
dictionaries.
20
Plagiarism Concepts
i. Source retrieval
21
Chapter 2
iii. Post-processing
22
Plagiarism Concepts
23
Chapter 2
24
Plagiarism Concepts
retrieved relevant
precision (2.1)
retrived
retrieved relevant
recall (2.2)
relevant
25
Chapter 2
Actual Class
Document set (D)
Relevant Not-Relevant
Retrieved True Positive False Positive
Predicted (correct result) (irrelevant result found)
Class Not- False Negative True Negative
Retrieved (Missing result) (irrelevant result not found)
True negatives are not returned by the system and are considered
irrelevant by the human expert.
False negatives are documents relevant to the query which are not
found by the system. The value of precision ranges from 0 to 1, where 0
means that no relevant document is retrieved and 1 means that all the
retrieved documents are relevant. The value of recall also ranges from 0 to
1, where 0 means no relevant documents have been retrieved and 1 means
26
Plagiarism Concepts
all relevant documents have been retrieved. Very high recall often results in
low precision , which is not desirable. Combining precision and recall is
proposed to overcome this problem .
(1 2 ). p.r
F (2.3)
2. p r
2. p.r
F1 (2.4)
pr
27
Chapter 2
…….…….
28