Professional Documents
Culture Documents
By,
Group No: 12
Guided By
Prof. Archana Banait
Dept. of Computer Engineering
Architecture Conclusion
Modules References
Algorithm
Design & Analysis of Duplication Detection Tool for Articles 2
OVERVIEW
Large volume public comment campaigns and web portals
that encourage the public to customize articles produce
many duplicate documents, which increases processing and
storage costs, but is rarely a serious problem.
Filtering near-duplicates out of a collection is thus important
and is particularly challenging in applications that require
them to be filtered out in real-time with high precision.
Our proposed system, using hashing technique with hash
index and similarity will detect articles duplication.
Design & Analysis of Duplication Detection Tool for Articles 3
INTRODUCTION
Electronic media has been developing rapidly nowadays,
resulting in many articles produced online, and thus
duplication detection is needed.
The proposed system will be focusing on articles duplication
detection. In our proposed system, fingerprinting technique
is used with hash index to detect articles duplication.
It will try to conduct an empirical study and summarize few
of most used plagiarism patterns in plagiarism articles.
It works on
probability of the hash
Locality-
Charikar values of vectors, if
4. Sensitive Near Duplication
(2002) these values are not
Hashing
equal then it gives
false result.
This method is
Chowdhury Fingerprinting by hashing all the sensitive to very slight
5. I-Match
et al. (2002) significant tokens changes in the
document.
As it uses Shingling
method and Locality
Near- Sensitive hashing so it
Henzinger Shingling and Locality Sensitive
7. duplicate costly and chances of
(2006) Hashing
Detection false results are
absolute due to
probability.
Figure 1. [1]
Data Pre-
Data Extraction Normalization
processing
Input
Articles
Hashing
algorithm
Figure 4. ER Diagram
Software Requirement:
IDE : Visual Studio
Database: Dataset
Language: C# .Net
OS: Windows 7 and above