Professional Documents
Culture Documents
Putro - Physical Document Validation With Perceptual PDF
Putro - Physical Document Validation With Perceptual PDF
Abstract—validation requirements documents electronically is document cannot be validated electronically because electronic
not only needed for electronic documents. For the specific needs validation mechanisms need hash functions for digitized
of the physical document validation also needs to be done physical documents. Because physically digitized documents
electronically. existing problems, physical documents will always will generate different hash values, it is not possible to validate
have a different hash values each time digitized. Through this physical documents using hash values.
research is reviewed whether perceptual hash can be used for
electronic validation of the physical document. The resulting This study attempts to prove whether perceptual hash can
conclusion of this study, perceptual can hash to use and can be used in electronic validation for physical document. The
detect all modifications that occur in the main information research will be done by trying to validate ten documents with
document. six type of modification. As a tool we develop java application
using one of published perceptual hash algorithm [1].
Keywords—electronic validation; perceptual hash; physical
document II. PERCEPTUAL HASH
I. INTRODUCTION Perceptual hashing algorithms is a fingerprint of a
multimedia file derived from various features from its content.
Validation of data authenticity is a process that certainly we Perceptual hash define two fingerprint similar if both files have
do on every information transaction. According to the oxford similar features. This method different with cryptographic hash
dictionary, validation can be interpreted as The action of function which rely on the avalanche effect of small changes in
checking or proving the validity or accuracy of something. input leading to drastic changes in the output. As shown in Fig.
Document validation used to be done by seeing two documents 1, a perceptual image hashing system consists of four stream
and compare them visually. With this method the accuracy of stages: the Transformation stage, the Feature extraction stage,
validation depends on the ability of the perpetrator visual the Quantization stage and the Compression and Encryption
validation. Another thing that affects the accuracy of the results stage [2].
of validation is objectivity of evidence and quality of data
validated. This is why process of data validation on physical
paper documents often lasts longer because the quality of the
documents that are not properly maintained.
Developments in information technology offers electronic
document solutions. Validation of electronic documents faster
than physical documents and have small possibility of losing
evidence for document validations. The accuracy of the
validation of electronic documents is also very high because it
involves cryptographic hash function algorithms and digital
signatures. With hash function, slightest difference in the
document will be detected. Fig. 1. Perceptual Hash Stages
Hash functions are functions that produce a fingerprint of Step 1 Transformation Stage. The transformation stage
the data inputted. This function will produce a unique value for performs a spatial transformation of the inputted image file
each inputed strings and the it will change if the inputted data involving the Discrete Cosine Transform (DCT) or Discrete
changes. The hash function will produce digital evidence from Wavelet Transform (DWT). Some of the spatial
a digital data, therefore its function will go well if the transformations such as color transformation, smoothing, affine
document to be validated already digitized. transformations, or frequency transformations. The principal
Although electronic business rapidly increase, until now, aim of these transformations is to make all extracted features
not all the administration process runs fully electronic. there depend upon the image pixel values or the their frequency
are still some activities that involve physical document with a coefficients in the frequency space. Conducting DWT in
visual validation. Until now, some institutions provide both perceptual hashing schemes will take just the LL (low low)
physical and electronic documents validation. the physical
subband into process because it is a coarse version of the structural changes become the main reason using perceptual
original image and contains all of the perceptually information. hash for validation of physical document.
Step 2 Feature Extraction Stage. In this stage, the Perceptual hash functions can be categorized into two
perceptual hashing algorithm extracts the image features from categories, that is unkeyed perceptual hash functions and keyed
the transformed image to generate the feature vector of L perceptual hash functions. An unkeyed perceptual hash
features, where L << M x N. At this stage we will get L x p function generates a hash value from an arbitrary string input.
floats because there is L features which each feature can A keyed perceptual hash function generates a hash value h
contain p elements of type float. However, there is still an open from an arbitrary string input and a secret key [2].
question about mappings from DCT or DWT coefficients keep
the essential information about an image for hashing and/or III. DESCRIPTION
mark embedding purpose. In some research they add another In recent years, there has been a growing body of research
features selections at this stage to select then only the most on perceptual image hashing. Study about perceptual hash
pertinent features [1]. The selected features can be presented as increasingly received attention in the literature. Most of the
an intermediate hash vector of K x p floats, where K < L. existing research studies focused on the stage of feature
These addition are statistically make the algorithm more extraction because they believe that extracting a set of robust
resistant against a specific allowed manipulation like the features that refused, and remained relatively constant, the
addition of noise, JPEG compression and filtering. manipulation of the content-preserving and at the same time
Modification of Feature Extraction Stage felt necessary need to detect manipulation of the content change is the most
because the visual features are usually publicly known and can important goal in the system image hashing perceptual [3].
therefore be modified. This might threaten security but in this Different research by comparing Zauner persepectual method
paper we need this so the hash value could be adjusted of verifying the hash [1].
maliciously to match that of another image,including image
with minor modification. Zauner compare the results of 4 content identification
function by using single hash creation function. This study uses
Step 3 Quantization Stage. In the quantization stage, we get a java class created by Zauner and implement it in simple
a quantized intermediate perceptual hash vector which contains applications as a tool for research. The application output hash
K x p components of byte. Uniform quantization applied to values in binary and hexadecimal form. We use hexadecimal
quantize each component of the continuous perceptual hash value to compare original value with the modification one.
vector. There are difference of Uniform quantization from While binary value we use to measure how far the differences.
Adaptive quantization. Uniform quantization is based on the Simple application of Zauner java class is shown in Fig. 2.
interval length of the hash values, while the adaptive
quantization partitioning based on the probability density
function (pdf) of the hash values.
Step 4 Compression and Encryption Stage. The
compression and encryption stage is the final step of a
perceptual hashing system. This stage guarantees both the
system security and the fixed length of the final perceptual
hash. The binary intermediate perceptual hash vector from
previous stage is compressed and encrypted into a short
perceptual hash of fixed size of l bytes, where l << K x p. This
process presents group of bytes that allows image verification
and authentication with perceptual hash . The compression and
encryption stage can be ensured by cryptographic hash
functions, i.e. SHA series that generate the final hash with a
fixed size (hash of 160 bits in case of SHA-1) average. After
that we compute the average value of the AC coefficients in
our 1/16th image. We can do this simply by summing all
values of our image, except the first, then dividing the result by
the size of the image. We ignore the first (or DC) coefficient Fig. 2. Simple Application of Zauner java class
during this calculation because it would most likely distort our
average. To prove that pespectual hash can be used for document
validation, validation process must be performed for original
Perceptual hashes provide capability to comparing the documents and modified documents. The expected result of
similarity of two images quickly. Although This hash this process is the original document will have the same
technique will not detect similarilty at large structural changes, perpectual hash value while the modified document will result
but it does prove useful for certain applications for reverse in a different perceptual hash value. To be processed with a
image searches and other approximate comparisons where we hash perceptual algorithm, the physical document must be
can compute the hashes of images in a database as an offline scanned into an image.
process [3]. The capability of not detect similarilty at large
583
2017 3rd International Conference on Science in Information Technology (ICSITech)
The study was conducted two phases. The first phase by TABLE I. LIST OF IMAGES
comparing the hash result with the original image modification. No Sample Name Information Type
This phase will examine 10 images with 6 type of 1 Image 1 Picture
modification. The modification used is : 2 Image 2 Picture
3 Image 3 Text
1. Content wiping as much as 5%, 4 Image 4 Text
5 Image 5 Text with charts
2. Content wiping as much as 25%, 6 Image 6 Text with charts
7 Image 7 Text with picture
3. Cropping, 8 Image 8 Text with picture
9 Image 9 Certificate
4. Adding content of text as much as 5%, 10 Image 10 Certificate
5. Adding content of text as much as 25%, and
6. Combination of content Adding and content
Wiping.
As the object of research, we use jpeg images that content 5
kind of information : picture, text document, text document
with chart, text document with picture, and certificate. This
kind of information selected because of the purpose of this
study for validating physical documents while in daily practice
we can still face this five kind of documents physicaly uses.
For each image studied, stored with the type of jpg and we
did six kinds of modifications. to modify the applications use
Adobe Photoshop CC 2017 using existing simple feature such
marquee tool, eraser tool, paint bucket tool, horizontal type Fig. 4. Image1.jpg
tool, clear menu navigation and crop. The results are stored in
the form of a jpeg modification also
The second phase of the study focused on the types of
modifications that still produces the same hash value. The
modification is repeated at several different locations in the
image such as empty space, over text data and above the
graphical data.
Phase 1 Phase 2
Testing Testing 1 modification
6 modification on 8 different place
584
2017 3rd International Conference on Science in Information Technology (ICSITech)
Fig. 8. Image5.jpg
585
2017 3rd International Conference on Science in Information Technology (ICSITech)
586
2017 3rd International Conference on Science in Information Technology (ICSITech)
REFERENCES [6] V. Monga and B.L. Evans, Perceptual image hashing via feature points:
performance evaluation and tradeoffs. IEEE Transactions on Image
[1] C. Zauner, "Implementation and benchmarking of perceptual image hash Processing, 15(11), 3452-3465. Chicago, 2006
functions," University of Applied Sciences Hagenberg, Hagenberg.
[7] S.S. Kozat, R. Venkatesan, and M.K. Mihçak, Robust perceptual image
[2] Hadmi, A. Puech, W.A.E. Said, B.A. Ouahman and Abdellah, "A robust hashing via matrix invariants. In Image Processing, 2004. ICIP'04. 2004
and secure perceptual hashing system based on a quantization step International Conference on (Vol. 5, pp. 3443-3446). IEEE. 2004.
analysis," Signal Processing: Image Communication, vol. 28, no. 8, pp. [8] M.K. Mıhçak and R. Venkatesan, New iterative geometric methods for
929-948, 2013. robust perceptual image hashing. In ACM Workshop on Digital Rights
Management(pp. 13-21). Springer, Berlin, Heidelberg. 2001
[3] Jie and Zeng, "A Novel Block-DCT and PCA Based Image Perceptual
Hashing Algorithm," Watermarking, vol. 10, no. 1, pp. 399-403, 2013.
[4] Hadmi, A. Puech, W. Said, B.A.E. Ouahman and A. Ait, "Perceptual
image hashing," Watermarking, vol. 2, pp. 17-42, 2012.
[5] V. Monga, "Perceptually Based Methods for Robust Image Hashing,"
University of Texas, Austin, 2005.
587