Captcha-Image Processing

International Conference on Science, Technology, Education, Arts, Management and Social Sciences
iSTEAMS Research Nexus Conference, Afe Babalola University, Ado-Ekiti, Nigeria, May, 2014.
USING IMAGE DEGRADATION AND HUMAN SIGNATURE

TO FORTIFY CAPTCHA SYSTEMS
1
Uyinomen O. Ekong & Victor E. Ekong

Department of Computer Science
University of Uyo
Uyo, Akwa Ibom State, Nigeria
uyiekong@yahoo.com
Stella C. Chiemeke
University of Benin
Benin City, Nigeria
Olumide B. Longe
Adeleke University
Ede, Osun State, Nigeria
Corresponding Author - uyiekong@yahoo.com
ABSTRACT
User authentication has gained apt attention over the years as a mechanism for securing web resources from unwanted
access. Completely Automated Public Turing Test to tell Computers and Human Apart (CAPTCHA) are test used to control
automated program (bot) access to web services. They are security mechanism test used to verify who gains access to a
website resources, whether human or automated bots. CAPTCHAs are meant to be easily solved by humans while it remains
difficult for automated bots to solve. The test requires a user to correctly respond to a question or perform some kind of
functions to prove their identity. Different CAPTCHAs have been developed over the years to control unwanted programs
having access to web resources. However, some of them have been broken with about 100% success rates, thus necessitating
the need for a new CAPTCHA scheme to enhance the existing ones. This paper presents a Signature CAPTCHA (SigCAPTCHA) as a new image degradation scheme which employs the use of handwritten signatures on document images to
produce a hard to break CAPTCHA test. The system is tested with a state-of-the-art optical character recognition (OCR)
program and a human study to determine the recognizability gap between human and automated programs. The OCR
program and a human user study were used to recognize 100 randomly rendered Sig-CAPTCHA images. Results obtained
from the test showed that 96.12 % of 103 human users passed the test and the OCR programs had 14.0% recognition rate out
of the 100 Sig-CAPTCHA tests presented. The Sig-CAPTCHA provided an effective way to degrade character images for
CAPTCHA designs making it an effective way to curb automated program and providing security to web services and
applications.
Keywords: CAPTCHA, Sig-CAPTCHA, Recognition, Degradation, Security
1. INTRODUCTION
Automated script attacks known as bots are the central threat to computer security (Xu et al., 2003; Longe et al., 2009). A
bot as defined by Rui and Liu (2010) is any type of autonomous software that operates as an agent for a user or a program or
simulates a human activity. Some examples of bots include spambots, shopbots, spiderbots, chatbots, knowbots, and many
others. Bots are considered as problem when used to access a web users account unauthorized. Malicious codes such as
viruses, spywares malwares and many more vulnerabilities have been developed to constitute menace such as
eavesdropping, destroying and transferring of vital information to unknown destination on the web and the services they
render. Completely Automated Public Turing Test to tell Computers and Human Apart (CAPTCHA) are test used to control
automated bot access to web services. They are security mechanism test used to verify who gains access to website
resources, whether human or automated bot. CAPTCHAs are meant to be easily solved by human while it remains difficult
for automated bots. CAPTCHA is based on text, audio, image or video which appear as twisted and/or in distorted forms.
Existing CAPTCHA algorithms have inherently been vulnerable to attacks and even been broken with almost 100% success
rate due to lack of proper use of degradation types (Ahmad and Jeff, 2010).
Annotations such as comments, seals, marks, highlights and signatures placed directly on top of characters of paper
document have been considered a source of noise which significantly degrades document images (Sandhya et al., 2012; Lins,
2009). OCR machines fail to recognize the characters beneath such annotations since they are of different orientation and
shapes. Sig-CAPTCHA combines common features of clutter which are usually of different orientations and shapes such as
lines, dots, circles, arcs and rectangles which are in various shapes, thickness, length and width in a single image. Of all the
common documents image defect models engaged in CAPTCHA design human annotations has been rarely explored.
887
This study explores this type of noise and uses them to produce a new technique for CAPTCHA development that is
unrecognizable to automated bots. The paper is divided into five mains sections. Section two discusses some related work.
Section three presents the methodology while Section four shows the result obtained from the tests. Section five concludes
this paper and makes suggestion for further studies.
2. RELATED LITERATURE
A number of text-based CAPTCHAs have been developed over the years. CAPTCHA design requires that some degradation
types be applied to images to make them resistance to automated bot programs. EZ-Gimpy and Gimpy CAPTCHA tests
were developed at Carnegie-Mellon University (CMU) in 2000 and used by Yahoo for free e-mail services and to protect
chat rooms from spammers (Chellapilla et al., 2005; Banday and Shah, 2009). This type of CAPTCHA was extensively
adopted on many web sites 2010 when it was broken by Greg Mori and Jitendra Malik from the University of California in
Berkeley Computer Vision Group due to limited words in its dictionary (860 words) with 92% success rate using generic
object recognition problems for EZ-Gimpy and 33% success rate for Gimpy (Basso and Bergadano, 2010). Gimpy
CAPTCHAs are an improvement on EZ-Gimpy. Gimpy are rendered in various free type font which shows five pairs of
overlapping words, three of which a user must identify and degradation (background grids and gradients, non-linear
deformations, blurring, occlusions, and additive pixel noise) is performed using the Gimpy tools (Chew and Baird, 2003).
BaffleText is another text based CAPTCHA that served as an enhanced version of Gimpy CAPTCHAs with about 2758
character string. Chew and Baird (2003) and Basso and Bergadano (2010) describe this kind of CAPTCHA as a test based on
the psychophysics of human reading and uses random masking to degrade images of non-English-like pronounceable
character string to defend against restoration attacks. BaffleText is considered to be highly secure but has the problem of
high rate of human legibility. Another type of text-based CAPTCHA developed in 2008 is the reCAPTCHA. reCAPTCHA is
considered one of the most popular CAPTCHA in use today (Azad and Jain, 2013). It is most widely used by millions of
Facebook, Twitter and other social network users to protect their sites. It is considered a more secured CAPTCHA
developed by Ahn et al., (2008). It utilizes words that were first scanned for book digitization projects. The test is based on
two different words taken from digitalized texts; one is not recognizable by OCR software, whilst for the other the answer is
known. Both words are inserted in an image which is then visually distorted. The user is required to type both the words in
an input field. If he/she solves the one for which the answer is known, the system assumes that the other answer is correct.
The same image is then presented to other users in different CAPTCHAs to increase the confidence of the correct answer
(Basso and Bergadano, 2010).
The Scattertext type challenges are images of machine-print text whose characters are cut into pieces which then drift apart,
in an attempt to frustrate segment-then-recognize computer vision attacks (Baird et al., 2005). This type of CAPTCHA was
proposed to defeat the segmentation attacks by visually shattering each letter using horizontal and vertical cuts into pieces
and overlapping them randomly. Their security level is very high. However, the major problem is that the human legibility
rate is about 53%. Pessimal Print is a type of text based CAPTCHA that simulates dirty scans of printed text that has been
proven to be extremely difficult to break. Baird, Fateson and Coates developed the model at UCB (Coates et al., 2001). They
used a model of document image degradations that approximates ten aspects of the physics of machine printing and imaging
of text, including spatial sampling rate and error, affine spatial deformations, jitter, speckle, blurring, thresholding, and
symbol size (Baird, 1993; Baird and Popat, 2002).
In Rusu et al., (2010) handwritten based text based CAPTCHA is developed. This form of CAPTCHA collects existing
United States of America city names or synthetically generated US city names and applies a number of degradation types
such as lines, grids, fragmentation, gap, arcs, waves, stretching, rotation, strokes, compression, blur, occlusion and noise to
develop the CAPTCHA test. To leverage on the human recognizability of the test, Gestalt and Geon theory of human visual
perception to images were employed to determine the proper placement of various degradation types. Although handwritten
CAPTCHA test was able to obfuscate computer recognizers, as a result of the variation in handwritings, the scheme still
posed a lot of challenges to human recognition. Human users required considerable amount of time and focus to be able to
effectively recognize the CAPTHCA image despite the full consideration of human visual perception to images recognition
that was employed. Also by using only US city names made the scheme vulnerable to any bot attack since the knowledge of
all US city names is all the attacker requires in other to gain access to the web resources using a simple random guessing
algorithm. Figure 1 presents samples of the different CAPTCHA discussed.
888
EZ-Gimpy
Gimpy CAPTCHA
reCAPTCHA
Baffletext CAPTCHA
Scattertext CAPTCHA
PessimalPrint CAPTCHA
Handwritten CAPTCHA
Figure 1: Variations of Text Based CAPTCHA Test

3. METHODOLOGY
A web site was developed to implement the Sig-CAPTCHA test. To determine the recognizability gap between human and
automated programs the test was carried out in two folds;
a. A controlled user study was conducted to collect challenge-responses from human users. This was to measure the
level of recognizability of the Sig-CAPTCHA image.
b. To test the efficacy of the scheme on machine, we subjected it to attack by using an OCR program called ABBYY
Fine Reader 12 to ascertain the level of recognition by automated programs.
The human handwritten signatures are scanned to a file using a scanner. These signatures are stored in a database and
randomly selected and applied to the already existing character images generated. Distortions and clutters are added as an
anti-segmentation mechanism to further enhance the un-recognizability of the CAPTCHA image by automated bots. The
resulting Sig-CAPTCHA are usually of different orientations and shapes comprising of varying lines, dots, circles arcs,
rectangle, of various shapes, thickness, length and width. Some signatures that could confuse human readability and
recognition were screened out. These signatures include signatures with numbers, characters, and strokes with high density,
and signatures with high rate of scramble in concentrated areas. Figure 2 shows the components that make up the SigCAPTCHA.
Randomly
generated
characters
Apply
signature
Cognitive
Aspect
Weakness of
OCR
Signature
CAPTCHA
Deform the
character
Scan or collected from

the web, digital
human handwritten
signatures
Figure 2: Components of Sig-CAPTCHA
889
Signature
CAPTCHA
application
4. DISCUSSION AND RESULTS

The Sig-CAPTCHA was implemented and tested online using a website. Figure 3 shows some sample CAPTCHAs that
were developed. The characters presented were randomly generated and the signatures presented at the background or
foregrounds of the image were scanned and synthetically generated. A total of 103 users participated in the test where they
were allowed to use the CAPTCHA website to enter responses on the test presented. The users comprised of 44 female and
59 male within an age range of 18 and 65 years.
Eighty-one (81) users had normal or corrected-to-normal vision, while twenty-two (22) use glasses. The profession of
participants ranged from undergraduate students (43), postgraduate (2), civil Servants (22), businessmen (13) and others
(23). All the participants had Internet experience, 80 had one form of online security or the other and 23 had no online
security experience. 96.12% of the human users passed the Sig-CAPTCHA test while 3.88% were unable to recognize it.
One hundred (100) randomly selected Sig-CAPTCHA images were scanned using a state-of-the-art OCR called ABBYY
FineReader version 12 in order to recognize the posed challenges. Figure 4 (a) to (c) show some scenarios of Sig-CAPTCHA
on ABBYY FineReader OCR program. The test show the use of human signatures as background clutter with no other form
of degradation applied. Only 14 of the tested Sig-CAPTCHA representing 14% were fully recognized with their individual
characters well segmented by the OCR Program, while 86% were unrecognized.
This indicated that Sig-CAPTCHA had anti-preprocessing, anti-segmentation, inability to remove defects and antirecognition ability when used without other defects. Sig-CAPTCHA thus represents a good candidate for mitigating
automated bot recognition ability when used alone or in combination with other distortion/clutter types. Adding signatures as
background clutter drops the accuracy rate of recognizers since it has the ability to distort the image pixels which conversely
affects the characters. Table1 and Figure 5 show the recognition and non-recognizability of Sig-CAPTCHA on human users
and OCR program.
Figure 3: Some Sample Sig-CAPTCHA.

(a)
(b)
(c)
Figure 4: ABBYY FineReader Recognition on Sig-CAPTCHA
890
Table 1: Ability Gap between Human Users and OCR Programs on Sig-CAPTCHA
Sig-CAPTCHA
Recognized
Characters
14
99
Automated Program
Human Users
Percentage (%)
14%
96.12%
Unrecognized
Characters
86
4
Percentage (%)
86%
3.88%
96.12%
86%
14%
3.88%
Figure 5: Graph showing the recognizability Gap between Human Users and OCR programs on Sig-CAPTCHA
5. CONCLUSION
In this study, we have explored a novel CAPTCHA scheme that serves as an additional degradation type to improve on the
conventional defects types employed in CAPTCHA development using handwritten signatures. Handwritten signatures
combines variety of common defects types such as circle, lines, curves, and many more to automatically generate defects
comprising of varying length, width, and density. We have shown that Sig-CAPTCHA can degrade character images thereby
standing as a new mechanism to obfuscate automated programs. However, using Signatures alone to resist automated bot
may not advised. Further research should be geared towards combining Signature degradation type with other defect types to
produce a more resistance to automated bot attack on CAPTCHA schemes thereby enhancing web security.
REFERENCES
1.
Ahn von, L., Maurer B., McMillen C., Abrahan D. and Blum, M. (2008). reCAPTCHA: Human-Based Character
Recognition via Web Security Measures, In Science Express, Vol. 321, No. 5895, pp. 1465-1468.
2.
Ahmad A.S. and Yan J. (2010) Colour, Usability and Security: A Case study. Technical Report Series, Newcastle
University, England.
3.
Azad, S. and Jain, K. (2013). CAPTCHA: Attacks and Weaknesses against OCR Technology, Global Journal of
Computer Science and Technology, Vol. 13, No.3, version 110, ISSN 0975-4172.
4.
Baird, H.S (1993). Document Image Defect Models and Their Uses, Proceedings. of International Conference on
Document Analysis and Recognition, Tsukuba Science City. Launa, pp. 62-66.
5.
Baird, H.S. and Popat K. (2002). Web Security and Document Image Analysis, In Web Document Analysis:
Challenges and Opportunities, Antonacopoulos A. and Hu J. (eds.), World Scientific Publishing Co., pp.257-272.
6.
Baird, H.S., Moll, M.A and Wang, S. (2005). ScatterType: A Legible but Hard-to-Segment CAPTCHA, In
Proceedings of the Eight International Conference on Document Analysis and Recognition (ICDAR05).
7.
Banday, M.T. and Shah, N. A. (2009). Image Flip CAPTCHA, International The ISC Journal of Information
Security, ISSN 2008-2045 AND 2008-3076, Theran, Iran, Vol. 1, No. 2, pp. 103-121. Available online at
http://www.isecure-journal.org.
8.
Basso, A and Bergadano, F. (2010). Anti-bot Strategies Based on Human Interactive Proofs, In Handbook of
Information and Communication Security, Stavroulakis, P. and Stamp, M. (eds.), DOI 10.1007/978-1-84882-2684-7, ISBN 078-3-642-04116-7, Springer-Verlag Berlin Heidelberg, pp.273-291.
9.
Chellapila, K., Larson, K. Simard, P. and Czerwinski, M. (2005a). Designing Human Friendly Human Interaction
Proofs (HIPs), CHI2005: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems,
Portland, Oregon, USA, pp. 711-720.
10.
Chew, M. and Baird, H.S. (2003). Baffletext: A Human Interactive Proof, Proc. of the 10th SPIE/IS&T Document
Recognition and Retrieval Conference, Vol. 4670, SPIE, Santa Clara.
11.
Coates, A.L., Baird, H.S. and Fateman, R.J. (2001). Pessimal Print: A Reverse Turing Test, Proc. of the 6th
International Conference on Document Analysis and Recognition (ICDAR 2001), Seattle, WA, USA.
12.
Lins, R.D. (2009). A Taxonomy of Noise in Images of Paper Documents The Physical Noise, Image Analysis
and Recognition, pp. 844-854, Springer.
891
13.
14.
15.
16.
17.
Longe, O.B, Robert, A.B.C., and Ugochukwu, O. (2009). Double CAPTCHA Response system: An enhanced
Authentication Scheme for Checking Internet Masquerading, Intl Conf. on Adaptive Science and Tech. Available
online at: http://www.edictech.com/ICAST09/ICAST2009Program.pdf.
Rui, Y and Liu, Z (2010) System and method for devising a human interactive proof that determines whether a
remote client is a human or a computer program, Available online at: http://www.google.com/patents/US7725395
Rusu, A., Rebecca, D., and Rusu, A. (2010). Leveraging Cognitive Factors in Securing WWW with CAPTCHA,
Proceedings of the 2010 USENIX Conference on Web Application Development, Boston, MA, pp. 5-15.
Sandhya, N., Krishnan, R., Babu, D. R., (2012). A Language Independent Characterization of Document Image
Noise in Historical Scripts, International Journal of Computer Applications (0975-8887), Vol. 50, No. 9, pp 11-18.
Xu, J., Lipton, R., Essa I., and Sung, M. (2003). Mandatory Human Participation: A new Authentication Scheme
for Building Secure System, Proceeding of the 12th International Conference of Computer
communication and Networks, pp. 547-552.
AUTHORS BIOGRAPHIES
Uyinomen O. Ekong is a Lecturer at the Department of Computer Science, University of Uyo,
Akwa Ibom State. She obtained a B.Sc degree in Computer Science from Ambrose Alli University,
Edo State Nigeria in 2002, and an M.Sc. degree in Management Information Systems (MIS)
specializing in mobile computing from Covenant University, Ota, Nigeria in 2006. She is currently
working towards the Ph.D degree in the Department of Computer Science, University of Benin,
Benin City Nigeria. Her current research interest includes; Application of Artificial Intelligence
methods and techniques for network security, Human Interactive Proofs, Mobile Computing, Ecommerce and E-government. She is a member of Nigeria Computer Society (NCS), and Institute
of Electrical and Electronic Engineers (IEEE). She can be reached by phone on +2348051035870
and through e-mail at uyinomenekong@uniuyo.edu.ng or uyiekong@yahoo.com .
Professor (Mrs.) Stella C. Chiemeke is a Professor of Computer Science and currently the
Director of UNIBEN ICT Centre, University of Benin, Benin City, Nigeria. She received her B.Sc.
and M.Sc. in Computer Science from University of Lagos in 1986 and 1992 respectively. She also
obtained her PhD in Computer Science from the Federal University of Technology, Akure, Nigeria
in 2004. Prof. (Mrs.) Chiemeke joined the services of the University of Benin as an Assistant
Lecturer in the Department of Computer Science in the year 1994 and then rose to the current
position of a Professor in the year 2009. Her teaching and research interests spans from software
engineering to industrial application of ICT. She has authored and co-authored over sixty five (65)
articles in reputable local, national and international journals. She is a member of the International Association of Engineers
(IAENG), Computer Professional of Nigeria (CPN), Nigeria Computer Society (NCS), International Network for Woman
Engineers and Scientists (INWES) Canada etc. She has served in various administrations capacities within and outside the
University Communities ranging from Acting Head of Department of Computer Science, Assistant Dean of the Faculty of
Physical Sciences etc. She can be reached by phone on +2348023158911 and through E-mail at schiemeke@uniben.edu
Dr. Longe Olumide is an Associate Professor of Computing & Information Security at the
Department of Computer Science and Information Systems, Adeleke University, Ede, State of
Osun, Nigeria. He obtained a BSc Computer Science at the University of Benin, Benin City,
Nigeria in 1998, a Master of Technology Degree in Computer Science at the Federal University of
Technology, Akure in 2005 and a PhD Degree in Computer Science from the University of Benin,
Benin City, Nigeria in 2010. A recipient of several International and National awards and
recognitions, his research is focused on using social theories, machine learning and computer
security models to design cyber security systems and explain cyber victimization. Dr. Longe is a
distinguished Fulbright Scholar and was named a Marquis Who is Who in the World in 2014. He
can be reached by phone on +18572078409 and through E-mail at longeolumide@fulbrightmail.org.
Victor Eshiet Ekong is a Lecturer in the Department of Computer Science, University of Uyo,
Akwa Ibom State. He obtained a B.Sc degree in Computer Science from University of Uyo in 1998
and M.Sc. degree in Computer Science from University of Benin in 2003. He is currently working
towards obtaining a Ph.D degree in Computer Science in the Department of Computer Science,
University of Benin, Benin City. His research interest includes; Artificial Intelligence, Software
Engineering, Applied Computational intelligence and Cognitive Science, Design theories for
medical informatics and e-Health systems. He is a member of the Nigerian Computer Society
(NCS), Computer Professionals (Registration Council) of Nigeria (CPN), Institute of Electrical and Electronics Engineers
(IEEE) and International Association of Engineers (IAENG). He can be reached by phone on +2348056043359 and through
e-mail at victoreekong@uniuyo.edu.ng.
892

Captcha-Image Processing

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Captcha-Image Processing

Uploaded by

Copyright:

Available Formats

International Conference on Science, Technology, Education, Arts, Management and Social Sciences

USING IMAGE DEGRADATION AND HUMAN SIGNATURE

Uyinomen O. Ekong & Victor E. Ekong

Corresponding Author - uyiekong@yahoo.com

Figure 1: Variations of Text Based CAPTCHA Test

Scan or collected from

4. DISCUSSION AND RESULTS

Figure 3: Some Sample Sig-CAPTCHA.

Figure 4: ABBYY FineReader Recognition on Sig-CAPTCHA

You might also like