Professional Documents
Culture Documents
iSTEAMS Research Nexus Conference, Afe Babalola University, Ado-Ekiti, Nigeria, May, 2014.
ABSTRACT
User authentication has gained apt attention over the years as a mechanism for securing web resources from unwanted
access. Completely Automated Public Turing Test to tell Computers and Human Apart (CAPTCHA) are test used to control
automated program (bot) access to web services. They are security mechanism test used to verify who gains access to a
website resources, whether human or automated bots. CAPTCHAs are meant to be easily solved by humans while it remains
difficult for automated bots to solve. The test requires a user to correctly respond to a question or perform some kind of
functions to prove their identity. Different CAPTCHAs have been developed over the years to control unwanted programs
having access to web resources. However, some of them have been broken with about 100% success rates, thus necessitating
the need for a new CAPTCHA scheme to enhance the existing ones. This paper presents a Signature CAPTCHA (SigCAPTCHA) as a new image degradation scheme which employs the use of handwritten signatures on document images to
produce a hard to break CAPTCHA test. The system is tested with a state-of-the-art optical character recognition (OCR)
program and a human study to determine the recognizability gap between human and automated programs. The OCR
program and a human user study were used to recognize 100 randomly rendered Sig-CAPTCHA images. Results obtained
from the test showed that 96.12 % of 103 human users passed the test and the OCR programs had 14.0% recognition rate out
of the 100 Sig-CAPTCHA tests presented. The Sig-CAPTCHA provided an effective way to degrade character images for
CAPTCHA designs making it an effective way to curb automated program and providing security to web services and
applications.
Keywords: CAPTCHA, Sig-CAPTCHA, Recognition, Degradation, Security
1. INTRODUCTION
Automated script attacks known as bots are the central threat to computer security (Xu et al., 2003; Longe et al., 2009). A
bot as defined by Rui and Liu (2010) is any type of autonomous software that operates as an agent for a user or a program or
simulates a human activity. Some examples of bots include spambots, shopbots, spiderbots, chatbots, knowbots, and many
others. Bots are considered as problem when used to access a web users account unauthorized. Malicious codes such as
viruses, spywares malwares and many more vulnerabilities have been developed to constitute menace such as
eavesdropping, destroying and transferring of vital information to unknown destination on the web and the services they
render. Completely Automated Public Turing Test to tell Computers and Human Apart (CAPTCHA) are test used to control
automated bot access to web services. They are security mechanism test used to verify who gains access to website
resources, whether human or automated bot. CAPTCHAs are meant to be easily solved by human while it remains difficult
for automated bots. CAPTCHA is based on text, audio, image or video which appear as twisted and/or in distorted forms.
Existing CAPTCHA algorithms have inherently been vulnerable to attacks and even been broken with almost 100% success
rate due to lack of proper use of degradation types (Ahmad and Jeff, 2010).
Annotations such as comments, seals, marks, highlights and signatures placed directly on top of characters of paper
document have been considered a source of noise which significantly degrades document images (Sandhya et al., 2012; Lins,
2009). OCR machines fail to recognize the characters beneath such annotations since they are of different orientation and
shapes. Sig-CAPTCHA combines common features of clutter which are usually of different orientations and shapes such as
lines, dots, circles, arcs and rectangles which are in various shapes, thickness, length and width in a single image. Of all the
common documents image defect models engaged in CAPTCHA design human annotations has been rarely explored.
887
International Conference on Science, Technology, Education, Arts, Management and Social Sciences
iSTEAMS Research Nexus Conference, Afe Babalola University, Ado-Ekiti, Nigeria, May, 2014.
This study explores this type of noise and uses them to produce a new technique for CAPTCHA development that is
unrecognizable to automated bots. The paper is divided into five mains sections. Section two discusses some related work.
Section three presents the methodology while Section four shows the result obtained from the tests. Section five concludes
this paper and makes suggestion for further studies.
2. RELATED LITERATURE
A number of text-based CAPTCHAs have been developed over the years. CAPTCHA design requires that some degradation
types be applied to images to make them resistance to automated bot programs. EZ-Gimpy and Gimpy CAPTCHA tests
were developed at Carnegie-Mellon University (CMU) in 2000 and used by Yahoo for free e-mail services and to protect
chat rooms from spammers (Chellapilla et al., 2005; Banday and Shah, 2009). This type of CAPTCHA was extensively
adopted on many web sites 2010 when it was broken by Greg Mori and Jitendra Malik from the University of California in
Berkeley Computer Vision Group due to limited words in its dictionary (860 words) with 92% success rate using generic
object recognition problems for EZ-Gimpy and 33% success rate for Gimpy (Basso and Bergadano, 2010). Gimpy
CAPTCHAs are an improvement on EZ-Gimpy. Gimpy are rendered in various free type font which shows five pairs of
overlapping words, three of which a user must identify and degradation (background grids and gradients, non-linear
deformations, blurring, occlusions, and additive pixel noise) is performed using the Gimpy tools (Chew and Baird, 2003).
BaffleText is another text based CAPTCHA that served as an enhanced version of Gimpy CAPTCHAs with about 2758
character string. Chew and Baird (2003) and Basso and Bergadano (2010) describe this kind of CAPTCHA as a test based on
the psychophysics of human reading and uses random masking to degrade images of non-English-like pronounceable
character string to defend against restoration attacks. BaffleText is considered to be highly secure but has the problem of
high rate of human legibility. Another type of text-based CAPTCHA developed in 2008 is the reCAPTCHA. reCAPTCHA is
considered one of the most popular CAPTCHA in use today (Azad and Jain, 2013). It is most widely used by millions of
Facebook, Twitter and other social network users to protect their sites. It is considered a more secured CAPTCHA
developed by Ahn et al., (2008). It utilizes words that were first scanned for book digitization projects. The test is based on
two different words taken from digitalized texts; one is not recognizable by OCR software, whilst for the other the answer is
known. Both words are inserted in an image which is then visually distorted. The user is required to type both the words in
an input field. If he/she solves the one for which the answer is known, the system assumes that the other answer is correct.
The same image is then presented to other users in different CAPTCHAs to increase the confidence of the correct answer
(Basso and Bergadano, 2010).
The Scattertext type challenges are images of machine-print text whose characters are cut into pieces which then drift apart,
in an attempt to frustrate segment-then-recognize computer vision attacks (Baird et al., 2005). This type of CAPTCHA was
proposed to defeat the segmentation attacks by visually shattering each letter using horizontal and vertical cuts into pieces
and overlapping them randomly. Their security level is very high. However, the major problem is that the human legibility
rate is about 53%. Pessimal Print is a type of text based CAPTCHA that simulates dirty scans of printed text that has been
proven to be extremely difficult to break. Baird, Fateson and Coates developed the model at UCB (Coates et al., 2001). They
used a model of document image degradations that approximates ten aspects of the physics of machine printing and imaging
of text, including spatial sampling rate and error, affine spatial deformations, jitter, speckle, blurring, thresholding, and
symbol size (Baird, 1993; Baird and Popat, 2002).
In Rusu et al., (2010) handwritten based text based CAPTCHA is developed. This form of CAPTCHA collects existing
United States of America city names or synthetically generated US city names and applies a number of degradation types
such as lines, grids, fragmentation, gap, arcs, waves, stretching, rotation, strokes, compression, blur, occlusion and noise to
develop the CAPTCHA test. To leverage on the human recognizability of the test, Gestalt and Geon theory of human visual
perception to images were employed to determine the proper placement of various degradation types. Although handwritten
CAPTCHA test was able to obfuscate computer recognizers, as a result of the variation in handwritings, the scheme still
posed a lot of challenges to human recognition. Human users required considerable amount of time and focus to be able to
effectively recognize the CAPTHCA image despite the full consideration of human visual perception to images recognition
that was employed. Also by using only US city names made the scheme vulnerable to any bot attack since the knowledge of
all US city names is all the attacker requires in other to gain access to the web resources using a simple random guessing
algorithm. Figure 1 presents samples of the different CAPTCHA discussed.
888
International Conference on Science, Technology, Education, Arts, Management and Social Sciences
iSTEAMS Research Nexus Conference, Afe Babalola University, Ado-Ekiti, Nigeria, May, 2014.
EZ-Gimpy
Gimpy CAPTCHA
reCAPTCHA
Baffletext CAPTCHA
Scattertext CAPTCHA
PessimalPrint CAPTCHA
Handwritten CAPTCHA
Randomly
generated
characters
Apply
signature
Cognitive
Aspect
Weakness of
OCR
Signature
CAPTCHA
Deform the
character
889
Signature
CAPTCHA
application
International Conference on Science, Technology, Education, Arts, Management and Social Sciences
iSTEAMS Research Nexus Conference, Afe Babalola University, Ado-Ekiti, Nigeria, May, 2014.
(b)
(c)
890
International Conference on Science, Technology, Education, Arts, Management and Social Sciences
iSTEAMS Research Nexus Conference, Afe Babalola University, Ado-Ekiti, Nigeria, May, 2014.
Table 1: Ability Gap between Human Users and OCR Programs on Sig-CAPTCHA
Sig-CAPTCHA
Recognized
Characters
14
99
Automated Program
Human Users
Percentage (%)
14%
96.12%
Unrecognized
Characters
86
4
Percentage (%)
86%
3.88%
96.12%
86%
14%
3.88%
Figure 5: Graph showing the recognizability Gap between Human Users and OCR programs on Sig-CAPTCHA
5. CONCLUSION
In this study, we have explored a novel CAPTCHA scheme that serves as an additional degradation type to improve on the
conventional defects types employed in CAPTCHA development using handwritten signatures. Handwritten signatures
combines variety of common defects types such as circle, lines, curves, and many more to automatically generate defects
comprising of varying length, width, and density. We have shown that Sig-CAPTCHA can degrade character images thereby
standing as a new mechanism to obfuscate automated programs. However, using Signatures alone to resist automated bot
may not advised. Further research should be geared towards combining Signature degradation type with other defect types to
produce a more resistance to automated bot attack on CAPTCHA schemes thereby enhancing web security.
REFERENCES
1.
Ahn von, L., Maurer B., McMillen C., Abrahan D. and Blum, M. (2008). reCAPTCHA: Human-Based Character
Recognition via Web Security Measures, In Science Express, Vol. 321, No. 5895, pp. 1465-1468.
2.
Ahmad A.S. and Yan J. (2010) Colour, Usability and Security: A Case study. Technical Report Series, Newcastle
University, England.
3.
Azad, S. and Jain, K. (2013). CAPTCHA: Attacks and Weaknesses against OCR Technology, Global Journal of
Computer Science and Technology, Vol. 13, No.3, version 110, ISSN 0975-4172.
4.
Baird, H.S (1993). Document Image Defect Models and Their Uses, Proceedings. of International Conference on
Document Analysis and Recognition, Tsukuba Science City. Launa, pp. 62-66.
5.
Baird, H.S. and Popat K. (2002). Web Security and Document Image Analysis, In Web Document Analysis:
Challenges and Opportunities, Antonacopoulos A. and Hu J. (eds.), World Scientific Publishing Co., pp.257-272.
6.
Baird, H.S., Moll, M.A and Wang, S. (2005). ScatterType: A Legible but Hard-to-Segment CAPTCHA, In
Proceedings of the Eight International Conference on Document Analysis and Recognition (ICDAR05).
7.
Banday, M.T. and Shah, N. A. (2009). Image Flip CAPTCHA, International The ISC Journal of Information
Security, ISSN 2008-2045 AND 2008-3076, Theran, Iran, Vol. 1, No. 2, pp. 103-121. Available online at
http://www.isecure-journal.org.
8.
Basso, A and Bergadano, F. (2010). Anti-bot Strategies Based on Human Interactive Proofs, In Handbook of
Information and Communication Security, Stavroulakis, P. and Stamp, M. (eds.), DOI 10.1007/978-1-84882-2684-7, ISBN 078-3-642-04116-7, Springer-Verlag Berlin Heidelberg, pp.273-291.
9.
Chellapila, K., Larson, K. Simard, P. and Czerwinski, M. (2005a). Designing Human Friendly Human Interaction
Proofs (HIPs), CHI2005: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems,
Portland, Oregon, USA, pp. 711-720.
10.
Chew, M. and Baird, H.S. (2003). Baffletext: A Human Interactive Proof, Proc. of the 10th SPIE/IS&T Document
Recognition and Retrieval Conference, Vol. 4670, SPIE, Santa Clara.
11.
Coates, A.L., Baird, H.S. and Fateman, R.J. (2001). Pessimal Print: A Reverse Turing Test, Proc. of the 6th
International Conference on Document Analysis and Recognition (ICDAR 2001), Seattle, WA, USA.
12.
Lins, R.D. (2009). A Taxonomy of Noise in Images of Paper Documents The Physical Noise, Image Analysis
and Recognition, pp. 844-854, Springer.
891
International Conference on Science, Technology, Education, Arts, Management and Social Sciences
iSTEAMS Research Nexus Conference, Afe Babalola University, Ado-Ekiti, Nigeria, May, 2014.
13.
14.
15.
16.
17.
Longe, O.B, Robert, A.B.C., and Ugochukwu, O. (2009). Double CAPTCHA Response system: An enhanced
Authentication Scheme for Checking Internet Masquerading, Intl Conf. on Adaptive Science and Tech. Available
online at: http://www.edictech.com/ICAST09/ICAST2009Program.pdf.
Rui, Y and Liu, Z (2010) System and method for devising a human interactive proof that determines whether a
remote client is a human or a computer program, Available online at: http://www.google.com/patents/US7725395
Rusu, A., Rebecca, D., and Rusu, A. (2010). Leveraging Cognitive Factors in Securing WWW with CAPTCHA,
Proceedings of the 2010 USENIX Conference on Web Application Development, Boston, MA, pp. 5-15.
Sandhya, N., Krishnan, R., Babu, D. R., (2012). A Language Independent Characterization of Document Image
Noise in Historical Scripts, International Journal of Computer Applications (0975-8887), Vol. 50, No. 9, pp 11-18.
Xu, J., Lipton, R., Essa I., and Sung, M. (2003). Mandatory Human Participation: A new Authentication Scheme
for Building Secure System, Proceeding of the 12th International Conference of Computer
communication and Networks, pp. 547-552.
AUTHORS BIOGRAPHIES
Uyinomen O. Ekong is a Lecturer at the Department of Computer Science, University of Uyo,
Akwa Ibom State. She obtained a B.Sc degree in Computer Science from Ambrose Alli University,
Edo State Nigeria in 2002, and an M.Sc. degree in Management Information Systems (MIS)
specializing in mobile computing from Covenant University, Ota, Nigeria in 2006. She is currently
working towards the Ph.D degree in the Department of Computer Science, University of Benin,
Benin City Nigeria. Her current research interest includes; Application of Artificial Intelligence
methods and techniques for network security, Human Interactive Proofs, Mobile Computing, Ecommerce and E-government. She is a member of Nigeria Computer Society (NCS), and Institute
of Electrical and Electronic Engineers (IEEE). She can be reached by phone on +2348051035870
and through e-mail at uyinomenekong@uniuyo.edu.ng or uyiekong@yahoo.com .
Professor (Mrs.) Stella C. Chiemeke is a Professor of Computer Science and currently the
Director of UNIBEN ICT Centre, University of Benin, Benin City, Nigeria. She received her B.Sc.
and M.Sc. in Computer Science from University of Lagos in 1986 and 1992 respectively. She also
obtained her PhD in Computer Science from the Federal University of Technology, Akure, Nigeria
in 2004. Prof. (Mrs.) Chiemeke joined the services of the University of Benin as an Assistant
Lecturer in the Department of Computer Science in the year 1994 and then rose to the current
position of a Professor in the year 2009. Her teaching and research interests spans from software
engineering to industrial application of ICT. She has authored and co-authored over sixty five (65)
articles in reputable local, national and international journals. She is a member of the International Association of Engineers
(IAENG), Computer Professional of Nigeria (CPN), Nigeria Computer Society (NCS), International Network for Woman
Engineers and Scientists (INWES) Canada etc. She has served in various administrations capacities within and outside the
University Communities ranging from Acting Head of Department of Computer Science, Assistant Dean of the Faculty of
Physical Sciences etc. She can be reached by phone on +2348023158911 and through E-mail at schiemeke@uniben.edu
Dr. Longe Olumide is an Associate Professor of Computing & Information Security at the
Department of Computer Science and Information Systems, Adeleke University, Ede, State of
Osun, Nigeria. He obtained a BSc Computer Science at the University of Benin, Benin City,
Nigeria in 1998, a Master of Technology Degree in Computer Science at the Federal University of
Technology, Akure in 2005 and a PhD Degree in Computer Science from the University of Benin,
Benin City, Nigeria in 2010. A recipient of several International and National awards and
recognitions, his research is focused on using social theories, machine learning and computer
security models to design cyber security systems and explain cyber victimization. Dr. Longe is a
distinguished Fulbright Scholar and was named a Marquis Who is Who in the World in 2014. He
can be reached by phone on +18572078409 and through E-mail at longeolumide@fulbrightmail.org.
Victor Eshiet Ekong is a Lecturer in the Department of Computer Science, University of Uyo,
Akwa Ibom State. He obtained a B.Sc degree in Computer Science from University of Uyo in 1998
and M.Sc. degree in Computer Science from University of Benin in 2003. He is currently working
towards obtaining a Ph.D degree in Computer Science in the Department of Computer Science,
University of Benin, Benin City. His research interest includes; Artificial Intelligence, Software
Engineering, Applied Computational intelligence and Cognitive Science, Design theories for
medical informatics and e-Health systems. He is a member of the Nigerian Computer Society
(NCS), Computer Professionals (Registration Council) of Nigeria (CPN), Institute of Electrical and Electronics Engineers
(IEEE) and International Association of Engineers (IAENG). He can be reached by phone on +2348056043359 and through
e-mail at victoreekong@uniuyo.edu.ng.
892