Professional Documents
Culture Documents
INTRODUCTION
1.1 Overview
Completely Automated Public Turing Tests [1] to Tell Computers and Humans Apart
[2] (CAPTCHAs) are standard security mechanisms for defending against undesirable
and malicious bot programs on the Internet (especially those bots that can sign up for
thousands of accounts a minute with free email service providers, send out thousands
of spam messages in an instant, or post numerous comments in blogs pointing both
readers and search engines to irrelevant sites). CAPTCHAs generate and grade tests
that most humans can pass but current computer programs cannot. Such tests often
called CAPTCHA challenges are based on hard, open artificial intelligence.
Use of INTERNET has remarkably increased all over the world in the past two
decades and Financial dealing are made over the internet ,so the need of security
increases.
To sign up for a free email service offered by Gmail or Yahoo we need to submit web
application, everyone first have to pass a test. It's not a hard test in fact, that's the
point. For humans, the test should be simple and straightforward. But for a computer,
the test should be almost impossible to solve.
This sort of test is a CAPTCHA. They're also known as a type of Human
Interaction Proof (HIP). CAPTCHA tests are easily seen on lots of Web sites. The
most common form of CAPTCHA is an image of several distorted letters. We have to
type the correct series of letters into a form. If letters match the ones in the distorted
image, we pass the test.
CAPTCHAs are short for Completely Automated Public Turing test to tell
Computers and Humans Apart. The term "CAPTCHA" was coined in 2000 by Luis
Von Ahn, Manuel Blum, Nicholas J. Hopper (all of Carnegie Mellon University,
and John Langford (then of IBM). They are challenge-response tests to ensure that the
users are indeed human. The purpose of a CAPTCHA is to block form submissions
from spam bots automated scripts that harvest email addresses from publicly
available web forms. A common kind of CAPTCHA used on most websites requires
the users to enter the string of characters that appear in a distorted form on the screen.
1
CAPTCHAs are used because of the fact that it is difficult for the computers to extract
the text from such a distorted image, whereas it is relatively easy for a human to
understand the text hidden behind the distortions. Therefore, the correct response to a
CAPTCHA challenge is assumed to come from a human and the user is permitted into
the website.
Why would anyone need to create a test that can tell humans and computers apart? It's
because of people trying to game the system, they want to exploit weaknesses in the
computers running the site. While these individuals probably make up a minority of
all the people on the Internet, their actions can affect millions of users and Web sites.
For example, a free e-mail service might find itself bombarded by account requests
from an automated program. That automated program could be part of a larger
attempt to send out spam mail to millions of people. The CAPTCHA test helps
identify which users are real human beings and which ones are computer program
Spammers are constantly trying to build algorithms that read the distorted text
correctly. So strong CAPTCHAs have to be designed and built so that the efforts of
the spammers are thwarted.
1.2 Motivation:
The proliferation of the publicly available services on the Web is a boon for
the community at large. But unfortunately it has invited new and novel abuses.
Programs (bots and spiders) are being created to steal services and to conduct
fraudulent transactions. Some examples:
Free online accounts are being registered automatically many times and
Online polls are attacked by bots and are susceptible to ballot stuffing.
This gives unfair mileage to those that benefit from it.
In light of the above listed abuses and much more, a need was felt for a facility that
checks users and allows access to services to only human users. It was in this
direction that such a tool like CAPTCHA was created.
1.3 Background:
The need for CAPTCHAs rose to keep out the website/search engine abuse by
bots. In 1997, AltaVista sought ways to block and discourage the automatic
submissions of URLs into their search engines. Andrei Broder, Chief Scientist of
AltaVista, and his colleagues developed a filter. Their method was to generate a
printed text randomly that only humans could read and not machine readers. Their
approach was so effective that in a year, spam-add-ons were reduced by 95% and a
patent was issued in 2001.
In 2000, Yahoos popular Messenger chat service was hit by bots which
pointed advertising links to annoying human users of chat rooms. Yahoo, along with
Carnegie Mellon University, developed a CAPTCHA called EZ-GIMPY, which chose
a dictionary word randomly and distorted it with a wide variety of image occlusions
and asked the user to input the distorted word.
In November 1999, slashdot.comreleased a poll to vote for the best CS
College in the US. Students from the Carnegie Mellon University and the
Massachusetts Institute of Technology created bots that repeatedly voted for their
respective colleges. This incident created the urge to use CAPTCHAs for such online
polls to ensure that only human users are able to take part in the polls.
removing image noise and clutter items, and isolating individual characters within the
captcha in order to allow optical character recognition (OCR) technologies a
maximised opportunity for success. Attacking heuristics often have a parameterised
design, so that their behaviour may be adjusted to attack several different but related
forms of captcha.
Mechanical Turk attacks
Here, the problem of solving the captcha is automatically outsourced to a paid
human agent. They immediately solve the challenge and quickly return the answer to
the automated agent in real time. The automated agent then presents the human
provided answer, and is able to programatically exploit the online resource (Barr and
Cabrera, 2006; Bursztein, et al., 2010; Economist, 2008; Websense Security Labs,
2008b). A human Turk agent working full time to support such attacks can solve
thousands of captchas per hour, depending on the type of captcha. There is little that
can be done to defend against such attacks, other than to perhaps raise the
inconvenience of the captcha for all users in order to reduce the economic viability of
this attack.
Trivial guessing attacks
If there is an unlimited range of challenges, but a very limited range of possible
answers (e.g., which of these 10 choices is correct?), a high success rate may be
achieved by an attacking program by merely guessing randomly from the available
answers. Particularly, any graphical captcha that requires the user to select a correct
position within an image but which has a wide error tolerance for user inaccuracy may
be vulnerable to a trivial guessing attack.
Brute force attacks
If there is a somewhat limited range of possible answers e.g., a numerical 4digit
captcha would have 10,000 possible answers then it is possible for a distributed
group of automated agents to attack the captcha by exhaustively trying answers at
random or according to a selected sequence. This differs from the trivial guessing
attack, in that it relies upon having access to a large number of attacking agents i.e., a
botnet (Websense Security Labs, 2008b; 2009) rather than relying upon having
access to a poorly designed captcha.
6
Hybrid attacks
It is possible to combine these attacks. For example, if a signal processing attack can
estimate five of six captcha characters with a high degree of confidence, a guess may
be made on the remaining character, yielding a success rate of between 1.5% (mixed
case alphanumerical characters) and 10% (numerical digits). For example, the
QuestionBased captcha (ShiraliShahreza and ShiraliShahreza, 2007b) presents a
mathematical problem, which can be broken by an attacker who uses OCR to
recognise the numerical digits mentioned in the puzzle, combined with a random
guess of one of the few possible ways in which the numbers may be combined
arithmetically.
1.6 Thesis Contribution
1. To study various types of existing captcha system such as Text based Captcha and
picture based Captcha.
2. To proposes a solution for improving web security from Web Bots (Robots) by
implementing WORD GROUPING CAPTCHA.
3. To discuss various types of evaluation parameter: Consistency, Ease of generation,
Entropy and Implementation.
4. To compare existing and proposed Captcha system based on different parameters.
1.7 Thesis Outline
Thesis discusses existing captcha system, weakness of existing captcha system. A new
captcha approach Word grouping captcha is proposed. Chapter 2 provides an
overview of existing Captcha techniques including Text Captcha, Graphic Captcha,
Audio Captcha, reCAPTCHA and book digitization. It also discusses different
applications of Captcha. Chapter 3 discusses the Literature Review. Chapter 4
discusses the project description. Chapter 5 presents various Captcha evaluation
parameters and comparing Text based captcha; Picture based captcha and Word
grouping captcha. Chapter 6 presents experimental result and analysis. Chapter 7
concludes providing suggestion for future work to be done in this area.
CHAPTER 2
CAPTCHA OVERVIEW
2.1.2 Ez Gimpy:
This is a simplified version of the Gimpy CAPTCHA, adopted by Yahoo in
their signup page. Ez Gimpy randomly picks a single word from a dictionary and
applies distortion to the text. The user is then asked to identify the text correctly.
finans
XTNM5YRE
L9D28229B
Fig 2.4 MSN Passport CAPTCHA
2.2.1 Bongo:
Bongo, Another example of a CAPTCHA is the program we call BONGO.
BONGO is named after M.M. Bongard, who published a book of pattern recognition
problems in the 1970s. BONGO asks the user to solve a visual pattern recognition
problem. It displays two series of blocks, the left and the right. The blocks in the left
series differ from those in the right, and the user must find the characteristic that sets
them apart. A possible left and right series is shown in Figure 2.5
10
11
2.2.2.2. Asirra
Another Image CAPTCHA is Asirra Stands for Animal Species Image
Recognition for Restricting Access is a cat or dog labelling based CAPTCHA design.
In this test user has to select all the pictures of cat. Asirra is randomly choosing
images from petfinder.com. Snapshot of Asirra is shown in figure 2.7.
is
as
service
paid
is
Shows
nine
images in 3 by 3 grid and user is asked to choose all the images of cat one by one until
images become dogs images. A snapshot of it is shown in Figure 2.8.The dog is
randomly placed among nine cats and the process is repeated for three times.
the image and the user is asked to select a relevant text label. A snapshot of Multi
Model CAPTCHA is shown in Figure 2.9
Fig
Multi
2.9
Model
Captcha
2.2.2.5
Dynamic
Image Based
CAPTCHA
Another improved image CAPTCHA is Dynamic Image Based CAPTCHA
(DIBC).In this CAPTCHA system user is required to recognize the exact matching
image or images to pass the Turing test. An image is selected randomly from image
database and is placed in a grid of six images random number of times. User is
supposed to submit all the correct version of the filtered image for clearing the Turing
test in maximum of 5 attempts.It is shown in Figure 2.10
13
14
15
Fig
2.13
(reCAPTCHA) first line shows scanned text, second line shows text read by OCR
2.5 Applications
CAPTCHAs are used in various Web applications to identify human users and
to restrict access to them. Some of them are:
1. Online Polls [5]: As mentioned before, bots can wreak havoc to any
unprotected online poll. They might create a large number of votes which
would then falsely represent the poll winner in spotlight. This also results in
decreased faith in these polls. CAPTCHAs can be used in websites that have
embedded polls to protect them from being accessed by bots, and hence bring
up the reliability of the polls.
2. Protecting Web Registration [6]: Several companies offer free email and
other services. Until recently, these service providers suffered from a serious
16
problem bots. These bots would take advantage of the service and would
sign up for a large number of accounts. This often created problems in account
management and also increased the burden on their servers. CAPTCHAs can
effectively be used to filter out the bots and ensure that only human users are
allowed to create accounts.
3. Preventing comment spam [7]: Most bloggers are familiar with programs
that submit large number of automated posts that are done with the intention of
increasing the search engine ranks of that site. CAPTCHAs can be used before
a post is submitted to ensure that only human users can create posts. A
CAPTCHA won't stop someone who is determined to post a rude message or
harass an administrator, but it will help prevent bots from posting messages
automatically.
4. Search engine bots: It is sometimes desirable to keep web pages unindexed to
prevent others from finding them easily. There is an html tag to prevent search
engine bots from reading web pages. The tag, however, doesn't guarantee that
bots won't read a web page; it only serves to say "no bots, please." Search
engine bots, since they usually belong to large companies, respect web pages
that don't want to allow them in. However, in order to truly guarantee that bots
won't enter a web site, CAPTCHAs are needed.
5. E-Ticketing: Ticket brokers like Ticketmaster also use CAPTCHA
applications. These applications help prevent ticket scalpers from bombarding
the service with massive ticket purchases for big events. Without some sort of
filter, it's possible for a scalper to use a bot to place hundreds or thousands of
ticket orders in a matter of seconds. Legitimate customers become victims as
events sell out minutes after tickets become available. Scalpers then try to sell
the tickets above face value. While CAPTCHA applications don't prevent
scalping; they do make it more difficult to scalp tickets on a large scale.
6. Email spam: CAPTCHAs also present a plausible solution to the problem of
spam emails. All we have to do is to use a CAPTCHA challenge to verify that
an indeed a human has sent the email.
7. Preventing Dictionary Attacks [8, 9, 10]: CAPTCHAs can also be used to
prevent dictionary attacks in password systems. The idea is simple: prevent a
17
computer from being able to iterate through the entire space of passwords by
requiring it to solve a CAPTCHA after a certain number of unsuccessful
logins. This is better than the classic approach of locking an account after a
sequence of unsuccessful logins, since doing so allows an attacker to lock
accounts at will.
8. As a tool to verify digitized books: This is a way of increasing the value of
CAPTCHA as an application. An application called reCAPTCHA harnesses
users responses in CAPTCHA fields to verify the contents of a scanned piece
of paper. Because computers arent always able to identify words from a
digital scan, humans have to verify what a printed page says.
Then its possible for search engines to search and index the contents of a
scanned document. This is how it works: The application already recognizes
one of the words. If the visitor types that word into a field correctly, the
application assumes the second word the user types is also correct. That
second word goes into a pool of words that the application will present to
other users. As each user types in a word, the application compares the word to
the original answer. Eventually, the application receives enough responses to
verify the word with a high degree of certainty. That word can then go into the
verified pool.
18
CHAPTER 3
Literature Review
are shown and corresponding to each pic there is drop down list having few
options.
Weakness of Picture based Captcha:
It creates a problem to users having low vision or learning disability [15].
Most of the time object Recognition becomes cumbersome due to the
ambiguity present in image objects. Instead of Turing test it has become
almost an IQ test
20
importantly, what really matters for visually impaired users is that the html image
alternative text attached to any of the above symbol should clearly indicate the need to
solve an audio CAPTCHA.
When embedded in web pages, audio CAPTCHAs can also cause compatibility
issues. For example, many such schemes require JavaScript to be enabled. However,
some users might prefer to disable JavaScript in their browsers. Some other schemes
can be even worse. For example, we found that one audio scheme requires Adobe
Flash support. With this scheme, vision-impaired users will not even notice that such
a CAPTCHA challenge exist in the page, unless Flash is installed in their computers apparently, no text alternative is attached to the speaker-like Flash object, either.
22
24
25
26
CHAPTER 4
PROJECT DESCRIPTION
pointing both readers and search engines to irrelevant sites). CAPTCHAs generate
and grade tests that most humans can pass but current computer programs cant. Such
tests are often called CAPTCHA challenges that are based on hard, open artificial
intelligence problems.
To date, the most commonly used CAPTCHAs are text-based, in which the challenge
appears as an image of distorted text that the user must decipher and retype. These
schemes typically exploit the difficulty for state-of-the-art computer programs to
recognize distorted text. Well-known examples include EZ-Gimpy, Gimpy, and
Gimpy-r, all developed at Carnegie Mellon University. Google, Microsoft, and Yahoo
have also developed and deployed their own text CAPTCHAs. A good CAPTCHA
must not only be human friendly but also robust enough to resist computer programs
that attackers write to automatically pass CAPTCHA tests. However, designing
CAPTCHAs that exhibit both good robustness and usability is much harder than it
might seem. The current collective understanding of this topic is very limited, as
suggested by the fact that many well-known schemes break. In particular, we recently
found that we could break a widely deployed CAPTCH which is carefully designed
and tuned by Microsoft with a success rate of higher than 60 percent, even though its
design goal was that automated attacks shouldnt achieve a success rate of higher than
0.01 percent. We expect that CAPTCHA will go through the same process of
evolutionary development as cryptography, digital watermarking, and the like, with an
iterative process in which successful attacks lead to the development of more robust
systems.
4.3. System Specification
4.3.1 Hardware Requirements
Processors
Speed
RAM
Hard Disk
Monitor
Key Boards
Mouse
:
:
:
:
:
:
:
28
Pentium IV
2.4GHz
128 MB
20GB
Color Monitor
104 Keys
3 buttons
4.3.2
Software Requirements
Operating System
Language
Database
Server
:
:
:
:
Windows XP/2000
Java (Jdk 1.6)
Oracle 10g, Heidi SQL
Apache Tomcat 7.0.29
The new system requirements are defined in as much details as possible. This
usually involves interviewing a number of users representing all the external or
internal users and other aspects of the existing system.
A first prototype of the new system is constructed from the preliminary design.
This is usually a scaled-down system, and represents an approximation of the
characteristics of the final product.
29
At the customer option, the entire project can be aborted if the risk is deemed too
great.
miscalculation, or any other factor that could, in the customers judgment, result
in a less-than-satisfactory final product.
The existing prototype is evaluated in the same manner as was the previous
prototype, and if necessary, another prototype is developed from it according to
the fourfold procedure outlined above.
The preceding steps are iterated until the customer is satisfied that the refined
prototype represents the final product desired.
30
31
Open URL
Correct Code
User Login
Incorrect code
Verification failed
Sequence diagram are an easy and intuitive way of describing the behavior of a system
by viewing the interaction between the system and its environment. A Sequence diagram
shows an interaction arranged in a time sequence. A sequence diagram has two dimensions:
vertical dimension represents time; the horizontal Dimension represents different objects. The
vertical line is called is the objects life line. The lifeline represents the objects existence
during the interaction.
Sequence Diagram
Browser
Website
LoginController
.java
WorkLogin
.jsp
validatelogin
.java
With
1. Connect
2. Redirect
3. Connects to loginController.java
4. Forward Request
33
34
CHAPTER 5
CAPTCHA EVALUATION PARAMETERS [23]
5.1 Parameters
5.1.1 Consistency
When presented with the same Captcha, how reproducible is a user's answer? The
level of consistency will clearly vary across different Captcha, and the
acceptable level will vary by application (some may be more lenient than
others).
5.1.2 Entropy
By entropy, we mean two things: First do different people answer the same
captcha in the same way? Second how hard is it for an adversary to guess the
users answers?
5.1.3 Ease of generationHow difficult is it to generate a given Captcha? Can it be generated given only
randomness, or does it require a precomputed/pregenerated corpus.
5.1.4 Implementation
Finally, how easy is it to implement? Does it require complex and elaborate
graphics, or can it be implemented for a text-only system? How accessible is
it?
35
5.2.2
Picture Captcha
36
CAPTCHA
SYSTEM
CONSISTENCY
ENTROPY
EASE OF
GENERATION
IMPLEMENTATION
Text Based
Captcha
High
Easy to generate ,
Require text to
rendered and
some randomness
Easy
Picture Based
Medium, Can be
high if user can
recognize picture
correctly
User may
confuse with
z/x, o/a,
C/G,I/l, Q/O,
h/b
Less,
depends on
quality and
difficulty
level of
Picture
Uses large
database of
pictures, so
require large
collection of
pictures.
Word
Grouping
26 possible
outputs,
Large amount
of variation
Limitations,
cannot become
too large, either
user will not able
to recognize some
words
Some implementations
use only a small fixed
pool of CAPTCHA
images. Test can be
broken by simply
looking up solutions in a
table, based on a hash of
the challenge image.
Only require text based
interface
37
CHAPTER 6
EXPERIMENTAL RESULTS AND ANALYSIS
6.1 Text based Captcha
After opening the webpage user will have an option to choose one captcha
among word grouping, picture based captcha and text based captcha. Word
grouping captcha is set as default. If user chooses text based captcha then he
needs to recognize the text given in image. The text will be generated among
letters a to z and numbers 0 to 9. User has to recognize those texts correctly
and will have to write in the box of Enter verification text. After writing those
texts user has to click on verify button.
If user will not write those texts correctly and clicks on verify button then a
message will appear Text Captcha Verification Failed. Please solve it again.
38
41
6.4. Authentication
After successfully solving any one among three captchas the user will be
directed to login page. User has to write user name and assigned password at
the designated place.
Fig.6.8
42
CHAPTER 7
CONCLUSION AND FUTURE WORK
CONCLUSION
Creating a captcha that is so secure that no human can solve it or so user friendly that
it is a trivial task for captcha breaking software is very easy to accomplish. A
successful captcha by its definition is able to tell humans and computers a part. The
goal is to add security features whenever possible as long as they do not significantly
or unnecessarily decrease the accuracy of human solvers. Text based captcha are now
breakable. If we will increase distortion, blurring and other factors, then it will be
hard for human beings also to read those texts while our goal is to differentiate
between humans and computers. Word grouping captcha proposes a solution for this
problem. Its hard to break and user may not find any difficulty in dividing those
words in two subgroups. Since user has to just click on radio buttons, so it is less time
consuming also.
FUTURE WORK
It is not possible to develop a system that makes all the requirements of user. User
requirements keep changing as the system is being used. Some of the future
enhancements that can be done to this system are
It is based on object oriented design, any further changes can be easily adaptable.
Based on the future security issues, security can be improved using emerging
technologies.
In word grouping captcha, in place of clicking radio buttons, user may be asked to
type those six words in two subcategory.
More research work needs to be done by researcher to create captcha based on
different AI problems, hopefully they will become more user-friendly with disabilities
43
REFRENCES
1.Moni Naor Verification of a human in the loop or identification via the Turing
test.Unpublished Manuscript,1997. A preliminary draft available on
170.
11. http://firstmonday.org/ojs/index.php/fm/article/viewArticle/3630/3145
12.http://www.informationweek.com/news/internet/webdev/showArticle.jhtml?
articleID=205900620
44
13. http://www.theregister.co.uk/2008/02/25/gmailcaptchacrack/
14. Rizwan Ur Rahman Survey on Captcha systems In Journal of Global
Research in Computer Science, Volume 3, No. 6, June 2012, Pages. 55-57
15.Moin Mahmud Tanvee, Mir Tafseer Nayeem, Md. Mahmudul Hasan RafeeMove
& Select: 2-Layer CAPTCHA based on Cognitive Psychology for securing web
services. Taken from ijens.org, available at
http://www.ijens.org/Vol_11_I_05/117005-8383-IJVIPNS-IJENS.pdf
16.Elie Bursztein, Matthieu Martin, John C. Mitchell Text-based CAPTCHA
Strengths and Weaknesses in ACM Computer and Communication security 2011
(CSS2011) Pages 11-12
17.E. Bursztein and S. Bethard, 2009. Decaptcha: Breaking 75% of eBay audio
CAPTCHAs, Proceedings of WOOT 09: Third USENIC Workshop on Offensive
Technologies (10 August Montreal, Canada), at
http://www.usenix.org/event/woot09/tech/full_papers/bursztein.pdf 10 February
2012. Page-3
18.Jennifer Tam, Jiri Simsa, Sean Hyde, Luis Von Ahn Breaking Audio
CAPTCHAs Carnegie Mellon University page -5
www.captcha.net/Breaking_Audio_CAPTCHAs.pdf
19.Andrew Kirillov. aforge framework. http://www.aforgenet.com/framework/
20.C.Cortes and V. Vapnik. Support vector networks.
Machine learning,
45