Body Report

CHAPTER 1
INTRODUCTION
1.1 Overview
Completely Automated Public Turing Tests [1] to Tell Computers and Humans Apart
[2] (CAPTCHAs) are standard security mechanisms for defending against undesirable
and malicious bot programs on the Internet (especially those bots that can sign up for
thousands of accounts a minute with free email service providers, send out thousands
of spam messages in an instant, or post numerous comments in blogs pointing both
readers and search engines to irrelevant sites). CAPTCHAs generate and grade tests
that most humans can pass but current computer programs cannot. Such tests often
called CAPTCHA challenges are based on hard, open artificial intelligence.
Use of INTERNET has remarkably increased all over the world in the past two
decades and Financial dealing are made over the internet ,so the need of security
increases.
To sign up for a free email service offered by Gmail or Yahoo we need to submit web
application, everyone first have to pass a test. It's not a hard test in fact, that's the
point. For humans, the test should be simple and straightforward. But for a computer,
the test should be almost impossible to solve.
This sort of test is a CAPTCHA. They're also known as a type of Human
Interaction Proof (HIP). CAPTCHA tests are easily seen on lots of Web sites. The
most common form of CAPTCHA is an image of several distorted letters. We have to
type the correct series of letters into a form. If letters match the ones in the distorted
image, we pass the test.
CAPTCHAs are short for Completely Automated Public Turing test to tell
Computers and Humans Apart. The term "CAPTCHA" was coined in 2000 by Luis
Von Ahn, Manuel Blum, Nicholas J. Hopper (all of Carnegie Mellon University,
and John Langford (then of IBM). They are challenge-response tests to ensure that the
users are indeed human. The purpose of a CAPTCHA is to block form submissions
from spam bots automated scripts that harvest email addresses from publicly
available web forms. A common kind of CAPTCHA used on most websites requires
the users to enter the string of characters that appear in a distorted form on the screen.
1
CAPTCHAs are used because of the fact that it is difficult for the computers to extract
the text from such a distorted image, whereas it is relatively easy for a human to
understand the text hidden behind the distortions. Therefore, the correct response to a
CAPTCHA challenge is assumed to come from a human and the user is permitted into
the website.
Why would anyone need to create a test that can tell humans and computers apart? It's
because of people trying to game the system, they want to exploit weaknesses in the
computers running the site. While these individuals probably make up a minority of
all the people on the Internet, their actions can affect millions of users and Web sites.
For example, a free e-mail service might find itself bombarded by account requests
from an automated program. That automated program could be part of a larger
attempt to send out spam mail to millions of people. The CAPTCHA test helps
identify which users are real human beings and which ones are computer program
Spammers are constantly trying to build algorithms that read the distorted text
correctly. So strong CAPTCHAs have to be designed and built so that the efforts of
the spammers are thwarted.
1.2 Motivation:
The proliferation of the publicly available services on the Web is a boon for
the community at large. But unfortunately it has invited new and novel abuses.
Programs (bots and spiders) are being created to steal services and to conduct
fraudulent transactions. Some examples:
Free online accounts are being registered automatically many times and
are being used to distribute stolen or copyrighted material.

Recommendation systems are vulnerable to artificial inflation or deflation
of rankings. For example, EBay, a famous auction website allows users to
rate a product. Abusers can easily create bots that could increase or
decrease the rating of a specific product, possibly changing peoples
perception towards the product.

Spammers register themselves with free email accounts such as those
provided by Gmail or Hotmail and use their bots to send unsolicited mails
to other users of that email service.
2
Online polls are attacked by bots and are susceptible to ballot stuffing.
This gives unfair mileage to those that benefit from it.
In light of the above listed abuses and much more, a need was felt for a facility that
checks users and allows access to services to only human users. It was in this
direction that such a tool like CAPTCHA was created.
1.3 Background:
The need for CAPTCHAs rose to keep out the website/search engine abuse by
bots. In 1997, AltaVista sought ways to block and discourage the automatic
submissions of URLs into their search engines. Andrei Broder, Chief Scientist of
AltaVista, and his colleagues developed a filter. Their method was to generate a
printed text randomly that only humans could read and not machine readers. Their
approach was so effective that in a year, spam-add-ons were reduced by 95% and a
patent was issued in 2001.
In 2000, Yahoos popular Messenger chat service was hit by bots which
pointed advertising links to annoying human users of chat rooms. Yahoo, along with
Carnegie Mellon University, developed a CAPTCHA called EZ-GIMPY, which chose
a dictionary word randomly and distorted it with a wide variety of image occlusions
and asked the user to input the distorted word.
In November 1999, slashdot.comreleased a poll to vote for the best CS
College in the US. Students from the Carnegie Mellon University and the
Massachusetts Institute of Technology created bots that repeatedly voted for their
respective colleges. This incident created the urge to use CAPTCHAs for such online
polls to ensure that only human users are able to take part in the polls.
1.4 CAPTCHAs and the Turing Test:

CAPTCHA technology has its foundation in an experiment called the Turing
Test [2]. Alan Turing, sometimes called the father of modern computing, proposed the
test as a way to examine whether or not machines can think or appear to think like
humans. The classic test is a game of imitation. In this game, an interrogator asks two
participants a series of questions. One of the participants is a machine and the other is
a human. The interrogator can't see or hear the participants and has no way of
knowing which is which. If the interrogator is unable to figure out which participant is
a machine based on the responses, the machine passes the Turing Test.
Of course, with a CAPTCHA, the goal is to create a test that humans can pass
easily but machines can't.
It's also important that the CAPTCHA application is able to present different
CAPTCHAs to different users. If a visual CAPTCHA presented a static image that
was the same for every user, it wouldn't take long before a spammer spotted the form,
deciphered the letters, and programmed an application to type in the correct answer
automatically.
Most, but not all, CAPTCHAs rely on a visual test. Computers lack the
sophistication that human beings have when it comes to processing visual data. We
can look at an image and pick out patterns more easily than a computer. The human
mind sometimes perceives patterns even when none exist, a quirk we call pareidolia.
Ever see a shape in the clouds or a face on the moon? That's your brain trying to
associate random information into patterns and shapes.
But not all CAPTCHAs rely on visual patterns. In fact, it's important to have
an alternative to a visual CAPTCHA. Otherwise, the Web site administrator runs the
risk of disenfranchising any Web user who has a visual impairment. One alternative to
a visual test is an audible one. An audio CAPTCHA usually presents the user with a
series of spoken letters or numbers. It's not unusual for the program to distort the
speaker's voice, and it's also common for the program to include background noise in
the recording. This helps thwart voice recognition programs.
1.5 Forms of attacks [3]

Whether a captcha is based on pictures, text, sound, or puzzle solving, certain
similarities can be seen in terms of how captcha are attacked by malicious users.
Typical attack models seen to date include:
Bypass attacks
Any attack that circumvents the need to solve the captcha at all. For example,network
replay attacks, or cases where the captcha solution is exposed accidentally, perhaps
through HTML or CGI form parameter values. Generally, any system that sends the
decoded form of the captcha to a client program as part of the data stream is
vulnerable to such an attack. Such attacks are not always a weakness of the captcha
itself; they may instead be a weakness of the service using the captcha.
Challenge replay attacks
If the captcha system can produce only a limited number of unique challenges, then
the automated agent may record all or most of the possible challenges. A human
associate provides a library of correct answers for the challenges. The automated
agent can then replay the correct answer whenever it is faced with a particular
challenge for which it knows the correct solution. Some image based captchas are
vulnerable to this weakness, particularly those based upon a finite library of
photographs (e.g., the KittenAuth captcha (Reimer, 2006) used a challenge library of
42 images).
Signal processing attacks
The noise and perturbations that are commonly used to obfuscate captcha images or
sounds are intended to be oneway; a computer should be able to add them, but not
reverse them easily. In principle, only a humans flexible image and sound recognition
capabilities should be capable of conveniently reversing the transformations and
recovering the original message. In practice, captcha researchers and malicious
attackers have both proven highly capable of reversing captcha transformations
approximately via mathematical heuristics and machine learning approaches
(Chellapilla and Simard, 2005; Hocevar, 2004; Huang, et al., 2008; Mori and Malik,
2003; Yan and El Ahmad, 2007, 2008a). For textbased captchas, this is achieved by
5
removing image noise and clutter items, and isolating individual characters within the
captcha in order to allow optical character recognition (OCR) technologies a
maximised opportunity for success. Attacking heuristics often have a parameterised
design, so that their behaviour may be adjusted to attack several different but related
forms of captcha.
Mechanical Turk attacks
Here, the problem of solving the captcha is automatically outsourced to a paid
human agent. They immediately solve the challenge and quickly return the answer to
the automated agent in real time. The automated agent then presents the human
provided answer, and is able to programatically exploit the online resource (Barr and
Cabrera, 2006; Bursztein, et al., 2010; Economist, 2008; Websense Security Labs,
2008b). A human Turk agent working full time to support such attacks can solve
thousands of captchas per hour, depending on the type of captcha. There is little that
can be done to defend against such attacks, other than to perhaps raise the
inconvenience of the captcha for all users in order to reduce the economic viability of
this attack.
Trivial guessing attacks
If there is an unlimited range of challenges, but a very limited range of possible
answers (e.g., which of these 10 choices is correct?), a high success rate may be
achieved by an attacking program by merely guessing randomly from the available
answers. Particularly, any graphical captcha that requires the user to select a correct
position within an image but which has a wide error tolerance for user inaccuracy may
be vulnerable to a trivial guessing attack.
Brute force attacks
If there is a somewhat limited range of possible answers e.g., a numerical 4digit
captcha would have 10,000 possible answers then it is possible for a distributed
group of automated agents to attack the captcha by exhaustively trying answers at
random or according to a selected sequence. This differs from the trivial guessing
attack, in that it relies upon having access to a large number of attacking agents i.e., a
botnet (Websense Security Labs, 2008b; 2009) rather than relying upon having
access to a poorly designed captcha.
6
Hybrid attacks
It is possible to combine these attacks. For example, if a signal processing attack can
estimate five of six captcha characters with a high degree of confidence, a guess may
be made on the remaining character, yielding a success rate of between 1.5% (mixed
case alphanumerical characters) and 10% (numerical digits). For example, the
QuestionBased captcha (ShiraliShahreza and ShiraliShahreza, 2007b) presents a
mathematical problem, which can be broken by an attacker who uses OCR to
recognise the numerical digits mentioned in the puzzle, combined with a random
guess of one of the few possible ways in which the numbers may be combined
arithmetically.
1.6 Thesis Contribution
1. To study various types of existing captcha system such as Text based Captcha and
picture based Captcha.
2. To proposes a solution for improving web security from Web Bots (Robots) by
implementing WORD GROUPING CAPTCHA.
3. To discuss various types of evaluation parameter: Consistency, Ease of generation,
Entropy and Implementation.
4. To compare existing and proposed Captcha system based on different parameters.
1.7 Thesis Outline
Thesis discusses existing captcha system, weakness of existing captcha system. A new
captcha approach Word grouping captcha is proposed. Chapter 2 provides an
overview of existing Captcha techniques including Text Captcha, Graphic Captcha,
Audio Captcha, reCAPTCHA and book digitization. It also discusses different
applications of Captcha. Chapter 3 discusses the Literature Review. Chapter 4
discusses the project description. Chapter 5 presents various Captcha evaluation
parameters and comparing Text based captcha; Picture based captcha and Word
grouping captcha. Chapter 6 presents experimental result and analysis. Chapter 7
concludes providing suggestion for future work to be done in this area.
CHAPTER 2
CAPTCHA OVERVIEW
CAPTCHAs are classified based on what is distorted and presented as a

challenge to the user. They are:
2.1 Text CAPTCHAs:
The most commonly used CAPTCHAs are text-based where distorted
text is displayed. To solve the CAPTCHA, users must recognize the distorted
characters and correctly enter them in a designated space.

The various implementations are:
2.1.1 Gimpy:
Gimpy is a very reliable text CAPTCHA built by CMU in collaboration with
Yahoo for their Messenger service. Gimpy is based on the human ability to read
extremely distorted text and the inability of computer programs to do the same.
Gimpy works by choosing ten words randomly from a dictionary, and displaying them
in a distorted and overlapped manner. Gimpy then asks the users to enter a subset of
the words in the image. The human user is capable of identifying the words correctly,
whereas a computer program cannot.
Fig 2.1. Gimpy CAPTCHA
2.1.2 Ez Gimpy:
This is a simplified version of the Gimpy CAPTCHA, adopted by Yahoo in
their signup page. Ez Gimpy randomly picks a single word from a dictionary and
applies distortion to the text. The user is then asked to identify the text correctly.
Fig 2.2 Yahoos Ez Gimpy CAPTCHA
2.1.3 Baffle Text:

This was developed by Henry Baird at University of California at
Berkeley. This is a variation of the Gimpy. This doesnt contain dictionary words, but
it picks up random alphabets to create a nonsense but pronounceable text. Distortions
are then added to this text and the user is challenged to guess the right word.
This technique overcomes the drawback of Gimpy CAPTCHA because,
Gimpy uses dictionary words and hence, clever bots could be designed to check the
dictionary for the matching word by brute-force.
finans
Fig 2.3 Baffle Text examples
2.1.4 MSN Captcha:

Microsoft uses a different CAPTCHA for services provided under MSN
umbrella. These are popularly called MSN Passport CAPTCHAs. They use eight
characters (upper case) and digits. Foreground is dark blue, and background is grey.
Warping is used to distort the characters, to produce a ripple effect, which makes
computer recognition very difficult.
XTNM5YRE
L9D28229B
Fig 2.4 MSN Passport CAPTCHA
2.2 Graphic CAPTCHAs:

Graphic CAPTCHAs are challenges that involve pictures or objects that have
some sort of similarity that the users have to guess. They are visual puzzles, similar
to Mensa tests. Computer generates the puzzles and grades the answers, but is itself
unable to solve it.
2.2.1 Bongo:
Bongo, Another example of a CAPTCHA is the program we call BONGO.
BONGO is named after M.M. Bongard, who published a book of pattern recognition
problems in the 1970s. BONGO asks the user to solve a visual pattern recognition
problem. It displays two series of blocks, the left and the right. The blocks in the left
series differ from those in the right, and the user must find the characteristic that sets
them apart. A possible left and right series is shown in Figure 2.5
10
Fig 2.5 Bongo CAPTCHA

These two sets are different because everything on the left is drawn with thick lines
and those on the right are in thin lines. After seeing the two blocks, the user is
presented with a set of four single blocks and is asked to determine to which group the
each block belongs to. The user passes the test if s/he determines correctly to which
set the blocks belong to. We have to be careful to see that the user is not confused by a
large number of choices.
2.2.2 Image Based Captcha
Image based CAPTCHA: In this scheme the user is required to identify some image
recognition task.
2.2.2.1 ESP Pix
ESP Pix is first image CAPTCHA and it was developed at Carnegie Mellon
University. A snapshot of ESP Pix CAPTCHA is shown in Figure 2.6. In ESP Pix the
user has given four images and in order to pass this test the user has to select word
related to those four images from drop down list of given choices.
Fig. 2.6 ESP Pix
11
2.2.2.2. Asirra
Another Image CAPTCHA is Asirra Stands for Animal Species Image
Recognition for Restricting Access is a cat or dog labelling based CAPTCHA design.
In this test user has to select all the pictures of cat. Asirra is randomly choosing
images from petfinder.com. Snapshot of Asirra is shown in figure 2.7.
Fig. 2.7 Asirra

Image
2.2.2.3 CAPTCHA
the dog
One
CAPTCHA
available
is
as
service
paid
is
CAPTCHA the dog.

It
Shows
nine
images in 3 by 3 grid and user is asked to choose all the images of cat one by one until
images become dogs images. A snapshot of it is shown in Figure 2.8.The dog is
randomly placed among nine cats and the process is repeated for three times.
Fig. 2.8 Captcha the Dog Image

2.2.2.4 Multi Model CAPTCHA
It combines text and image based System together. In this end user is shown
an image and four text labels associated with the image. Text labels are embedded in
12
the image and the user is asked to select a relevant text label. A snapshot of Multi
Model CAPTCHA is shown in Figure 2.9
Fig
Multi
2.9
Model
Captcha
2.2.2.5
Dynamic
Image Based
CAPTCHA
Another improved image CAPTCHA is Dynamic Image Based CAPTCHA
(DIBC).In this CAPTCHA system user is required to recognize the exact matching
image or images to pass the Turing test. An image is selected randomly from image
database and is placed in a grid of six images random number of times. User is
supposed to submit all the correct version of the filtered image for clearing the Turing
test in maximum of 5 attempts.It is shown in Figure 2.10
Fig 2.10 Dynamics image based captcha

2.2.2.6 IdentiPic[4]
It is photo based CAPTCHA system where user has to identify picture .Three
pictures are shown and corresponding to each pic there is drop down list having ten
options. A snapshot of Identipic is shown in Figure 2.11
13
Fig 2.11 IdentiPic

2.2.2.7. Puzzle based CAPTCHA:
It is also referred as question based CAPTCHA. In this test, a small
mathematical problem is generated according to some predefined rules. The problem
then rendered by the server to the user answer of which is already known to server.
Solving of this problem requires an ability of understanding text of question, only a
human user can answer this question. Figure 2.12 illustrates the GUI of Question
based CAPTCHA
Fig 2.12 Puzzle

based captcha
14
2.3 Audio CAPTCHAs:

The final example we offer is based on sound. The program picks a word or a
sequence of numbers at random, renders the word or the numbers into a sound clip
and distorts the sound clip; it then presents the distorted sound clip to the user and
asks users to enter its contents. This CAPTCHA is based on the difference in ability
between humans and computers in recognizing spoken language. Nancy Chan of the
City University in Hong Kong was the first to implement a sound-based system of this
type. The idea is that a human is able to efficiently disregard the distortion and
interpret the characters being read out while software would struggle with the
distortion being applied, and need to be effective at speech to text translation in order
to be successful. This is a crude way to filter humans and it is not so popular because
the user has to understand the language and the accent in which the sound clip is
recorded.
2.4 reCAPTCHA and book digitization:

To counter various drawbacks of the existing implementations, researchers at
CMU developed a redesigned CAPTCHA aptly called the reCAPTCHA. About 200
million CAPTCHAs are solved by humans around the world every day. In each case,
roughly ten seconds of human time are being spent. Individually, that's not a lot of
time, but in aggregate these little puzzles consume more than 150,000 hours of work
each day. What if we could make positive use of this human effort? reCAPTCHA
does exactly that by channelling the effort spent solving CAPTCHAs online into
"reading" books.
To archive human knowledge and to make information more accessible to the
world, multiple projects are currently digitizing physical books that were written
before the computer age. The book pages are being photographically scanned, and
then transformed into text using "Optical Character Recognition" (OCR). The
transformation into text is useful because scanning a book produces images, which are
difficult to store on small devices, expensive to download, and cannot be searched.
The problem is that OCR is not perfect.
15
reCAPTCHA improves the process of digitizing books by sending words that

cannot be read by computers to the Web in the form of CAPTCHAs for humans to
decipher. More specifically, each word that cannot be read correctly by OCR is placed
on an image and used as a CAPTCHA. This is possible because most OCR programs
alert you when a word cannot be read correctly.
But if a computer can't read such a CAPTCHA, how does the system know the
correct answer to the puzzle? Here's how: Each new word that cannot be read
correctly by OCR is given to a user in conjunction with another word for which the
answer is already known. The user is then asked to read both words. If they solve the
one for which the answer is known, the system assumes their answer is correct for the
new one. The system then gives the new image to a number of other people to
determine, with higher confidence, whether the original answer was correct.
Currently, reCAPTCHA is employed in digitizing books as part of the Google Books
Project.
Fig
2.13
(reCAPTCHA) first line shows scanned text, second line shows text read by OCR
2.5 Applications
CAPTCHAs are used in various Web applications to identify human users and
to restrict access to them. Some of them are:
1. Online Polls [5]: As mentioned before, bots can wreak havoc to any
unprotected online poll. They might create a large number of votes which
would then falsely represent the poll winner in spotlight. This also results in
decreased faith in these polls. CAPTCHAs can be used in websites that have
embedded polls to protect them from being accessed by bots, and hence bring
up the reliability of the polls.
2. Protecting Web Registration [6]: Several companies offer free email and
other services. Until recently, these service providers suffered from a serious
16
problem bots. These bots would take advantage of the service and would
sign up for a large number of accounts. This often created problems in account
management and also increased the burden on their servers. CAPTCHAs can
effectively be used to filter out the bots and ensure that only human users are
allowed to create accounts.
3. Preventing comment spam [7]: Most bloggers are familiar with programs
that submit large number of automated posts that are done with the intention of
increasing the search engine ranks of that site. CAPTCHAs can be used before
a post is submitted to ensure that only human users can create posts. A
CAPTCHA won't stop someone who is determined to post a rude message or
harass an administrator, but it will help prevent bots from posting messages
automatically.
4. Search engine bots: It is sometimes desirable to keep web pages unindexed to
prevent others from finding them easily. There is an html tag to prevent search
engine bots from reading web pages. The tag, however, doesn't guarantee that
bots won't read a web page; it only serves to say "no bots, please." Search
engine bots, since they usually belong to large companies, respect web pages
that don't want to allow them in. However, in order to truly guarantee that bots
won't enter a web site, CAPTCHAs are needed.
5. E-Ticketing: Ticket brokers like Ticketmaster also use CAPTCHA
applications. These applications help prevent ticket scalpers from bombarding
the service with massive ticket purchases for big events. Without some sort of
filter, it's possible for a scalper to use a bot to place hundreds or thousands of
ticket orders in a matter of seconds. Legitimate customers become victims as
events sell out minutes after tickets become available. Scalpers then try to sell
the tickets above face value. While CAPTCHA applications don't prevent
scalping; they do make it more difficult to scalp tickets on a large scale.
6. Email spam: CAPTCHAs also present a plausible solution to the problem of
spam emails. All we have to do is to use a CAPTCHA challenge to verify that
an indeed a human has sent the email.
7. Preventing Dictionary Attacks [8, 9, 10]: CAPTCHAs can also be used to
prevent dictionary attacks in password systems. The idea is simple: prevent a
17
computer from being able to iterate through the entire space of passwords by
requiring it to solve a CAPTCHA after a certain number of unsuccessful
logins. This is better than the classic approach of locking an account after a
sequence of unsuccessful logins, since doing so allows an attacker to lock
accounts at will.
8. As a tool to verify digitized books: This is a way of increasing the value of
CAPTCHA as an application. An application called reCAPTCHA harnesses
users responses in CAPTCHA fields to verify the contents of a scanned piece
of paper. Because computers arent always able to identify words from a
digital scan, humans have to verify what a printed page says.
Then its possible for search engines to search and index the contents of a
scanned document. This is how it works: The application already recognizes
one of the words. If the visitor types that word into a field correctly, the
application assumes the second word the user types is also correct. That
second word goes into a pool of words that the application will present to
other users. As each user types in a word, the application compares the word to
the original answer. Eventually, the application receives enough responses to
verify the word with a high degree of certainty. That word can then go into the
verified pool.
18
CHAPTER 3
Literature Review
The purpose of literature review is to establish a theoretical framework for topic.

The need for CAPTCHAs rose to keep out the website/search engine abuse by bots. In
1997, AltaVista sought ways to block and discourage the automatic submissions of
URLs into their search engines. Andrei Broder, Chief Scientist of AltaVista, and his
colleagues developed a filter. Their method was to generate a printed text randomly
that only humans could read and not machine readers. Their approach was so effective
that in a year, spam-add-ons were reduced by 95% and a patent was issued in 2001.
3.1 Existing System

A. Text based Captcha: The most commonly used CAPTCHAs are textbased where distorted text is displayed. To solve the CAPTCHA, users must
recognize the distorted characters and correctly enter them in a designated
space.
Weakness of Text based Captcha
The number of classes of characters and digits are very small. When the noise and
distortion is added to the text based captcha they often create a problem in
recognizing them. Although some alphabets and digits have very different shapes, but
when they are distorted, it is become difficult to recognize them. For example, 7 may
look like 1, cl can be confused with d, nn with m.
In January 2008 article published in informationweek.com claiming Yahoos captcha
[11] security had been broken. In February 2008 in www.theregister.co.uk claiming
that Googles captcha [12] had been broken by spammers. In May, Microsofts
captcha [13] security had been broken.
B.Picture based Captcha [14]: In picture recognition the motivation here is

that humans are much better at recognizing picture than computers are, and
that perhaps we can use that advantage to make a good CAPTCHA. IdentiPic
is photo based CAPTCHA system where user has to identify picture. Pictures
19
are shown and corresponding to each pic there is drop down list having few
options.
Weakness of Picture based Captcha:
It creates a problem to users having low vision or learning disability [15].
Most of the time object Recognition becomes cumbersome due to the
ambiguity present in image objects. Instead of Turing test it has become
almost an IQ test
3.1.1 Issues with CAPTCHAs

There are many issues with CAPTCHAs, primarily because they distort text
and images in such a way that, sometimes it gets difficult for even humans to read.
Even the simplest, but effective CAPTCHA, like a mathematical equation What is
the sum of three and five? can be a pain for cognitively disabled people.
3.1.1.1 Usability issues with text based CAPTCHAs:

Are text CAPTCHAs like Gimpy, userfriendly? Sometimes the text is
distorted to such an extent, that even humans have difficulty in understanding it.
Some of the issues are listed in table 3.1
Table 3.1. Usability issues in text based CAPTCHAs
20
Distortion becomes a problem when it is done in a very haphazard way. Some

characters like d can be confused for cl or m with rn. It should also be easily
understandable to those who are unfamiliar with the language.
Content is an issue when the string length becomes too long or when the string is not
a dictionary word. Care should be taken not to include offensive words.
Presentation should be in such a way as to not confuse the users. The font and colour
chosen should be user friendly.
3.1.1.2 Usability of audio CAPTCHAs:

In audio CAPTCHAs, letters are read aloud instead of being displayed in an
image. Typically, noises are deliberately added to prevent such audio schemes from
being broken by current speech recognition technologies.
Distortion: Background noises effectively distort sounds in audio CAPTCHAs. There
is no rigorous study of what kind of background noises will introduce acceptable
sound distortion. However, it is clear that distortion methods and levels, just as in text
based CAPTCHAs, can have a significant impact on the usability of audio
CAPTCHAs. For example, an early test in 2003 showed that the distorted sound in an
audio CAPTCHA that was deployed at Microsofts Hotmail service was unintelligible
to all (four) journalists, with good hearing, that were tested. Due to sound distortion,
confusing characters can also occur in audio CAPTCHAs. For example, we observed
that it is hard to tell apart p and b; g and j, and a and 8. Whether a scheme is
friendly to non-native speakers is another usability concern for audio CAPTCHAs.
Content: Content materials used in audio CAPTCHAs are typically language
specific. Digits and letters read in a language are often not understandable to people
who do not speak the language. Therefore, unlike text-based schemes, localisation is a
major issue that audio CAPTCHAs face.
Presentation: The use of colour is not an issue for audio CAPTCHAs, but the
integration with web pages is still a concern. For example, there is no standard
graphical symbol for representing an audio CAPTCHA on a web page, although many
schemes such as Microsoft and reCAPTCHA use a speaker symbol. More
21
importantly, what really matters for visually impaired users is that the html image
alternative text attached to any of the above symbol should clearly indicate the need to
solve an audio CAPTCHA.
When embedded in web pages, audio CAPTCHAs can also cause compatibility
issues. For example, many such schemes require JavaScript to be enabled. However,
some users might prefer to disable JavaScript in their browsers. Some other schemes
can be even worse. For example, we found that one audio scheme requires Adobe
Flash support. With this scheme, vision-impaired users will not even notice that such
a CAPTCHA challenge exist in the page, unless Flash is installed in their computers apparently, no text alternative is attached to the speaker-like Flash object, either.
3.1.2. Decaptcha [16]

In this section we present our captcha breaker, Decaptcha, which is able to break
many popular captchas including eBay [17], Wikipedia and Digg [18]. Then we
discuss the rationale behind its five stage pipeline. Decaptcha uses the aForge
framework [19] and the Accord framework that provide easy access to image
manipulation filters, and standard machine learning algorithms such as SVM [20].
Decaptcha pipeline stages are:
1. Pre-processing: In this first stage, the captchas background is removed using
several algorithms and the captcha is binarized (represented in black and white) and
stored in a matrix of binary values. Transforming the captcha into a binary matrix
makes the rest of the pipeline easier to implement, as the remaining algorithm works
on a well-defined abstract object. The downside of using a binary representation is
that we lose the pixel intensity. However in practice this was never an issue.
2. Segmentation: In this stage Decaptcha attempts to segment the captchas using
various segmentation techniques, the most common being CFS [21] (Colour Filling
Segmentation) which uses a paint bucket flood filling algorithm [22] This is the
default segmentation technique because it allows us to segment the captcha letters
even if they are tilted, as long as they are not contiguous.
22
3. Post-Segmentation: At this stage the segments are processed individually to make

the recognition easier. During this phase the segments sizes are always normalized.
4. Recognition: In training mode, this stage is used to teach the classifier what each
letter looks like after the captcha has been segmented. In testing mode, the classifier is
used in predictive mode to recognize each character.
5. Post-processing: During this stage the classifiers output is improved when
possible. For example, spell checking is performed on the classifiers output for
Slashdot because we know that this captcha scheme uses dictionary words. Using
spellchecking allows us to increase our precision on Slashdot from 24% to 35% .
3.2 Proposed System:

A: Word Grouping Captcha: The user is presented with six words, and asked
to divide the group into two subsets. Any categorization is possible, for example: do
they sound same? Whether three words belongs to games while other three are
computer gadgets. There will be three column, in which first column will show words,
second column will show radio buttons of set A and third column will show radio
buttons of set B. There will be six rows which will represent all six words. User just
needs to click on radio buttons by dividing those six words in two groups as set A and
set B.
Fig. 3.1 Word Grouping Captcha (Unsolved)

23
Fig. 3.2 Word Grouping Captcha (Solved)

Advantage over existing Captcha system
No readability issue with word grouping captcha. Words are easily
understandable, no confusion in recognising them. We can add large number of word
group sets in our database. No problem with people having colour vision problem.
User just needs to divide the words in two subgroups. It only requires text based
interface. As it is new in comparison with existing captcha system so attacks are less
vulnerable.
3.3 Feasibility Study
A feasibility study is a high-level capsule version of the entire System analysis
and Design Process. The study begins by classifying the problem definition.
Feasibility is to determine if its worth doing. Once an acceptance problem definition
has been generated, the analyst develops a logical model of the system. A search for
alternatives is analysed carefully. There are 3 parts in feasibility study.
1. Economical Feasibility
2. Operational Feasibility
3. Technical Feasibility
24
3.3.1 Economical Feasibility

Economic feasibility attempts 2 weigh the costs of developing and implementing a
new system, against the benefits that would accrue from having the new system in
place. This feasibility study gives the top management the economic justification for
the new system.A simple economic analysis which gives the actual comparison of
costs and benefits are much more meaningful in this case. In addition, this proves to
be a useful point of reference to compare actual costs as the project progresses. There
could be various types of intangible benefits on account of automation. These could
include increased customer satisfaction, improvement in product quality better
decision making timeliness of information, expediting activities, improved accuracy
of operations, better documentation and record keeping, faster retrieval of
information, better employee morale.
3.3.2 Operational Feasibility
Proposed project is beneficial only if it can be turned into information systems that
will meet the organizations operating requirements. Simply stated, this test of
feasibility asks if the system will work when it is developed and installed. Are there
major barriers to Implementation? Here are questions that will help test the
operational feasibility of a project:
Is there sufficient support for the project from management from users? If
the current system is well liked and used to the extent that persons will not be able to
see reasons for change, there may be resistance.
Are the current business methods acceptable to the user? If they are not,
Users may welcome a change that will bring about a more operational and useful
systems.
Since the proposed system was to help to reduce the chances of breaking of captcha
system by implementing Word grouping captcha. Since this system is new so it is less
vulnerable to attacks. In the existing manual system, the new system was considered
to be operational feasible.
25
3.3.3 Technical Feasibility

Evaluating the technical feasibility is the trickiest part of a feasibility study.
This is because, at this point in time, not too many detailed design of the system,
making it difficult to access issues like performance, costs on (on account of the kind
of technology to be deployed) etc. A number of issues have to be considered while
doing a technical analysis.Understand the different technologies involved in the
proposed system before commencing the project we have to be very clear about what
are the technologies that are to be required for the development of the new system.
Find out whether the organization currently possesses the required technologies. Is the
required technology available with the organization?
This new system just requires Java 1.6, Java Eclipse and oracle database to run on
existing system.
26
CHAPTER 4
PROJECT DESCRIPTION
4.1 Problem Definition

Completely Automated Public Turing Tests to Tell Computers and Humans Apart
(CAPTCHAs) are now an almost standard security mechanisms for defending against
undesirable and malicious bot programs on the Internet (especially those bots that can
sign up for thousands of accounts a minute with free email service providers, send out
thousands of spam messages in an instant, or post numerous comments in blogs
pointing both readers and search engines to irrelevant sites). CAPTCHAs generate
and grade tests that most humans can pass but current computer programs cant.
.
4.2 Overview of the Project

The first step in developing anything is to state the requirements. This applies just as
much to leading edge research as to simple programs and to personal programs, as
well as to large team efforts. Being vague about your objective only postpones
decisions to a later stage where changes are much more costly.
The problem statement should state what is to be done and not how it is to be done. It
should be a statement of needs, not a proposal for a solution. A user manual for the
desired system is a good problem statement. The requestor should indicate which
features are mandatory and which are optional, to avoid overly constraining design
decisions. The requestor should avoid describing system internals, as this restricts
implementation flexibility. Performance specifications and protocols for interaction
with external systems are legitimate requirements. Software engineering standards,
such as modular construction, design for testability, and provision for future
extensions, are also proper.
Completely Automated Public Turing Tests to Tell Computers and Humans Apart
(CAPTCHAs) are now an almost standard security mechanisms for defending against
undesirable and malicious bot programs on the Internet (especially those bots that can
sign up for thousands of accounts a minute with free email service providers, send out
thousands of spam messages in an instant, or post numerous comments in blogs
27
pointing both readers and search engines to irrelevant sites). CAPTCHAs generate
and grade tests that most humans can pass but current computer programs cant. Such
tests are often called CAPTCHA challenges that are based on hard, open artificial
intelligence problems.
To date, the most commonly used CAPTCHAs are text-based, in which the challenge
appears as an image of distorted text that the user must decipher and retype. These
schemes typically exploit the difficulty for state-of-the-art computer programs to
recognize distorted text. Well-known examples include EZ-Gimpy, Gimpy, and
Gimpy-r, all developed at Carnegie Mellon University. Google, Microsoft, and Yahoo
have also developed and deployed their own text CAPTCHAs. A good CAPTCHA
must not only be human friendly but also robust enough to resist computer programs
that attackers write to automatically pass CAPTCHA tests. However, designing
CAPTCHAs that exhibit both good robustness and usability is much harder than it
might seem. The current collective understanding of this topic is very limited, as
suggested by the fact that many well-known schemes break. In particular, we recently
found that we could break a widely deployed CAPTCH which is carefully designed
and tuned by Microsoft with a success rate of higher than 60 percent, even though its
design goal was that automated attacks shouldnt achieve a success rate of higher than
0.01 percent. We expect that CAPTCHA will go through the same process of
evolutionary development as cryptography, digital watermarking, and the like, with an
iterative process in which successful attacks lead to the development of more robust
systems.
4.3. System Specification
4.3.1 Hardware Requirements
Processors
Speed
RAM
Hard Disk
Monitor
Key Boards
Mouse
:
:
:
:
:
:
:
28
Pentium IV
2.4GHz
128 MB
20GB
Color Monitor
104 Keys
3 buttons
4.3.2
Software Requirements
Operating System
Language
Database
Server
:
:
:
:
Windows XP/2000
Java (Jdk 1.6)
Oracle 10g, Heidi SQL
Apache Tomcat 7.0.29
4.4. Development Environment

The document play a vital role in the development of life cycle (SDLC) as it describes
the complete requirement of the system. It means for use by developers and will be
the basic during testing phase. Any changes made to the requirements in the future
will have to go through formal change approval process.
SPIRAL MODEL was defined by Barry Boehm in his 1988 article, A spiral Model
of Software Development and Enhancement. This model was not the first model to
discuss iterative development, but it was the first model to explain why the iteration
models.
As originally envisioned, the iterations were typically 6 months to 2 years long. Each
phase starts with a design goal and ends with a client reviewing the progress thus far.
Analysis and engineering efforts are applied at each phase of the project, with an eye
toward the end goal of the project.
The steps for Spiral Model can be generalized as follows:
The new system requirements are defined in as much details as possible. This
usually involves interviewing a number of users representing all the external or
internal users and other aspects of the existing system.
A preliminary design is created for the new system.
A first prototype of the new system is constructed from the preliminary design.
This is usually a scaled-down system, and represents an approximation of the
characteristics of the final product.
29
A second prototype is evolved by a fourfold procedure:

1. Evaluating the first prototype in terms of its strengths, weakness, and
risks.
2. Defining the requirements of the second prototype.
3. Planning and designing the second prototype.
4. Constructing and testing the second prototype.
At the customer option, the entire project can be aborted if the risk is deemed too
great.
Risk factors might involve development cost overruns, operating-cost
miscalculation, or any other factor that could, in the customers judgment, result
in a less-than-satisfactory final product.
The existing prototype is evaluated in the same manner as was the previous
prototype, and if necessary, another prototype is developed from it according to
the fourfold procedure outlined above.
The preceding steps are iterated until the customer is satisfied that the refined
prototype represents the final product desired.
The final system is thoroughly evaluated and tested. Routine maintenance is

carried on a continuing basis to prevent large scale failures and to minimize down
time.
Why Spiral model?

1) Spiral Life Cycle Model is one of the most flexible SDLC models in
place. Development phases can be determined by the project manager,
according to the complexity of the project.
2) Project monitoring is very easy and effective. Each phase, as well as
each loop, requires a review from concerned people. This makes the model
more transparent and better than other models.
3) Risk management is one of the in-built features of the model, which
makes it extra attractive compared to other models.
4) Changes can be introduced later in the life cycle as well. And coping
with these changes is not a very big headache for the project manager.
5) Project estimates in terms of schedule, cost etc. become more and more
realistic as the project moves forward and loops in spiral get completed.
30
The following diagram shows how a spiral model acts like:
Fig 4.1-Spiral Model

4.5 Design techniques:
Design is a meaningful engineering representation of something that is to be
built. Software design is a process through which the requirements are translated into
a representation of the software. Design is the place where quality is fostered in
software engineering. Design is the perfect way to accurately translate a customers
requirement in to a finished software product. Design creates a representation or
model, provides detail about software data structure, architecture, interfaces and
components that are necessary to implement a system. This chapter discusses about
the design part of the project. Here in this document the various UML diagrams that
are used for the implementation of the project are discussed.
31
4.5.1 Activity Diagram:

The purpose of activity diagram is to provide a view of flows and what is going
on inside a use case or among several classes. Activity diagram can also be used to
represent a classs method implementation. A token represents an operation. An
activity is shown as a round box containing the name of the operation. An outgoing
solid arrow attached to the end of activity symbol indicates a transition triggered by
the completion.
Open URL
Choose type of Captcha
Enter CAPTCHA Security code for text Captcha

Select correct option for Picture Based Captcha
Divide the word group in two subgroups
Correct Code
User Login
Incorrect code
Verification failed
Fig. 4.2 Activity Diagram

4.5.2 Sequence Diagram:
32
Sequence diagram are an easy and intuitive way of describing the behavior of a system
by viewing the interaction between the system and its environment. A Sequence diagram
shows an interaction arranged in a time sequence. A sequence diagram has two dimensions:
vertical dimension represents time; the horizontal Dimension represents different objects. The
vertical line is called is the objects life line. The lifeline represents the objects existence
during the interaction.
Sequence Diagram
Browser
Website
LoginController
.java
WorkLogin
.jsp
validatelogin
.java
With
1. Connect
2. Redirect
3. Connects to loginController.java
4. Forward Request
5.Login form(id, password, Captcha image) is sent to the browser

6.Browser submits id, passwords, Captcha random num to ValidateLogin.Java
7. Checks
8. Sends the response to website
9. User Authorisation
10. Redirect
11. Browser redirects to the original URL in step 1

Fig. 4.3 Sequence Diagram
33
Step wise Description of sequence diagram

1. Connect to a protected URL
2. Website ask the browser to redirect to loginController.java
3. Browser connects to loginController.java
4. Login Controller requests for a new captcha image, put the random number in the
HTTP session and forward the request to workLogin.jsp
5. Login form(id, password, Captcha image) is sent to the browser
6. Browser submits id, passwords, Captcha random number to ValidateLogin.Java.
7. ValidateLogin.java checks id, passwords and captcha random number
8. After successful validation ValidateLogin.java sends the response to website
9. Website performs the user authorization.
10. Website ask the browser to redirect to the original URL in step1
11. Browser redirects to the original URL in step 1.
34
CHAPTER 5
CAPTCHA EVALUATION PARAMETERS [23]
5.1 Parameters
5.1.1 Consistency
When presented with the same Captcha, how reproducible is a user's answer? The
level of consistency will clearly vary across different Captcha, and the
acceptable level will vary by application (some may be more lenient than
others).
5.1.2 Entropy
By entropy, we mean two things: First do different people answer the same
captcha in the same way? Second how hard is it for an adversary to guess the
users answers?
5.1.3 Ease of generationHow difficult is it to generate a given Captcha? Can it be generated given only
randomness, or does it require a precomputed/pregenerated corpus.
5.1.4 Implementation
Finally, how easy is it to implement? Does it require complex and elaborate
graphics, or can it be implemented for a text-only system? How accessible is
it?
35
5.2 Captcha Evaluation

5.2.1 Text based captcha
Consistency For Text based captcha Consistency is high.
Entropy Nearly everyone provide the same solutions for all of the Text based
CAPTCHAs, there is essentially no variation. In some cases user may confused a z
with an x and an o with an a, but all of the other answers chances are same. While
there was basically no inter-user variation. There are 44 possible letters in each
position (upper and lowercase, with commonly-confused pairs like C/G, I/l, Q/O, h/b
removed), and five possible positions, yielding 445= 227 possible puzzle answers.
Ease of generation Text based CAPTCHAs are, by design, fairly easy to generate
they simply require the text to be rendered and some randomness.
Implementation- Implementation is easy in comparison with other image based or
word grouping Captcha.
5.2.2
Picture Captcha
Consistency Consistency is good, if user is able to recognize the picture correctly.

Entropy- Entropy is less than text based captcha, since it depends on the quality of picture,
difficulty level of picture and users ability to recognize pictures. Image recognition is a hard
problem
Ease of generation- It uses a larger database of photographs and animated images of
everyday object.
Implementation -some implementations use only a small fixed pool of CAPTCHA images.
Eventually, when enough image solutions have been collected by an attacker over a period of
time, the test can be broken by simply looking up solutions in a table, based on a hash of the
challenge image.
36
5.2.3 Word Grouping

Consistency-Word Grouping seems somewhat memorable without much practice. We
suspect that this rate can be boosted with a tiny bit of practice.
Entropy- In principle, each word grouping has 2^6 possible outputs and we expect to
see a large amount of variation.
Ease of generation-Word Grouping has some of the limitations on its corpus, it
cannot become too large or users will not recognize some of the words.
Implementation-The implementation is straightforward, and has several desirable
properties. The task is easy and the user interface is simple and accessible which
means that it can work on screen readers or in a text-only environment like a login
prompt.
CAPTCHA
SYSTEM
CONSISTENCY
ENTROPY
EASE OF
GENERATION
IMPLEMENTATION
Text Based
Captcha
High
Easy to generate ,
Require text to
rendered and
some randomness
Easy
Picture Based
Medium, Can be
high if user can
recognize picture
correctly
User may
confuse with
z/x, o/a,
C/G,I/l, Q/O,
h/b
Less,
depends on
quality and
difficulty
level of
Picture
Uses large
database of
pictures, so
require large
collection of
pictures.
Word
Grouping
Less, but can be

boosted with tiny
bit of practice
26 possible
outputs,
Large amount
of variation
Limitations,
cannot become
too large, either
user will not able
to recognize some
words
Some implementations
use only a small fixed
pool of CAPTCHA
images. Test can be
broken by simply
looking up solutions in a
table, based on a hash of
the challenge image.
Only require text based
interface
Table 5.1 Comparison Between different Captchas based on Parameters
37
CHAPTER 6
EXPERIMENTAL RESULTS AND ANALYSIS
6.1 Text based Captcha
After opening the webpage user will have an option to choose one captcha
among word grouping, picture based captcha and text based captcha. Word
grouping captcha is set as default. If user chooses text based captcha then he
needs to recognize the text given in image. The text will be generated among
letters a to z and numbers 0 to 9. User has to recognize those texts correctly
and will have to write in the box of Enter verification text. After writing those
texts user has to click on verify button.
Fig. 6.1 Snapshot Text Based Captcha
If user will not write those texts correctly and clicks on verify button then a
message will appear Text Captcha Verification Failed. Please solve it again.
38
Fig. 6.2 Snapshot of Invalid Text Captcha

6.2. Picture Based Captcha
In Picture based captcha user will get a picture along with a drop down menu.
Drop down menu will have ten options, out of which only one is correctly
matches with given picture. User has to recognize the picture, select the
correct option from drop down menu, and then click on the verify button. If
user has selected the correct option then, the login page will appear.
Fig 6.3 Snapshot of Picture Based Captcha

In Case of incorrect option selected a message will appear Picture Captcha
Verification Failed. Please solve it again.
39
Fig 6.4 Snapshot of Invalid Picture based Captcha

6.3 Word Grouping Captcha
Word grouping Captcha is set as default on the web page. User will get a group
of six words. User needs to divide the words into two subcategories of three
words each. There will be six rows and three columns. Each row will represent
different word. In first column there will be words, in second column radio
buttons of set A and in third column radio buttons of set B. User has to click
on the radio buttons to divide words in two sub category.
Fig. 6.5 Snapshot of Word grouping Captcha

If user will not be able to divide those words in correct groups a message will
appear that Word grouping verification Captcha failed.
40
Fig 6.7 Snapshot of Invalid Word grouping Captcha
41
6.4. Authentication
After successfully solving any one among three captchas the user will be
directed to login page. User has to write user name and assigned password at
the designated place.
Fig.6.8
Snapshot of Login Page

After writing correct username and password user will be authenticated.
Fig. 6.9 Snapshot of Authentication page
42
CHAPTER 7
CONCLUSION AND FUTURE WORK
CONCLUSION
Creating a captcha that is so secure that no human can solve it or so user friendly that
it is a trivial task for captcha breaking software is very easy to accomplish. A
successful captcha by its definition is able to tell humans and computers a part. The
goal is to add security features whenever possible as long as they do not significantly
or unnecessarily decrease the accuracy of human solvers. Text based captcha are now
breakable. If we will increase distortion, blurring and other factors, then it will be
hard for human beings also to read those texts while our goal is to differentiate
between humans and computers. Word grouping captcha proposes a solution for this
problem. Its hard to break and user may not find any difficulty in dividing those
words in two subgroups. Since user has to just click on radio buttons, so it is less time
consuming also.
FUTURE WORK
It is not possible to develop a system that makes all the requirements of user. User
requirements keep changing as the system is being used. Some of the future
enhancements that can be done to this system are
It is based on object oriented design, any further changes can be easily adaptable.
Based on the future security issues, security can be improved using emerging
technologies.
In word grouping captcha, in place of clicking radio buttons, user may be asked to
type those six words in two subcategory.
More research work needs to be done by researcher to create captcha based on
different AI problems, hopefully they will become more user-friendly with disabilities
43
REFRENCES
1.Moni Naor Verification of a human in the loop or identification via the Turing
test.Unpublished Manuscript,1997. A preliminary draft available on
http://www.wisdom.weizmann.ac.il/~naor/PAPERS/human.pdf Pages. 2-3.

2.Luis Von Ahn , Manuel Blum , and Jo n Langford Telling humans and computer
apart automatically. In communications of the ACM Vol. 47, No.2, February 2004,
Pages. 57-58
3.Graeme Baxter Bell Strengthening Captcha based web security ,First Monday,
Volume 17, Number 2 - 6 February 2012 Pages 2-3
4. identiPic CAPTCHA available at http://identipic.com
5.Luis Von Abn, Manual Blum, Nichlas Hoper and John Langford CAPTCHA:Using
Hard AI Problems For Security.Computer Science Dept., Carnegie Mellon
University, Pittsburgh PA 15213, USA
6.H. S. Baird and K. Popat. Human interactive proofs and document image
analysis. Proc. of 5th IAPR Int. Workshop on Document Analysis Systems (DAS
2002), vol. 2423 of LNCS, pages 507518.
7. The Official CAPTCHA site located on the internet at http://www.captcha.net/
8. Dictionary Attack, available at http://en.wikipedia.org/wiki/Dictionary_attack
9. S.Chakrabarti and M. Singhal. Password-based authentication: Preventing
dictionary attacks. Computer, 40(6): June 2007, pages 68-74.
10. B. Pinkas and T. Sander. Securing passwords against dictionary attacks. Proc.
of 9th
Conf. on Computer and Communications Security, Nov. 2002, pages 161-
170.
11. http://firstmonday.org/ojs/index.php/fm/article/viewArticle/3630/3145
12.http://www.informationweek.com/news/internet/webdev/showArticle.jhtml?
articleID=205900620
44
13. http://www.theregister.co.uk/2008/02/25/gmailcaptchacrack/
14. Rizwan Ur Rahman Survey on Captcha systems In Journal of Global
Research in Computer Science, Volume 3, No. 6, June 2012, Pages. 55-57
15.Moin Mahmud Tanvee, Mir Tafseer Nayeem, Md. Mahmudul Hasan RafeeMove
& Select: 2-Layer CAPTCHA based on Cognitive Psychology for securing web
services. Taken from ijens.org, available at
http://www.ijens.org/Vol_11_I_05/117005-8383-IJVIPNS-IJENS.pdf
16.Elie Bursztein, Matthieu Martin, John C. Mitchell Text-based CAPTCHA
Strengths and Weaknesses in ACM Computer and Communication security 2011
(CSS2011) Pages 11-12
17.E. Bursztein and S. Bethard, 2009. Decaptcha: Breaking 75% of eBay audio
CAPTCHAs, Proceedings of WOOT 09: Third USENIC Workshop on Offensive
Technologies (10 August Montreal, Canada), at
http://www.usenix.org/event/woot09/tech/full_papers/bursztein.pdf 10 February
2012. Page-3
18.Jennifer Tam, Jiri Simsa, Sean Hyde, Luis Von Ahn Breaking Audio
CAPTCHAs Carnegie Mellon University page -5
www.captcha.net/Breaking_Audio_CAPTCHAs.pdf
19.Andrew Kirillov. aforge framework. http://www.aforgenet.com/framework/
20.C.Cortes and V. Vapnik. Support vector networks.
Machine learning,
20(3):1995, pages 273-295.

21. J.Yan and A.S. El Ahmad. A Low-cost Attack on a Microsoft CAPTCHA. In
Proceedings of the 15th ACM conference on Computer and communications
security, ACM,2008, pages 543-554.
22.Wikipedia. Flood fill algorithm. http://en.wikipedia.org/wiki/Flood_fill
23.Waseem Daher POSH: A Generalized Captcha with security application In
AISec ,Proceedings of the 1st ACM
2008,Pages.1-2.
45
workshop on Workshop on AISec ,

Body Report

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Body Report

Uploaded by

Copyright:

Available Formats

CHAPTER 1

are being used to distribute stolen or copyrighted material.

perception towards the product.

1.4 CAPTCHAs and the Turing Test:

1.5 Forms of attacks [3]

CAPTCHAs are classified based on what is distorted and presented as a

characters and correctly enter them in a designated space.

Fig 2.1. Gimpy CAPTCHA

Fig 2.2 Yahoos Ez Gimpy CAPTCHA

2.1.3 Baffle Text:

Fig 2.3 Baffle Text examples

2.1.4 MSN Captcha:

2.2 Graphic CAPTCHAs:

Fig 2.5 Bongo CAPTCHA

Fig. 2.6 ESP Pix

Fig. 2.7 Asirra

CAPTCHA the dog.

Fig. 2.8 Captcha the Dog Image

Fig 2.10 Dynamics image based captcha

Fig 2.11 IdentiPic

Fig 2.12 Puzzle

2.3 Audio CAPTCHAs:

2.4 reCAPTCHA and book digitization:

reCAPTCHA improves the process of digitizing books by sending words that

The purpose of literature review is to establish a theoretical framework for topic.

3.1 Existing System

B.Picture based Captcha [14]: In picture recognition the motivation here is

3.1.1 Issues with CAPTCHAs

3.1.1.1 Usability issues with text based CAPTCHAs:

Table 3.1. Usability issues in text based CAPTCHAs

Distortion becomes a problem when it is done in a very haphazard way. Some

3.1.1.2 Usability of audio CAPTCHAs:

3.1.2. Decaptcha [16]

3. Post-Segmentation: At this stage the segments are processed individually to make

3.2 Proposed System:

Fig. 3.1 Word Grouping Captcha (Unsolved)

Fig. 3.2 Word Grouping Captcha (Solved)

3.3.1 Economical Feasibility

3.3.3 Technical Feasibility

4.1 Problem Definition

4.2 Overview of the Project

4.4. Development Environment

The steps for Spiral Model can be generalized as follows:

A preliminary design is created for the new system.

A second prototype is evolved by a fourfold procedure:

Risk factors might involve development cost overruns, operating-cost

The final system is thoroughly evaluated and tested. Routine maintenance is

Why Spiral model?

The following diagram shows how a spiral model acts like:

Fig 4.1-Spiral Model

4.5.1 Activity Diagram:

Choose type of Captcha

Enter CAPTCHA Security code for text Captcha

Fig. 4.2 Activity Diagram

5.Login form(id, password, Captcha image) is sent to the browser

11. Browser redirects to the original URL in step 1

Step wise Description of sequence diagram

5.2 Captcha Evaluation

Consistency Consistency is good, if user is able to recognize the picture correctly.

5.2.3 Word Grouping

Less, but can be

Table 5.1 Comparison Between different Captchas based on Parameters

Fig. 6.1 Snapshot Text Based Captcha