You are on page 1of 28


• • • • • • • • Definition Background Types Applications Constructing CAPTCHAs Breaking CAPTCHAs Issues with CAPTCHAs Conclusion

al • A program that is a challenge – response test to separate humans from computer programs . et. Manuel Blum.Intro • CAPTCHA Completely Automated Public Turing test to tell Computers and Humans Apart • Invented at CMU by Luis von Ahn.

the user is inferred to be a human and allowed access • Else. user is a bot and denied access .• Generic CAPTCHAs distort letters and numbers • Distorted characters are presented to user • User has to recognize the distorted letters • If the guessed letters are correct.

• Humans can read the distorted and noisy text • Current OCRs cannot read them .

Amazon) .Background • Why CAPTCHA was needed? • • • • Sabotage of online polls Spam emails Abusing free online accounts Tampering with rankings on recommendation systems (like EBay.

• Luis von Ahn and Manuel Blum of CMU trademarked CAPTCHA in 2000 .• Altavista first used a crude CAPTCHA in their sites • Resulted in 95% spam reduction • Yahoo partnered CMU to counter these threats in Messenger chat service.

judge = CAPTCHA program. one is a machine. the machine passes the test o CAPTCHA employs a reverse Turing test. he is human if user fails. it is a machine o o o . he doesn’t know which is which o If judge can’t tell which is the machine. participant = user if user passes CAPTCHA.• What is a Turing test? Proposed by Alan Turing To test a machine’s level of intelligence Human judge asks questions to two participants.

table. water is a fruit? o o Very effective. normal language questions:  What is sum of three and thirty-five?  If today is Saturday.Types of CAPTCHAs • Text based: • Simple. what is day after tomorrow?  Which of mango. needs a large question bank Cognitively challenged users find it hard .

he is admitted o o . fills with noise o User has to recognize at least 3 words o If user is correct.• Gimpy: Designed by Yahoo and CMU Picks up 10 random words from dictionary and distorts.

• EZ-Gimpy: A modified version of Gimpy Yahoo used this version in Messenger Has only 1 random string of characters Not a dictionary word. so not prone to dictionary attack o Not a good implementation. already broken by OCRs o o o o .

• MSN’s Passport service CAPTCHAs: o o o o o Provided for Microsoft’s MSN services Use 8 characters Warping is used to distort Very strong implementation. hasn’t been broken It is segmentation-resistant .

M.Bongard. pattern recognition expert User has to solve a pattern recognition problem Has to tell the distinct characteristic between two sets of figures o Then tell to which set a given figure belongs to o o o .• Graphic based CAPTCHAs: • BONGO: After M.

.• PIX: Uses a large database of labelled images It shows a set of images. user has to recognize the common feature among those o E. Pick the common characteristic among the following four pictures-----”Aeroplane” o o .g.

• Audio CAPTCHAs: Consist of downloadable audio clip User listens and enters the spoken word Helps visually disabled users Below is the Google’s audio enabled CAPTCHA o Not popular o o o o .

Applications • Protect online polls • Prevent Web registration abuse. protect passwords from brute-force attack • Prevent comment spam and spam emails • E-Ticketing. prevent scalping .

• Verify digitized books: reCAPTCHA Used in Google Books Project Two words are shown. the program knows first word o If user enters first word correctly. it assumes that the second unknown word will also be entered correctly o Second word becomes “known” o o .

then current implementation is able to withstand attacks o • Thus AI knowledge is advanced if CAPTCHAs are broken .• Help advance AI knowledge • CAPTCHAs are called Hard-AI problems • A win-win scenario: If CAPTCHAs are broken by a bot. a Hard-AI problem is solved o If its not yet broken.

Constructing CAPTCHAs • Things to keep in mind: o Don’t store CAPTCHA solution in Web page’s metadata A CAPTCHA is no good if it doesn't distort Need a large database of different CAPTCHA questions Avoid repetition of questions o o o .

start again-Generate a different CAPTCHA • If correct.• CAPTCHA Logic: • Generate the question • Persist the correct answer • Present the question to user • Evaluate answer. allow access to user . if incorrect.

recaptcha.• Embeddable CAPTCHAs: Available freely. . ASP.g.NET. www.. JavaScript .net o No maintenance o • Custom CAPTCHAs: o o Fits to the theme of the page Better protected from spammers Can be written in any language– Perl. just embed code into Web page’s HTML. from e.

• Guidelines: o o o o o Accessibility Image security Script security Security after widespread adoption Custom implementation or a general CAPTCHA? .

Breaking CAPTCHAs • Cracking CAPTCHAs through programs Convert CAPTCHA into greyscale Detect patterns in the image corresponding to characters o Or. read session files of that user and know the CAPTCHA word o o  Solution: Only store a hash of the CAPTCHA word in session files .

g. Ez-Gimpy o To break this CAPTCHA   Segmentation: Locate possible letters in the image   Construct graph of consistent letters   Find out plausible words from the graph.• Greg Mori and Jitendra Malik have broken text CAPTCHAs. e. profit=9.94. use scores to rank roll=11..42 (better match) .

e.• Social engineering to break CAPTCHAs: Spammer encounters a CAPTCHA That CAPTCHA is copied to another site Humans are baited. free MP3s To get those MP3s.g.. users are told to solve the copied CAPTCHA o Solution is routed to the spammer o o o o  Solution: Fix a time-to-live period for a question • CAPTCHA cracking as a business: o Firms offer CAPTCHA cracking service in exchange for money .

Issues with CAPTCHAs • Usability issues: W3C mandates Web to be accessible to all people o Some CAPTCHAs are inaccessible to visually impaired. cognitively challenged people o • Compatibility issues: JavaScript may need to be activated in browsers o Some may need Adobe Flash plugin installed o .

Summary • CAPTCHAs are an effective way to counter bots and reduce spam • They serve dual purpose– help advance AI knowledge • Applications are varied– from stopping bots to character recognition & pattern matching • Some issues with current implementations represent challenges for future improvements .