Professional Documents
Culture Documents
CAPTCHA Origins
1997: Andrei Broder at AltaVista wanted to prevent bots from automatically submitting sites for indexing He decided to add a test to the submission page He reversed Brother scanner OCR optimization techniques 2000: Luis von Ahn, Manuel Blum & John Langford at CMU coined term CAPTCHA
H-CAPTCHA
Printed CAPTCHA
Printed
CAPTCHA is difficult to break Lots of algorithms are available to generate these Humans cannot identify these very easily Two major types are there viz. Baffle text,Pessimal print.
Handwritten CAPTCHA
less frequently used because human can easily identify the handwriting rather than text images Use of transformations by adding lines,arcs,circles etc.
GIMPY
Randomly chooses 7 words from a dictionary Distorts the words using a variety of techniques Human must correctly type 3 of the words to pass the test In the real world, most applications only test for a single word (EZGimpy)
GIMPY Examples
EZ-GYMPY
R-GIMPY
BONGO
A visual recognition problem Two sets of shapes with a distinguishing characteristic Must choose which set the shape belongs to
PIX
A database of labeled images of recognizable objects Randomly chooses an object and displays N pictures of it Must correctly identify the object Pictures are distorted
KittenAuth
The Cutest Human Test A 3x3 matrix of cute animals Choose the 3 kittens Strategy is to use animals that look similar to kittens
Audio CAPTCHA
Pick a word or a sequence of numbers at random Render them into an audio clip using a TTS software Distort the audio clip Ask the user to identify and type the word or numbers
Logic Puzzles
Easy trivia questions Example: Which of the following is a bird? Elephant, Tiger or Robin,Cons
Difficult to create a big enough database of these questions Difficult for ESL users / international users
Breaking CAPTCHA
Other CAPTCHAs were broken by streaming the tests for unsuspecting users to solve.
Uses of CAPTCHA
Online polls Free e-mail services Search engine bots Prevention to Worms and spams Preventing dictionary attack etc.
Properties
CAPTCHA should be automatically generated and graded Test can be taken quickly and easily by human users Test will accept virtually all human users and reject software agents Test will resist automatic attack for many years despite the technology advances and prior knowledge of algorithms
Yahoo! Registration
Final Thoughts
They
are crucial to preventing bot attacks Hopefully, they will become more user-friendly to people with disabilities (visual, mental) CAPTCHAs are mainly produced from AJAX and PHP technology Various algorithms are present Use of XML
Different CAPTCHAs
PHP
PHP originally known as Personal Home Page Its a Hypertext Preprocessor It is a scripting lang. Used to create dynamic web pages. With syntax from C,JAVA,perl etc PHP code is embedded within HTML pages for server side execution.
OCR
(Optical Character Recognition) The machine recognition of printed characters. OCR systems can recognize many different OCR fonts, as well as typewriter and computer-printed characters. Advanced OCR systems can recognize hand printing. When a text document is scanned into the computer, it is turned into a bitmap, which is a picture of the text. OCR software analyzes the light and dark areas of the bitmap in order to identify each alphabetic letter and numeric digit. When it recognizes a character, it converts it into ASCII text. Hand printing is much more difficult to analyze than machine-printed characters. Old, worn and smudged documents are also difficult. Scanning documents and processing them with OCR is sometimes as much an art as it is a
OCR
Segmentation
It is nothing but Image Processing Pixel based Segmentation Model based Segmentation Multi-scale Segmentation Semi-automatic Segmentation
Validators
Types of validators : 1) Mark up : checks web documents in format like HTML,XHTML etc. 2) Link validator : checks hyperlinks,useful to find broken links 3) CSS validator : checks stylesheet 4) RDF validator : checks RDF documents 5) Feed validator 6) P3P validator : related to protocols Etc.
Session Management
Process of keeping tracks of users activity across the sessions of interaction of user with comp sys. When user opens some web pages and does not do anything on that, session gets xpired. E.g : score watch on web site So after certain time when user re-login to the page then previously xpired session gets restored. E.g: if user opened yahoo acc in two windows, and after some time he\ she logged off from one window.then user cannot use same acc from other window, session gets xpired. User have to re-login to acc.
Session Management
There are types : 1) Desktop management 2) Browser management