You are on page 1of 32

Text extraction from images

by anchal agarwal (0906331011) reetika shukla (0906331076) shiv kumar (0906331) vimal kumar(0906331)

Under the Guidance of <mr. diwakar agarwal> Designation

Submitted to the Department of Electronics & Communication in partial fulfillment of the requirements for the degree of Bachelor of Technology In Electronics & Communication Engineering

<glaitm> Gautam Buddh Technical University December, 2012

TABLE OF CONTENTS
ACKNOWLEDGEMENT .................................................................................. ABSTRACT ........................................................................................................... LIST OF TABLES.................................................................................................. LIST OF FIGURES................................................................................................ LIST OF SYMBOLS .............................................................................................. LIST OF ABBREVIATIONS ................................................................................ CHAPTER 1 (INTRODUCTION, BACKGROUND OF THE PROBLEM, STATEMENT OF PROBLEM etc.).............................................................. 1.1. ................................................................................................................. 1.2. ................................................................................................................. CHAPTER 2 (OTHER MAIN HEADING) ......................................................... 3.1. .................................................................................................................. 3.2. .................................................................................................................. 3.2.1. ......................................................................................................... 3.2.2. ......................................................................................................... 3.2.2.1. ................................................................................................ 3.2.2.2. .......................................................................................... 3.3. ................................................................................................................. CHAPTER 4 (OTHER MAIN HEADING) ......................................................... 4.1. ................................................................................................................ 4.2. ................................................................................................................ CHAPTER 5 (CONCLUSIONS) ......................................................................... APPENDIX A ......................................................................................................... APPENDIX B ......................................................................................................... REFERENCES... ....................................................................................................

Page i ii iii iv v vi 1 5 8 13 15 17 19 20 21 22 23 30 36 39 40 45 47 49

ACKNOWLEDGEMENT
It gives us a great sense of pleasure to present the report of the B. Tech Project undertaken during B. Tech. Final Year. We owe special debt of gratitude to Professor/Asst. Prof Mr. Diwakar Agarwal, Department of Electronics & Communication Engineering, GLA University, Mathura for his constant support and guidance throughout the course of our work. His sincerity, thoroughness and perseverance have been a constant source of inspiration for us. It is only his cognizant efforts that our endeavors have seen light of the day. We also take the opportunity to acknowledge the contribution of Professor T.N Sharma, Head, Department of Electronics & Communication Engineering, GLA University, Mathura for his full support and assistance during the development of the project. We also do not like to miss the opportunity to acknowledge the contribution of all faculty members of the department for their kind assistance and cooperation during the development of our project. Last but not the least, we acknowledge our friends for their contribution in the completion of the project.
Signature: Name :anchal agarwal

Roll No.:0906331011 Date :

Signature: Name :reetika shukla

Roll No.:0906331076 Date :

Signature: Name :shiv kumar

Roll No.:0906331 Date :

Signature: Name :vimal kumar

Roll No.:0906331 Date :

ABSTRACT Text extraction in images has been developing rapidly since 1990s and is an important research field in content-based information indexing and retrieval, automatic annotation and structuring of images.Extraction of this information involves detection, localization, tracking, extraction, enhancement, and recognition of the text from a given image. However, variations of text due to differences in size, style, orientation, and alignment, as well as low image contrast and complex background make the problem of automatic text extraction extremely difficult and challenging job. A large number of techniques have been proposed to address this problem and the purpose of this paper is to classify and review these techniques, discuss the applications and performance evaluation, and to identify promising directions for future research. The amount of pictorial data has been growing enormously with the expansion of WWW. From the large number of images, it is very important for users to retrieve required images via an efficient and effective mechanism. To solve the image retrieval problem, many techniques have been devised addressing the requirement of different applications. Problem of the traditional methods of image indexing have led to the rise of interest in techniques for retrieving images on the basis of automatically derived features such as color, texture andshape a technology generally referred as Content-Based Image Retrieval (CBIR). After decade of intensive research, CBIR technology is now beginning to move out of the laboratory into the marketplace. However, the technology still lacks maturity and is not yet being used in a significant scale. List of tables: Table 1 List of figures: fig 1. An image an array or a matrix of pixels arranged in columns and rows. properties of text in images

Fig 2. Each pixel has a value from 0 (black) to 255 (white). The possible range of the pixel values depend on the colour depth of the image, here 8 bit = 256 tones or grayscales. Fig 3: A true-colour image assembled from three greyscale images coloured red, green and blue. Such an image may contain up to 16 million different colours. Fig4. image Fig5. Fig 6 Fig7. Fig8. Fig9. Fig10. Difference between Colored image and corresponding gray scale

RGB CUBE CMYK Circle text images Document images text images Flowchart of preprocessing 4

fig11. Fig12. Fig13.

Architecture of tie system Stepwise result of text detection Result of text extraction

INTRODUCTION Extracting text from images is an important problem in many applications like document processing , image indexing, . Usually,texts embedded in an image or a frame capture important media contexts such as players name,title, date, story introduction, and since including. Therefore, the task can provide various advantages for annotating an image and thus improves the accuracy of a content-based indexing system to search desired media content. Moreover,when analyzing video audios, the recognition result of text line can provide extra refinements for correcting the errors of speech recognition. Since 1990s, with rapid growth of available multimedia documents and increasing demand for information indexing and retrieval, much effort has been doneon text extraction in images . A larger 5

number of approaches, such as region based, edgebased, morphological based and texture based methods, have been proposed and already obtainedimpressive performance. Documents in which text is embedded in complex colored backgrounds are increasingly common today, for example, in magazines, advertisements and web pages. Robust detection of text from these documents is a challenging problem. Text extraction has a vast number of applications : Text searches in Images - Currently, Image searches deliver inaccurate results as they do not search the image content. Text extraction would enable better searching by extracting the content of an image. Content based Indexing - For the purpose of archiving and indexing documents, the content of the document is required in the digital format. Knowledge about the text content of documents can help in the building of an intelligent system which archives and indexes the printed documents. Reading foreign language text - One of the common problems faced by a person in foreign land is that of communication, understanding road signs, signboards etc. The proposed method, aims to alleviate such problems by reading the text information from the image scenes whichare captured by a camera. Archiving documents - Archives of paper documents in offices or other printed material like magazines and newspapers can be electronically converted for more efficient storage and instant delivery to home or office computers. Content-based image indexing refers to the process of attaching labels to images based on their content. Image content can be divided into two main categories: perceptual content and semantic content . Perceptual content includes attributes such as color, intensity, shape, texture, and their temporal changes, whereas semantic content means objects, events, and their relations. A number of studies on the use of relatively low-level perceptual content for image and video indexing have already been reported. Studies on semantic image content in the form of text, face, vehicle, and human action have also attracted some recent interest . Among them, text within an image is of particular interest as (i) it is very useful for describing the contents of an image; (ii) it can be easily extracted compared to other semantic contents, and (iii) it enables applications such as keywordbased image search, automatic video logging, and text-based image indexing.

SCOPE AND ORGANIZATION This paper presents a comprehensive survey of TIE from images . Page layout analysis is similar to text localization in images. However, most page layout analysis methods assume the characters to be black with a high contrast on a homogeneous background. In practice, text in images can have any color and be superimposed on a complex background. Although a few TIE surveys have already been published, they lack details on individual approaches and are not clearly organized . We organize the TIE algorithms into several categories according to their main idea and discuss their pros and cons.

It also reviews the various sub-stages of TIE and introduces approaches for text detection, localization, tracking, extraction, and enhancement. We also point out the ability of the individual techniques to deal with color, scene text, compressed images, etc. The important issue of performance evaluation is discussed in Section 3, along with sample public test data sets and a review of evaluation methods. Section 4 gives an overview of the application domains for TIE in image processing and computer vision. The final conclusions are presented in Section 5

Chapter 1 Introduction to Image processing: In imaging science, image processing is any form of signal processing for which the input is an image, such as a photograph or video frame; the output of image processing may be either an image or a set of characteristics or parameters related to the image. Most image-processing techniques involve treating the image as a two-dimensional signal and applying standard signal-processing techniques to it. Image processing usually refers to digital image processing, but optical and analog image processing also are possible. This article is about general techniques that apply to all of them. The acquisition of images (producing the input image in the first place) is referred to as imaging. Image Processing

An image defined in the real world is considered to be a function of two real variables, for example, a(x,y) with a as the amplitude (e.g. brightness) of the image at the real coordinate position (x,y). In a sophisticated image processing system it should be possible to apply specific image processing operations to selected regions. Thus one part of an image (region) might be processed to suppress motion blur while another part might be processed to improve color rendition. Modern digital technology has made it possible to manipulate multi-dimensional signals with systems that range from simple digital circuits to advanced parallel computers. The goal of this manipulation can be divided into three categories: * Image Processing image in -> image out * Image Analysis image in -> measurements out * Image Understanding image in -> high-level description out Image processing is referred to processing of a 2D picture by a computer. Basic definitions: An image defined in the real world is considered to be a function of two real variables, for example, a(x,y) with a as the amplitude (e.g. brightness) of the image at the real coordinate position (x,y). An image may be considered to contain sub-images sometimes referred to as regions-of-interest, ROIs, or simply regions. This concept reflects the fact that images frequently contain collections of objects each of which can be the basis for a region. In a sophisticated image processing system it should be possible to apply specific image processing operations to selected regions. Thus one part of an image (region) might be processed to suppress motion blur while another part might be processed to improve color rendition. Sequence of image processing: The most requirements for image processing of images is that the images be available in digitized form, that is, arrays of finite length binary words. For digitization, the given Image is sampled on a discrete grid and each sample or pixel is quantized using a finite number of bits. The digitized image is processed by a computer. To display a digital image, it is first converted into analog signal, which is scanned onto a display. Closely related to image processing are computer graphics and computer vision. In computer graphics, images are manually made from physical models of objects, environments, and lighting, instead of being acquired (via imaging devices such as cameras) from natural scenes, as in most animated movies. Computer vision, on the other hand, is often considered high-level image processing out of which a machine/computer/software intends to decipher the physical contents of an image or a sequence of images (e.g., videos or 3D full-body magnetic resonance scans). In modern sciences and technologies, images also gain much broader scopes due to the ever growing importance of scientific visualization (of often large-scale complex scientific/experimental data). Examples include microarray data in genetic research, or real-time multi-asset portfolio trading in finance.

1.1Image Basics

1.1.1Image
An image is an array, or a matrix, of square pixels (picture elements) arranged in columns and rows.

fig 1.An image an array or a matrix of pixels arranged in columns and rows.

In a (8-bit) greyscale image each picture element has an assigned intensity that ranges from 0 to 255. A grey scale image is what people normally call a black and white image, but the name emphasizes that such an image will also include many shades of grey.

Fig 2.Each pixel has a value from 0 (black) to 255 (white). The possible range of the pixel values depend on the colour depth of the image, here 8 bit = 256 tones or greyscales.

A normal greyscale image has 8 bit colour depth = 256 greyscales. A true colour image has 24 bit colour depth = 8 x 8 x 8 bits = 256 x 256 x 256 colours = ~16 million colours.

Fig 3:A true-colour image assembled from three greyscale images coloured red, green and blue. Such an image may contain up to 16 million different colours.

1.1.2Pixel
10

The picture elements that make up an image, similar to grains in a photograph or dots in a halftone. Each pixel can represent a number of different shades or colors, depending upon how much storage space is allocated for it.

1.2TYPES OF IMAGES
A)Binary Image
A greyscale image is a two dimensional array of binary pixels. If the value is 0, the pixel is black. If the value is 1, the pixel is white.

B)Greyscale Image
A greyscale image is a two dimensional array of values indicating the brightness at each point. The brightness values are generally stored as a value between 0 (black) and 255 (white). Values inbetween are different shades of grey.

C)Color Image
A color image can be viewed in two equivalent ways. The _rst is as a two dimensional array of pixels, just like a greyscale image, but instead of a brightness value, each pixel has a specific color given by an (R,G,B) triple. The alternative view is that the image is composed of three separate 2D arrays of pixels (one for red, one for green, and one for blue), where each element in the three arrays contains the amount of only of the layer color present in the image at that point. Each of these 2D arrays is called a layer.

Fig4. Difference between Colored image and corresponding gray scale image

D) Indexed image This is a practical way of representing color images. (In this course we will mostly work with gray scaleimages but once you have learned how to work with a gray scale image you will also know the principle how to work with color images.) An indexed image stores an image as two matrices. The first matrix hasthe same size as the image and one number for each pixel. The second matrix is called the color mapand its size may be different from the image. The numbers in the first matrix is an instruction of whatnumber to use in the color map matrix.

1.3 Colours
11

For science communication, the two main colour spaces are RGB and CMYK. A)RGB Red, green, and blue are the three basic colors. By combining these three colors of light, any color can be produced. R, G, and B are specified as relative amounts, which describe how much of each color to combine (e.g. [1, 0, 0 ] is pure red, [1, 1, 0] means to combine red and green in equal quantities, etc.). These combinations can be represented as a cube.

Fig5. RGB CUBE

B) CMYK
Cyan, Magenta, Yellow, and blacK. With these four colors of ink any color can be produced. Since these colors are the exact inverse of the additive color model, the two systems can be interchanged with

Black is not needed in theory, CMY should color the entire range of possible colors. However, in practice, it is much better to use a fourth color, black. Some reasons are as follows: 1) It is cheaper to apply 1 ink (black) than 3 inks (CMY). 2) The paper gets wet if too much ink is applied, which often happens when C, M, and Y are applied. This is ine_cient because it adds drying time to the printing process. 3) Text is often black. Since text requires very _ne detail, it should be easy to produce this detail in black. If it was produced with CMY, the C, M, and Y print

12

heads would have to be very accuratly aligned, which is much more di_cult than simply using a fourth ink.

Fig 6.CMYK Circle


1.3.1 Number of colors Images start with differing numbers of colors in them. The simplest images may contain only two colors, such as black and white, and will need only 1 bit to represent each pixel. Many early PC video cards would support only 16 fixed colors. Later cards would display 256 simultaneously, any of which could be chosen from a pool of 224, or 16 million colors. New cards devote 24 bits to each pixel, and are therefore capable of displaying 224, or 16 million colors without restriction. A few display even more. Since the eye has trouble distinguishing between similar colors, 24 bit or 16 million colors is often called TrueColor

13

1.4Image file formats


Image file formats are standardized means of organizing and storing digital images. Image files are composed of digital data in one of these formats that can be rasterized for use on a computer display or printer. An image file format may store data in uncompressed, compressed, or vector formats. Once rasterized, an image becomes a grid of pixels, each of which has a number of bits to designate its color equal to the color depth of the device displaying it.

1.4.1Major graphic file formats


Including proprietary types, there are hundreds of image file types. The PNG, JPEG, and GIF formats are most often used to display images on the Internet. These graphic formats are listed and briefly described below, separated into the two main families of graphics: raster and vector. In addition to straight image formats, Metafile formats are portable formats which can include both raster and vector information. Examples are application-independent formats such as WMF and EMF. The metafile format is an intermediate format. Most Windows applications open metafiles and then save them in their own native format. Page description language refers to formats used to describe the layout of a printed page containing text, objects and images. Examples are PostScript, PDF and PCL.

1.4.2Digital Image File Types Explained


JPG, GIF, TIFF, PNG, BMP. What are they, and how do you choose? These and many other file types are used to encode digital images. The choices are simpler than you might think. Part of the reason for the plethora of file types is the need for compression. Image files can be quite large, and larger file types mean more disk usage and slower downloads. Compression is a term used to describe ways of cutting the size of the file. Compression schemes can by lossy or lossless. Another reason for the many file types is that images differ in the number of colors they contain. If an image has few colors, a file type can be designed to exploit this as a way of reducing file size1.4Image formats supported by Matlab

14

1.4.3Image format supported by matlab The following image formats are supported by Matlab: BMP HDF JPEG PCX TIFF .

1.4.4Lossy vs. Lossless compression


You will often hear the terms "lossy" and "lossless" compression. A lossless compression algorithm discards no information. It looks for more efficient ways to represent an image, while making no compromises in accuracy. In contrast, lossy algorithms accept some degradation in the image in order to achieve smaller file size. A lossless algorithm might, for example, look for a recurring pattern in the file, and replace each occurrence with a short abbreviation, thereby cutting the file size. In contrast, a lossy algorithm might store color information at a lower resolution than the image itself, since the eye is not so sensitive to changes in color of a small distance. .

1.4.5Raster Image Files Types and Formats


.bmp .gif .jpg .png .psd .pspimage .thm .tif .yuv Bitmap Image File Graphical Interchange Format File JPEG Image File Portable Network Graphic Adobe Photoshop Document PaintShop Pro Image Thumbnail Image File Tagged Image File YUV Encoded Image File

15

1.5RASTER FORMATS A) JPEG/JFIF

JPEG (Joint Photographic Experts Group) is a compression method; JPEG-compressed images are usually stored in the JFIF (JPEG File Interchange Format) file format. JPEG compression is (in most cases) lossy compression. The JPEG/JFIF filename extension is JPG or JPEG. Nearly every digital camera can save images in the JPEG/JFIF format, which supports 8-bit grayscale images and 24-bit color images (8 bits each for red, green, and blue). JPEG applies lossy compression to images, which can result in a significant reduction of the file size. The amount of compression can be specified, and the amount of compression affects the visual quality of the result. When not too great, the compression does not noticeably detract from the image's quality, but JPEG files suffer generational degradation when repeatedly edited and saved. (JPEG also provides lossless image storage, but the lossless version is not widely supported.)
B)JPEG 2000

JPEG 2000 is a compression standard enabling both lossless and lossy storage. The compression methods used are different from the ones in standard JFIF/JPEG; they improve quality and compression ratios, but also require more computational power to process. JPEG 2000 also adds features that are missing in JPEG. It is not nearly as common as JPEG, but it is used currently in professional movie editing and distribution (some digital cinemas, for example, use JPEG 2000 for individual movie frames).
C)Exif

The Exif (Exchangeable image file format) format is a file standard similar to the JFIF format with TIFF extensions; it is incorporated in the JPEG-writing software used in most cameras. Its purpose is to record and to standardize the exchange of images with image metadata between digital cameras and editing and viewing software. The metadata are recorded for individual images and include such things as camera settings, time and date, shutter speed, exposure, image size, compression, name of camera, color information. When images are viewed or edited by image editing software, all of this image information can be displayed. It stores meta informations. The actual Exif metadata as such may be carried within different host formats, e.g. TIFF, JFIF (JPEG) or PNG. IFF-META is another example.
D)TIFF

The TIFF (Tagged Image File Format) format is a flexible format that normally saves 8 bits or 16 bits per color (red, green, blue) for 24-bit and 48-bit totals, respectively, usually using either the TIFF or TIF filename extension. TIFF's flexibility can be both an advantage and disadvantage, since a reader that reads every type of TIFF file does not exist. TIFFs can be lossy and lossless; some offer relatively good lossless compression for bi-level (black&white) images. Some digital cameras can save in TIFF format, using the LZW compression algorithm for lossless storage. TIFF image format is not widely supported by web browsers.
16

TIFF remains widely accepted as a photograph file standard in the printing business. TIFF can handle device-specific color spaces, such as the CMYK defined by a particular set of printing press inks. OCR (Optical Character Recognition) software packages commonly generate some (often monochromatic) form of TIFF image for scanned text pages.
E) RAW

RAW refers to a family of raw image formats that are options available on some digital cameras. These formats usually use a lossless or nearly lossless compression, and produce file sizes much smaller than the TIFF formats of full-size processed images from the same cameras. Although there is a standard raw image format, (ISO 12234-2, TIFF/EP), the raw formats used by most cameras are not standardized or documented, and differ among camera manufacturers. 6 )GIF GIF (Graphics Interchange Format) is limited to an 8-bit palette, or 256 colors. This makes the GIF format suitable for storing graphics with relatively few colors such as simple diagrams, shapes, logos and cartoon style images. The GIF format supports animation and is still widely used to provide image animation effects. It also uses a lossless compression that is more effective when large areas have a single color, and ineffective for detailed images or dithered images.
7)BMP

The BMP file format (Windows bitmap) handles graphics files within the Microsoft Windows OS. Typically, BMP files are uncompressed, hence they are large; the advantage is their simplicity and wide acceptance in Windows programs.
8)PNG

The PNG (Portable Network Graphics) file format was created as the free, open-source successor to GIF. The PNG file format supports 8 bit paletted images (with optional transparency for all palette colors) and 24 bit truecolor (16 million colors) or 48 bit truecolor with and without alpha channel - while GIF supports only 256 colors and a single transparent color. Compared to JPEG, PNG excels when the image has large, uniformly colored areas. Thus lossless PNG format is best suited for pictures still under edition - and the lossy formats, like JEPG, are best for the final distribution of photographic images, because in this case JPG files are usually smaller than PNG files Some programs do not handle PNG gamma correctly, which can cause the images to be saved or displayed darker than they should be.
9)PPM, PGM, PBM, PNM and PFM

Netpbm format is a family including the portable pixmap file format (PPM), the portable graymap file format (PGM) and the portable bitmap file format (PBM). These are either pure ASCII files or raw binary files with an ASCII header that provide very basic functionality and serve as a lowest-common-denominator for converting pixmap, graymap, or bitmap files between different platforms. Several applications refer to them collectively as
17

PNM format (Portable Any Map). PFM was invented later in order to carry floating-point based pixel information (as used in HDR).
10)PAM

A late addition to the PNM family is the PAM format (Portable Arbitrary Format).
11)WEBP

WebP is a new image format that uses lossy compression. It was designed by Google to reduce image file size to speed up web page loading: its principal purpose is to supersede JPEG as the primary format for photographs on the web. WebP is based on VP8's intra-frame coding and uses a container based on RIFF.
12)HDR Raster formats

Most typical raster formats cannot store HDR data (32 bit floating point values per pixel component), which is why some relatively old or complex formats are still predominant here, and worth mentioning separately. Newer alternatives are showing up, though.
13)RGBE (Radiance HDR)

The classical representation format for HDR images, originating from Radiance and also supported by e.g. Adobe Photoshop.
14)TIFF

As TIFF can represent almost any kind of image data, it also can be used to hold HDR data. However, many TIFF readers do not support it.
15)IFF-RGFX

IFF-RGFX the native format of SView5 provides a straight-forward IFF-style representation of any kind of image data ranging from 1-128 bit (LDR and HDR), including common meta data like ICC profiles, XMP, IPTC or EXIF. .
16)CGM

CGM (Computer Graphics Metafile) is a file format for 2D vector graphics, raster graphics, and text, and is defined by ISO/IEC 8632. All graphical elements can be specified in a textual source file that can be compiled into a binary file or one of two text representations. CGM provides a means of graphics data interchange for computer representation of 2D graphical information independent from any particular application, system, platform, or device. It has been adopted to some extent in the areas of technical illustration and professional design, but has largely been superseded by formats such as SVG and DXF.

18

17)Gerber Format (RS-274X)

RS-274X Extended Gerber Format[3] was developed by Gerber Systems Corp., now Ucamco. This is a 2D bi-level image description format. It is the de facto standard format used by printed circuit board or PCB software. It is also widely used in other industries requiring high-precision 2D bi-level images.
18)SVG

SVG (Scalable Vector Graphics) is an open standard created and developed by the World Wide Web Consortium to address the need (and attempts of several corporations) for a versatile, scriptable and all-purpose vector format for the web and otherwise. The SVG format does not have a compression scheme of its own, but due to the textual nature of XML, an SVG graphic can be compressed using a program such as gzip. Because of its scripting potential, SVG is a key component in web applications: interactive web pages that look and act like applications.

1.5.1When should we use each?


TIFF

This is usually the best quality output from a digital camera. Digital cameras often offer around three JPG quality settings plus TIFF. Since JPG always means at least some loss of quality, TIFF means better quality. However, the file size is huge compared to even the best JPG setting, and the advantages may not be noticeable. A more important use of TIFF is as the working storage format as you edit and manipulate digital images. You do not want to go through several load, edit, save cycles with JPG storage, as the degradation accumulates with each new save. One or two JPG saves at high quality may not be noticeable, but the tenth certainly will be. TIFF is lossless, so there is no degradation associated with saving a TIFF file. Do NOT use TIFF for web images. They produce big files, and more importantly, most web browsers will not display TIFFs.
JPG

This is the format of choice for nearly all photographs on the web. You can achieve excellent quality even at rather high compression settings. I also use JPG as the ultimate format for all my digital photographs. If I edit a photo, I will use my software's proprietary format until finished, and then save the result as a JPG. Digital cameras save in a JPG format by default. Switching to TIFF or RAW improves quality in principle, but the difference is difficult to see. Shooting in TIFF has two disadvantages compared to JPG: fewer photos per memory card, and a longer wait between photographs as the image transfers to the card. I rarely shoot in TIFF mode.

19

Never use JPG for line art. On images such as these with areas of uniform color with sharp edges, JPG does a poor job. These are tasks for which GIF and PNG are well suited. See JPG vs. GIF for web images.
GIF

If your image has fewer than 256 colors and contains large areas of uniform color, GIF is your choice. The files will be small yet perfect. Here is an example of an image well-suited for GIF:

Do NOT use GIF for photographic images, since it can contain only 256 colors per image.
PNG

PNG is of principal value in two applications:


1. If you have an image with large areas of exactly uniform color, but contains more than 256 colors, PNG is your choice. Its strategy is similar to that of GIF, but it supports 16 million colors, not just 256. 2. If you want to display a photograph exactly without loss on the web, PNG is your choice. Later generation web browsers support PNG, and PNG is the only lossless format that web browsers support.

PNG is superior to GIF. It produces smaller files and allows more colors. PNG also supports partial transparency. Partial transparency can be used for many useful purposes, such as fades and antialiasing of text. Unfortunately, Microsoft's Internet Explorer does not properly support PNG transparency, so for now web authors must avoid using transparency in PNG images.
1.6Other formats

When using graphics software such as Photoshop or Paint Shop Pro, working files should be in the proprietary format of the software. Save final results in TIFF, PNG, or JPG. Use RAW only for in-camera storage, and copy or convert to TIFF, PNG, or JPG as soon as you transfer to your PC. You do not want your image archives to be in a proprietary format. Although several graphics programs can now read the RAW format for many digital cameras, it is unwise to rely on any proprietary format for long term storage. Will you be able to read a RAW file in five years? In twenty? JPG is the format most likely to be readable in 50 years.Thus, it is appropriate to use RAW to store images in the camera and perhaps for temporary lossless storage on your PC, but be sure to create a TIFF, or better still a PNG or JPG, for archival storage.

20

Chapter 2

2.1 tie A variety of approaches to text information extraction (TIE) from images have been proposed for specific applications including page segmentation , address block location, license plate location, and content-based image/video indexing . In spite of extensive studies, it is still not easy to design a general-purpose TIE system. This is because there are so many possible sources of variation when extracting text from a shaded or textured background, from low-contrast or complex images, or from images having variations in font size, style, color, orientation, and alignment. These variations make the problem of automatic TIE extremely difficult.

Fig7.text images Figures 1-4 show some examples of text in images. Page layout analysis usually deals with document images1 (Fig. 1). Readers may refer to papers on document segmentation/analysis [17, 18] for more examples of document images.

Fig8. Document images Although images acquired by scanning book covers, CD covers, or other multi-colored documents have similar characteristics as the document images (Fig. 2), they can not be directly dealt with using a conventional document image analysis technique Accordingly, this survey distinguishes this category of images as multi-color document images from other document images. Text in video images can be further classified into caption text , which is artificially overlaid on the image, or scene

21

text , which exists naturally in the image. Some researchers like to use the term graphics text for scene text, and superimposed text or artificial text for caption text .

Fig9. Caption text

It is well known that scene text is more difficult to detect and very little work has been done in this area. In contrast to caption text, scene text can have any orientation and may be distorted by the perspective projection. Text in images can exhibit many variations with respect to the following properties: 1. Geometry: Size: Although the text size can vary a lot, assumptions can be made depending on the application domain. Alignment: The characters in the caption text appear in clusters and usually lie horizontally, although sometimes they can appear as non-planar texts as a result of special effects. This does not apply to scene text, which can have various perspective distortions. Scene text can be aligned in any direction and can have geometric distortions. Inter-character distance: characters in a text line have a uniform distance between them. 2. Color: The characters in a text line tend to have the same or similar colors. This property makes it possible to use a connected component-based approach for text detection. Most of the research reported till date has concentrated on finding text strings of a single color (monochrome). However, video images and other complex color documents can contain text strings with more than two colors (polychrome) for effective visualization, i.e., different colors within one word. 3. Motion: The same characters usually exist in consecutive frames in a video with or without movement. This property is used in text tracking and enhancement. Caption text usually moves in a 22

uniform way: horizontally or vertically. Scene text can have arbitrary motion due to camera or object movement. 4. Edge: Most caption and scene text are designed to be easily read, thereby resulting in strong edges at the boundaries of text and background. 5. Compression: Many digital images are recorded, transferred, and processed in a compressed format. Thus, a faster TIE system can be achieved if one can extract text without decompression.

table 1 properties of text in images

23

2.2 Pre Processing A scaled image was the input which was then converted into a gray scaled image. This image formed the first stage of the pre-processing part. This was carried out by considering the RGB color contents(R: 11%, G: 56%, B: 33%) of each pixel of the image and converting them to grayscale. The conversion of a colored image to a gray scaled image was done for easier recognition of the text appearing in the images as after gray scaling, the image was converted to a black and white image containing black text with a higher contrast on white background. The second stage of pre-processing is lines removal. The third stage of pre-processing is discontinuities removals that were created in the second stage of pre-processing. The final output of pre-processing stage is wherein the remaining disturbances like noise are eliminated. This was carried out again by scanning each pixel from top left to bottom right and taking into consideration each pixel and all its neighbouring pixels. If a pixel under consideration was black, and all the neighbouring pixels were white, then that corresponding pixel was set as black because all the black neighbouring pixels indicated that the pixel under consideration was some unwanted dot .

Fig10. Flowchart of preprocessing

24

2.3What is Text Information Extraction (TIE)? The problem of Text Information Extraction needs to be defined more precisely before proceeding further. A TIE system receives an input in the form of a still image or a sequence of images. The images can be in gray scale or color, compressed or un-compressed, and the text in the images may or may not move. The TIE problem can be divided into the following sub-problems: (i) detection, (ii) localization, (iii) tracking, (iv) extraction and enhancement, and (v) recognition (OCR)

IMAGE

Text detection

Text localisation Text tracking Text extraction

Text enhancement

Text recognition

TEXT fig11. Architecture of tie system 25

A)TEXT DETECTION:In the text detection stage, since there was no prior information on whether or not the input image contains any text, the existence or non existence of text in the image must be determine. The text detection stage seeks to detect the presence of text in a given image.

Fig12 Stepwise result of text detection However, in the case of video, the number of frames containing text is much smaller than the number of frames without text. The text detection stage seeks to detect the presence of text in a given image. Selected a frame containing text from shots elected by video framing, very low threshold values were needed for scene change detection because the portion occupied by a text region relative to the whole image was usually small. This approach is very sensitive to scene change detection. This can be a simple and efficient solution for video indexing applications that only need key words from video clips, rather than the entire text. B)TEXT LOCALIZATION: The localization stage included localizing the text in the image after detection. In other words, the text present in the frame was tracked by identifying boxes or regions of similar pixel intensity values and returning them to the next stage for further processing. This stage used Region Based Methods for text localization. Region based methods use the properties of the color or gray scale in a text region or their differences with the corresponding properties of the background. This means that most of the text lines are included in the initial text boxes while at the same time some text boxes may include more than one text line as well as noise or non-text regions. This noise usually comes from non-text objects that connect to the text lines during the dilation process. And the low precision comes from detected bounding boxes which do not contain text but objects with high vertical edge density. To increase the precision and reject the false alarms we use a method based on horizontal and vertical projections. Firstly, the horizontal edge projection of every box is computed. A horizontal projection is defined as the sums of the candidate pixels over rows. c)TEXT TRACKING: The text tracking stage can serve to verify the text localization results. In addition, if text tracking could be performed in a shorter time than text detection and localization, this would speed up the overall system. In cases where text is occluded in different frames, text tracking can help recover the original image. Text tracking is performed to reduce the processing time for text 26

localization and to maintain the integrity of position across adjacent frames. Although the precise location of text in an image can be indicated by bounding boxes, the text still needs to be segmented from the background to facilitate its recognition. This means that the extracted text image has to be converted to a binary image and enhanced before it is fed into an OCR engine. D)TEXT EXTRACTION Text extraction segments these regions and generates binary images for recognition. There often exist many disturbances from background in a text region. They share similar intensity with the text and consequently the binary image of the text region is unfit for recognition directly. After the text was localized, the text segmentation step deals with the separation of the text pixels from the background pixels. The output of this step is a binary image where black text characters appear on a white background. This stage included extraction of actual text regions by dividing pixels with similar properties into contours or segments and discarding the redundant portions of frame.

Fig13. Result of text extraction E)TEXT ENHANCEMENT Text Enhancement of the extracted text components is required because the text region usually has low resolution and is prone to noise. Thereafter, the extracted text images can be transformed into plain text using OCR technology. F)TEXT RECOGNITION: The result of recognition was a ratio between the number of correctly extracted characters and that of total characters and evaluates what percentage of a character were extracted correctly from its background. For each extraction result of characters, if it did not miss the main strokes, it was taken as a correct character. The extraction results were then sent to OCR engine directly .A commercial OCR engine was utilized for recognition. Another method was proposed for text extraction from a colored image with complex background in which the main idea was to first identify potential text line segments from horizontal scan lines. Text line segments were then expanded or merged with text line segments from adjacent scan lines to form text blocks. False text blocks were filtered based on the irgeometrical properties. The boundaries of the text blocks were then adjusted so that text pixels lying outside the initial text region were included. Text pixels within text blocks were then detected by using bi-color clustering and connected components analysis. 2.4TEXT EXTRACTION TECHNIQUES

27

Text extraction in images includes fivestages, among which text detection and text localization are closely related and morechallenging stages which attract the attention of most researchers. The goal of the two stages is togenerate accurate bounding boxes of all text objectsin images and video frames and provide a uniqueidentity to each text. In this section, the recenttechniques focused on text detection andlocalization are reviewed and then the results are discussed. REGION -BASED TECHNIQUE Region-based methods use the properties of thecolor or gray-scale in a text region or their differences with the corresponding properties of thebackground. This method uses a bottom-up approach by grouping small components intosuccessively larger components until all regions are identified in the image. A geometrical analysis isneeded to merge the text components using the spatial arrangement of the components so as tofilter out non-text components and mark the boundaries of the text regions.Leon [37] presented a method for caption textdetection. It included in a generic indexing systemdealing with other semantic concepts which are tobe automatically detected. To have a coherentdetection system, the various object detectionalgorithms use a common image description. Theauthor proposed the image description is a hierarchical regionbased image model and introduced the algorithm for text detection. Thisalgorithm is divided into three phases: 1. Text candidate spotting: an attempt to separatetext from background is done. 2. Text characteristics verification: where textcandidate regions are grouped to discard those regions wrongly selected. 3. Consistency analysis for output: where regionsrepresenting text are modified to obtain a more useful character representation as input for an OCR. This technique takes advantage of texture and geometric features to detect the caption text.Texture features are estimated using wavelet analysis and mainly applied for Text candidatespotting. In turn, Text characteristics verification is basically carried out relying on geometric features,which are estimated exploiting the region-based image model. Analysis of the region hierarchyprovides the final caption text objects. The final step of Consistency analysis for output is performedby a binarization algorithm that robustly estimatesthe thresholds on the caption text area of support..

28

2.2. EDGE BASED TECHNIQUE Edges are a reliable feature of text regardless ofcolor/intensity, layout, orientations, etc. Edge strength, density and the orientation variance arethree distinguishing characteristics of text embedded in images, which can be used as mainfeatures for detecting text. Edge-based textextraction algorithm is a general-purpose method,which can quickly and effectively localize andextract the text from both document and indoor/outdoor images. Among the several textual properties in an image, edge-based methods focus on the high contrast between the text and the background. The edges of the text boundary are identified and merged, and then several heuristics are used to filter out the non-text regions. Usually, an edge filter (e.g., a Canny operator) is used for the edge detection, and a smoothing operation or a morphological operator is used for the merging stage. 2.3.MORPHOLOGICAL BASED TECHNIQUE Mathematical morphology is a topological and geometrical based approach for image analysis. It provides powerful tools for extractinggeometrical structures and representing shapes in many applications. Morphological featureextraction techniques have been efficiently applied to character recognition and document analysis. Itis used to extract important text contrast features from the processed images. The feature is invariantagainst various geometrical image changes like translation, rotation, and scaling. Even after thelighting condition or text color is changed, the feature still can be maintained. This method worksrobustly under different image alterations. a morphology-basedtext line extraction algorithm for extracting textregions from cluttered images. First of all, themethod defines a novel set of morphologicaloperations for extracting important contrast regionsas possible text line candidates. In order to detectskewed text lines, a momentbased method is thenused for estimating their orientation. According tothe orientation, an xprojection technique can beapplied to extract various text geometries from thetext-analogue segments for text verification.However, due to noise, a text line region is oftenfragmented into different pieces of segments.Therefore, after the projection, a novel recoveryalgorithm is then proposed for recovering acomplete text line from its pieces of segments.that, a verification scheme is then proposefor verifying all extracted potential text lineaccording to their text geometries. In order toanalyze the performance of this approach, an imagedatabase including 100 images was used for testing.After testing this method, these images havevarious appearance changes like contrast changes,complex backgrounds, lightings, different fonts,and sizes. Figure 6 shows the results of text linedetection in different images with differentalterations. 2.4. TEXTURE-BASED TECHNIQUE Texture-based methods use the observation that textin images have distinct textural properties that distinguish them from the background. Thetechniques based on Gabor filters, Wavelet, FFT,

29

spatial variance, etc. can be used to detect thetextural properties of a text region in an image. Chu Duc[44] presented a novel texture descriptorbased on line-segment features for text detection inimages and video sequences, which is applied tobuild a robust car license plate localization system. Unlike most of the existing approaches which uselow level features (color, edge) for text / non-text discrimination, the aim is to exploit more accurateperceptual information. A scale and rotation invariant - texture descriptor which describes thedirectionality, regularity, similarity, alignment and connectivity of group of segments are proposed. Animproved algorithm for feature extraction based onlocal connective Hough transform has also beeninvestigated. 2.5APPLICATIONS There are numerous applications of a text information extraction system, including document analysis, vehicle license plate extraction, technical paper analysis, and object-oriented data compression. In the following, we briefly describe some of these applications. Wearable or portable computers: with the rapid development of computer hardware technology, wearable computers are now a reality. A TIE system involving a hand-held device and camera was presented as an application of a wearable vision system. Watanabes *74+ translation camera can detect text in a scene image and translate Japanese text into English after performing character recognition. Haritaoglu] also demonstrated his TIE system on a hand-held device. Content-based video coding or document coding: The MPEG-4 standard supports object-based encoding. When text regions are segmented from other regions in an image, this can provide higher compression rates and better image quality. Feng et al. [76] and Cheng et al. [77] apply adaptive dithering after segmenting a document into several different classes. As a result, they can achieve a higher quality rendering of documents containing text, pictures, and graphics. License/container plate recognition: There has already been a lot of work done on vehicle license plate and container plate recognition. Although container and vehicle license plates share many characteristics with scene text, many assumptions have been made regarding the image acquisition process (camera and vehicle position and direction, illumination, character types, and color) and geometric attributes of the text. Cui and Huang [9] model the extraction of characters in license plates using Markov random field. Meanwhile, Park et al. [44] use a learning-based approach for license plate extraction, which is similar to a texture-based text detection method [47, 49]. Kim et al. [88] use gradient information to extract license plates. Lee and Kankanhalli [34] apply a connected component-based method for cargo container verification. Text-based image indexing: This involves automatic text-based video structuring methods using caption data [11, 78]. Texts in WWW images: The extraction of text from WWW images can provide relevant information on the Internet. Zhou and Lopresti use a CC-based method after color quantization.

30

Video content analysis: Extracted text regions or the output of character recognition can be useful in genre recognition . The size, position, frequency, text alignment, and OCR-ed results can all be used for this. Industrial automation: Part identification can be accomplished by using the text information on each part 2.6CONCLUSION Text extraction in images, as an important research branch of content-based information retrieval and text-based image indexing, continuesto be a topic of much interest to researchers. A large number of newly proposed approaches in theliterature have contributed to an impressive progress of text extraction techniques Althoughmany researchers have already investigated text localization, text detection and tracking for imagesis required for utilization in real applications (e.g., mobile handheld devices with a camera and realtimeindexing systems). A text-image-analysis, is needed to enable a text information extractionsystem to be used for any type of image, including both scanned document images and real sceneimages through a video camera. Despite the many difficulties in using TIE systems in real worldapplications, the importance and usefulness of this field continues to attract much attention.

31

References

1.Uvika* et al. / (IJAEST) INTERNATIONAL JOURNAL OF ADVANCED ENGINEERING SCIENCES AND TECHNOLOGIES Vol No. 10, Issue No. 2, 309 313 2.Text Information Extraction in Images and Video: A Survey Keechul Jung, Kwang In Kim, Anil K. Jain 3.In: Stilla U, Rottensteiner F, Paparoditis N (Eds) CMRT09. IAPRS, Vol. XXXVIII, Part 3/W4 --- Paris, France, 3-4 September, 2009 4.Character recognition overview http://www.cs.berkeley.edu/~fateman/kathey/char_recognition.html 5.Journal of Theoretical and Applied Information Technology 31st January 2012. Vol. 35 No.2 techniques and challenges of automatic text extraction in complex images : a survey 6.www.wikipedia.org

32

You might also like