You are on page 1of 25

1

CHAPTER 1

INTRODUCTION

1.1 COMPANY PROFILE

Datazone., founded in 1999, is a tech savvy company with the power of a software giant and the direction and flow of a dot-com. In addition, Datazone is people friendly; not just technology oriented. Think of us as a powerful unix server with a userfriendly windows front end. We understand that most of our customers are neither familiar with technology, nor have the time and resources to devote to their technical needs. This is where we step in to lend a helping hand. We offer you solutions in a box. This means that you do not get a hodgepodge of piecemeal solutions that result in an unsatisfactory experience. Our solutions are well designed, keeping in mind all of the clients requirements and constraints. Most of our clients are repeat customers, providing strong testimony to the fact that our solutions speak for themselves. Our various services include ecommerce website design, development, hardware solutions, software solutions, and eCommerce portals. Please browse through our offerings in this site.

If you like what you see our marketing personnel will be glad to discuss your needs and provide you with more information.Datazone believes in an open door policy. Employees can always talk to anyone in management about any issue, though they are all assigned immediate superiors to whom they report.

Datazone has created a culture of trust and performance. The value for our clients is a knowledgeable and stable group of technical people upon whom they can depend. For our employees we offer career development, stability, and a true team environment.

1.2 OBJECTIVE The text on the video file is detected and extracted using a novel approach. With the development of video editing technology, there are growing uses of overlay text inserted into video contents to provide viewers with better visual understanding. summary of semantics and deliver better viewing experience. For example, headlines summarize the reports in news videos and subtitles in the documentary drama help viewers understand the content. Sports videos also contain text describing the scores and team or player names. In general, text displayed in the videos can be classified into scene text and overlay text. Most broadcasting videos tend to increase the use of overlay text to convey more direct

Scene text occurs naturally in the background as a part of the scene, such as the advertising boards, banners, and so on. In contrast to that, overlay text is superimposed on the video scene and used to help viewers understanding. Since the overlay text is highly compact and structured, it can be used for video indexing and retrieval.

However, overlay text extraction for video optical character recognition (OCR) becomes more challenging, compared to the text extraction forOCR tasks of document images, due to the numerous difficulties resulting from complex background, unknown text color, size and so on.

There are two steps involved before the overlay text recognition is carried out, i.e., detection and extraction of overlay text. First, overlay text regions are roughly distinguished from background. The detected overlay text regions are refined to determine the accurate boundaries of overlay text strings. To generate a binary text image for video OCR, background pixels are removed from the overlay text strings in the extraction step.

CHAPTER 2

SYSTEM ANALYSIS
2.1 EXISTING SYSTEM The content of the scene or the editors intention can be well represented by using inserted text. Most of the previous approaches to extracting overlay text from videos are based on low-level features, such as edge, color, and texture information. However, existing methods experience difficulties in handling texts with various contrasts or inserted in a complex background. 2.1.1 Drawbacks The proposed method is not robust and cannot show different character size, position, contrast, and color. It is also language dependent.

2.2 PROPOSED SYSTEM We propose a novel framework to detect and extract the overlay text from the video scene. Based on our observation that there exist transient colors between inserted text and its adjacent background, a transition map is first generated. Then candidate regions are extracted by a reshaping method and the overlay text regions are determined based on the occurrence of overlay text in each candidate. The detected overlay text regions are localized accurately using the projection of overlay text pixels in the transition map and the text extraction is finally conducted. Experiments are performed on diverse videos to confirm the efficiency of the proposed method.

2.2.1 Advantages The proposed method is robust to different character size, position, contrast, and color. It is also language independent. Overlay text region update between frames is also employed to reduce the processing time. 2.3 FEASIBILITY STUDY The feasibility of the project is analyzed in this phase and business proposal is put forth with a very general plan for the project and some cost estimates. During system analysis the feasibility study of the proposed system is to be carried out. This is to ensure that the proposed system is not a burden to the company. For feasibility analysis, some understanding of the major requirements for the system is essential. Three key considerations involved in feasibility analysis are Economical Feasibility Social Feasibility Technical Feasibility 2.3.1 ECONOMICAL FEASIBILITY This study is carried out to check the economic impact that the system will have on the organization. The amount of fund that the company can pour into the research and development of the system is limited. The expenditures must be justified. Thus the developed system as well within the budget and this was achieved because most of the technologies used are freely available. Only the customized products had to be purchased.

2.3.2 SOCIAL FEASIBILITY The aspect of study is to check the level of acceptance of the system by the user. This includes the process of training the user to use the system efficiently. The user must not feel threatened by the system, instead must accept it as a necessity. The level of acceptance by the users solely depends on the methods that are employed to educate the user about the system and to make him familiar with it. His level of confidence must be raised so that he is also able to make some constructive criticism, which is welcomed, as he is the final user of the system. 2.3.3 TECHNICAL FEASIBILITY This study is carried out to check the technical feasibility, that is, the technical requirements of the system. Any system developed must not have a high demand on the available technical resources. This will lead to high demands on the available technical resources ands will lead to high demands being placed on the client. The developed system must have a modest requirement, as only minimal or null changes are required for implementing this system.

CHAPTER 3

SYSTEM SPECIFICATION
3.1 HARDWARE REQUIREMENTS System Hard Disk Floppy Drive Monitor Mouse Keyboard Ram : Pentium Iv 2.4 GHz : 80 GB HDD : 1.44 Mb : 15 VGA Colour : Logitech. : Standard Keyboard : 1 GB RAM

3.2 SOFTWARE REQUIREMENTS Operating system Front end Coding Language Back end : : : : Windows XP Professional ASP.NET 2005 C# MS SQL 2005

CHAPTER 4 SOFTWARE DESCRIPTION

4.1 FRONT END MICROSOFT .NET FRAMEWORK


Microsoft .NET is a set of Microsoft software technologies for rapidly building and integrating XML Web services, Microsoft Windows-based applications, and Web solutions. The .NET Framework is a language-neutral platform for writing programs that can easily and securely interoperate. Theres no language barrier with .NET: there are numerous languages available to the developer including Managed C++, C#, Visual Basic and Java Script. The .NET framework provides the foundation for components to interact seamlessly, whether locally or remotely on different platforms. Each version of the .NET Framework contains the common language runtime (CLR) as its core component, and includes additional components such as the base class libraries and other managed libraries. The version of the CLR on which an application is running can be determined by retrieving the value of the environment version property.

This topic describes the key components of the .NET Framework versions, provides information about the underlying CLR versions and associated development environments, and identifies the versions that are installed by Windows. It standardizes common data types and communications protocols so that components created in different languages can easily interoperate..NET is also the collective name given to various software components built upon the .NET platform. These will be both products (Visual Studio.NET and Windows.NET Server, for instance) and services (like Passport, .NET My Services, and so on). The .NET Framework has two main parts: 1. The Common Language Runtime (CLR). 2. A hierarchical set of class libraries. The CLR is described as the execution engine of .NET. It provides the environment within which programs run. The most important features are Conversion from a low-level assembler-style language, called Intermediate Language (IL), into code native to the platform being executed on. Memory management, notably including garbage collection. Checking and enforcing security restrictions on the running code. Loading and executing programs, with version control and other such features. The following features of the .NET framework are also worth description

10

ASP.NET XML WEB SERVICE


Base Class Libraries Common Language Runtime
Figure

Windows Forms

Operating System

Figure The CLR is an application virtual machine so that programmers need not consider the capabilities of the specific CPU that will execute the program. The CLR also provides other important services such as security, memory management, and exception handling. The class library and the CLR together constitute the .NET Framework.

4.1.1 FEATURES JIT (Just-in-Time) Just - in - Time (JIT) compiler, this compiles MSIL into native code that is specific to the OS and machine architecture being targeted. Only at this point can the OS execute the application. The just - in - time part of the name reflects the fact that MSIL code is only compiled as, and when, it is needed.

11

In the past, it was often necessary to compile your code into several applications, each of which targeted a specific operating system and CPU architecture. Often, this was a form of optimization.

GARBAGE COLLECTION (GC) One of the most important features of managed code is the concept of garbage collection. This is the .NET method of making sure that the memory used by an application is freed up completely when the application is no longer in use. Prior to .NET this was mostly the responsibility of programmers, and a few simple errors in code could result in large blocks of memory mysteriously disappearing as a result of being allocated to the wrong place in memory. That usually meant a progressive slowdown of your computer followed by a system crash.

COMMON LANGUAGE RUNTIME Many different languages and platforms provide a runtime, and the .NET Framework is no exception. You will find, however, that this runtime is quite different from most. The Common Language Runtime (CLR) in the .NET Framework manages the execution of the code and provides access to a variety of services that will make the development process easier. The CLR has been developed to be far superior to previous runtimes, such as the VB runtime, by attaining the following Cross-language integration Code access security Object lifetime management Debugging and profiling support

12

ASSEMBLIES In the applications that you build within the .NET Framework, assemblies will always play an important role. Assemblies can be thought of as the building blocks of your applications. Without an associated assembly, code will not be able to compile from IL. When you are using the JIT compiler to compile your code from managed code to machine code, the JIT compiler will look for the IL code that is stored in a portable executable (PE) file along with the associated assembly manifest.

NAMESPACES The .NET Framework is made up of hundreds of classes. Many of the applications that you build in .NET are going to take advantage of these classes in one way or another. Because the number of classes is so large and you will need to get at them in a logical fashion, the .NET Framework organizes these classes into a class structure called a namespace. There are a number of namespaces, and they are organized in an understandable and straightforward way.

INTEROPERABILITY Because computer systems commonly require interaction between new and older applications, the .NET Framework provides means to access functionality that is implemented in programs that execute outside the .NET environment. Access to COM components is provided in the System runtime. InteropServices and System. Enterprise Services namespaces of the framework; access to other functionality is provided using the P/Invoke feature.

13

LANGUAGE INDEPENDENCE The .NET Framework introduces a Common Type System, or CTS. The CTS specification defines all possible data types and programming constructs supported by the CLR and how they may or may not interact with each other conforming to the Common Language Infrastructure (CLI) specification. Because of this feature, the .NET Framework supports the exchange of types and object instances between libraries and applications written using any conforming .NET language.

BASE CLASS LIBRARY The Base Class Library (BCL), part of the Framework Class Library (FCL), is a library of functionality available to all languages using the .NET Framework. The BCL provides classes which encapsulate a number of common functions, including file reading and writing, graphic rendering, database interaction, XML document manipulation and so on.

SIMPLIFIED DEPLOYMENT The .NET Framework includes design features and tools that help manage the installation of computer software to ensure that it does not interfere with previously installed software, and that it conforms to security requirements.

PORTABILITY

14

The design of the .NET Framework allows it theoretically to be platform agnostic, and thus cross-platform compatible. That is, a program written to use the framework should run without change on any type of system for which the framework is implemented. While Microsoft has never implemented the full framework on any system except Microsoft Windows, the framework is engineered to be platform agnostic, and cross-platform implementations are available for other operating systems. Microsoft submitted the specifications for the Common Language Infrastructure, the C# language, and the C++/CLI language to both ECMA and the ISO, making them available as open standards. This makes it possible for third parties to create compatible implementations of the framework and its languages on other platforms.

COMMON LANGUAGE INFRASTRUCTURE (CLI) The purpose of the Common Language Infrastructure is to provide a language-neutral platform for application development and execution, including functions for Exception handling, Garbage Collection, security, and interoperability. By implementing the core aspects of the .NET Framework within the scope of the CLI, this functionality will not be tied to a single language but will be available across the many languages supported by the framework. Microsoft's implementation of the CLI is called the Common Language Runtime, or CLR.

SECURITY

.NET has its own security mechanism with two general features: Code Access Security (CAS), and validation and verification. Code Access Security is based on evidence that is associated with a specific assembly. Other code can demand that calling code is granted a specified permission.

15

The demand causes the CLR to perform a call stack walk: every assembly of each method in the call stack is checked for the required permission; if any assembly is not granted the permission a security exception is thrown. However, has to split the application into sub domains, it is not done by the CLR.

CLASS LIBRARY The .NET Framework includes a set of standard class libraries. The class library is organized in a hierarchy of namespaces. Most of the built in APIs are part of either System.* or Microsoft.* namespaces. These class libraries implement a large number of common functions, such as file reading and writing, graphic rendering, database interaction, and XML document manipulation, among others. The .NET class libraries are available to all CLI compliant languages. The .NET Framework class library is divided into two parts: the Base Class Library and the Framework Class Library.

The Base Class Library (BCL) includes a small subset of the entire class library and is the core set of classes that serve as the basic API of the Common Language Runtime. The classes in mscorlib.dll and some of the classes in System.dll and System.core.dll are considered to be a part of the BCL. The BCL classes are available in both .NET Framework as well as its alternative implementations including .NET Compact Framework, Microsoft Silverlight and Mono.

16

MEMORY MANAGEMENT The .NET Framework CLR frees the developer from the burden of managing memory instead it does the memory management itself even though there are no actual guarantees as to when the Garbage Collector will perform its work, unless an explicit double-call is issued To this end, the memory allocated to instantiations of .NET types (objects) is done contiguously from the managed heap, a pool of memory managed by the CLR. When there is no reference to an object, and it cannot be reached or used, it becomes garbage. However, it still holds on to the memory allocated to it. .NET Framework includes a garbage collector which runs periodically, on a separate thread from the application's thread, that enumerates all the unusable objects and reclaims the memory allocated to them.

4.2 BACK END - SQL SERVER 2000 Microsoft SQL Server 2000 is a full-featured relational database management system (RDBMS) that offers a variety of administrative tools to ease the burdens of database development, maintenance and administration. Well cover six of the more frequently used tools: Enterprise Manager, Query Analyzer, SQL Profiler, Service Manager, Data Transformation Services and Books Online

17

4.2.1 FEATURES INTERNET INTEGRATION The SQL Server 2000 database engine includes integrated XML support. It also has the scalability, availability, and security features required to operate as the data storage component of the largest Web sites. The SQL Server 2000 programming model is integrated with the Windows DNA architecture for developing Web applications, and SQL Server 2000 supports features such as English Query and the Microsoft Search Service to incorporate user-friendly queries and powerful search capabilities in Web applications.

SCALABILITY AND AVAILABILITY The same database engine can be used across platforms ranging from laptop computers running Microsoft Windows 98 through large, multiprocessor servers running Microsoft Windows 2000 Data Center Edition.

SQL Server 2000 Enterprise Edition supports features such as federated servers, indexed views, and large memory support that allow it to scale to the performance levels required by the largest Web sites.

ENTERPRISE-LEVEL DATABASE FEATURES The SQL Server 2000 relational database engine supports the features required to support demanding data processing environments. The database engine

18

protects data integrity while minimizing the overhead of managing thousands of users concurrently modifying the database. SQL Server 2000 distributed queries allow you to reference data from multiple sources as if it were a part of a SQL Server 2000 database, while at the same time, the distributed transaction support protects the integrity of any updates of the distributed data. Replication allows you to also maintain multiple copies of data, while ensuring that the separate copies remain synchronized. You can replicate a set of data to multiple, mobile, disconnected users, have them work autonomously, and then merge their modifications back to the publisher.

EASE OF INSTALLATION, DEPLOYMENT, AND USE SQL Server 2000 includes a set of administrative and development tools that improve upon the process of installing, deploying, managing, and using SQL Server across several sites. SQL Server 2000 also supports a standards-based programming model integrated with the Windows DNA, making the use of SQL Server databases and data warehouses a seamless part of building powerful and scalable systems. These features allow you to rapidly deliver SQL Server applications that customers can implement with a minimum of installation and administrative overhead.

CHAPTER 5
PROJECT DESCRIPTION

5.1 PROBLEM DEFINITION Although many methods have been proposed to detect and extract the video text, few methods can effectively deal with different color, shape, and multilingual text.

19

Most of existing video text detection methods have been proposed on the basis of color, edge, and texture-based feature. Color-based approaches assume that the video text is composed of a uniform color. In the approach by Agnihotri et al., the red color component is used to obtain high contrast edges between text and background. In, the uniform color blocks within the high contrast video frames are selected to correctly extract text regions. Kim et al. cluster colors based on Euclidean distance in the RGB space and use 64 clustered color channels for text detection. However, it is rarely true that the overlay text consists of a uniform color due to degradation resulting from compression coding and low contrast between text and background. Edge-based approaches are also considered useful for overlay text detection since text regions contain rich edge information. The commonly adopted method is to apply an edge detector to the video frame and then identify regions with high edge density and strength. This method performs well if there is no complex background and it becomes less reliable as the scene contains more edges in the background. Lyu et al. use a modified edge map with strength for text region detection and localize the detected text regions using coarse-to-fine projection.

They also extract text strings based on local thresholding and inward filling. In, authors consider the strokes of text in horizontal, vertical, up-right, and up-left directions and generate the edge map along each direction. Then they combine statistical features and use k-means clustering to classify the image pixels into background and text candidates. Liu et al. use multiscale edge detector to detect the text regions. They compute the edge strength, density, and orientation variance to form the multiscale edge detector.

20

Texture-based approaches, such as the salient point detection and the wavelet transform, have also been used to detect the text regions. Bertini et al. detect corner points from the video scene and then detect the text region using similarity of corner points between frames. Sato et al apply an interpolation filter based on vertical, horizontal, left diagonal, right diagonal directions to enhance the performance of text extraction. Gllavata et al. employ the high-frequency wavelet coefficients and connected components to detect the text regions. However, since it is almost impossible to detect text in a real video by using only one characteristic of text, some methods take advantage of combined features to detect video text.

5.2 OVERVIEW OF THE PROJECT After the text detection step, the text extraction step, which can be classified into color-based and stroke-based methods, should be employed before OCR is applied. Since color of text is generally different from that of background, text strings can be extracted by thresholding.

Otsu method is a widely used color-based text extraction method due to its simplicity and efficiency of the algorithm. However, Otsu method is not robust to text extraction with similar color of background due to the use of global thresholding. To solve this problem, the detected text regions are divided into several blocks and then Otsu method is applied locally to each block, such as the adaptive thresholding introduced in , where a dam point is defined to extract text strings from background.

21

On the other hand, some filters based on the direction of strokes have also been used to extract text in the stroke-based methods. The four-direction character extraction filters are used to enhance the stroke-like shapes and to suppress others.However, since the stroke filter is language-dependent, some characters without obvious stripe shape can also be suppressed In this paper, we propose a new overlay text detection and extraction method using the transition region between the overlay text and background.First, we generate the transition map based on our observation that there exist transient colors between overlay text and its adjacent background. Then the overlay text regions are roughly detected by computing the density of transition pixels and the consistency of texture around the transition pixels. The detected overlay text regions are localized accurately using the projection of transition map with an improved color-based thresholding method to extract text strings correctly.

5.3 MODULE DESCRIPTION Overlay Text Region Detection Overlay Text Region Determination Overlay Text Extraction

22

5.3.1 OVERLAY TEXT REGION DETECTION The proposed method is based on our observations that there exist transient colors between overlay text and its adjacent background and overlay texts have high saturation because they are inserted by using graphic components. Transition Map Generation As a rule of thumb, if the background of overlay text is dark, then the overlay text tends to be bright. On the contrary, the overlay text tends to be dark if the background of overlay text is bright. Therefore, there exists transient colors between overlay text and its adjacent background due to color bleeding, the intensities at the boundary of overlay text are observed to have the logarithmical change. Candidate Region Extraction The transition map can be utilized as a useful indicator for the overlay text region. To generate the connected components, we first generate a linked map.

If a gap of consecutive pixels between two nonzero points in the same row is shorter than 5% of the image width, they are filled with 1s. If the connected components are smaller than the threshold value, they are removed. The threshold value is empirically selected by observing the minimum size of overlay text region. Then each connected component is reshaped to have smooth boundaries. Since it is reasonable to assume that the overlay text regions are generally in rectangular shapes, a rectangular bounding box is generated by linking four points taken from the link map. 5.3.2 OVERLAY TEXT REGION DETERMINATION

23

The next step is to determine the real overlay text region among the boundary smoothed candidate regions by some useful clues, such as the aspect ratio of overlay text regionn. Since most of overlay texts are placed horizontally in the video, the vertically longer candidates can be easily eliminated. The density of transition pixels is a good criterion as well. In this subsection, we introduce a texture-based approach for overlay text region determination. Based on the observation that intensity variation around the transition pixel is big due to complex structure of the overlay text, we employ the local binary pattern (LBP) introduced into describe the texture around the transition pixel. LBP is a very efficient and simple tool to represent the consistency of texture using only the intensity pattern. LBP forms the binary pattern using current pixel and its all circular neighbor pixels and can be converted into a decimal number

Overlay Text Region Refinement The overlay text region or the bounding box obtained in the preceding subsection needs to be refined for better accurate text extraction. In this subsection, we use a modified projection of transition pixels in the transition map to perform the overlay text region refinement. First, the horizontal projection is performed to accumulate all the transition pixel counts in each row of the detected overlay text region to form a histogram of the number of transition pixels. Then the null points, which denote the pixel row without

24

transition pixels, are removed and separated regions are re-labeled. The projection is conducted vertically and null points are removed once again. Compared to the coarse-to-fine projection proposed for edge-based scheme, our projection method is applied to the detected overlay text regions only, making the process simpler. Overlay Text Region Update Once the overlay text regions are detected in the current frame, it is reasonable to take advantage of continuity of overlay text between consecutive frames for the text region detection of the next frame. If the difference, which can be obtained by XOR of current transition map and previous transition map, is smaller than a predefined value, the overlay text regions of previous frame are directly applied as the detection result without further refinement. A special situation that needs to be taken care of happens when an overlay text region is in gradual appearance.

5.3.3 OVERLAY TEXT EXTRACTION A. Color Polarity Computation Either the overlay text is darker than the surrounding background or the text is brighter than its neighbors the binarized images obtained by simple thresholding represent the overlay text as either 1 (or White) or 0 (or Black), respectively. Such inconsistent results must complicate the following text extraction steps. Thus, our goal in this subsection is to check the color polarity and inverse the pixel intensities if needed so

25

that the output text region of the module can always contain bright text compared to its surrounding pixels. B. Overlay Text Extraction Since it is confirmed that the overlay text is always bright in each text region, it is safe to employ Lyus method to extract characters from each overlay text region. First, each overlay text region is expanded wider by two pixels to utilize the continuity of background. This expanded outer region is denoted as ER. Then, the pixels inside the text region are compared to the pixels in ER so that pixels connected to the expanded region can be excluded. We denote the text region as TR and the expanded text region as ETR, i.e. ETR=TR U ER. Next, sliding-window based adaptive thresholding is performed in the horizontal and the vertical directions with different window sizes, respectively. Compared to the Lyus method, the height of expanded text region is not normalized in our method.

You might also like