Professional Documents
Culture Documents
Jinsik Kim, Taehun Kim, Jiexi Lin CS570 Artificial Intelligence Team Foxtrot KAIST, South Korea 305-701
jskim@ai.kaist.ac.kr, thkim@vivaldi.kaist.ac.kr, jesse@islab.kaist.ac.kr
Abstract
This is a term project report of Artificial Intelligence course (CS570) in the Division of Computer Science at KAIST. In this report, we will describe the implementation of a Video Text Detection System. We focus our scope to artifact texts in video frames of the news broadcast which has less noise and distinct text regions and is appropriate for detection purpose. We propose several heuristic approaches such as long line removal, horizontal stroke detection and vertical stroke detection, text area detection, bounding box detection and text validation for text detection based on the existing edge detecting techniques. Keywords: OCR, Edge Detection, Text localization
2. Overview
In this section, we are going to take a look at the existing text detection techniques. Text detection requires us to detect the region where texts locate. After the detection, we could use machine learning to enhance the accuracy and then use the detected region for recognition. In this report, we only focus on the text detection. Our feature work may relate to the text verification and recognition.
1. Introduction
For efficient indexing and retrieval of content-based multimedia database, we need automatic extraction of descriptive features that are relevant to the subject materials such as image, video and etc. Researches have been done on low-level features such as color, texture or shape. To offering a precise idea of the image content, we need some high-level features such as text and human face, etc [2]. Artificial text appearing in video of news broadcast is usually closely related to the visual content and is a strong candidate for high-level semantic indexing for retrieval. It also has the characteristics of having less noise and distinct text regions thus making it especially appropriate for text detection research. Our approach for detecting the text region is based on the existing edge detection techniques together with several heuristic methods such as removal of the long line, stroke region detection, bounding box detection and validation of the text. The remainder of this report is organized as follows. Section 2 gives an overview of existing techniques about text detection. Section 3 describes the implementation of the system. Section 4 presents the experimental results. A conclusion is given at last section.
approaches such as long line removal, horizontal stroke detection and vertical stroke detection, text area detection, bounding box detection and text validation.
3. Implementation
In this section we will describe the implementation of our system. The flow chat of the system is shown as Figure2.
Video Frame C anny Edge Det. Long Line Rem.
Text area Det. Bounding Box Det. Validation Detec ted Text Region
Figure2. System flow chat From the input video frame, we first tried to use the continuous frames detection method to get the complex background filtered. Due to some problems occurred during the conversion from bit 24 true color image to 256 color BMP image, we skipped this step and use canny edge detection directly on the input video frame. As we can see from the system flow chart, we proposed several heuristic
not. Figure5 shows the results of the horizontal stroke detection and the vertical stroke detection.
(Horizontal) Figure7. Process of getting the bounding box (Vertical) Figure5. Stroke detection
4. Experimental Results
We evaluate our system with two standards. The first is accuracy, which means the fraction of the detected text regions that are correct. The second is completeness, which means the fraction of all correct results actually captured in the detection. We tested our system at 30 different images which is not a large enough sample set. With these 30 images we get so far, we get perfect detection with more than 20 images which means the accuracy and the completeness are both 100%. The average accuracy of 30 images is about 86% and the average completeness is about 98%. To obtain more evaluation results, we need more sample sets and more tests on them. The interface of our system is shown Figure8. Figure9 ~ Figure15 show each step of getting the final detected text
Figure10. After long line removal region. Figure 15 shows the final detected text region.
References
[1] Video OCR: A Survey and Practitioners Guide, Rainer Lienhart [2] Text detection and recognition in images and video frames, Datong chen, Jean-Marc Odobez, Herve Bourlard [3] Automatic Text Detection In Video Frames Based on Bootstrap Artificial Neural Network And CED, Yan Hao Zhang Yi Hou Zeng-guang Tan Min Figure14. After bounding box detection [4] Automatic text segmentation and text recognition for video indexing, Rainer Lienhart and Wolfgang Effelsberg2, Multimedia Systems 8 (2000) P69-P81 [5] Automatic Detection And Extraction of Artifilal Text in Video, Jovanka Malobabi, Noel O'Connol, Noel Murphy and Sean [6] Text detection and recognition in images and video frames, Datong Chen, Jean-Marc Odobez, Herv/e Bourlard, Pattern Recognition 37 (2004) P595 P608 [7] A new robust algorithm for video text extraction, Edward K. Wong, Minya Chen, Pattern Recognition 36 (2003) P1397 P1406 Figure15. After text validation (Final) [8] A localization/verification scheme for finding text in images and video frames based on contrast independent features and machine learning methods, Datong Chena, Jean-Marc Odobeza, Jean-Philippe Thiran, Signal Processing: Image Communication 19 (2004) 205217 [9] Text Area Detection from Video Frames, Xiangrong Chen, Hongjiang Zhang [10] A Video Text Detection and Recognition System, Jie Xi, Xian-Sheng Hua, Xiang-Rong Chen, Liu Wenyin, Hong-Jiang Zhang [11] Automatic Text Extraction and REcognition forr Video Indexing and Retrieval, Laiyan Qing, Weiqiang Wang, Wen Gao [12] Canny Edge Detectoin Tutorial, Bill Green, http://www.pages.drexel.edu/~weg22/can_tut.html [13] Sobel Edge Detectoin Tutorial, Bill http://www.pages.drexel.edu/~weg22/edge.html Green,
5. Conclusion
In this report, we showed the implementation of a Video Text Detection System. We detected the text region by canny edge algorithm and several heuristic approaches proposed by us, such as long line removal, horizontal stroke detection and vertical stroke detection, text area detection, bounding box detection and text validation. We tested our system with 30 sample video frames and got an average detecting accuracy of 86% and an average completeness of 98%.
6. Acknowledgements
Special thanks should be given to our instructor, Prof. Jin Hyung Kim, who showed great passion and expert knowledge in teaching the course. By attending the course, we have caught the basic ideas and specific techniques in the AI, which we believe will greatly benefit our future research.