You are on page 1of 6

A Fast Block Extraction Method for Document Segmentation

*, **, ***
*
10520 . 0-2860-8886 E-mail: mr_phaisarn@yahoo.com
**
235 . 10163 wichian@siam.edu
***
10210 nucharee@dpu.ac.th

1.

Block Extraction
32*32 [1]
32*32





Block Extraction
6

Document Segmentation


3


1.2 (Bottom-Up Approach) [3]

Abstract
In the document segmentation process, the first black pixel must
be found and used as the starting point of the process. This paper
presents a new method for finding the starting point in an efficient way.
To speed up the process. Six types of windows are proposed and their
efficiency is also compared. The experimental results show that the
proposed method can speed up the process for finding the starting point
significantly.

top-down bottom-up
B. Kruatrachue P. Suthaphan [1]
1. Block Extraction (top-down) [1] 32*32

2. Multi-Column Block Detection and Segmentation (Bottom-up) [1]


1

32*32 [1]



32*32

26 (EECON-26) 6-7 2546 .


________________________________________________________________________________________________________________________________
1069

DS09


(block)

1.3 (Mixed Approach) [4-5]

: ,

Key words: document segmentation, window

1.1 (Top-Down Approach) [2]


32*32

6

2.


,
2

2.1.1

( 32*32)



Block Extraction




6 3.

2.1 Block Extraction (top-down)


Block Extraction

2.
Block Extraction 2

1.

3. A F

2. Block Extraction


-
- ()
-

26 (EECON-26) 6-7 2546 .


________________________________________________________________________________________________________________________________
1070


(
)

()


(3. )
(noise)
. 1
5


5




4.

Block Extraction
(block) 1
32
Multi-Column Block Detection and
Segmentation

DS09

5. Block Extraction
raster

4.

2.1.2

32*32
(Chain Codes)
32*32
10


2.1.1

6. Block Extraction

26 (EECON-26) 6-7 2546 .


________________________________________________________________________________________________________________________________
1071

5-6
raster 5.
6.
6.
5. 6.
5. (
32*32 )

2.2 Multi-Column Block Detection and Segmentation


(bottom - up)
Block Extraction 1
(block) 1 bottom-up
1 1
1
(bounding box)








(y)

()

3.
6

()

3.1
6
AD
EF
7.

3.2
6

Block Extraction

7. () E, F
() ()
Pentium4 1.6 GHz 256 MB
WindowXP Visual Basic.Net
2002 A4 300dpi
9.
1.

26 (EECON-26) 6-7 2546 .


________________________________________________________________________________________________________________________________
1072

1.

A
B
C
D
E
F

18x18
18x18
11x11
12x12
16x16
20x20

324
324
121
144
256
400

52
32
9
14
14
14

%
16.049
9.877
7.438
9.72
5.469
3.5

(sec)
7.801
7.39
7.37
7.43
7.06
6.77

1 Block Extraction

6 6 - 8
raster 24.986

4.

8. Block Extraction

9. Block Extraction


Digital Image Processing

[1] B. Kruatrachue, P. Suthaphan, A Fast and Efficient Method for


Document Segmentation for OCR, Electrical and Electronic
Technology, 2001. Vol. 1, 19-22 Aug. 2001 pp. 381 -383
[2] T. Akiyama and N. Hagita, Automated entry system for printed
documents, Pattern Recognition 23, 1990, pp. 1141-1154.
[3] K. C. Fan, C. H. Liu, and Y. K. Wang, Segmentation and
classification of mixed text/graphics/image documents, Pattern
Recognition Letters 15, 1994, pp. 1201-1209.
[4] S. N. Srihari, T. Hong, and G. Srikantan, Machine printed Japanese
document recognition, Pattern recognition 30, 1997, pp. 13011313.
[5] D. Wang, S.N. Srihari. Classification of Newspaper Image Blocks
using Texture Analysis, Computer Vision, Graphics, and Image
Processing, Vol. 47, 1989, pp. 327-352.

26 (EECON-26) 6-7 2546 .


________________________________________________________________________________________________________________________________
1073

DS09


C


300dpi (
)
E (
C)
F
F (
F E )


C


4.

D E
F

Digital Image Processing


..
Waseda University, Tokyo

..
Waseda University, Tokyo

()

()
10. E
10. () E 6
() C E

26 (EECON-26) 6-7 2546 .


________________________________________________________________________________________________________________________________
1074

You might also like