You are on page 1of 4

2008 11 Journal of Informati on No.

11, 2008

*

The Study of Large- scale Web Term- pairs Extraction based on Regular Expressions


( 300222)

,
Web , Web ,
,
, , 66. 7% , ,
100%
Web
T P391. 3

( a z)
0 ( ) ,
Web HT M L ,

, HT M L , ,
, , Web
HTM L , ,
, Web ,
Web , Web
,
/ 0 , 1
,
, 1. 1 HT ML
, < br>
,
< / br>
< p> < / p> < td> < / td>

:
,
1. 1. 1 : < p> e1
; , ,
c1 e2 c2 e3 c3,en cn< / p>
,
e1 , c1 ,
, ,
, < p> < br > < td>

:
, [ 1~ 7]
,
< p> library , linkage to load , , , locat io n
: ;
logger , loop machine language m ag net ic
, sto rage magnet ic tape matrix memory message
, Web , , m icrocomputer < / p>
1
, , ( : ht t p: / / w w w . ddsic. com/ blog/ category/ 8/ 39)
[ 8] [ 9] 1. 1. 2
, : < p> e1 < / p> < p> c1< / p> < p> e2< / p>
, < p> c2 < / p> ,< p> en< / p> < p> cn< / p>

: / 0 ( : 20071303)
: , , 1981 , , ,

62
Journal of Information No. 11, 2008 2008 11

< P > & nbsp; & nbsp; & nbsp; valid and subsisting bill< / p> < P > & nbsp;
& nbsp; & nbsp; < / p> < P> & nbsp; & nbsp; & nbsp; v alid
bilateral netting arrang em ent< / p> < P> & nbsp; & nbsp; & nbsp;

< / p>
1 pat t ern1
2

( : ht t p: / / w w w . 0350edu. com/ A rticle/ kind/ account/ 200604/ [ a- zA- Z \ ( \ . ]
850. ht ml) [ \ s- . % - \ . \ ( \ / \ [ a- zA 0

1. 1. 3 - Z0- 9] *
[ \ ) a- zA- Z0- 9 \ . \ ]
: < p> e1 c1< / p> < p> e2 c2 < / p> , < p>
; { 0, 1} \ s* ( ?: & nbs p) * [ \ ( \

en cn< / p> | ] * \ s* ; { 0, 1}

< br> Gross Reg istered T onnage ( GRT ) ( ) < br> Net Registered [ \ x80- \ xff] [^a- zA- Z] { 1, }

T onnage ( NRT ) ( ) < br> Deadw eig ht T onnage ( All To ld) ( DW T ,


or D. W. A. T ) ( ) < br> Gross Dead W eig ht Tonnag e
:
< br> Dead Weight C argo T onnage ( DW CT ) < br> Light Displace-
$ patt ern2= / [ \ s] * [ a- zA- Z \ ( \ . ] [ \ s - . % - \ . \ ( \ / a- zA- z0- 9
ment < br> Load ( L oaded) Displacem ent
\ [ ] * [ \ ) a- zA- Z0- 9 \ . \ ] ] ; { 0, 1} \ s* ( ?: & nbsp) * [ \ ( \ | ] * \ s
3
* ; { 0, 1} ( [ \ x80- \ x ff] [ ^a- zA- Z] { 1, }) < BR \ s{0, 1} \ / {0, 1}> / Ui
( : ht t p: / / ww w . zft rans. com/ f avorit e/ vocabulary/ 2006- 1/ 9/
5 - ( 2)
2006010912345. ht ml)
$ pattern2 $ pattern1
1. 2
$ pattern2 , ( )
, ,
( ) ,
, ,
$ pattern1
< br> < br/ > (
,
< p> < / p> < tr> < / tr> < td> < / td> ,
,
) ,
,
,
:

$ patt ern3= / [ \ s] * [ \ x80- \ x ff ] [ ^ a - zA - Z] { 1, }; { 0, 1} \ s * ( ?:
: a. ; b. ; c.
& nbsp) * [ \ ( \ | ] * \ s* ; { 0, 1} ( [ a- zA- Z \ ( \ . ] [ \ s- . % - \ . \ ( \ /
a- zA- z0- 9 \ [ ] * [ \ ) a- zA- Z0- 9 \ . \ ] ] ) < BR \ s{0, 1} \ / { 0, 1} > /
1. 2. 1 Ui
, U RL , $ patt ern4= / [ \ s] * ( [ \ x 80- \ xff ] [ ^ a - zA- Z] { 1, } ) ; { 0, 1} \ s* ( ?:
, 8 192 , & nbsp) * [ \ ( \ | ] * \ s* ; { 0, 1} [ a- zA- Z \ ( \ . ] [ \ s - . % - \ . \ ( \ / a

, - zA- z0- 9 \ [ ] * [ \ ) a- zA- Z0- 9 \ . \ ] < B R \ s{0, 1} \ / {0, 1}> / U i

, 6 -
1. 2. 2 ,
, , , $ pattern3
, , $ pattern4
, , U i , U
:
$ pat tern1= \ [ \ s] * ( [ a- zA- Z \ ( \ . ] [ \ s- . % - \ . \ ( \ / \ [ a- zA- ,
Z0- 9] * [ \ ) a- zA- Z0- 9 \ . \ ] ] ) ; { 0, 1} \ s* ( ? : & nbsp) * [ \ ( \ | ] * , / aa0 , / a+ 0
\ s* ; {0, 1}[ \ x80- \ x ff] [ ^a- zA- Z] {1, } < B R \ s{0, 1} \ / {0, 1}> / Ui / a0, / a0
4 - ( 1) , , i
$ pattern1 , ,
[ \ s] * ( 0 )
( ) 1 pattern1 1. 2. 3
, : a.
, UT F HT ML , ; b.
- 8 ASCII ; c. , (
, 127 , ) ; d. ,
, [ ^a- zA- Z] , 7
63
2008 11 Journal of Informati on No. 11, 2008

3
,
- 822 99. 04% 100%

- 9874 99. 06% 98%
, , - ( l- p) 3910 99. 54% 100%
, , - ( a) 32 21. 3% 93. 75%
- 37 88. 1% 100%
, , / > 0
- 3460 97. 25% 100%
, ; - ( w ) 159 92. 44% 100%
, 360 - ( v) 283 99. 65% 100%
- ( t) 231 99. 57% 98. 7%

2 ,
, ,
,
, ,

100% , /
0
,
66. 7% ,
,
,


3 ,
,
,
: a.
,
,

,
,
1 3 ,
b.
7 / t matr ix t
0/ t matrix t0 ,
2
/ 0 , t
2. 1
: a. V, ; b.
A: 30 , 3
; c. P: , -
/ ; d. R: , ,
, / , ,
,
2. 2 Web 30 ,
, 2 3 , ,
2
,
V A P R ,
12. 5s 66. 7% 99. 9% 88. 44%
,
( 68 )

64
2008 11 Journal of Informati on No. 11, 2008

2
[ M ] . . : , -

, 2001: 77- 123
1 , , . [ J ] . 9 Luft man Jerry. Assessing Business - IT Alignment M at urity [ R ] .
, 2005, 2( 3) : 340- 346 Communicat ions of A IS, 2000( 12) : 1- 49assessment , 2002
2 , , . 10 . . ht tp : / / it . city. sc. cn/
[ J] . , 2005, 2( 4) : 410- 416 HTM LS/ 200572611308054- 2. ht ml, 2005
3 G ibson C F, N olan R L. M anaging t he Four St ages of ED P Growt h 11 . [ J ] . , 2005, 23
[ J] . Harvard Business Review , 1974, 52( 1) : 76- 88 ( 2) : 9- 13
4 N olan R L. M anaging t he Comput er Resource: ASt age Hypot hesis 12 , , . [ J ] .
[ J] . Communicat ions of A CM , 1973, 16( 7) : 399- 405 ( ) , 2007, 37( 4) : 976- 980
5 N olan R L, Croson D C, Seger K N. Th e Stages t heory: A Frame- 13 , . [ J] .
w ork for IT Adoption an d Organizat ional Learning[ M ] . Bost on: Har- , 2007, ( 12) : 136- 138
vard Business School Publishing, 1993 14 , . [ J] .
6 Nolan R L. M anaging t he Crisis in Data Processing[ J] . Harvard Bus-
i , 2007, ( 8) : 39- 44
ness Review , 1979, 57( 2) : 115- 126 15 , , .
7 K oen Brand, HarryBoonen. IT OG vernanee. A Poeket Guide based on [ J] . , 2007, ( 8) : 108- 110
CO BIT [ M ] . V an Haren Publishing, 2004: 56- 135 16 . 5
8 S oumet ra Dut ta, M azoni Jean - Fransow a. 6 , 200 ( : )

( 64 ) 5 D . H iemstra, F. de Jong, W. K raaij. A Domain S pecif ic Lexicon


A cquisit ion Tool for Cross - Language Informat ion R et rieval[ C] . In
Proceedings of RIA O97, M ontreal, Canada, 1997: 217- 232

6 W. A. Gale, K. W . Church. Ident ifying Word Corresponden ces in
1 , , , . Parallel Texts [ C] . Proceedings of t he 4th DA RPA W orkshop on
[ J] . , 2000, 14( 6) : 33- 39 Speech and N at ural Language. 1991: 152- 157
2 , , . 7 I. Dagan, K . W . Church , W . A . G ale. Robust Bil ingual Word
[ J] . , 2003, 22( 3) : 310- 314 Alignment for M achine Aided T ranslat ion[ C ] . Proceedings of Work-
3 Lars Ah renberg, M ikael A ndersson, M agnus M erkel. A S imple Hy- shop on V ery Large Corpora, 1993: 1- 8
brid A lign er f or G enerat ing Lex ical Correspon dences in Parallel T exts 8 N agat a. M , S aito. T, Suzuki. K . U sing t he Web as a Bilingual Dict io-
[ C] . In 36t h A nnual M eet ing of the Associat ion for Computational nary[ C] . Proceeding of w orkshop on Dat a- driven M et hods in M a-
Linguistics and 17th Int ernat ional Conf erence on Computational Lin- chine Translation, 2001: 95- 102
guistics ( COL IN G - ACL. 98) , M ont real, 1998: 29- 35 9 J ian - Cheng Wu, T racy Lin, Jason S. Chang. Learning Source-
4 Jorg T iedemann. Ext ract ion of T ranslat ion Equivalents From Parallel Target S urface Patt erns f or W eb - Based Terminology Translation
Corpora[ C ] . In 11t h N ordic Conf erence of Computational Linguis- [ C] . Proceedings of t he ACL Int eract ive Post er and Demonstration
t ics, Copenhagen, D enmark, 1998: 120- 128 Sessions, 2005: 37- 40 ( : )

68