You are on page 1of 7

BAN TIN LIEN HIEP TH VIEN

THANG 11/2002

XAY DNG KHO D LIEU M

CHO TRAO OI THONG TIN TH VIEN


TS. Hoang Le Minh
Trung Tam Cong ngh Phan mm HQG TP. HCM
Tm tt
Xy dng cc kho d liu thng tin th vin theo hng m, chia s ti a cc ti nguyn trn
mng Internet l mt trong cc nhim v pht trin ng dng CNTT. Trung tm Cng Ngh Phn Mm
HQG-HCM (UniSoft) trong thi gian qua trin khai thnh cng mt s gii php v phn mm tin hc
ho trn nn cc cng ngh ngun m trong lnh vc qun l thng tin th vin. Bi vit ny nhm nhn
mnh tm quan trng ca vic ng dng cc gii php cng ngh ngun m v cc chun m trong xy
dng cc kho d liu m qun l v chia s cc thng tin trn mng Internet. Chng ti c gng tr li
cc cu hi thng gp sau ca c gi:
1. Ti sao nn xy dng cc kho d liu m, m t bng cc th metadata ?
2. Chun Dublin Core Metadata (Z.39.85-2001) l g ?
3. Qun l v truy xut cc kho d liu m trn mng Internet nh th no ?
I. Thng tin v cc kho d liu thng tin.
Trong th gii hin i, mi hot ng ca con ngi u cn n thng tin
1. Thng tin v gi c v th trng, khch hng v i tc, v.v... trong hot ng sn xut, kinh
doanh, thng mi, ti chnh.
2. Thng tin v chnh sch v php lut, thu v tin t, dn s v lao ng, t ai v mi trng
trong qun l kinh t, qun l nh nc.
3. Thng tin v vn ho, khoa hc, gio dc, c bit trong lnh vc hot ng Thng tin - Th
vin.
c c cc thng tin cn thit v cp nht v ni dung, phi xy dng cc kho d liu. l quy
trnh phc tp v tn nhiu chi ph, bao gm nhiu cng on khc nhau t thu thp, x l, phn tch, t
chc lu tr thng tin theo mt s tiu chun no . Gii php CNTT thng c ng dng l s dng cc
h qun tr c s d liu v cc phn mm giao tip qun l v truy tm cc thng tin cn thit mt cch
nhanh chng.
Trong k hoch tng th trc y v hin nay v pht trin CNTT, trung ng v cc a phng
u ch trng xy dng cc kho d liu, th hin qua nhng d n xy dng, qun l v khai thc cc thng
tin, cc Trung tm tch hp d liu. Ring trong lnh vc thng tin th vin, c kh nhiu d n u t
ca nh nc v ngun vn vay hay ti tr ca nc ngoi xy dng cc Th vin in t s ho.
Trong thi i cch mng cng ngh thng tin, cc kho thng tin cn c lin kt vi nhau gip
chia s ti nguyn, cung cp dch v. Tuy nhin khng phi h qun tr CSDL no cng c th p ng tt
nhu cu tch hp, chia s thng tin t cc ngun phn tn, i khi khng tht chun mc! La chn gii
php kh thi trao i, lin thng cc c s d liu l iu rt kh thc hin.
Trn thc t c hai xu hng gii quyt vn nh sau:
1. Thng nht s dng chung mt phn mm hay mt c s d liu.
2. Xy dng Trung tm tch hp d liu thng nht d liu t nhiu ngun.

25

BAN TIN LIEN HIEP TH VIEN

THANG 11/2002

Gii php dng chung phn mm v c s d liu tuy c lm gim c chi ph phi chuyn i
v nhp li d liu, nhng rt kh thc hin quy m rng, ti nhiu a im khc nhau, bi v n
i hi mi trng v trnh ngi s dng phi tng i thng nht. Mt khc kh c th u t
v trin khai trn din rng mt phn mm dng chung c cht lng cao.
Xy dng cc trung tm tch hp thng tin d liu cng i hi thi gian v chi ph kh ln, v
trn thc t cng kh lng gii quyt c cc vn nu trn. Thc t p dng CNTT ca ngnh GDT trong tuyn sinh i hc nm 2002 cho thy gii php tch hp d liu t cc trng v cc a
phng c c kt qu xt tuyn chung khng thnh cng.
Vn t ra y l lm sao cc ngun vn u t xy dng cc kho d liu dng chung
phi pht huy hiu qu thc s. lm c, ch c mt cch duy nht l phi a thng tin n tn
tay ngi s dng (thng qua mng Internet), ng thi m rng kh nng cung cp v cp nht
thng tin trc tip t ngun, khng ch i thng tin chuyn n cc trung tm tch hp ri mi x l.
Xut pht t thc t trn, UniSoft xut m hnh c th gip xy dng cc kho d liu dng
chung mt cch n gin v nhanh chng nh sau:
3. Thng nht chun m t d liu bng ngn ng XML, da trn cc th metadata; s dng
mng Internet/Intranet xy dng cc kho d liu m; xy dng phn mm qun l bng
cng ngh ngun m v tra cu d liu trn c s cc my tm kim thng tin (search engine).
Cc kho d liu m l tp hp cc tp tin c gn cc th m t metadata, c lu tr
trong cc th mc v c th truy xut ton vn bn (full-text) t cc a ch website. c im khc
bit ca h thng cc d liu ny so vi cc trang web thng thng l ch:
a. Mi d liu c km theo cc th m t metadata; phn m t ny c t phn u
(header) ca tp tin d liu,
b. Phn mm qun tr s c cc thng tin metadata trong phn header v trnh by kt qu
(ngn gn hay y ) di dng cc trang web.
c.

C ch tm kim thng tin c xy dng trn nguyn tc hat ng ca cc search engine,


ngha l ta thc hin tm kim theo thng tin metadata v full-text trc to ra cc bng
ch mc (index), gip ngi s dng tm kim thng tin mt cch nhanh chng.

Vi gii php cng ngh nh trn, chng ti tin tng rng c th nhanh chng hnh thnh
c nhng kho d liu rt ln vi chi ph thp, khng b cc hn ch nh trng hp s dng cc
c s d liu ng. ng thi vic thm v cp nht d liu c th thc hin rt n gin, bng
nhiu cng c san tho trang web v trnh by d liu quen thuc nh MS FrontPage, MS Word,
v.v Thnh phn quan trng nht ca h thng l phn mm trnh din d liu v phn mm tm
kim thng tin u c thit k chy trn my ch v trn mng Internet.
Di y chng ta s tm hiu k hn qu trnh m t ti liu (bao gm c ti liu dng ton
vn bn) theo chun ISO Z39.85-2001 Dublin Core Metadata cho cc th vin in t s ho trn
mng Internet.
II. Ngn ng nh du m rng XML v cc th Dublin Core Metadata.
Ngn ng nh du m rng (XML) l gii php thch hp cho vn trao i d liu t ng gia
cc kho thng tin trn mng Internet.
Bn thn ngn ng XML (eXtensible Markup Language) c ngun gc ging nh ngn ng nh dng
siu vn bn HTML, t chun ngn ng nh dng vn bn tng qut c cu trc SGML (Structured General

26

BAN TIN LIEN HIEP TH VIEN

THANG 11/2002

Markup Language). Mi vn bn XML bao gm cc th (tag) vi tn gi ca cc phn t (element). Nhng


khc vi ngn ng HTML, s lng v tn gi cc phn t trong XML l khng hn ch.
Cc vn bn XML vi thng tin m t km theo, gi l cc th metadata, c th d dng chuyn giao
trn mng Internet, tng t nh cc tp tin HTML. c bit cc vn bn ny c th c cc phn mm
c (cc b XML Parser) x l t ng trnh din trn web. Cc phn mm ny cng c vai tr tng t
nh cc trnh duyt thng tin, ch khc l cc b XML Parser cho php thc hin qu trnh t ng x l
nhp/xut d liu thay cho con ngi. Tuy tng v mt ngn ng vn bn gip t ng ho qu trnh
nhp/xut/trao i d liu gia cc kho d liu nh XML l ht sc n gin, nhng hiu qu m XML em
li rt to ln, gip chng ta c th gii quyt nhng vn cn vng mc trong qu trnh trao i thng tin
trn mng Internet.
Chun trao i cc d liu thng tin th vin trn Internet hin nay c t chc tiu chun
quc gia ca M thng qua nhm thay th cho cc chun c khng cn ph hp (nh Z39.50) l chun
ANSI/NISO Z.39.85-2001. Ni dung ch yu ca chun m t d liu ny gm 15 trng d liu, cn gi l
Dublin Core Metadata. y l cc trng d liu ph bin v hu ch nht km theo mt ti liu s ho
trao i trn mng Internet.
Th vin Quc gia Hoa K v Y hc (National Library of Medicine) bt u t nm 2001 hon ton
thc hin trao i thng tin v cc d liu bng ngn ng XML, sau khi chuyn i hn 10 triu bn ghi
MARC sang dng XML. Cc th vin ca Php v Nht bn cng ang bt u chuyn qua s dng ngn ng
XML cho cng tc bin mc v trao i thng tin, v.v.
Metadata l g ?
Metadata dng m t mt ti nguyn thng tin. Thut ng "meta" xut x l mt t Hy Lp dng
ch mt ci g c bn cht c bn hn hoc cao hn. V vy metadata l d liu v nhng d liu khc.
N c nhng th th truyn thng t vo trong cc bin mc v c s dng thng thng nht l
m t thng tin v cc ti nguyn Web.
Mt bn ghi metadata bao gm mt tp cc thuc tnh hoc tp cc phn t cn thit m t cc ti
nguyn theo yu cu. V d, mt h thng metadata thng thng trong bin mc th vin cha mt tp
cc bn ghi metadata dng m t sch nh: tc gi, nhan , ngy xut bn, tiu mc, s gi
ch v tr trn gi sch, v.v.
Mi lin kt gia cc bn ghi metadata v ti nguyn c th mt trong hai dng sau:

Cc phn t metadata c cha trong mt bn ghi ring tch ri vi ti liu, v d trng hp bn


ghi bin mc ca th vin truyn thng.

Metadata c th c gn vo trong ti liu. V d trong bin mc xut bn nc ngoi, d liu m


t ny c gn vo trang sau ca trang tiu sch. i vi cc th vin s ho, vic gn cc
thng tin bin mc metadata vo ngay trong ti liu ton vn bn l yu cu bt buc.

C nhiu chun m t bin mc mang tnh cht metadata kh thng dng, th d MARC21/UNIMARC,
ISO 2709, Dublin Core Metadata, Cc d liu ny thng c gn vo phn u cho mi ti liu in t
t trn website v rt thch hp cho cc search engine tm kim, lc ra cc thng tin metadata t chc
thnh cc kho d liu m khng cn dng n h qun tr c s d liu truyn thng. Ni cch khc, ngay
bn thn ngn ng XML t n h tr cho vic hnh thnh mt c s d liu ton vn bn, phi cu trc rt
tin li cho vic tm kim v trao i thng tin.
Dublin Core Metadata l g ?

27

BAN TIN LIEN HIEP TH VIEN

THANG 11/2002

Dublin Core Metadada l chun dng m t ni dung ca biu ghi v d liu. N n gin hn MARC
Format rt nhiu v ch c 15 phn t. Mc ch thit k metadata ny l s dng trn mng Internet, m
t cc ti liu chuyn ngnh trong cc th vin v ti liu ti cc Web site khc nhau. Cc phn t d liu
MARC v Dublin Core c th trao i ln nhau theo mt gin quy nh khi hin th cho ngi dng. Cc
phn t ny l:
1. Nhan

- tn gi chnh thc ca ti liu (Title)

2. Tc gi

- tn ca mt hay mt s tc gi chnh (Creator)

3. mc

- tn tiu mc dng phn loi ti liu (Subject)

4. M t

- m t vn tt ni dung ti liu (Description)

5. Nh xut bn

- tn gi, ni ban hnh ti liu (Publisher)

6. Tc gi ph

- tn ca nhng tc gi cng tc (Contributor)

7. Ngy thng

- ngy thng ban hnh ti liu (Date)

8. Loi ti liu

- phn loi ti liu (Type)

9. M t vt l

- cc thng tin v dng vt l (Format)

10. nh danh

- cc thng tin nh danh ca ti liu (Identifier)

11. Ngun gc

- cc thng tin v xut x ca ti liu (Source)

12. Ngn ng

- cc thng tin v ngn ng (Language)

13. Lin kt

- cc thng tin lin h ca ti liu (Relation)

14. Ni cha

- cc thng tin lin quan v tr lu tr (Coverage)

15. Bn quyn

- cc thng tin lin quan quyn tc gi (Rights)

Vic s dng cc phn t ca Dublin Core l ty chn v c th lp li. Mi phn t cng c mt tp


gii hn cc thuc tnh gii hn ngha ca n. Dublin Core Metadata c nhng c tnh sau:

Vic to lp v duy tr kh d dng: Cho php nhng ngi khng chuyn nghip c th to cc bn
ghi m t n gin cho cc ti nguyn thng tin v vic truy xut chng trn mi trng mng mt
cch d dng.

Ng ngha d hiu v thng dng: Vic khai thc thng tin trn mng Internet din rng thng gp
tr ngi bi nhng s khc nhau v thut ng v s m t thc t. Dublin Core Matadata gip
nhng ngi d tm thng tin khng chuyn c th tm thy vn ca mnh bng cch h tr mt
tp cc phn t thng dng m ng ngha ca chng c hiu ph bin. V d, nhng nh khoa
hc th lin quan n cc bi bo ca tc gi ni ting, nhng nh nghin cu ngh thut th quan
tm n cc tc phm ngh thut ca ngh nhn ni ting.

Phm vi quc t: Tp cc phn t Dublin Core Metadata lc u c pht trin bng ting Anh,
nhng cc phin bn sau ca n c to bi cc ngn ng khc nhau. n thng 11 nm 1999
c kh nhiu phin bn vi trn 20 ngn ng, ch yu l ting Phn Lan, ting Na Uy, ting Thi,
ting Nht, ting Php, ting B o Nha, ting c, ting Hy Lp, ting In--n-xi-a v ting Ty
Ban Nha.

Tnh m rng : Nhng nh pht trin Dublin Core Metadata cung cp mt c ch cho vic m
rng tp cc phn t Dublin Core, phc v nhu cu khai thc cc ti nguyn b sung. Cc phn t
metadata t nhng tp cc phn t khc nhau c th lin kt vi Metadata ca Dublin Core. iu
ny cho php cc t chc khc nhau c th dng cc phn t Dublin Core m t thng tin thch
hp cho vic s dng trn Internet.

28

BAN TIN LIEN HIEP TH VIEN

THANG 11/2002

C php ca Dublin Core Metadata:


HTML h tr mt khun dng d hiu dng m t cc khi nim c bn ca Dublin Core nhng vi
nhng ng dng phc tp th vic s dng RDF/XML (Resource Description Framework C cu m t ti
nguyn bng cch s dng XML) c hiu qu hn dng HTML. HTML s dng hai nhn: "<META>" v
"<LINK>" sao chp metadata. Nu c thm vo vic to metadata th cc nhn ny phi xut hin trong
phn HEAD ca ti liu HTML. Mi phn t l ty chn v c th lp li. Cc phn t metadata c th xut
hin theo bt k th t no. V d:
<HTML>
<HEAD>
<TITLE>Tuyn ngn ng cng sn</TITLE>
<META NAME="DC.Creator" CONTENT="Marx, Karl">
<META NAME="DC.Creator" CONTENT="Engels, Friedrich">
<META NAME="DC.Title" CONTENT="Tuyn ngn ng cng sn" >
<META NAME="DC.Date" CONTENT="1887" >
<META NAME="DC.Type" CONTENT="document">
<META NAME="DC.Format" CONTENT="text/html">
<META NAME="DC.Identifier" CONTENT="http://www.nhandan.org.vn/tndcs.htm">
</HEAD>
<BODY>
<H1>Tuyn ngn ng Cng sn</H1>
<P>....</P>
</BODY>
</HTML>
III. Qun l v truy xut cc kho ti liu cha th Metadata
Vi vic p dng rng ri th Metadata trong m t ti liu, cc kho d liu thng tin c th c hnh
thnh nhanh chng m khng tn chi ph nhp v x l d liu. Cc qu trnh qun l, tm kim thng tin
u c th thc hin hon ton t ng da vo nguyn l lm vic ca phn ln cc h thng thng tin
website hin nay. Th d cc phn mm c XML x l v trnh din u hiu rng cc th metadata bt
u sau dng "<HEAD>" v kt thc trc dng "</HEAD>". Do ta c th trch ly metadata mt cch t
ng trong khi cc trnh duyt Web c th b qua n. Nhng my tm kim thng tin (search engine) hin
nay u c kh nng c v s dng cc thng tin trong nhn <META> ca HTML trong cc ti liu web.
Nh vy quy trnh qun l cc kho d liu m ny tr nn ht sc n gin:
1. Nhp thm d liu mi vo kho ch l cng on sao chp thm mt tp tin ti liu c cha th
metadata m t t trc. Qu trnh ny c th thc hin t ng bng cc thao tc ti ln (upload)
cc tp tin, hay s dng e-mail gi ti liu n kho.
2. Cp nht d liu bng cc sa li cc thng tin trong cc th metadata bng cc cng c to
trang web, hon ton khng cn dng n phn mm ring bit giao tip c s d liu.
3. Xo / di chuyn d liu ch n gin l cc thao tc xo / di chuyn tp tin trn mt website hay
gia cc website vi nhau.
Vi cc thao tc n gin, vic tp hp d liu hnh thnh cc kho thng tin, ti liu hu ch cho
ngi s dng l khng phc tp, hp vi trnh ca a s ngi qun tr cc h thng thng tin - th vin
hin nay. Khai thc v trao i d liu gia cc kho thng tin m ny cng rt n gin, vi s tr gip ca
cc phn mm tm kim qun l thng tin Internet.

29

BAN TIN LIEN HIEP TH VIEN

THANG 11/2002

Mt trong cc kt qu kh thnh cng ca UniSoft theo hng ng dng ngn ng XML, m t Dublin
Core Metadata xy dng cc kho thng tin m l phn mm qun l tin tc iNews c pht trin trn
nn cng thng tin iPortal cho h thng thng tin website ca HQG TP.HCM. iNews bao gm ba phn h
ng dng:
iNewsReader dng c cc tin tc trn website. Do c thit k nh cc knh ng dng trong
iPortal, h thng ny c th phc v ng thi nhiu mc ch khc nhau: thng tin cng cng (dnh cho tt
c mi ngi, khng cn ng nhp v xc thc) v cc thng tin ni b (cn ng nhp v xc thc c
th truy cp c tng loi thng tin dnh ring cho mi i tng ngi dng). Cc thng tin ni b nh
cc bo co, lch cng tc hng tun, hng thng dnh cho cn b lnh o v cc c nhn c lin quan, ca
tng b phn, phng ban. Do vic cung cp tin phi qua kim duyt v c phn quyn r rng, nn m
bo cht lng v cc yu cu khc v qun l thng tin. Giao din xem cc tin tc ca iNewsReader trong
trnh duyt c khun dng ging nhau cho cc b phn ca t chc, tuy nhin truy cp c tng h
thng tin dnh ring, ngi s dng phi c xc thc (ng nhp vo website)

Cc thng tin, bn tin c t chc theo ch (trong minh ha trn thng tin c t chc theo
ch Vn ha v Vi tnh) bao gm tn ch , tn bn tin, tc gi, ngy ng tin, hnh nh minh ha km
theo nu c v thng tin chi tit (khi click vo tn bn tin). Ngoi ra, h thng iNews cn h tr chc nng
lu tr cc thng tin c ng v cho php c ch tm kim li thng tin theo thi gian.

30

BAN TIN LIEN HIEP TH VIEN

THANG 11/2002

Phn h iNewsEditor gip bin son mt bn tin mi cho h thng iNewsReader trong website.
Sau khi bin son xong mt tin mi, ngi s dng s lu vo trong th mc nh trc trn website di
dng tp tin cha metadata. Giao din iNewsEditor c dng nh sau :

iNewsManager l phn h qun tr h thng thng tin website, cho php ngi qun tr c th to mi cc
th mc lu tr thng tin, sa cha, cp nht cc tin, xo cc tin c hay a vo v tr lu tr (khng hin
th khi ngi dng xem tin). Giao din v cc chc nng ca knh ng dng ny c thit k gn ging vi
mt ca s qun tr th mc ca h iu hnh Windows, gip ngi dng d dng lm quen nhanh chng
vi vic s dng.

31