You are on page 1of 29

Mc lc

Li ni u........................................................................................................................................................................1
Gii thiu..........................................................................................................................................................................1
Mt s t in cng loi...................................................................................................................................................2
Nhng kin thc cn trang b............................................................................................................................................4
D liu t in.............................................................................................................................................................4
T ASCII n Unicode.................................................................................................................................................5
Ngn ng lp trnh v cc tin ch cn chun b..........................................................................................................9
Bt u lm vic vi file d liu t in....................................................................................................................10
Progressbar v thread.................................................................................................................................................10
Cc chc nng ca t in..............................................................................................................................................10
So snh v sp xp theo ngn ng.............................................................................................................................10
Pht m cho t in....................................................................................................................................................12
Hin th d liu c nh dng mu sc + chuyn tip t............................................................................................12
Tra t qua clipboard...................................................................................................................................................12
Thut ton lm t in....................................................................................................................................................13
o tc thut ton...................................................................................................................................................13
Mt s thut ton lm t in....................................................................................................................................13
Chun dict.org............................................................................................................................................................14
C ch load danh sch nhanh.....................................................................................................................................20
Tm kim nng cao cho t in..................................................................................................................................23
nh dng t in SPDict..........................................................................................................................................27
Nhng gii php cha hon thnh..................................................................................................................................29
Nhng tnh nng cha hon thnh.............................................................................................................................29
Lin kt online offline................................................................................................................................................30

Li ni u
Super Power Dict (SPDict) l mt t in m v m ngun (khng dng bt c bin php no bo v m ngun) , d
liu cng nh thut ton lm t in vi tinh thn chia s kin thc cng nh kinh nghim ca bn thn cho cc bn
c hng th vi vn ny , tuy nhin t phin bn ny t s khng cung cp source code nh nhiu phn mm
ngun m hin nay down v bin dch m cc bn mun c source code th dng cc trnh dch ngc file java v
d nh : DJ Java Decompiler.
Nguyn nhn ca vic lm ny l t mun truyn t kin thc cho nhng ngi thc s mun tm hiu , c mt k
nng code nht nh v c kh nng pht trin ng gp thm cng ng ngun m ch khng phi dnh cho
nhng ngi ch bit n sn , bc code v sa tn ri em i khoe ci ny l ca mnh , thm ch cn chng bit code
java (T_T) , hoc bit code nhng c n sn nn m ra li , dch c , chy c l cho vo ng dng ca mnh
m chng nghin cu c g v t in ht .
Ti liu ny c trnh by l ton b nhng g mnh mun chia s kin thc , khng h tr thm ring cho ai , nhng
th khng c trnh by trong ti liu th mt l khng mun truyn t , hai l qu d , qu c bn khng ng
truyn t , v vy cc bn gi yu cu gip ring th tuyn b trc l mnh khng lm u .
Thng cm , mnh hi rn trong vic ny v ngi cn h tr th sng ri , c my ai hiu c ni lng ca ngi
suppost u (^_^), 1 s trng hp xin xong ri cn phi tay , ch ny ch n , ph nhn nhng th ngi khc phi
mt nhiu cng sc lm ra , c ngi th ch bit g th bit h tr ci g y , ch l lm h , n sn .. ngoi ra
t cn mt thi gian gii p v ai cng ch mun h tr ring , khng chu tm kim , add nick , gi mail spam
lung tung , t tng phi b 1 nick yahoo v nhng l do ny .

Gii thiu
Gii thiu qua cht v bn thn:

H tn : Bi c Tin
Nm sinh 1989
web :
http://superpowerdict.googlepages.com/
http://tienlbhoc.vnbb.com/
Lm t in nu ch vi cc chc nng c bn th khng phi l ci g phc tp lm , nhng i su vo hon thin
y th li rt kh khn . ti ny c nhiu sinh vin CNTT chn lm ti lm bi tp ln nhng s lng
phn mm m ngun m hay min ph c cht lng tht s khng nhiu. Mnh mun lm mt ci , gi m ging
unikey d t tnh nng hn nhng n nh thay th c vietkey th tt.
Hin nay c rt nhiu t in online , khng phi ci t, cp nht t thng xuyn , min ph , nhng n cng c
nhng yu im khng th thay th c t in ci t v vy vic lm t in ny vn rt cn thit v:
+Khng phi ai cng ln net tra v khng phi ai cng c net , cha k nhiu ngi xt tin net na
+i khi ngi dng ch mun t to ring cho mnh ghi chp vn g hay chuyn ngnh g cho ring
mnh hay chia s ni b . T in online hin gi khng th lm c iu
+V l web nn v d cc tnh nng kiu nh click and see , load y danh sch ... khng th c
+Web c th dng hot ng bt c lc no
+D liu thm vo trang web s khng ly li c, nh vy , phi gn b vi n mi mi , ging nh google khng
th chat vi yahoo , khi ngi dng khng thch dng web ny , th khi sang web khc, c th s phi nhp li cc t
trc kia .
l l do ngi dng t nhp t vo t in online m cn phi c mt t in d liu m nh SPDict

Mt s t in cng loi
T in offline l cc t in chy trn my m khng cn kt ni net :
Hin ti ni ting nht trong gii m ngun m l Stardict , mt phn mm c kh nng tra t kh nhanh , gn nh,
bt t trong ng dng khc (click and see ) kh tt ,nh dng ca n l mt bin th nng cp ca chun Dict ,
mt chun m trong thi im hin ti a s cc phn mm min ph , ngun m u dng lm t in , nhng n
cng c nhng nhc im nht nh cha khc phc c ca chun Dict l d liu t in phi ghi vo trong
nhiu file d liu khc nhau , nh trong stardict l 4 file. Ngoi ra kh nng thm xo khng c cng l mt hn ch
rt ln ca chun t in ny v cc bin th ca n .
Ngoi cc soft m ngun m , phi k n mt b phn khng nh l soft t in min ph vi i din m mnh cho
l tt nht l lingoes , vi cc tnh nng cng kh ging stardict nhng c u im l phn danh sch t ca n xuyn
sut t u n cui , c pht m cng ngh text to speech ca microsoft , cn stardict ch c 30 t trong danh sch.
Nhng n cng c nhc im l tc gi ca n khng cung cp b cng c convert d liu t in, phi gi d liu
bin son bng text cho tc gi tc gi convert thnh ra hin ti b d liu t in ca n khng nhiu ngn ng
v hin ti stardict vn ph bin hn.
i vi cc phn mm min ph trong nc th c cc phn mm :Multidictionary (ph bin nht) , powerclick v
jtranslator mi xut hin.
Tt c cc t in trn u khng c kh nng thm xo t v to t in .
Hi vng t in ny ra i s kho lp c ch trng . ng thi vi c ch d liu m v m ngun m s thc
y c s xut hin ca nhng t in min ph , ngun m tng thm v cht lng , tnh nng phc v cho cng
ng . Trong thi bui hi nhp ny , vic hc ngoi ng rt quan trng v cng vic ny ca mnh chc cng khng
phi l v ngha . Tuy ni vy thi ch mnh ting Anh dt lm (^_^) .
Gii thiu lun cho mi ngi mt s t in thng mi ph bin hin nay:
+Lc vit mtd : c thm xo t , tra t click and see , chy n nh , d liu t in tng i y .
+Just click n see: ch c tra t click and see(tt hn lc vit) , rt t tnh nng.
+Evtran 2.0 : c thm xo, n l phn mm dch th ng hn nhng bn 3.0 khng c kh nng thm xo, chc nng
click and see ln vista th lit. cc bn c th vo trang vdict.com dch trc tuyn min ph
+English Study 4.0: y l phn mm ng php ting anh + t in + luyn nghe

+babylon: khng c kh nng thm xo nhng kh nng click and see thuc loi tt nht hin nay cng vi kh nng
tm kim t gn ng hon ho, n rt mnh .
+prodict, v javidict thuc cng mt hng, c im ca loi ny l d liu ln nht hin nay gm nhiu chuyn
ngnh, nhng khng thm xo (javidict th mnh khng bit), tra t click and see ngang lc vit , hin nay trang
http://tratu.baamboo.com/ mua bn quyn d liu v cc bn c th tra t trc tuyn trn min ph.
y l bng so snh:
D liu Click and Tm kim
see

Pht
m

Cp
File
nht d data
liu

Load danh sch

Stardict

Nhiu
ngn
ng

Tra a t
in, C
truy vn
m tm
c, tra
wildcard
(tra vi *
v ?)

Ging
tht
dung
lng
ln , t
mi
ting
anh

Khng 4 file
thm
xo, c
tool to
t in

30 t , danh sch chung cho cc t in

lingoes

Tra a t
in

Text to
speech
(TTS 4
v 5)

Khng, 1 file Load y , nhanh


phi
ci t
gi data
cho tc
gi
to

Multi
dictionary

Tm tm

Tra ti a TTS4
3 t
in,tm
kim thng
minh km

Tm tm

C 1 t
in ,
khng tm
thng minh

powerclick Rt t

Hn
ch rt
nhiu

4 file

y nhng chm

TTS4 , khng
mi
ting
anh

3 file

y , nhanh

jtranslator

Bnh
Tra qua
1 t in , Free
thng clipboard tm kim
java
thng minh TTS
tm tm

khng

3 file

y , nhanh

mtd

Nhiu, Trung
cht
bnh
lng
tt

C ,
1 file
khng
c tool
convert

y , nhanh

Just click n Rt t
see

kh

1 t in,
tra cho ,
cho dng
wildcard

TTS

km

khng khng

Khng y , nhanh
r

Evtran 2.0

Bnh
khng
thng

1 t in ,
tra
wildcard

khng c

Nhiu
file

y , nhanh

Evtran 3.0

Bnh
Trung
thng bnh

2 t in

khng khng

Nhiu
file

Khng c

EStudy

Hi t

1 t in,
tm thng
minh km

Ging
tht ,
ch
ting

Trung
bnh

C ,
Nhiu
khng file
tool
convert

y , nhanh

Anh
babylon7

Nhiu
ngn
ng

Rt tt

Prodic

D liu Bnh
ln
thng
nht

SPDict

Nhiu
ngn
ng

Tra a t TTS 4 Khng , 1 file Nhiu t in, load t t , danh sch chung
in, tm
+5
c tool ci t
thng minh
convert
rt tt
Bnh
TTS
thng ,
tm c
phn t v
ngha
nhng
chm

Khng, 1 file
ch cho
tm
online

Tra qua
Tra a t Java
Co , co 1 file
clipboard in, tm
text to tool
kim thng speech convert
minh vi
wildcard ,
regular
expression
tm t gn
ng .

y , nhanh

2 loi :
y vi spdict
Chung vi spdict small

Ngoi cc so snh trn spdict cn c 1 u im l chy a nn tng , ch cn c java runtime th d win hay linux u
chy c ht

Nhng kin thc cn trang b


D liu t in
Bt u nh, c th lm mt t in, ngoi vic c mt k nng lp trnh, mt thut ton tm kim nhanh, mt cu
trc d liu .... th ci cn nht l CSDL t in, nhp mt CSDL t u th ng l mt, cha k n va t v s
lng, km v ni dung, li c th sai v ng ngha ( con ngi m, sai l chuyn thng), rt may cho chng ta, c
mt ngun cung cp t in rt ln trn web ca ngi vit l trang www.tudientiengviet.net, s lng t in
y rt phong ph , a ngn ng v tha bn c th bt tay vo lm soft t in.
ly CSDL t in bn vo ti d liu stardict (mt t in ngun m kh ph bin nht l trong linux)
http://www.tudientiengviet.net/data.html
dng d liu stardict bn hy dng cng c stardict-editor
n l mt cng c convert file stardict sang nh dng dict.tab v ngc li.
file dict.tab sau khi convert t stardict s l file ly d liu cho t in ca chng ta bi v nh dng ca n cc k
n gin v n cn c mt s tnh nng b tr t in rt tt
y l trch nguyn vn ca nh dng ny:
:Here is a example dict.tab file
============
a 1\n2\n3
b 4\\5\n6
c 789
============
It means: write the search word first, then a Tab character, and the definition. If the definition contains
.\\ new line, just write \n, if contains \ character, just write
Bi vit u tin mnh gii thiu qua th, mi ngi c th nh hng, nhng mnh ni trc, plain text rt d

hiu nhng khng bao gi nn dng n lm t in v tc s rt chm, mnh th, nu ch tra mt t in nh


khong 30.000 t th cn c, tra vi t in ln cng mt lc ( nh babylon 17 ngn ng) th nguy, y l cha k
ngi dng ca bn c my cu hnh thp.
Lu :
+File stardict c 3 hoc 4 file , decompile bn phi chn file c ui ifo
+i khi stardict convert li vi ni dung nh sau:
Building...
File not exist: D:\YViet\star_yviet.dict
Please rename somedict.dict.dz to somedict.dict.gz and use SevenZip to uncompress the somedict.dict.gz
file, then you can get the somedict.dict file.
Done!
C ngha l 1 file ui dz ca stardict ny l file nn (stardict c th chy c vi file nn bng nh dng dictZip
(hnh nh th) , bn c th lm theo cch n hng dn , nhng mnh thng dng 7zip gii nn trc tip ra file dict
lun

T ASCII n Unicode
y l bi vit mnh search trn mng , v t in ca chng ta l a ngn ng, nn s dng m Unicode ch khng
phi vni hay tcvn3 , cc bn nn c cht kin thc v n , bi vit ny l bi vit thuc loi d hiu nht mnh tng
bit

T ASCII n Unicode.
kpham2@erols.com
(Xin cm n bn Minh Sn TP HCM dch bi vit ny t Anh ra Vit).
Bi vit ny l gip cc bn c trnh my tnh trung bnh hiu c Unicode v UTF-8 r rng hn. Sau khi c xong, cc
bn s bit c lch s ca Unicode, n c cc dng thc no, UTF-8 l g v ti sao lun i i vi Unicode.
Khi tng hp nn ti liu ny, cho n gin, ti b qua nhiu kha cnh hi phc tp ca Unicode nh cc ti v m t
hp, m dng sn. Nu c thiu st, mong cc bn thng cm. Thm vo , bi vit khng bn v cch ci t/s dng
Unicode font trong cc h iu hnh hay phn mm. V chuyn ny, cc bn c th tham kho trang web ca L Hon hay cc
th trao i v Unicode.
Mt vi iu cn lu :
Trong bi vit, ti ch dng h thp lc phn (h 16) ch gi tr ca cc m. V d, khi ti ni k t "a" c m l 61, bn phi
hiu rng y l 61 trong h thp lc phn (bng 97 h thp phn). L do l trong cc bng m, cc m thng c dng thp
lc phn ch t khi c dng thp phn.
cui bi, ti c mt bng m Unicode cho cc k t Vit nam cc bn tham kho. Nu mun xem ton b bng m Unicode
(di dng file PDF) vo http://www.unicode.org. Trong , click "Code Charts" v bn s thy nhiu "trang m". Ton b cc k
t ting Vit c th c tm thy cc trang Latin-1 Supplement, Latin Extend A v Latin Extend B, v Latin Extended
Additional. Bn c th in cc trang m nu mun.
Cui cng, bn c th b qua cc phn mnh bit v i thng n ni ti ni v UTF-16, UTF-8. Tuy nhin, ti cho rng nu
bn hiu r hn v cc bng m ASCII v ANSI th s hiu r hn s ra i v pht trin ca Unicode.
Mt s nh ngha hu ch:
-Bng m: Mt tp hp nhiu k t khc nhau. Mt v d l bng m chun ASCII (American Standard Code for Information
Interchange - M chun Hoa k trong Trao i Thng tin) bao gm 128 k t, phn ln l cc k s, k t ting Anh, nhng k t
c bit v thng dng nh cc du cng, tr, phn trm... Unicode l mt bng m chun khc, gm c hng ngn cc k t
gm ting Anh v quc t bao gm c cc k t Vit nam. Cng c mt vi bng m ting Vit (khng chun) nh TCVN-ABC,
VNI, VISCII, chng ch c ti a l 256 k t .
- M: Mt s nguyn dng i din cho mt k t trong mt bng m. M ca mt k t thay i ty theo bng m. V d, trong
bng m ting Vit TCVN-ABC, k t "" c m C7. Trong bng ting Vit VISCII, "" c m l A5. Trong bng Unicode, "" c
m l 1EA7 (=7847 thp phn). Lu l m ca mt k t cho thy v tr ca k t trong bng m. V d, trong bng Unicode,
"" nm v tr 7847 . Mi k t Unicode ch c "gn" mt m duy nht. V d, trong Unicode, bn khng th tm thy k t ""
ti bt k ch no khc ngoi v tr 7847. Cc my tnh ch bit mt k t qua m ca n. V d, khi bn nh Unicode dng mt
b g ting Vit v bn mun nhp ch "", b g tm cch gi m 1EA7 (sau khi c m ha di dng nh phn) n b
x l trung ng ca my tnh.

- Font Unicode: Mt font c gi l font Unicode khi n cung cp cu hnh ca cc k t trong bng m Unicode. Mt font file
(tp tin font) dng m ca mt k t ch nh cu hnh cho k t . V d, khi phi th hin k t "" trn mn hnh dng font
Arial, phn mm s lc tm m 1EA7 trong font file Arial.ttf v xc nh cu hnh tng ng. Nu mt font nh VNI-Times khng
h tr Unicode, n s khng c cu hnh cho m 1EA7 v n ch c m ln nht l FF (=255 thp phn). V vy, n khng th
hin th k t "" v n khng c gi l font Unicode. Tng t nh vy, cc font Arial, Times New Roman, Tahoma ca cc
h iu hnh nh Windows 95 hoc Windows 98 khng c cu hnh cho cc k t Unicode; do bn phi "cp nht" chng
bng cch ti v ci t cc font Unicode vi cc tn tng t vo my nu bn mun c mail hay duyt cc web site dng
Unicode font.
- Chui bit: Mt chui cc s nh phn, nh 01100001. Do my vi tnh ch "c" c s nh phn, d liu phi c chuyn
i thnh cc chui bit trc khi c nhp vo my. Mi k s trong mt s thp lc phn lun c biu din bng bn 4 s
nh phn. V d, 6 = 0110, 1 = 0001, F = 1111, 7 = 0111, 61 = 01100001, 7F=01111111.
- M ha (encoding): Cch biu din mt k t trong dng mt chui bit. Ty theo cch m ha, mt k t c th c biu
din khc nhau.
"UTF-16" l mt kiu m ha cc k t Unicode trong mi k t c biu din di dng mt chui 16-bit tng ng vi
gi tr ca m. V d, trong UTF-16, "" c m ho thnh mt chui 16-bit: 0001111010100111 (= 1EA7), tng ng vi
m gc ca "" trong bng Unicode.
"UTF-8" l mt kiu m ha khc cho cc k t Unicode, trong mi k t c biu din di dng MT hay NHIU chui 8bit, c th KHNG tng ng vi m gc. V d, trong UTF-8, "" c m ha thnh ba chui 8-bit (cng c th gi l mt
chui 24 bit) 111000011011101010100111 (= E1BAA7) khng tng ng vi m gc l 1EA7. Ti sao cn UTF-8. Chng ta
s bit sau.
- Gii m: Sau khi h iu hnh nhn c mt k t (v d c t mt file) c m ha, n phi gii m ly li m gc
ca k t trong bng m trc khi vo font file tm cu hnh v th hin k t trn mn hnh. Mt font file ch dng cc m gc
ch khng dng dng m ha.
H ASCII/ANSI: cc h iu hnh ch dng cc bng m ASCII hay ANSI. V d: Windows 95 dng bng m ANSI. Cc h
ASCII v ANSI lun lun dng mt n v d liu l 8 bit (1 byte).
QU TRNH PHT TRIN: t ASCII n ANSI cho n Unicode.
1. Bng m ASCII: 7-bit, cho php 128 m (2 m 7) Cn c tn khc l ISO 646-IRV. ASCII l b m u tin lc my tnh
c
pht minh
M cho php: t 0 n 7F
M nh nht: 0, dng cho k t NUL (null: trng trn, khng c g).
M ln nht 7F (=thp phn 127, =nh phn 01111111). c dng cho phm DEL (delete-xo).
(lu : mc d n v d liu l 8 bit, ch c 7 bit cui c dng, )
V d: Trong bng ASCII, k t "a" c m l 61.
Khuyt im: ch c 128 k t c cho php. Mi ngi cn nhiu m hn, nht l sau khi h DOS v my tnh c nhn xut
hin. V vy, ngi ta phi ngh ra b m ANSI.
2. Bng m ANSI : 8-bit, l bng m ASCII m rng; cho php 256 m (2 m 8).
Cc tn khc: ISO-8859-1, LATIN-1.
M cho php: t 0 n FF
M nh nht: 0, dng cho k t NUL.
M ln nht 255 = FF ( =thp phn 255, =nh phn 11111111 ) .
(lu : tt c 8 bit trong n v d liu c dng)
V d: trong bng ANSI, k t "" ca ting Vit c m l F4. (cc b m ting Vit u da trn ANSI vi nhiu sa i)
Lu : 128 k t u tin (cc m t 0...7F) ging nhau trong ASCII v ANSI.
V d, k t "a" c m l 61 trong c hai bng ASCII v ANSI. Ni cch khc, ASCII l tp con ca ANSI.
u im: s lng m cho php c tng n 256. Do , by gi bng m c ch cho cc k t khc bn cnh ting Anh.
Khuyt im: Vn cha ch cho cc k t quc t. (Tu, Hn Quc, Rp, Do Thi...,qu nhiu!) V vy, ngi ta pht minh
ra Unicode 16-bit.
3. Bng m Unicode 16-bit: Cho php 65536 m. (2 m 16)
Cc tn khc: ISO-14646, UCS-2.
M cho php: t 0 n FFFF M nh nht: 0, dng cho NUL
M ln nht 65535 = FFFF (= thp phn 65535, = nh phn 1111111111111111 )
V d: trong bng Unicode, k t "`" ca ting Vit c m l 1EA7.
Lu : 256 k t u tin (cc m t 0...255= FF) ging nhau trong ANSI v Unicode. V d, k t "a" c m l 61 trong c ba
bng ASCII v ANSI v Unicode. Ni cch khc, ANSI (cng nh ASCII) l tp con ca Unicode.
u im: ch cha ton b cc k t ca cc dn tc trn th gii.
Khuyt im: Hu ht cc my tnh vn cn dng b m ASCII, do chng khng nhn ra cc m ln hn 7F. V cn mt vn
ln hn l, cc h ASCII v ANSI, vn ch x l d liu theo tng chui 8-bit, s lm ln khi x l cc k t Unicode c m
ha di dng 16-bit (UTF-16). Cc h ASCII/ANSI s din dch MT k t Unicode 16-bit thnh HAI k t 8-bit. V d, k t "a"
dng 16-bit s c dch thnh HAI k t: k t th nht l NUL (00000000), v k t th hai l k t ASCII "a" (01100001).
Chng hn, khi bn mun th hin hng ch : "ABCDEF" c m ha UTF-16, c kh nng bn s nhn thy " A B C D E F"
trn mn hnh. (trn mn hnh, cc k t NUL c th c th hin thnh cc trng hay l cc vung, ty theo my).

Vn ny cn phi c gii quyt. Chng ta vn mun dng bng m Unicode nhng cn m ho cc k t theo cch no
m cc h ASCII c th nhn ra cc k t ca chng ta. Cch m ho UTF-16 r rng l c vn cho cc h iu hnh ph
bin hin nay vn ang dng chun ASCII/ANSI. l l do ngi ta sng ch ra cch m ho UTF-8.
4. Nguyn tc m ho UTF-8:
- Mt k t Unicode s c m ha thnh mt hay nhiu chui 8-bit cc h ASCII hay ANSI c th nhn din.
- tng thch vi ASCII, cc k t Unicode thuc bng m ASCII (m t 0 n 7F) c m ha thnh mt chui 8-bit tng
ng vi gi tr nh phn ca m. V bng ASCII ch c thun cc k t ting Anh, iu ny cng c ngha l cc h ASCII c
th c cc vn bn ting Anh vit bng Unicode UTF-8 mt cch d dng, khng cn phi chuyn i g.
- Tt c cc k t Unicode c m ln hn 7F c m ho thnh HAI hoc BA chui 8-bit (byte) ph hp vi nguyn tc trong
bng pha di.
- Trong UTF-8, byte u tin ca mt k t Unicode s ch nh c bao nhiu byte i km theo dnh cho k t . Nh vy nu
mt h ASCII/ ANSI sau khi c c byte th nht ca mt k t UTF-8 th s bit c bao nhiu byte i km cho k t . iu
ny gip cho n trong vic gii m ( ly tr li m Unicode) cho k t.
Di y l hai bng m ho UTF-16 v UTF-8 cho cc k t Unicode. Trong cccbng, mt ch "x", "y" hoc "z" c th l mt
bit 0 hoc mt bit 1.
Bng A: Cho m c gi tr t 0 n 7F (cc k t ASCII):
m
---0-7F

UTF-16
-------------------byte 1 byte 2
00000000 0xxxxxxx

UTF-8
----------0xxxxxxxx

Bng B: cho m t hex 80 tr ln:


m
-------

UTF-16
------------------byte 1
byte 2
80-7FF
00000yyy yyxxxxxx
800-FFFF zzzzyyyy yyxxxxxx

UTF-8
------------------------------byte 1
byte 2
byte 3
110yyyyy 10xxxxxx
1110zzzz 10yyyyyy 10xxxxxx

Theo bng A::


- Nu m NH HN hoc BNG 7F th c m ho thnh 8-bit tng ng vi dng nh phn ca m.
Theo bng B:
- Nu m LN HN 7F v NH HN hoc BNG 7FF th c m ho thnh 2 chui 8-bit.
- Nu m LN HN 7FF th c m ho thnh 3 chui 8-bit.
V d: M ho k t Unicode ting Vit "" (m = 1EA7) dng UTF-8:
1) u tin vit m thnh 1 chui 16-bit (UTF-16): 0001111010100111 tng ng vi 1EA7.
2) Ct chui 16-bit thnh hai byte: byte 1 l: 00011110 v byte 2 l: 10100111.
3) 1EA7 ln hn 7FF v nh hn FFFF. Theo bng trn, dng dng cui cng chuyn i (ngha l dng m ha UTF-8 ca
bn cho k t "" s c 3 chui 8-bit (3-byte) .
4) i chiu vi byte 1 v byte 2 trong dng cui ca ct UTF-16, bn s c: zzzz = 0001; yyyyyy = 111010; v xxxxxx =
100111.
5) i chiu vi byte 1 v byte 2 trong dng cui ca ct UTF-8, bn s c dng UTF-8:
byte 1 l: 1110zzzz = 11100001. (=E1)
byte 2 l: 10yyyyyy = 10111010 (= BA)
byte 3 l: 10xxxxxx = 10100111 (=A7).
Tng hp li, k t "" c m ha di dng UTF-8 l: E1BAA7.
Lu rng by gi bn c 3 byte cho k t , khc vi k t gc ch c 2 byte. Nu bn theo nguyn tc trn, bn c th vit
cc trnh m ho/gii m UTF-8 cho h thng ca mnh.
Thm vi v d UTF-8
k t/ m
UTF-8
-----------------------------a
97
97
A.
7840
225, 186,160;
A(` 7856
225, 186,176;
E^` 7872
225, 187,128;
O^ 7888
225, 187, 144;
O*~ 7904
225, 187, 160
Bng Unicode cho cc k t Vit Nam.
225 a'
224 a`
7843 a?
227 a~
7841 a.
7855 a(' a(` a(? a(~.
7857
7859

7861
7863
7845 a^' a^` a^? a^~ a^.
7847
7849
7851
7853
250 u' u` u? u~ u.
249
7911
361
7909
7913 u*' u*` u*? u*~ u*.
7915
7917
7919
7921
233 e' e` e? e~ e.
232
7867
7869
7865
7871 e^' e^` e^? e^~ e^.
7873
7875
7877
7879
243 o' o` o? o~ o.
242
7887
245
7885
7889 o^' o^` o^? o^~ o^.
7891
7893
7895
7897
7899 o*' o*` o*? o*~ o*.
7901
7903
7905
7907
237 i' i` i? i~ i.
236
7881
297
7883
253 y' y` ...
7923
7927
7929
7925
259 a( a^ u* e^ o* o^
226
432
234
417
244
273 d193 A' A` A? A~ A.
192
7842
195
7840
7854 A(' A(` A(? A(~ A(.
7856
7858
7860
7862
7844 A^' A^` A^? A^~ A^.

7846
7848
7850
7852
218 U' U` U? U~ U.
217
7910
360
7908
7912 U*' U*` U*? U*~ U*.
7914
7916
7918
7920
201 E' E` E? E~ E.
200
7866
7868
7864
7870 E^ E^ E^ E^ E^
7872
7874
7876
7878
211 O' O` O? O~ O.
210
7886
213
7884
7888 O^' O^` O^? O^~ O^.
7890
7892
7894
7896
7898 O*' O*` O*? O*~ O*.
7900
7902
7904
7906
205 I' I` I? I~ I.
204
7880
296
7882
221 Y' Y` Y? Y~ Y.
7922
7926
7928
7924
258 A( A^ U* E^ O* O^
195
431
202
416
212
208 D-

Ngn ng lp trnh v cc tin ch cn chun b


V t in ca chng ta l t in a ngn ng th nn vic cn thit l phi h tr unicode , ngoi ra nn h tr sp
xp a ngn ng (khng nht thit phi c).
Mnh chn java (cc bn spdict trc 6.0 th dng c#) lp trnh , c# , vb.net cng c , 2 ci ny tho mn c hai
tnh nng trn, cc ngn ng dng framework hnh nh u h tr . Cn khng cc bn c th dng vb (ly control
unicode ca bn caulacbovb.net) hoc delphi , vc++ cng c, tu , ci phn sp xp a ngn ng khng nht thit
phi c v n ch lm thay i phn hin th danh sch mt cht , khng ng k , ci ny ch mang tnh cht thi

quen ngi dng thi.


Ngoi ra , lm vic vi file dict.tab cc bn cn cn mt cng c c kh nng hin th file text hng chc mb thm
ch hng trm mb vi tc nhanh , vic ny notepad thm ch word cng khng lm c. Phi dng 1 s cng c
nh notepad2 , notepad++ , EmEditor , EditPlus ...
notepad++ nhiu tnh nng hn, nhng hin th unicode mt s k t thnh vung, my ci kia khng free v phi
ci v vy mnh quyt nh chn notepad2
Nh vy l xong, chng ta bt u nghin cu tip cc bi sau.

Bt u lm vic vi file d liu t in


c th code t in th cc bn phi thng tho c v ghi 2 loi file nh phn v vn bn .
Trong file vn bn th cn bit cch c d liu unicode (spditc lu dng UTF-8) v c tun t tng dng , tng k
t .
i vi file nh phn th cn ch :
+position (cho bit v tr con tr vn bn hin hnh)
+seek : nhy n cc v tr trong vn bn
+setlength : nh li kch thc ca file.
+c mng byte vi di cho sn, c 1 s kiu nh phn short (2 byte) , integer(4 byte)

Progressbar v thread
chy cc qu trnh convert v tm kim nng cao th thanh hin th tin trnh progressbar l rt cn thit , chy
n th cc bn search google hoc ln trang sun c v d v cch thc s dng control ny (swing control) , ngoi ra
th progressbar v chng trnh khng b khi chy qu trnh tm , convert , ta cn dng n thread.

Ly ng dn ca th mc hin hnh (ring cho java)


cc chng trnh c# , vb , vb.net ch cn application startup path l ra ng dn nhng java th khng th . Java
dng hm ny ly ng dn th mc hin ti :
System.getProperty("user.dir");
Nhng n ch ng vi mi trng window , vi mi trng linux th n lun ra th mc home nh my mnh l :
/home/tienlbhoc
c th ly c ng dn trn c win v linux th cc bn dng on m sau :
URL link = this.getClass().getProtectionDomain().getCodeSource().getLocation();
//ly ng dn class hin ti dng url
File i = new File(link.toURI());
duongDanChinh = i.getParent();//convert ra dng path bnh thng

Cc chc nng ca t in
So snh v sp xp theo ngn ng
Java h tr sp xp nhiu ngn ng trn th gii (trong c ting vit) , sp xp , so snh ta cn phi to 1 class

k tha class Comparator (class so snh trong java)


import java.text.Collator;
import java.util.Comparator;
import java.util.Locale;
/**
*
* @author tien
*/
public class LangComparator implements Comparator {
Collator collator;
Locale locale;
public LangComparator(String lang) {
locale = new Locale(lang);
collator = Collator.getInstance(locale);
}
public int compare(Object emp1, Object emp2) {
return collator.compare((String) emp1, (String) emp2);
}
public int SoSanh(String emp1, String emp2) {
if (emp1 == null) {
emp1 = "";
}
if (emp2 == null) {
emp2 = "";
}
return collator.compare(emp1, emp2);
}
public int compareThuong(String emp1, String emp2) {
return collator.compare(emp1.toLowerCase(locale), emp2.toLowerCase(locale));
}
}

ly danh sch cc bng m sp xp th cc bn tham kho on code sau:

String[] mangSort = Locale.getISOLanguages();


Locale l, l1 = new Locale("vi");
cbbMaSapXep.removeAllItems();
for (int i = 0; i < mangSort.length; i++) {
l = new Locale(mangSort[i]);
cbbMaSapXep.addItem(l.getLanguage() + " : " + l.getDisplayLanguage(l1));
}
cbbMaSapXep.setSelectedItem("en : Ting Anh");
//nu l1 l en th tn ca n s l en : English ch khng phi en : Ting Anh
Cc qu trnh sp xp vi mng , danh sch lin kt vi langComparator trn th trong java c class sn , m nu
b qu khng bit dng th t to code sp xp cng c (lc sp xp th so snh 2 string bng class langComparator
l c)

Pht m cho t in
Dng th vin pht m free java text to speech ca java http://freetts.sourceforge.net/ ti th vin mi nht + m
ngun hng dn s dng

Hin th d liu c nh dng mu sc + chuyn tip t


(trong phn ngha ca t c tra)
Java h tr control jtextpane , h tr m html (tuy nhin ch hn ch thi, html 3.2 th phi) , nhng cng tha
dng ri , s dng n th cn set 2 thuc tnh sau :
jTextPane1.setContentType("text/html");
jTextPane1.setEditable(false);
Sau , mun convert vn bn th ch vic settext cho n l c , v d hin th ch Tin in m
jTextPane1.setText(<b>Tin</b>);
jTextPane1.setCaretPosition(0); //cun v u trang sau khi hin th
nu khng tho html c th dng 1 trnh son web no nh microsoft web expression , dreamwave .
V ta cng c th dng hyper link to chuyn tip t cho t in :
u tin add s kin (even lng nghe qu trnh kch hyperlink
jTextPane1.addHyperlinkListener(new HyperlinkListener() {
public void hyperlinkUpdate(HyperlinkEvent e) {
if (e.getEventType() == HyperlinkEvent.EventType.ACTIVATED) {
TraAllVaHienThi(e.getDescription());// e.getDescription() l on text c ly v , chnh l t cn tra
}
}
});

By gi ch cn hin th html c ni dung dng nh sau , vi on code trn s bt c t xin cho


<a href="xin cho" style="text-decoration: none">xin cho</a>

Tra t qua clipboard


y l mt tin ch tra t trong ng dng khc , ch vic bi en text v g Ctrl+C , code cc k ngn th ny thi ,
i th l cho mt ci timer thi gian = 200 (nh hn cng c) , c sau khong thi gian kim tra xem clipboard
c thay i g khng, nu thay i th ly text cn khng th thi , y l code to timer v ly clipboard :
Timer t = new Timer(200, new ActionListener() {
public void actionPerformed(ActionEvent e) {
try {
if (jcbClipboard.isSelected() == true) {
tk = Toolkit.getDefaultToolkit().getSystemClipboard().getContents(null);
if (tk != null && tk.isDataFlavorSupported(DataFlavor.stringFlavor)) {
String tuMoi = (String) tk.getTransferData(DataFlavor.stringFlavor);
if (tuMoi.length() < 200 && tuMoi.equals(tuCu) == false) {
jTextField1.setText(tuMoi);
tuCu = tuMoi;
jTextPane1.setText(frmMain.TraAll(jTextField1.getText()));
jTextPane1.setCaretPosition(0);
}
}
}
} catch (Exception exception) {
}
}
});
t.start();

Thut ton lm t in
o tc thut ton
C nhiu cch khc nhau , c phn mm ring lm , nhng mnh ngh ci ny d lm nht:
long c = Calendar.getInstance().getTimeInMillis();
//on chng trnh cn kim tra tc
c = Calendar.getInstance().getTimeInMillis() - c;
//c s c gi tr mili giy ca thi gian on chng trnh

Mt s thut ton lm t in
He he, qu trnh tm thut ton lm t in ca mnh nan gii lm . u tin l my bi lm t in bng dos, dng
cy nh phn , mt thut ton t in kh nhanh v hu ht sinh vin my nm u cntt lm t in u i theo hng
ny (chc c code sn) , tuy nhin cy s b mt i xng trong qu trnh thm xo, code hi phc tp v d li. Theo
mnh ai m lm t in theo kiu cy th nn i theo cc loi cy cn bng nh cy AVL , cy c bc (cy en) ,
cc bn search trn http://vi.wikipedia.com s c nhng gii thiu rt c bn d hiu v cc loi cy ny. Mnh khng
tho v cy lm, nhng cy c mt u im hn nh dng ca mnh l tc thm xo nhanh lm , nhng cng c
nhc im l nhy n mt v tr bt k (vn sng cn trong k thut load danh sch) th cn phi ci tin
nhiu , hin mnh cha bit cch no nhy n v tr n nhanh nht c .
Tip theo l dng bng bm lm t in, ci ny tm trn net rt nhiu ngi bo lc vit dng ci ny (tin vt va
h) , c im ca ci ny l t t kho cn tm dng m bm bm n danh sch cha v tr ngha , kh nng ny
l nhy trc tip n ngha, v cng nhanh, nhng rt ph thuc vo hm bm v d liu nhp vo, nhanh hay chm
tu thuc vo hm bm , search wikipedia bit thm chi tit.
Cn mt cch na l xi c s d liu access , xml , sqlite (nghe ni thng sqlite ny nh , a nn tng v mnh
hn access) ... c t in echip lm bng access , nu dng n th rt d code , nhiu tnh nng , nhng nu c nh
dng t in ring th vn thch hn v mnh c th qun l c linh hot hn l
Thut ton tm kim t in ca mnh : tm kim nh phn , ng nhm ln vi cy nh phn nh . Khi cc bn c
ebook ny khng nht thit phi i theo hng ca mnh ging nh trc y , mi ngi ton bo mnh lm bng
bng bm .

Chun dict.org
y l bi vit ca anh Trn Bnh An, admin tudientiengviet.net chnh ni chng ta ly d liu v xi, cng chnh bi
vit ny m chun dict.org c bit n nhiu vit nam v cc t in multidictionary, powerclick , jtranslator
mi xut hin . T in ca mnh , chun SPDict cng l nng cp 3 ln lin tc ca nh dng ban u ny , nhng
by gi nh dng SPDict thy n ging mng con tr hn.
T xy dng mt ng dng t in n gin
Vic hc ngoi ng hin ny tr thnh nhu cu khng th thiu i vi rt nhiu ngi. V vt dng cn nht khi
hc ngoi ng chnh l quyn t in. Cng nh cc bn, khi hc ngoi ng ti cng phi dng t in. Tuy nhin,
chc hn cc bn cng nh ti s cm thy rt vt v khi phi tra t trn t in . V gii php ng gi l s dng
cc ng dng t in trn my vi tnh. Mc d hin nay cc ng dng t in ny c nhiu nhng vn l dn tin
hc, ti quyt nh t xy dng cho mnh mt ng dng t in ring. Ti s hng dn cc bn lm mt ng
dng t in cho mnh, v chc chn cc bn s tn hng c cm gic vui sng nh ti mi khi s dng t in
do chnh mnh lm ra.
I. C s d liu:
Phn quan trng nht i vi mt ng dng t in khng phi l kh nng hot ng ca ng dng , m li chnh
l c s d liu. Vic xy dng c s d liu cho t in phi m bo c kh nng truy cp nhanh cho ng dng
bi d liu ca t in thng kh ln, ln ti hng chc nghn t. Tht may mn, www.dict.org xy dng mt
format t in rt d s dng, Dng format ny c mt s c nhn s dng xy dng nhng b t in kh
ln. Dict format c m t nh sau: ton b c s d liu c cha trong 2 file, mt file cha ngha ca t v mt
file index. File index bao gm tn t, v tr ngha ca t bt u trong file cha ngha v di ca ngha. V tr bt
u v di ca ngha c m ho theo cch nh sau: S dng 64 ch ci:
ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/

ch ci A tng ng s 0, ch ci B tng ng s 1 v.v Gia t, v tr bt u v di ngha phn cch nhau


bi k t tab (ASCII 9). Mi dng trong file index cha d liu ca mt t. Cc dng phn cch nhau bi k t
xung dng (ASCII 10).
V d nh trong file index ca t in c Vit c mt dng nh sau:

Abdeckung

kbpP

D3

Nh vy ngha ca t Abdeckung trong file cha ngha s bt u ti offset kbpP (theo m 64 k t) v c di l


D3.
Vic chuyn t m c s 64 v c s 10 c thc hin nh sau:
i vi v tr bt u: kbpP. Ta c k m c s 64 = 36 c s 10, b = 27, p = 41, P = 15. Nh vy chuyn sang c
s 10, m kbpP c gi tr l: 36*64^3 + 27*64^2 + 41*64^1 + 15*64^0 = 9550415
i vi di ngha: D3. Ta c D = 3, 3 = 55. Nh vy chuyn sang c s 10, m D3 c s 64 c gi tr l: 247
File index c sp xp gim bt thi gian tm kim. Vic m ho theo c s 64 nh trn gip cho kch thc file
index s gim xung rt nhiu khi so vi khi khng m ha.
Cn cu trc ca file cha ngha gm cc phn nh sau:
@headword
* tu loai (noun, verb...)
- dinh nghia 1
= cau vi du cho dinh nghia 1 + nghia cua cau do
- dinh nghia 2
= cau vi du cho dinh nghia 2 + nghia cua cau do
* tu loai
- dinh nghia 3
Ngha ca mi t gm mt phn nh trn, cc ngha ca mi t ni tip nhau lin tc.

Nh vy, cc bn hiu cch thc v hon ton c th xy dng c cho mnh cc b t in ring ri. Tuy nhin,
cng vic nhp d liu li khng h n gin mt cht no. Nhng, li mt ln na, chng ta tht may mn v c
mt s bn b cng ra nhp sn cho chng ta mt s b t in thng dng ri. Cc bn c th tham kho thm ti:
http://www.ttdomain.net/ttdownload/, http://www.informatik.uni-leipzig.de/~duc/Dict/, http://huybien.vze.com..
Ngoi ra cn rt nhiu b t in chuyn dng khc na, cc bn c th tham kho thm cc a ch trn hoc ti
www.dict.org.
II. Xy dng chng trnh:
y ti xin trnh by cch s dng ngn ng Visual C++ 6.0 v th vin MFC. Cc bn hon ton c th d dng s
dng cc ngn ng khc lm. Trong khun kh mt bi bo, ti ch xin a ra nhng phn ch yu nht. Cc phn
nh thit k giao din, b tr giao din v.v ti xin dnh cho cc bn t sng to.
1. Cc thnh phn giao din c bn:
- Edit Box: dng nhp t;Gn bin: Variable name: m_word, Category: Value, Type: CString
- WebBrowser dng hin ngha ca t. Vic s dng WebBrowser ch nhm mc ch hin th ngha trc quan v
sinh ng hn bng cch x l chui (s c cp sau ny). Bn hon ton c th thay th bng mt iu khin
Edit. Bn c th thm iu khin ActiveX Web Browser vo ng dng ca mnh bng cch chn Project -> Add to
Project -> Components And Controls, chn trong th mc Registered ActiveX Controls iu khin Microsoft Web
Browser. Ci bin cho iu khin: Variable name : m_wordmean;
- Listbox, dng hin danh sch t. Ci bin cho iu khin: Variable name: m_wordlist; Category: Control;
- Listbox, dng lu tr d liu v t. Ci bin cho iu khin: Variable name: m_worddata, Category:Control;
Giao din chng trnh s nh sau:

2. M chng trnh:
- Np d liu vo cc list box: bn t phn m ny s kin WM_OnInitDialog() d liu c np ngay t lc
khi ng chng trnh. y, bn thay tn file index bng tn file tng ng vi t in bn s dng:
FILE *inFile;
inFile = fopen ("mydic.index","r");
if (inFile == NULL)
{
MessageBox ("Cannot open index file");
}
else
{
char * line;
char lineBuf[100];
line = (char *) lineBuf;
m_wordlist.ResetContent();
m_worddata.ResetContent();
CString word = "";
CString sWord = "";
CString sData = "";
while (!feof(inFile))
{
fgets(line,99,inFile);
if (strlen(line)>=2)
{
word = line;
int pos = word.Find("\t",0);
sWord = word.Left(pos);
sData = word.Mid(pos+1,word.GetLength()-pos-1);

if (sData.Find("\n",0) > 0){


sData = sData.Left(sData.GetLength()-1);
}
if (sWord.GetLength()>=1)
{
m_wordlist.AddString(sWord);
m_worddata.AddString(sData);
}
}
}
}
fclose(inFile);
- Hm chuyn t m c s 64 sang c s 10:
int GetDemicalValue (CString str)
{
CString base64 = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/";
int decValue = 0;
int len = str.GetLength();
for (int i = 0; i<len; i++)
{
int pos = base64.Find(str.GetAt(i),0);
decValue += (int)pow(64,len-i-1)*pos;
}
return decValue;
}
- Hm x l chui k t ngha. Nh cp trn, y ti s dng iu khin Web Browser hin th ngha cho
thm phn sinh ng. Hm ny c tc dng x l chui k t ngha bng cch thm cc tag HTML ngha c th
hin sinh ng hn v d nh: cc ngha con th in m, ch , cc v d th in nghing, ch xanh v.v.
CString ChangeStyle(CString wordmean)
{
CString meaning = wordmean;
meaning = meaning.Right(meaning.GetLength()-1);
int pos = meaning.Find("\n",1);
meaning.Insert(pos,"</b>");
meaning = "<b>" + meaning;
meaning.Replace("\n","<br>");
meaning.Replace("{","<font color=\"#FF0000\"><b>");
meaning.Replace("}","</b></font>");
meaning.Replace("[","<font color=\"#FF0000\"><b>");

meaning.Replace("]","</b></font>");
meaning.Replace('+',' ');
return meaning;
}
- Hm ly ngha ca t: hm ny c tc dng c t file cha ngha ly ngha ca t, sau x l chui ngha ri
ghi ra file temp.htm.Nu ly ngha thnh cng, hm tr v gi tr TRUE, nu ly ngha khng thnh cng, hm tr v
gi tr FALSE. Bn thay tn file mydict.dict bng tn ca file t in tng ng.
BOOL CXDictDlg::GetMeaning ()
{
CFile f;
CString meaning="";
if (f.Open("mydic.dict",CFile::modeRead) == FALSE)
{
meaning = "Can not open database file!";
}
else
{
CString sOffLen;
m_worddata.GetText(m_wordlist.GetCurSel(),sOffLen);
int pos = sOffLen.Find("\t",0);
CString sOff = sOffLen.Left(pos);
CString sLen = sOffLen.Right(sOffLen.GetLength()-pos-1);
int iOff = GetDemicalValue(sOff);
int iLength = GetDemicalValue(sLen);
int temp = f.Seek(iOff,CFile::begin);
char buff[64];
DWORD dwRead;
do
{
if (iLength>=64)
dwRead = f.Read(buff,64);
else
dwRead = f.Read(buff,iLength);
iLength -= dwRead;
CString stemp = buff;
stemp = stemp.Left(dwRead);
meaning += stemp;
}while (iLength>0);
f.Close();

}
meaning = ChangeStyle(meaning);
CString strHtml("");
strHtml += "<html>\n<head>\n";
strHtml += "<meta http-equiv=\"Content-Type\" content=\"text/html; charset=utf-8\">\n";
strHtml += "</head>\n<body>\n";

strHtml += meaning + "\n</body>\n</html>";


CFile f2;
if (f2.Open("temp.htm",CFile::modeCreate | CFile::modeWrite) == FALSE){
MessageBox("Cannot write meaning file!","Error!");
return 0;
}
f2.Write(strHtml,strHtml.GetLength());
f2.Close();
return 1;
}
- Hin th ngha ca t, bn ci on m ny vo s kin DoubleClick ca iu khin list box.
BOOL gm = GetMeaning();
if (gm)
{
m_wordlist.GetText(m_wordlist.GetCurSel(),m_word);
UpdateData(FALSE);
// Lay duong dan cua thu muc hien thoi
DWORD cchCurDir;
LPTSTR lpszCurDir;
TCHAR buffer[MAX_PATH];
lpszCurDir = buffer;
GetCurrentDirectory(cchCurDir, lpszCurDir);
CString str = lpszCurDir;
str = "file://"+str+"\\temp.htm";
//Hien thi nghia cua tu
m_wordmean.Navigate(str,NULL,NULL,NULL,NULL);
}
else
{

MessageBox("Can't get the meaning of the word");


}
- Tm kim t trong list box tng ng vi s thay i trong t nhp trong iu khin Edit. Bn ci t phn m ny
vo s kin EN_CHANGE ca iu khin Edit. Nh vy mi khi bn nh 1 k t vo iu khin Edit, chng trnh
s t ng tm mt t gn ging nht trong danh sch cc t.
UpdateData(TRUE);
m_wordlist.SelectString(-1,m_word);
Sau khi xy dng thnh cng, chng trnh khi chy s c giao din nh sau

Nh vy trn y, ti trnh by gip bn cch thc xy dng mt ng dng t in n gin. Vi nhng sng to
v cc kin thc c, bn hon ton c th b xung nhng kh nng mi cho ng dng t in ny n khng
thua km g so vi Click and See hay MTD. Ngoi ra, vi mt format t in kh n gin nh th ny, bn c th d
dng xy dng thm mt ng dng na to t in mi mt cch t ng. Ti cng xin gii thiu cho bn mt s
a ch cc phn mm c s dng cc b t in theo chun ca www.dict.org cc bn tham kho thm:
- Trang Web ca bn H Ngc c. a ch: http://www.informatik.uni-leipzig.de/~duc/Dict/ . Ti y bn c th tra
t in trc tuyn. Hoc download ng dng vit bng java v cc file t in v my chy trc tip trn my.
- PowerClick:. a ch:http://www.ttdomain.net/ttdownload. Phn mm c gii thiu trn PCWorldVN 7-2004, c
kh nng tra theo kiu Click And See trn mt s ng dng
- E-lexikon: a ch: http://www.edusoft.com.vn. Phn mm c gii thiu trn PCWorldVN 7-2004, chy theo m
hnh Client - Server.
- MultiDictionary: a ch: http://huybien.vze.com/. ng dng a t in, giao din p, c kh nng pht m cc
thc ting Anh, Nga, Php, c.
Chc cc bn thnh cng v hi lng vi ng dng t in ca mnh.
Trn Bnh An
T ng Ho 3 - K46 i Hc Bch Khoa H Ni.

C ch load danh sch nhanh


Theo mt cch thng thng nh hng dn ca chun dict, load danh sch cch thng thng ca chng ta l np
ton b danh sch t vo listbox, vi cch ny, cc thao tc vi danh sch t s rt n gin v listbox h tr ht,
nhng nu s lng t ca t in tng i nhiu mt cht th sao nh, dn chng nh:
T in echip, y l trang ch:
http://www.echip.com.vn/echiproot/html/tudienechip/quydinhsudung.html
cn y l t in c d liu ln nht ca h (t in hn vit hn 60.000 t):
http://www.echip.com.vn/echiproot/html/tudienechip/echip_han_viet.rar
Hy bt n ln v xem thi gian n khi ng , chng ta kh c tnh c thi gian load danh sch v thi gian
gp chung vo thi gian khi ng chng trnh, vy hy bm nt i t in v chn chnh t in hn vit ang
chy, t in echip s np list li t u, v chng ta c th c lng c thi gian load danh sch (nh my ca
mnh l 9 giy).
Tip n l multidictionary ca anh huy bin, ci ny c ci tin, tc load danh sch tng ln ng k (khong
gp i), nhng cng khng tho mn v tc cn t c, load ci t in c prodic 400000 t th cng mt c
pht nh chi, m load ton ln ram, my nng lm , l l do v sao mi khi chuyn tab th ci multidictionary li
hin ci hnh mt vi giy trc khi chy c.
Nu nhng soft t in min ph u nh th th kh lng chng ta c th thay th cc soft thng mi c.
Nhng mnh tm ra c cch c th load t rt nhanh i vi bt k d liu t in no d ln n u chng
na. Phng php l load vi t xem thi, in hnh nh stardict, load danh sch ngn t vi chc t.
Cch ny c v gii quyt c phn no nhng vn cha phi l hay, v sao lc vit, prodic , englist study vn c th
dng ci thanh cun ko t u n cui hng chc thm ch hng trm nghn t c m khi ng vn nhanh.
Mnh tm ra cch gii quyt vn ny, mnh vn load my chc t thi(va list box khng hin ra thanh
cun ca n). Thay vo mnh ly mt ci VScrollbar lp vo bn cnh.
Vy lp vo lm g ( d hi chng ) , khng phi u, nu th th mnh cn vit tut ny lm g.
vscrollbar c 3 thuc tnh bn cn quan tm l: Minimum, maximum v value
Minimum hy t l 1, maximum = tng s t ca t in , cn value , chnh l th hin v tr tng i ca ci
con trt trn thanh cun .
Khi ngi dng cun danh sch bng vscrollbar th value s thay i, v d value =20.000 (t th 20 nghn),ta s
nhy n v tr t th 20000 ny , thay my chc phn t listbox c thnh cc t t v tr 20.000 -> 20.020 chng hn.
Ngi dng s c cm gic y ht nh cch khi ng lu kia m tc khi ng ca chng ta th nhanh v cng.
Ch :
s dng c k thut ny , nh dng t in phi c kh nng truy xut nhanh n mt v tr bt k trong
t in
Khi load kiu ny nu bt s kin cun danh sch, c th t in khng load kp so vi ng tc cun ca
ngi dng , c nhiu cch x l , spdict dng 1 timer c 50 mili giy s kt v tr c c thay i so vi gi tr
ca vscrollbar khng (dng 1 bin lu v tr c) , lm vy th ti a 50 mili giy mi phi load danh sch,
danh sch s thi gian thc hin load , 50 mili giy l khong thi gian rt nh , ngi dng s khng th
phn bit c
khi resize form th phi tnh ton s t cn hin th trn listbox nh, cng thc y ny:
s phn t = ( cao listbox-4)/ cao mt phn t { -4 l tr i vin trn v di listbox }
Do khng phi l listbox tht, lm n ging , cc bn cn x l khi ngi dng n Home / end (nhy v u
, cui danh sch) page up , page down , cun chut gia , dng phm mi tn di chuyn trn di ... (tham
kho code spdict nh)

Khng phi ch mi mnh mnh m nhiu t in khc cng lm cch


tng t:
y l bi vit ca mnh bn vnoss.org sau khi dng th vdict ca h tnh n hin nay ,mnh thy c powerclick(load
va ) , vdict load 100 t v jtranslator load 50 t:
hix, tng th no, ci ny load c 100 t , chc dng ci vscrollbar thng ln ci scrollbar ca
listbox nn khi thay i kch thc form khng b git nh tui . Ni chung l cng t tng ging nhau.
Thc ra cha c c code, nhng c xem qua, nhng dng chut gia di chuyn thy c mt on

ri tt (m thy c 100 phn t). Thc ra cng bng th thut tui cng pht hin ra powerclick cng lm
cch tng t :
bt powerclick ln

ko danh sch xung di nh sau, thy c hin tng l, danh sch b ri

Khng phi u, bn cht ca n l th ny ny ny, n load nhanh l do cng dng scrollbar gi lp vo


v load vi mt phng php gn nh ging hon ton tui (t tng ln gp nhau).

Lc u tui tng mi mnh thng minh (^_^) , my soft thng mi lm c v n c tin, khng thm
chp. Ho ra v sau pht hin ra my soft ny th ho ra cng c ch, th powerclick v vdict , c

spdict u pht trin t chun dict (bi vit ca anh peacemoon trn qun tr mng).
@To b wasabi : http://forums.congdongcviet.com/showthread.php?t=3186
my bi vit cng ton thnh vin lo lng bn cviet nhn xt , csdl tuy mnh nhng cc soft
thng mi ch ci no dng u, n gin v n a nng nhng n khng th bng mt ci chuyn dng
c , vi li kh qun l bao qut ht cng nh b l thuc . Cng ging nh my tnh c th nghe nhc,
xem ti vi, nhng ngi ta vn sm ring l nhng th .

Tm kim nng cao cho t in


Ba phn, rng c cho ht :
1 .Lm ci t in cho ra tr th cng phi c vi tnh nng ngon ngon mt cht , gi n tm kim nng cao , nh
check spelling ca babylon hay fuzzy(truy vn m) ca stardict , nhng chng ta s lm nhng k thut tht n gin
nhng hiu qu phi cao , sau nhiu ngy tm kim cui cng mnh tm thy ci ny JTranstator . Lc u mnh
sng st , sao n lm siu th m tc th nhanh kinh (gn nh ngay lp tc - n babylon cng phi mt my giy)

Ho ra k thut l th ny, n n gin load 7 t trc danh sch v 7 t sau danh sch mt t v tr gn t g nht
(nhn sang danh sch bn tri th s thy l since.
Th thut ch c vy, rt nhanh phi khng , nhng cng rt c ch , v d cc th qu kh thm ed hay ing hay ...
th s trng hp nm trong danh sch gn ng rt nhiu . M nguyn l th qu n gin. Tuy n gin nhng
hiu qu , khng phi ch l gi to cho ging babylon u.
Nhng spdict l t in a ngn ng , tra nhiu t in , nu b nguyn th n ch tm thng minh c cho 1 t in
(nh trn l anh vit) , nh cc t t in vit anh li gn ng hn th sao , vy chng ta s p dng th thut
trn cho tt c cc t in , ri trn chng li nh sau :
1.
2.
3.
4.
5.

Cho chng vo 1 mng c kch thc 15 * s t in , bng cch ly mi t in 15 t gn ng ri tng vo


Cc t sp xp theo cc m sp xp khc nhau, phi soft li theo m en-US (m chung nht)
Lc cc phn t trng nhau (v d t in vit anh v vit vit l lm t trng lm)
tm v tr ca t cn tra trong danh sch t lc
Hin th 7 t trc, t gn nht v 7 t sau ging nh trn

Cch lm ny cn p dng cho vic load danh sch chung ging cc t in stardict hay babylon (dng cho spdict
small t bn 4 tr ln)

2 . Tm kim vi wildcard v regular expression:


Nu ch c nh trn th t qu , khng th gi l advance search c , spdict cn h tr wildcard v regular
expression . Java ch c regular expression thi , tuy nhin regular expression l m rng ca wild card (nhng
wildcard n gin vi ngi dng hn nn vn cn phi s dng ch khng b c) , y l hm convert chui
regex sang wildCard (dng regex m tr v kt qu nh wildCard) :
private static String replaceWildcards(String wild) {
StringBuffer buffer = new StringBuffer();
char[] chars = wild.toCharArray();
for (int i = 0; i < chars.length; ++i) {
if (chars[i] == '*') {
buffer.append(".*");
} else if (chars[i] == '?') {
buffer.append(".");
} else if ("+()^$.{}[]|\\".indexOf(chars[i]) != -1) {
buffer.append('\\').append(chars[i]);
} else {
buffer.append(chars[i]);
}
}
return buffer.toString();
}
Cch thc dng regex th t search

3. Tim t gn ung:
Ci ny mnh t ch , t tng nh sau:
+u tin kim tra di string so snh :t hay hn 30% th loi (c th tng gim ci ny tng hoc
gim s t tm c)
+Tip l so snh tng k t ca 2 string nu khng bng nhau th li cng thm 1, so snh cc t ln cn
tip theo ca c 2 string ,
Trong khong sai s nu c , th chnh li v tr i,j l ch s ca 2 string , ci ny s kim tra cc li tha
hay thiu t , c k t k tip .
+Cui cng , khi 1 trong 2 string i ht
th cn mu ui ta lm : loi += s.Length - i + s1.Length - j;
tc l nu 1 string cn tha th cho mu l li cng vo
nu s li <=30% th l t , khng th khng t , n gin vy thi
code ny :
public class ApproximatString {
String s;
int i, j, k, loi, saiSo;
public ApproximatString(String nhap, float phantram) {
s = nhap;

saiSo = (int) Math.round(s.length() * phantram);


}
public boolean SoSanh(String s1) {
if (s1.length() < (s.length() - saiSo) || s1.length() > (s.length() + saiSo)) {
return false;
}
i = j = loi = 0;
while (i < s.length() && j < s1.length()) {
if (s.charAt(i) != s1.charAt(j)) {
loi++;
for (k = 1; k <= saiSo; k++) {
if ((i + k < s.length()) && s.charAt(i + k) == s1.charAt(j)) {
i += k;
break;
} else if ((j + k < s1.length()) && s.charAt(i) == s1.charAt(j + k)) {
j += k;
break;
}
}
}
i++;
j++;
}
loi += s.length() - i + s1.length() - j;
if (loi <= saiSo) {
return true;
} else {
return false;
}
}
}
Cn y l kt qu :

nh dng t in SPDict
Mnh thay i nh dng vi ln ri , bn 6.0 li va thay i ln na cho n tt hn , gi tm c mt nh
dng t in gii quyt c kh tt cc vn , m nh dng cng rt d (pro c 10 pht hiu ht) . y l v d 1
file t in abc gm 2 t :a->aa v b->bb c m bng notepad 2, cc bn c th d dnh nhn ra v tr ca d liu 2
t ny trn hn

C th phn nh dng spdict ra lm 4 phn (3 phn ng khung v 1 phn khng ng khung c v xung dng
nh hnh v :

* Phn th nht gm chui 2SPDict u file ( nh du file ny l ca t in spdict to ra m khng phi l 1


file no i ui thnh.
4 byte tip theo m bn thy l null null null ! y , n lu v tr ca phn th 3 (phn khng nh du )
4byte tip lu s d liu tha pht sinh trong qu trnh lm t in (hin mi to nn n = 0 , 4 ch null)
* Phn th 2 cha d liu ca t in :
Bn s thy null sqh ri mi n a (t u tin) , y l 2 byte dng short lu di ca t (c gi tr 1)
sau l ngha ca t a , lu di bng 4byte (null null null stx) ri n aa . Tip theo l t b bb , bn s thy cc
byte lu di string c di ging ht t u tin a aa.
* Phn th 3 :n c gi tr tng t nh 1 ni dung t phn 2 (2 byte lu di) , phn cn li l ni dung l mt
chui gm nhiu chui con phn cch nhau bi byte c gi tr 0 (null) gm :

Tn t in (abc)

m sp xp(en)

ging pht m(kevin)

font , kch thc t v ngha(tahoma , 12, tahoma ,12)

tc gi (tienlbhoc)

thng tin thm (demo)

* Phn th 4 : gm 8 byte , l 2 s integer (tng ng vi 2 t) , mi s lu v tr ca mt t (a v b trong phn th 2


ca t in) . C th ni y l danh sch v tr hay gi l con tr vn bn ( c sp xp) , thut ton tm kim s
c din ra y
=====================================================================
Cch c file t in :
u tin ta c 7 k t u xem c phi l 2SPDict khng , c 4 byte tip c s interger v tr ca thng tin t
in v 4 byte na ly s d liu tha . Sau nhy n v tr ca phn th 3(thng tin) , c 2 byte ly di
mng byte cha thng tin t , c mng byte cha thng tin t convert ra chui k t , split cc string thnh phn
phn cch bi k t '\0' ta c cc thng tin ca t in .
c ht thng tin phn th 3 th con tr file s v tr ca phn th 4 , lu vo v ta bt u tm kim.
n gin vy thi , th th hay ch no. u tin ly c tng s t ta ly kch thc file - v tr phn th 4 ri
tt c chia cho 4 (v mi v tr t dng 4 byte phn th 4 lu nn ly kch thc n chia cho 4 s ra s byte).
nhy n v tr th n bt k th ta ch vic nhy n v tr mng +4*(n-1) .
V y l mng sp xp (sp xp trong qu trnh convert d liu v thm xa), vic tm kim nh phn c l khng
phi ni nhiu chc ai cng bit ch c iu mng bnh thng th ta g ch s n ra bin c ni dung ngay v d a[3] ,
nhng v y l mng ta t nh ngha nn phi nhy n phn t cn ly ni dung , c 4 byte ly v tr ni dung ,
nhy n t ri mi ly c ngha .
thm xo t, v d thm t v tr th n ( cho danh sch vn theo th t abc) , ta tng ton b mng con tr (gi

th ny cho quen thuc) ln ram , ghi ni dung t mi vo cui phn d liu t , lu phn mng t u -> n-1 vo
file, thm v tr phn t mi thm , lu nt phn cn li vo , setlength li cho file . Phn ny ging hi chui, nu cc
bn khng hiu th c c code.
Xo t, chng ta chp phn t v ngha thnh cc byte =0 ht ( khi tm kim nng cao , phn ny s khng c
trong kt qu, ch xo phn v tr ca t cn xo trong mng con tr . Ging access vy, d liu ny s l d liu
tha , v ch c y i khi dng lnh "compact and repair database". Chng ta phi to thm 1 cng c c chc
nng tng t cho t in.
Sa v i tn t cc bn t code theo phng hng trn .
i vi vic cp nht thng tin t in (font , size , pht m...) th v phn thng tin khng t u t in nn khi
cp nht ch vic lu phn danh sch v tr ln ram , ghi thng tin mi ln thng tin c ri ghi li danh sch v tr .
cc phin bn trc bn 6 th d liu c ghi u file , v d t in 50 mb phi dch c 50 mb v pha sau ri
ghi li thng tin t , bn 6.1 tr ln th ch phi dch phn danh sch v tr nn nhanh hn rt nhiu
Vy chc s c bn hi, lm vy phi load ton b mng hng trm nghn t (nu t in ln ) ln th cht . Mnh
p rng hy lm mt con tnh 4byte * 100.000 cha c 400KB , hy tng tng copy 400KB nhanh th no th
tc thm xo t cng nhanh nh vy.
Cn lu phn mng con tr phi lun cui file kch thc ca n lun lun l min (4byte cho mi phn t) ,
ch nu mng ny u file, load ln ram , phi load lun c phn ni dung ng s (c khi hng trm mb) th cht
mt .
Ni chung so vi cc loi cy th ci ny thm xo chm hn nhng i vi t in , nhp tun t ch khng thay i
lin tc th tc ny hon ton tho mn. Ngi dng s khng th nhn thy s khc bit nu thi gian nhp t l
0.001 giy vi 0.1 giy . nh dng ny rt thch hp khi thm xo vi t in. Mi ngi cng c th ng dng ci
lm cc csdl loi nh, c ng , khng i hi qu nhiu.
Ch :
i vi tra t nng cao , khng nn p dng nhy n phn t th n nh trn cho ton b t in (phi mt cng nhy
v tr lin tc trong file gy tc chm) , cn rng t in gm 4 phn , phn 2 l cc t v ngha ni tip nhau ,
ta ch vic c cc t ni tip nhau cho n ht phn ny th thi
Do lu bng 4byte cho mi phn t , nn gii hn d liu t in l 2Gb , v 4byte ch cha c v tr n th , ni
chung l tha thi, t in thng khng bao gi ht (t in thng ch c vi mb hay vi chc mb thi) , nhng c
ni th cho y
c v thao tc nh dng ny bn ch yu cn ch 2 on code sau :
Nhy n v tr bt k trong file bng hm seek
RandomAccessFile raf = new RandomAccessFile(s, "rw");
raf.seek(1000); //nhy n v tr 1000 byte
Ly kch thc s byte ca 1 string , convert t byte sang string
import java.io.*;
class CLab
{
public static void Main()
{
String s = "tin";
byte[] b=s.getBytes("UTF-8"); // chuyn sang mng byte
tu = new String(bs, 0, doDaiString, "UTF-8"); // chuyn mng byte sang string
}
}

Nhng gii php cha hon thnh


Nhng tnh nng cha hon thnh
Mnh trnh vn cn km, d c gng tm kim nhiu bin php khc phc nhng vn cn my tnh nng cha
hon thin c:
+Click and see : nghe ni y l mt k thut cc kh , cc soft trong nc hin nay cha ci no c th bt text gi
l tt c , nht l pdf , hin nay cc th vin lp trnh sn trn mng ch c ci wordcaptureX ngoi tr khng bt
c trong openoffice ra th n bt text gn nh ngang nga babylon , cc bn c th search trn mng , c th down
demo ca n v dng th, ai nh mua (cha c crack u) th gi 900USD .Hoc mt ci khc min ph , cng n
l ci bt text ca stardict , nhng mnh khng bit dch m ngun n th no ,nu hin ti SPDict ch dng tra t qua
clipboard, ch bt c text c kh nng bi en, menu , lable th chu
+Tm kim nng cao(Kim tra chnh t tm gn ng) : d sao cng ch l code mnh t ch , khng th pro c
bng cc b chuyn nghip c, mnh cng ngh c my ngy thi , mnh ngh c th pht trin hn.
+Vn bn multimedia : webbrowser tng lc phi lc i nhng tnh nng c bn film, nhc, flash, scrip... ch
dng c nh thi .
+Nu ci tin c thm t in khng b pht sinh d liu tha th tt hn

Lin kt online offline


Nh mnh ni bi gii thiu , t in online c nhng nhc im:
+Khng phi ai cng ln net tra v khng phi ai cng c net , cha k nhiu ngi xt tin net na
+i khi ngi dng ch mun t to ring cho mnh ghi chp vn g hay chuyn ngnh g cho ring
mnh hay chia s ni b . T in online hin gi khng th lm c iu
+V l web nn v d cc tnh nng kiu nh click and see , load y danh sch ... khng th c
+Web c th dng hot ng bt c lc no
+D liu thm vo trang web s khng ly li c, nh vy , phi gn b vi n mi mi , ging nh google khng
th chat vi yahoo , khi ngi dng khng thch dng web ny , th khi sang web khc, c th s phi nhp li cc t
trc kia .
Nhng n cng c nhng u im m t in offline khng c :
+Khng cn ci t
+Cp nht thng xuyn
+Nu my b s c th vn c d liu lu trn server
+C th hi p , tranh lun v mt t no cn thc mc
....
Nu chng ta kt hp c c 2 loi th ng l rt tt :
V d t cp nht vo t in , sau khi tch n mt s lng no phn mm s ngh gi d liu ln web,
ban bin tp tng hp , cp nht . Khi my b li , ta c th down li d liu up + vi d liu ca nhiu ngi khc
cp nht th qu l qu hay , ta b cng nhp 10 t , c th s thu li c nghn t (nu c 100 ngi cng lm nh
bn) , cha k d liu cn c bin son cho hp l , sa li ... . qu l mt m hnh pht trin rt tt. Mnh
thy cc trnh antivirus hay window b li u yu cu gi online x l nn vic p dng cho t in cng chng
c g l khng ng . Hix ng tic , mnh khng c cht kin thc no v web nn khng th gip g c trong vn
ny , nhng mnh vit bi ny ngi pht trin k tha d n c th lm tip nhng th mnh cha lm .

You might also like