Professional Documents
Culture Documents
Croatian
Croatian
THE CROATIAN
LANGUAGE
IN THE
DIGITAL AGE
HRVATSKI
JEZIK U
DIGITALNOM
DOBU
Marko Tadi
Dunja Brozovi-Ronevi
Amir Kapetanovi
THE CROATIAN
LANGUAGE
IN THE
DIGITAL AGE
HRVATSKI
JEZIK U
DIGITALNOM
DOBU
Marko Tadi [1]
Dunja Brozovi-Ronevi
Amir Kapetanovi [2]
[2]
PREDGOVOR PREFACE
Ova bijela knjiga dio je niza koji promie jezine teh-
smjeru.
2020.
III
Izradba ove bijele knjige poduprta je od strane Sedmoga okvirnoga programa i ICT programa za podrku politici Europske
komisije u skladu s ugovorima T4ME (opi ugovor 249 119),
CESAR (opi ugovor 271 022), METANET4U (opi ugovor
270 893) i META-NORD (opi ugovor 270 899).
IV
SADRAJ CONTENTS
HRVATSKI JEZIK U DIGITALNOM DOBU
1 Saetak
2.1
2.2
2.3
2.4
2.5
2.6
3.1
Ope injenice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.2
Hrvatska narjeja . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3.3
3.4
3.5
3.6
3.7
Jezik u obrazovanju . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.8
Meunarodni odnosi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.9
Hrvatski na Internetu . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
23
5 O META-NET-u
42
43
45
2.1
2.2
2.3
2.4
2.5
2.6
50
3.1
General Facts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
3.2
Croatian dialects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
3.3
3.4
3.5
3.6
3.7
Language in education . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
3.8
International aspects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
3.9
65
5 About META-NET
83
A Bibliograja -- References
85
89
93
1
SAETAK
Informacijske tehnologije mijenjaju na svakodnevni i-
gramskoj podrci.
U dananjem informacijski usmjerenom drutvu, mogunost dostupa obavijestima na vlastitome jeziku sma-
budunosti.
//www.meta-net.eu.
2
JEZICI U OPASNOSTI: IZAZOV ZA JEZINE
TEHNOLOGIJE
U ovome trenutku svjedoimo digitalnoj revoluciji koja
cesa:
raunalna priprema za tisak zamijenila je tipkanje i
graki slog;
Microso PowerPoint zamijenio je projiciranje s
prozirnica;
Nakon Gutenbergova izuma pravi su proboji u komunikaciji i razmjeni znanja postignuti pothvatima kao to je
Lutherov prijevod Biblije na narodni jezik (ili u hrvatskome sluaju, glagoljiki prvotisak Misala iz 1483. kao
prve tiskanje knjige na hrvatskome jeziku). U nadolazeim stoljeima razvijeni su razni kulturni postupci koji
su omoguili obradbu jezika i razmjenu znanja:
ski je moda bio lingua anca www-a jer je veina sadraja na www-u tada bila na engleskome, meutim, da-
ranje?
bom za uenjem novih jezika, a kod razvijatelja aplikacija potrebom za stvaranjem novih tehnolokih aplikacija, ne bi li se osiguralo uzajamno razumijevanje i omo-
prevoenja.
Jezine tehnologije, o kojima se podrobnije govori u
ovoj bijeloj knjizi, ine sr buduih inovativnih aplikacija. Jezine su tehnologije uobiajena potporna tehnologija unutar vee aplikacije kao to su navigacijski sustav
nekim procjenama europsko trite prevoenja, tumaenja, lokalizacije programske podrke i prevoenja www-
skih potreba.
korisniko iskustvo.
jetko uz ogranienu kakvou. Pa ipak i za takve aplikacije postoje ogromne trine mogunosti u obrazovanju
i zabavnoj industriji s ukljuivanjem jezinih tehnologija u raunalne igre, obrazovne sustave, knjinice, simulacijske sustave i sustave za uvjebavanje. Mobilne
obavijesne usluge, strojno potpomognuti programi ue-
rije, bore se s mnogim potekoama kad su nam potrebni visokokvalitetni i potpuni prijevodi. Zahvaljujui sloenosti prirodnih jezika, njihovo modeliranje u
obliku raunalnih programa i provjera u stvarnome i-
votu, dugotrajan je i skup posao koji zahtijeva stalnu nancijsku potporu. Europa mora zadrati svoju vodeu
nome jeziku za treniranje npr. pravopisnoga provjernika, usporedni tekstovi na dva (ili vie) jezika potrebni
su za treniranje strojnoprevoditeljskih sustava. Algorit-
mima strojnoga uenja prepoznaju se obrasci kako se pojedine rijei, kratke fraze ili itave reenice prevode s jednoga jezika na drugi.
Meutim, za takve statistike pristupe potrebni su mili-
oscilira.
to strunjaci imaju mogunost istananijega upravljanja obradbom jezika. Zbog toga je mogue sustavno ispravljati pogrjeke u programskoj podrci i pruati ko-
brojem govornika.
boratorijima.
3
HRVATSKI JEZIK U EUROPSKOME
INFORMACIJSKOME DRUTVU
3.1 OPE INJENICE
je slubeni jezik vie od 4 milijuna stanovnika Republike Hrvatske, a uz bonjaki i srpski takoer je jedan
od tri slubena jezika Bosne i Hercegovine gdje ga govori oko 700.000 govornika. Takoer, hrvatskim jezikom govore i mnogi pripadnici nacionalnih manjina u
Hrvatskoj kao i autohtone hrvatske etnike i jezine manjine u Srbiji, Crnoj Gori, Sloveniji, Madarskoj, Aus-
10
11
3.3 STANDARDIZACIJA
HRVATSKOGA JEZIKA
skoj osnovici.
kim narjejem.
narjeju zapoela vrlo rano, narodno se jezino jedinstvo postiglo tek u vrijeme Ilirskoga narodnoga prepo-
rjeje.
12
13
noloki (nom. jd. sladak : gen. jd. slatkoga, nom. jd. dio :
gen. jd. dijela) i morfonoloki uvjetovane promjene
(nom. jd. majka : dat. jd. majci, nom. jd. junak :
meu ili d.
3.4.2 Morfologija
(djelo, odijelo).
komparaciji.
14
tur I., futur II.). Glagoli biti (esse) and htjeti (volere) u
su na slici 4.
tvorba.
leksikoga inventara. Njemaki i francuski nekad su takoer utjecali na hrvatski vokabular, a od druge polovice
15
imenina sklonidba
nom. mn.
opis, opisa
sunce, sunca
ena, ene
no, noi
opisi
sunca
ene
noi
pade
muki rod
srednji rod
enski rod
jednina
N
G
D
A
V
L
I
-i
-og(a) -eg(a)
-om(u/e) -em(u/e)
=N/=G
=N
-om(u/e) -em(u/e)
-im
-o -e
-og(a) -eg(a)
-om(u/e) -em(u/e)
=N
=N
-om(u/e) -em(u/e)
-im
-a
-e
-oj
-u
=N
=D
-om
mnoina
N
G
D
A
V
L
I
-i
-ih
-im(a)
-e
=N
=D
=D
-a
-ih
-im(a)
=N
=N
=D
=D
-e
-ih
-im(a)
=N
=N
=D
=D
16
3.4.5 Pravopis
nim hrvatskim (ponajvie viejezinim) rjenicima sastavljenim od 16. do 20. stoljea. U 19. stoljeu njemaki
i eki imali su iznimno jak utjecaj na hrvatsko nazivlje,
a engleski je danas preuzeo tu ulogu.
3.4.4 Sintaksa
Premda je povijest hrvatske kulture obiljeena uporabom triju pisama (glagoljica, irilica, latinica), latinica
u Hrvata prevladava od 16. stoljea. Hrvatska latinina
abeceda nije bila u cijelosti standardizirana do 1835. kad
joj Ljudevit Gaj daje dananji oblik. Sastoji se od 30
Johna, a ne John-a).
17
velika slova
A
L
B
LJ
C
M
NJ
D
O
D
P
E
S
G
T
H
U
I
V
J
Z
e
s
g
t
h
u
i
v
j
z
mala slova
a
l
b
lj
c
m
nj
d
o
d
p
3.4.6 Onomastika
Hrvatska imena predstavljaju vane spomenike jezinoga, kulturnoga i drutvenoga nasljea ljudi koji su ih
napravili. Stoga i osobna imena (antroponimi) i imena
mjesta (toponimi) ine vaan dio hrvatske jezine kulture. Ozemlje dananje Hrvatske, u grubo ogranieno
rijekom Dravom na sjeveru, rijekom Dunavom na istoku i Jadranskim morem na jugu, vrlo se ilustrativno
reektira u bogatom raslojavanu zemljopisnih imena.
rena nakon tridentinskoga koncila u 16. stoljeu, u vrijeme kad je velik dio hrvatskih zemalja bio pod turskom
vlau.
vatskome jeziku.
18
tih polaznih uvjeta (nepostojanje osnovnoga, zajednikoga standarda) i razliitih tradicija u jezinome kultiviranju i standardizaciji, zbog razjedinjenja neotokav-
publici Hrvatskoj;
skoga i srpskoga. Unato svim pokuajima da se slubeno prizna postojanje hrvatskoga kao zasebnoga jezika,
nametanje zajednikoga nazivlja, vokabulara, pravopisa
i drugih jezinih normi u Jugoslaviji, dovelo je jedino
do slubenoga prihvaanja jednoga zajednikoga standardnoga jezika (srpsko-hrvatskoga) s dvije varijante (istonom ili srpskom i zapadnom ili hrvatskom). Reakcija
iz Hrvatske dola je ubrzo u obliku Deklaracije o nazivu
i poloaju hrvatskog knjevnog jezika koja se otvoreno
tikim vremenima.
koslovne savjete kad je to potrebno. Institut za hrvatski jezik i jezikoslovlje [9] sredinja je hrvatska ustanova
za istraivanje hrvatskoga jezika, a jedan je od njegovih odjela (Odjel za hrvatski standardni jezik) posveen
19
nolokoga rada.
togodinjaci zauzimaju 26. mjesto na ljestvici svih zemalja svijeta i smjeteni su ispred uenika iz deset zemalja-
lanica EU i SAD.
U osnovnim i srednjim kolama uz hrvatski obvezatno je
uenje barem jednoga stranoga jezika od etvrtoga raz-
sku manjinu.
20
Najposjeenija hrvatska www-sjedita su: net.hr (portal za vijesti, sport, zabavu i zbivanja), index.hr (opi
ljem svijeta.
broju lanaka.
21
uporaba raunala
pristup Internetu
www-sjedite
uporaba nancijskih i bankovnih usluga
uporaba usluga e-uprave
2008
2009
2010
98
97
64
84
56
98
95
57
84
61
97
95
61
85
63
6: ICT u poduzeima
osobno raunalo
pristup Internetu
mobilni telefon
2008
2009
2010
53
45
81
55
50
82
60
57
7: ICT u kuanstvima
22
4
JEZINOTEHNOLOKA
PODRKA ZA HRVATSKI
Jezine se tehnologije koriste za razvoj sustava namije-
pretraga obavijesti
crpljenje obavijesti
saimanje teksta
odgovaranje na pitanja
prepoznavanje govora
generiranje govora
Jezine su tehnologije ve vrsto uspostavljeno zasebno
istraivako podruje s irokim rasponom uvodne literature. Zainteresirani itatelj se upuuje na sljedee referencije: [15, 16, 17, 18, 19].
Prije nego to krenemo prikazivati navedena podruja istraivanja, na kratko bismo opisali arhitekturu uobiajenoga jezinotehnolokoga sustava.
4.1 ARHITEKTURE
JEZINOTEHNOLOKIH
APLIKACIJA
ene, slika 9 pokazuje znatno pojednostavnjenu arhitekturu kakva se moe nai u sustavima za obradbu teksta.
provjeru pravopisa
noga teksta:
23
Govorne tehnologije
Multimedijske i
multimodalne
tehnologije
Jezine
tehnologije
Tehnologije znanja
Tekstne tehnologije
8: Jezine tehnologije
takoer nai na slici 15 na kraju poglavlja. Jezine tehnologije za hrvatski jezik usporeene su i s jezinim tehno-
knjiga.
Ulazni tekst
Predobradba
Izlaz
Gramatika analiza
Semantika analiza
Moduli za posebne
zadatke
24
mnogo ei niz dviju rijei nego jaz generacija. Statistiki se jezini model moe proizvesti automatski iz velike koliine (ispravnih) jezinih podataka (tj. iz korpusa). Do sada su se ova dva pristupa mahom razvila i
provjerila na jezinim podatcima za engleski jezik. Meutim, na hrvatski jezik takva rjeenja nisu izravno pri-
H. Zara (1992):
Za razrjeivanje ovakvih pogrjeaka u mnogim je sluajevima potrebna ralamba konteksta, npr. treba li u hrvatskome imenicu pisati velikim (ensko osobno ime) ili
malim (opa imenica) poetnim slovom kao u sluaju:
Slatka je ova vinja. [is cherry is sweet.]
Slatka je ova Vinja. [is Cherry is sweet.]
Takva ralamba zahtijeva bilo oblikovanje jezino posebnih gramatikih pravila, izradba kojih ukljuuje vi-
25
Ulazni tekst
Pravopisna provjera
Gramatika provjera
Prijedlozi ispravaka
preporuke.
liinama podataka i uinkovitim tehnikama njihova indeksiranja, preteito statistiki utemeljeni pristup moe
dovesti do zadovoljavajuih rezultata, no njihova kakvoa takoer ovisi i o samoj strukturi prirodnoga jezika
na kojem se pretrauje.
nice (iz-/na-/pre-/pro-/u-)guglati/(iz-/na-/pre-/pro-/u-
26
Mrene stranice
Predobradba
Semantika obradba
Indeksiranje
Pronalaenje
i
relevantnost
Predobradba
Analiza upita
Korisniki upit
Rezultati pretrage
urno priskrbiti i tako indeksirane dokumente. Za zadovoljavajui odgovor na ovo pitanje potrebna je pri-
mjena sintaktike ralambe (parsanja) kako bi se ralanila sintaktika struktura reenice i odredilo kako korisnik trai tvrtke koje je neka tvrtka preuzela, a ne tvrtke
27
sustava.
Godine 2009., kao rezultat zajenikoga amanskohrvatskoga projekta CADIAL [26], vladina agencija
HIDRA omoguila je javni www-pristup svim hrvatskim zakonskim i podzakonskim dokumentima putem
ektivno osjetljive trailice [27]. Ta trailica takoe
28
Govorni izlaz
Govorni ulaz
Govorna sinteza
Obradba signala
Fonetski odabir i
planiranje intonacije
Razumijevanje
prirodnoga jezika i
dijalog
Prepoznavanje
Premda je baza hrvatskih difona razvijena jo 1998. unutar projekta MBROLA [28] u kojem je sudjelovao Od-
sjek za fonetiku Filozofskoga fakulteta Sveuilita u Zagrebu, do danas jo uvijek ne postoji niti jedan komercijalni sustav za hrvatski ATS ili TTS razvijen u Hrvatskoj.
Istraivanja na ovome podruju provode se i na Fakul-
29
nologije za govornu interakciju. S jedne e strane i dugorono gledano potrebe za klasinim telefonskim VUI
zacijelo opadati, a s druge e strane uporaba govora kao
izvora ulaznih podataka za pametne telefone zacijelo
biti u porastu. Taj smjer razvoja takoer se moe prepoz-
telefona.
enje moglo biti lake izvedivo. Nerijetko sustavi temeljeni na pravilima (ili na jezinome znanju) ralanjuju
gove kakvoe.
skup posao.
S krajem 1980-ih, kad je porasla snaga raunala i kad su
ona postala dostupnija, vie se zanimanja poelo posve-
30
Izvorni tekst
Statistiko
strojno
prevoenje
Prijevodna pravila
Ciljni tekst
Generiranje teksta
enje temeljenih na znanju i sustava temeljenih na podatcima upravo komplementarno rasporeeni, danas se
Sveuilita u Zagrebu.
31
strojnim prevoenjem.
dova.
nski).
svoje www-stranice, kroz dodatak za popularne prebirnike kao i kroz integraciju u postojee sustave za strojno
potpomognuto prevoenje kako on-line, tako i o-line.
32
EN
BG
DE
CS
DA
EL
ES
ET
FI
FR
HU
IT
LT
LV
MT
NL
PL
PT
RO
SK
SL
SV
EN
61.3
53.6
58.4
57.6
59.5
60.0
52.0
49.3
64.0
48.0
61.0
51.8
54.0
72.1
56.9
60.8
60.7
60.8
60.8
61.0
58.5
BG
40.5
26.3
32.0
28.7
32.4
31.1
24.6
23.2
34.5
24.7
32.1
27.6
29.1
32.2
29.3
31.5
31.4
33.1
32.6
33.1
26.9
DE
46.8
38.7
42.6
44.1
43.1
42.7
37.3
36.0
45.1
34.3
44.3
33.9
35.0
37.2
46.9
40.2
42.9
38.5
39.4
37.9
41.0
CS
52.6
39.4
35.4
35.7
37.7
37.5
35.2
32.0
39.5
30.0
38.9
37.0
37.8
37.9
37.0
44.2
38.4
37.8
48.1
43.5
35.6
DA
50.0
39.6
43.1
43.6
44.5
44.4
37.8
37.9
47.4
33.0
45.8
36.8
38.5
38.9
45.4
42.1
42.8
40.3
41.0
42.6
46.6
EL
41.0
34.5
32.8
34.6
34.3
39.4
28.2
27.2
42.8
25.5
40.6
26.5
29.7
33.7
35.3
34.2
40.2
35.6
33.3
34.0
33.3
ES
55.2
46.9
47.1
48.9
47.5
54.0
40.4
39.7
60.9
34.1
26.9
21.1
8.0
48.7
49.7
46.2
60.7
50.4
46.2
47.0
46.6
ET
34.8
25.5
26.7
30.7
27.8
26.5
25.4
34.9
26.7
29.6
25.0
34.2
34.2
26.9
27.5
29.2
26.4
24.6
29.8
31.1
27.4
MT
39.8
25.9
19.8
26.3
21.1
23.8
24.6
20.5
19.4
25.3
18.1
24.6
22.2
23.3
22.0
27.9
24.8
28.7
28.5
30.0
23.7
NL
52.3
44.9
50.2
46.5
48.5
48.9
48.8
41.3
40.6
51.6
36.1
50.5
38.1
41.5
44.0
44.8
49.3
43.0
44.4
45.9
45.6
PL
49.2
35.1
30.2
39.2
34.3
34.2
33.9
32.0
28.8
35.7
29.8
35.2
31.6
34.4
37.1
32.0
34.5
35.8
39.0
38.2
32.2
PT
55.0
45.9
44.1
45.7
45.4
52.5
57.3
37.8
37.5
61.0
34.2
56.5
31.6
39.6
45.9
47.7
44.1
48.5
43.3
44.1
44.2
RO
49.0
36.8
30.7
36.5
33.9
37.2
38.1
28.0
26.5
43.8
25.7
39.3
29.3
31.0
38.9
33.0
38.2
39.4
35.3
35.8
32.7
SK
44.7
34.1
29.4
43.6
33.0
33.1
31.7
30.6
27.3
33.1
25.6
32.5
31.8
33.3
35.8
30.1
38.2
32.1
31.5
38.9
31.3
SL
50.7
34.1
31.4
41.3
36.2
36.3
33.9
32.9
28.2
35.6
28.2
34.7
35.3
37.1
40.0
34.6
39.8
34.4
35.1
42.6
33.5
SV
52.0
39.9
41.2
42.9
47.2
43.3
43.7
37.3
37.6
45.8
30.5
44.3
35.3
38.0
41.6
43.6
42.1
43.9
39.4
41.8
42.7
14: Kakvoa strojnoprevoditeljskoga prijevoda za sve parove izmeu 22 slubena jezika EU-a Machine translation between 22 EU-languages [34]
govara s itavim skupom potencijalno relevantnih do-
npr.:
teksta.
33
vih sustava.
lita u Zagrebu.
talnih, a katkada samo potpornih aplikacija, jesu saimanje teksta i generiranje teksta. Saimanje pokuava dati
sr duljega teksta u obliku kraega teksta, a postoji kao
ponuena funkcionalnost ve i u MS Wordu. Postupci
saimanja mahom su statistiki utemeljeni pri emu se
prvo identiciraju vane rijei u tekstu (npr. rijei koje
su visoko uestale u danome tekstu, ali su izrazito nisko
uestale u opoj jezinoj uporabi), a potom se pronalaze reenice u kojima se te rijei nalaze. Takve se reenice u dokumentu posebno obiljeavaju ili izdvajaju
ne bi li se od njih sastavio saetak. U tom najpopularnijem scenariju saimanje dokumenata vrsta je izdvajanja reenica: itav se tekst reducira na podskup svojih
reenica. Svi komercijalni sustavi za saimanje tekstova
koriste isti pristup. Alternativni pristup iskuava se u nekoliko istraivakih sredita i usmjeren je na sastavljanje
novih reenica, tj. na izgradnju saetka sastavljenog od
34
ljice, irilice i latinice), rjeavanje nestandardnih pravopisnih rjeenja, individualne varijacije u uporabi pojedi-
35
Izvori za hrvatsku batinu i hrvatski europski identitet [61] s projektima koji se bave digitalizacijom starih hrvatskih rjenika i izradbom Hrvatskoga valencijskoga rjenika [62];
nih jezinih podataka i izravno uveavaju broj dostupnih jezinih resursa za hrvatski jezik.
Takoer na Sveuilitu u Rijeci projekt Govorne tehnologije [63] napravio je znaajan napredak u razvoju temeljnih resursa i alata za obradbu hrvatskoga govora kao
to su Hrvatski govorni korpus i prototipovi sustava za
ATR i TTS na hrvatskome.
Ovi su istraivaki programi otvorili mogunost da ra-
voljnoj mjeri.
zirane funkcional-nosti.
36
4. Sporadina razvijenost
5. Slaba ili nikakva razvijenost
beni jezik.
bodova:
1. Izvrsna razvijenost
2. Dobra razvijenost
3. Umjerena razvijenost
37
Koliina
Dostupnost
Kakvoa
Pokrivenost
Zrelost
Odrivost
Prilagodljivost
Prepoznavanje govora
Sinteza govora
Gramatika analiza
1.5
3.5
Semantika analiza
0.3
0.3
0.67
0.3
Generiranje teksta
Strojno prevoenje
Tekstovni korpus
2.5
Govorni korpusi
Usporedni korpusi
2.5
3.5
3.5
3.5
2.5
2.5
Jezini resursi
Leksiki resursi
Gramatike
4.7 ZAKLJUCI
38
39
Izvrsna
podrka
Dobra
podrka
Engleski
Djelomina
podrka
Finski
Francuski
Nizozemski
Talijanski
Portugalski
panjolski
eki
Ruski
Sporadina
podrka
Baskijski
Bugarski
Danski
Estonski
Galicijski
Grki
Irski
Katalonski
Norveki
Poljski
Srpski
Slovaki
Slovenski
vedski
Maarski
Slaba podrka/
odsutnost podrke
Islandski
Hrvatski
Latvijski
Litavski
Malteki
Rumunjski
Izvrsna
podrka
Dobra
podrka
Engleski
Djelomina
podrka
Francuski
panjolski
Sporadina
podrka
Nizozemski
Talijanski
Katalonski
Poljski
Rumunjski
Maarski
Ruski
Slaba podrka/
odsutnost podrke
Baskijski
Bugarski
Danski
Estonski
Finski
Galicijski
Grki
Irski
Islandski
Hrvatski
Latvijski
Litavski
Malteki
Norveki
Portugalski
Srpski
Slovaki
Slovenski
vedski
eki
40
Izvrsna
podrka
Dobra
podrka
Engleski
Djelomina
podrka
Francuski
Nizozemski
Talijanski
panjolski
Ruski
Sporadina
podrka
Baskijski
Bugarski
Danski
Finski
Galicijski
Grki
Katalonski
Norveki
Poljski
Portugalski
Rumunjski
Slovaki
Slovenski
vedski
eki
Maarski
Slaba podrka/
odsutnost podrke
Estonski
Irski
Islandski
Hrvatski
Latvijski
Litavski
Malteki
Srpski
Izvrsna
podrka
Dobra
podrka
Engleski
Djelomina
podrka
Francuski
Nizozemski
Talijanski
Poljski
panjolski
vedski
eki
Maarski
Ruski
Sporadina
podrka
Baskijski
Bugarski
Danski
Estonski
Finski
Galicijski
Grki
Katalonski
Hrvatski
Norveki
Portugalski
Rumunjski
Srpski
Slovaki
Slovenski
Slaba podrka/
odsutnost podrke
Irski
Islandski
Latvijski
Litavski
Malteki
41
5
O META-NET-U
META-NET je mrea izvrsnosti koju podupire Europ-
notehnolokoj zajednici.
Peer-to-peer
mrea digitalnih repozitorija sadravat e jezine podatke, alate i web servise koji su dokumentirani visokokvalitetnim metapodatcima i organizirani u standardizirane kategorije. Resursi su odmah dostupni i jed-
META-RESEARCH.
jednice.
oce@meta-net.eu http://www.meta-net.eu
42
1
EXECUTIVE SUMMARY
Information technology changes our everyday lives. We
the ones that already exist and are being developed fur-
lms and TV stations that use it, but also on the pres-
soware applications.
tion in your mother tongue is considered to be the civilisational level necessary for overcoming the digital di-
French.
43
pus, the Croatian-English Parallel Corpus, the Croatian Morphological Lexicon, the Croatian Dependency
Treebank, etc.), it cant be assumed that the current status of language technologies is satisfactory. Beside the
nationally funded projects unfortunately, still only
few of them since 2008 started more substantial funding through ve EC projects: CLARIN, ACCURAT,
LetsMT!, ATLAS, XLike; but they are also mainly ori-
META-NETs vision is high-quality language technology for all languages that supports political and economic unity through cultural diversity. is technology
will help tear down existing barriers and build bridges
between Europes languages. is requires all stakeholders in politics, research, business, and society to unite
their eorts for the future.
meta-net.eu.
44
2
LANGUAGES AT RISK: A CHALLENGE FOR
LANGUAGE TECHNOLOGY
We are witnesses to a digital revolution that is dramat-
technology are sometimes compared to Gutenbergs invention of the printing press. What can this analogy tell
45
much public attention; yet, it raises a very pressing question: Which European languages will thrive in the networked information and knowledge society, and which
society will look like. However, there is a strong likelihood that the revolution in communication technology is bringing together people who speak dierent languages in new ways. is is putting pressure both on individuals to learn new languages and especially on developers to create new technology applications to ensure
mutual understanding and access to shareable knowledge. In the global economic and information space,
there is increasing interaction between dierent languages, speakers and content thanks to new types of media. e current popularity of social media (Wikipedia,
Facebook, Twitter, YouTube, and, recently, Google+) is
only the tip of the iceberg.
46
be able to achieve a really eective interactive, multimedia and multilingual user experience in the near future.
cussed primarily on language education and translation. According to one estimate, the European market
for translation, interpretation, soware localisation and
website globalisation was 8.4 billion in 2008 and is
expected to grow by 10% per annum [6]. Yet this gure covers just a small proportion of current and future
ready reasonably accurate in specic domains, and experimental applications provide multilingual informa-
47
lets look briey at the way humans acquire rst and sec-
systems work.
Looking even further ahead, innovative European multilingual language technology will provide a benchmark
48
49
3
THE CROATIAN LANGUAGE IN THE
EUROPEAN INFORMATION SOCIETY
3.1 GENERAL FACTS
e Croatian language belongs to the West-South Slavic
subgroup of the Slavic branch of the Indo-European linguistic family. Currently over 5.5 million people speak
Croatian as their native language. e Croatian language consists of the dialects and standard national language of the Croats, which is the ocial language of
more than 4 million people in the Republic of Croatia
and is, along with Bosnian and Serbian, one of three ofcial languages in Bosnia and Herzegovina, where it is
spoken by about 700,000 people. However, the Croatian language is also spoken by members of national minorities in Croatia as well as by autochthonous Croatian ethnic and linguistic minorities in Serbia, Montenegro, Slovenia, Hungary, Austria, Slovakia and Italy, who
either reside upon territories of former Croatian lands
or emigrated due to historically conditioned exoduses
throughout the centuries.
50
alphabet.
tion. e most numerous groups are the so-called Burgenland Croats in Austria (presumably about 50,000),
and about the same number of Croats live in Hungary.
In Austria, the Croats actively use Burgenland Croatian.
is variant of Croatian, which has been standardized
according to somewhat dierent rules than standard
Croatian, is one of Austrias ocial minority languages.
ere are a number of kindergartens and schools in Burgenland that use Burgenland Croatian. On the other
hand, the Croatian standard language is the ocial minority language in Hungary. In Italy at the moment live
about 3,000 Croats, who use a variant of Croatian called
Molise Croatian that is also taught in schools in three
communities inhabited by Croats in Molise. e number of Croats in Serbia, specically in the province of
Vojvodina where Croats are a national minority, is difcult to establish, since a number of ethnic Croats are
declared as so-called Bunjevci, mostly for political reasons. Although many Croats were expelled from Serbia
aer Croatia gained independence from Yugoslavia, it
is assumed that there are still more than 100,000 Croats
in Serbia. In other European countries, a Croatian autochthonous minority lives in Montenegro (7,000 to
10,000), the Czech Republic (less than 1,000), Slovakia
(4,000) and Romania (7,500). e number of Croats
in Slovenia is about 50,000, but only a small number of
them are an autochthonous minority, mostly in settlements along the border, and more of them are recent
on language usage.
51
52
as far North as the rivers Kupa and Sava, and as far east
dialectal group.
3.3 STANDARDISATION OF
CROATIAN LANGUAGE
e millennial history of the Croatian language is attested to by texts written as early as the end of the 10th
or the beginning of the 11th century, the period in
which the three Croatian dialects (akavian, tokavian,
Kajkavian) began to form. All three Croatian dialects
played an important part in the formation of the Croatian literary language (various dialectal stylizations) and
the moulding of the Croatian linguistic culture that
led to the Croatian standard language with a tokavian
foundation.
Dubrovnik region, but also by other South Slavic peoples. Croats in Burgenland (Austria, Hungary, Slovakia)
mostly speak akavian, and rarely the tokavian or Kajkavian dialects; Croats in the Italian province of Molise
speak an archaic tokavian dialect, and Karaevo Croats
53
54
works of Bartol Kai (15751650) and a ourishing of Renaissance and Baroque literature from tokavian Dubrovnik recognised the linguistic structure
of the tokavian dialect (rstly with the ikavian jat
reex, but later with the jekavian reex) as the best
there also exist the syllabic r (crn black) and the diph-
jelo).
lic, Latin), and the Latin script has been the foremost of
the three among the Croats since the 16th century. Its
modern-day form.
grd),
55
tokavian regions.
gender nouns).
non-dierentiation of and d.
3.4.2 Morphology
56
Noun declension
N and G singular
N plural
a-type masculine
a-type neuter
e-type feminine
i-type feminine
opis, opisa
sunce, sunca
ena, ene
no, noi
opisi
sunca
ene
noi
Case
Masculine
Neuter
Feminine
Singular
N
G
D
A
V
L
I
-i
-og(a) -eg(a)
-om(u/e) -em(u/e)
=N/=G
=N
-om(u/e) -em(u/e)
-im
-o -e
-og(a) -eg(a)
-om(u/e) -em(u/e)
=N
=N
-om(u/e) -em(u/e)
-im
-a
-e
-oj
-u
=N
=D
-om
Plural
N
G
D
A
V
L
I
-i
-ih
-im(a)
-e
=N
=D
=D
-a
-ih
-im(a)
=N
=N
=D
=D
-e
-ih
-im(a)
=N
=N
=D
=D
57
continental part.
3.4.4 Syntax
58
press both tense and modal meaning. Sentence organisation can be both coordinated and subordinated (with
the aid of conjunctions or without them). A relatively
new occurrence in the modern language is the common
use of the Slavonic genitive (Nije olio vina), genitive expressions of possession are avoided in favour of possesive
adjectives (majina kua instead of kua majke), and the
use of preterite tenses is reduced (imperfect, aorist and
pluperfect). In modern Croatian passive constructions
are rarer than in the older Croatian language.
3.4.6 Onomastics
Croatian names represent important linguistic monuments of the linguistic, cultural and social heritage of
the people who created them. us, both personal
names (anthroponyms) and place names (toponyms) are
an important segment of Croatian linguistic culture.
e territory of present-day Croatia, roughly bound by
the river Drava in the North, the river Danube in the
East and the Adriatic Sea in the South, is very picturesquely reected in its complex stratication of geographical names. e complex stratication of Croa-
3.4.5 Orthography
and Latin script), the Latin script has been the domi-
written heritage, the dual-characters d, lj and nj, are replaced by , and respectively. e characters q, x, y,
Croatian orthography is phonological-morphonological, Slavic nation to bear family names (since the 12th censince it presents a conuence of two orthographic
59
Capital letters
A
L
B
LJ
C
M
NJ
D
O
D
P
E
S
G
T
H
U
I
V
J
Z
g
t
h
u
i
v
j
z
Lowercase letters
a
l
b
lj
c
m
nj
d
o
d
p
e
s
in four ocial languages (Croatian, Macedonian, Serbian, Slovenian), but soon a lot of political eort was
Because of
them.
60
to promote the culture of the Croatian standard language in written and oral communication;
to tend to the status and role of the Croatian standard language in light of Croatias integration into
the European Union;
to make decisions on further standardisation processes of the Croatian standard language;
to take care of language issues and set principles for
61
universities. ere is a pronounced tendency in Croatia, especially in so-called hard sciences to teach in the
English language. ere were agreeable opinions that it
could be functional and useful, but also harmful and unacceptable not to teach in the Croatian language at universities. It would have devastating eects for the development of the Croatian scientic terminology and occupational phraseology. erefore e Croatian Language Council advised the Ministry to legally dene
the language usage at higher education.
In primary and secondary schools, Croatian language
and literature is taught as a subject, and takes up considerable space in the curriculum. As part of this subject,
Croatian grammar, vocabulary and literature is studied,
and written and verbal expression in Croatian is devel-
62
1999.
63
Computer usage
Internet access
Web site
Usage of nancial and banking services
E-government usage
2008
2009
2010
98
97
64
84
56
98
95
57
84
61
97
95
61
85
63
6: ICT in enterprises
Personal computer
Internet access
Mobile phone
2008
2009
2010
53
45
81
55
50
82
60
57
7: ICT in households
64
4
LANGUAGE TECHNOLOGY SUPPORT FOR
CROATIAN
Language technologies are used to develop soware
information retrieval
information extraction
text summarisation
While speech is the oldest and in terms of human evolution the most natural form of language communication, complex information and most human knowledge
is stored and transmitted through the written word.
question answering
speech recognition
speech synthesis
scape.
When we communicate, we combine language with
other modes of information and communication media
for example speaking can involve gestures and facial
expressions. Digital texts link to pictures and sounds.
Movies may contain language in spoken and written
form. In other words, speech and text technologies overlap and interact with other multimodal communication
and multimedia technologies.
In this section, we will discuss the main application
areas of language technology, i. e., language checking,
web search, speech interaction, and machine translation. ese applications and basic technologies include:
4.1 APPLICATION
ARCHITECTURES
Soware applications for language processing typically
consist of several components that mirror dierent aspects of language. While such applications tend to be
complex, Figure 2 shows a highly simplied architecture
of a typical text processing system. e rst three modules handle the structure and meaning of the text input:
1. Pre-processing: cleans the data, analyses or removes
formatting, detects the input languages, and so on.
2. Grammatical analysis: nds the verb, its objects,
spelling correction
authoring support
sentence structure.
65
Speech Technologies
Multimedia &
Multimodality
Technologies
Language
Technologies
Knowledge Technologies
Text Technologies
8: Language technologies
(p. 78) at the end of this chapter. is table lists all tools
readable way.
Aer analysing the text, task-specic modules can per-
in Croatia.
Input Text
Pre-processing
Output
Grammatical Analysis
Semantic Analysis
Task-specific Modules
66
Input Text
Spelling Check
Grammar Check
Correction Proposals
is type of analysis either needs to draw on languagespecic grammars laboriously coded into the soware
the words that precede and follow it). For example: jaz
izmeu (gap between) is much more probable word se-
67
ian, even though it has not made its way into printed
68
Web Pages
Pre-processing
Semantic Processing
Indexing
Matching
&
Relevance
Pre-processing
Query Analysis
User Query
Search Results
69
nologies:
1. Automatic speech recognition (ASR) determines
which words are actually spoken in a given sequence
of sounds uttered by a user.
2. Natural language understanding analyses the syntactic structure of a users utterance and interprets it according to the system in question.
3. Dialogue management determines which action to
take given the user input and system functionality.
70
Speech Output
Speech Input
Speech Synthesis
Signal Processing
Natural Language
Understanding &
Dialogue
Recognition
tomated translation.
71
Source Text
Statistical
Machine
Translation
Translation Rules
Target Text
Text Generation
72
to be aligned.
tional challenge.
Although the pioneering workshop on machine translation was organised at the University of Zagreb, Faculty of Humanities and Social Sciences by eljko Bujas
and Bulcs Lszl as early as 1959 [35], no serious research on MT for Croatian happened until the beginning of 21st century. e nationally funded project Information Technology in Translation and e-Learning
can be shared with other users who can exploit them further on. e translation services of the LetsMT! project
can be used in several ways: through the web portal,
through a widget provided for free inclusion in a webpage, through browser plug-ins, and through integration in computer-assisted translation (CAT) tools and
dierent online and oine applications.
able on-line.
73
the context.
guages that are very dierent from other languages (e. g.,
Hungarian, Maltese, Finnish).
For example:
Answer: 38.
74
portant words in a text (i. e., words that occur very fre-
4.3 EDUCATIONAL
PROGRAMMES
75
same source:
Treebank [57], Croatian Wordnet [58], hybrid tagger [59] and lemmatiser [15], dependency parser,
NERC system and other information extraction
tools [60] etc.);
Sources for Croatian Heritage and Croatian European Identity [61] with projects dealing with digiti-
tian.
76
ished in 2002.
formats.
project ATLAS.
a few others.
summed up as follows:
77
Coverage
Maturity
Sustainability
Adaptability
Speech synthesis
Grammatical analysis
1.5
3.5
0.3
0.3
0.67
0.3
Machine translation
uality
Availability
uantity
Speech recognition
Semantic analysis
2.5
Speech corpora
Parallel corpora
Lexical resources
2.5
3.5
3.5
3.5
2.5
2.5
Grammars
point scale:
1. Excellent support
in 2013.
2. Good support
3. Moderate support
4.6 CROSS-LANGUAGE
COMPARISON
4. Fragmentary support
teria:
5. Weak or no support
78
speech-based applications.
ment detailed in this report, immediate action must occur before any breakthroughs for the Croatian language
4.7 CONCLUSIONS
cooperation.
79
cultural diversity.
80
Excellent
support
Good
support
English
Moderate
support
Czech
Dutch
Finnish
French
German
Italian
Portuguese
Spanish
Fragmentary
support
Basque
Bulgarian
Catalan
Danish
Estonian
Galician
Greek
Hungarian
Irish
Norwegian
Polish
Serbian
Slovak
Slovene
Swedish
Weak/no
support
Croatian
Icelandic
Latvian
Lithuanian
Maltese
Romanian
15: Speech processing: state of language technology support for 30 European languages
Excellent
support
Good
support
English
Moderate
support
French
Spanish
Fragmentary
support
Catalan
Dutch
German
Hungarian
Italian
Polish
Romanian
Weak/no
support
Basque
Bulgarian
Croatian
Czech
Danish
Estonian
Finnish
Galician
Greek
Icelandic
Irish
Latvian
Lithuanian
Maltese
Norwegian
Portuguese
Serbian
Slovak
Slovene
Swedish
16: Machine translation: state of language technology support for 30 European languages
81
Excellent
support
Good
support
English
Moderate
support
Dutch
French
German
Italian
Spanish
Fragmentary
support
Basque
Bulgarian
Catalan
Czech
Danish
Finnish
Galician
Greek
Hungarian
Norwegian
Polish
Portuguese
Romanian
Slovak
Slovene
Swedish
Weak/no
support
Croatian
Estonian
Icelandic
Irish
Latvian
Lithuanian
Maltese
Serbian
17: Text analysis: state of language technology support for 30 European languages
Excellent
support
Good
support
English
Moderate
support
Czech
Dutch
French
German
Hungarian
Italian
Polish
Spanish
Swedish
Fragmentary
support
Basque
Bulgarian
Catalan
Croatian
Danish
Estonian
Finnish
Galician
Greek
Norwegian
Portuguese
Romanian
Serbian
Slovak
Slovene
Weak/no
support
Icelandic
Irish
Latvian
Lithuanian
Maltese
18: Speech and text resources: State of support for 30 European languages
82
5
ABOUT META-NET
META-NET is a Network of Excellence partially
across languages;
grants all Europeans equal access to information and
knowledge in any language;
builds upon and advances functionalities of networked information technology.
e network supports a Europe that unites as a single digital market and information space. It stimulates
and promotes multilingual technologies for all European languages. ese technologies support automatic
e peerto-
of the community.
stakeholder community that unites around a shared vision and a common strategic research agenda (SRA).
oce@meta-net.eu http://www.meta-net.eu
83
A
BIBLIOGRAFIJA REFERENCES
[1] Aljoscha Burchardt, Markus Egg, Kathrin Eichler, Brigitte Krenn, Jrn Kreutel, Annette Lemllmann, Georg Rehm,
Manfred Stede, Hans Uszkoreit, and Martin Volk. Die Deutsche Sprache im Digitalen Zeitalter e German Language
in the Digital Age. META-NET White Paper Series. Georg Rehm and Hans Uszkoreit (Series Editors). Springer, 2012.
[2] Aljoscha Burchardt, Georg Rehm, and Felix Sasaki. e Future European Multilingual Information Society Vision
Paper for a Strategic Research Agenda, 2011. http://www.meta-net.eu/vision/reports/meta-net-vision-paper.pdf.
[3] Directorate-General Information Society & Media of the European Commission. User Language Preferences Online,
2011. http://ec.europa.eu/public_opinion/flash/fl_313_en.pdf.
[4] European Commission. Multilingualism: an Asset for Europe and a Shared Commitment, 2008. http://ec.europa.eu/
languages/pdf/comm2008_en.pdf.
[5] Directorate-General of the UNESCO. Intersectoral Mid-term Strategy on Languages and Multilingualism, 2007. http:
//unesdoc.unesco.org/images/0015/001503/150335e.pdf.
[6] Directorate-General for Translation of the European Commission. Size of the Language Industry in the EU, 2009.
http://ec.europa.eu/dgs/translation/publications/studies.
[7] Narodne novine. Ustav Republike Hrvatske, 2001. http://narodne-novine.nn.hr/clanci/sluzbeni/232289.html.
[8] Mladen Klemeni. A Concise atlas of the Republic of Croatia & of the Republic of Bosnia and Hercegovina. Miroslav
Krlea Lexicographical Institute, 1993.
[9] Institut za hrvatski jezik i jezikoslovlje (Institute of Croatian Language and Linguistics). http://www.ihjj.hr.
[10] Institut za hrvatski jezik i jezikoslovlje. Jezini Savjeti (Language Advice Portal). http://savjetnik.ihjj.hr.
[11] Institut za hrvatski jezik i jezikoslovlje. Struna: Hrvatsko strukovno nazivlje (Struna: Croatian Professional Terminology). http://struna.ihjj.hr/o-programu/.
[12] Croaticum centar za hrvatski kao drugi i strani jezik (Croaticum e Center for Croatian as a Second and Foreign
Language). http://croaticum.ffzg.hr.
[13] Hrvatski jezik (Croatian Language). http://www.hrvatskijezik.eu.
[14] Jezine tehnologije za hrvatski jezik (HLT). http://jthj.ffzg.hr.
[15] eljko Agi, Marko Tadi, and Zdravko Dovedan. Evaluating Full Lemmatization of Croatian Texts. In M. Klopotek,
A. Przepiorkowski, S. Wierzchon, and K. Trojanowski, editors, Recent Advances in Intelligent Information Systems. Academic Publishing House EXIT, 2009.
[16] Daniel Jurafsky and James H. Martin. Speech and Language Processing (2nd Edition). Prentice Hall, 2009.
85
[17] Christopher D. Manning and Hinrich Schtze. Foundations of Statistical Natural Language Processing. MIT Press,
1999.
[18] Language Technology World (LT World). http://www.lt-world.org.
[19] Ronald Cole, Joseph Mariani, Hans Uszkoreit, Giovanni Battista Varile, Annie Zaenen, and Antonio Zampolli, editors.
Survey of the State of the Art in Human Language Technology (Studies in Natural Language Processing). Cambridge
University Press, 1998.
[20] Marko Tadi. Jezine tehnologije i hrvatski jezik (HLT and Croatian). Exlibris, 2003.
[21] Hrvatski akademski spelling checker (Hascheck). http://hacheck.tel.fer.hr.
[22] Spiegel Online. Google zieht weiter davon (Google is still leaving everybody behind), 2009. http://www.spiegel.de/
netzwelt/web/0,1518,619398,00.html.
[23] Juan Carlos Perez. Google Rolls out Semantic Search Capabilities, 2009. http://www.pcworld.com/businesscenter/
article/161869/google_rolls_out_semantic_search_capabilities.html.
[24] Hrvatski Morfoloki Leksikon (Croatian Morphological Lexicon). http://hml.ffzg.hr.
[25] MULTEXT-East: Multilingual Text Tools and Corpora for Central and Eastern European Languages. http://nl.ijs.si/
ME/.
[26] Cadial: Computer aided document indexing for accessing legislation. http://www.cadial.org.
[27] Cadial: Computer aided document indexing for accessing legislation. http://cadial.hidra.hr/search.php.
[28] e MBROLA Project. http://tcts.fpms.ac.be/synthesis/mbrola.html.
[29] Branimir Dropulji and Davor Petrinovi. Development of Acoustic Model for Croatian Language Using HTK. Automatika, 51(1):7988, 2010.
[30] Sanda Martini-Ipi, Miran Pobar, and Ivo Ipi. Croatian Large Vocabulary Automatic Speech Recognition. Automatika, 52(2):147157, 2011.
[31] CRO-SPEECHDAT (Baza govornih uzoraka i tekstova dostupna putem Interneta). http://www.inf.uniri.hr/~ivoi/
CROSPEECH/index.htm.
[32] Sanda Martini-Ipi and Ivo Ipi. Croatian HMM Based Speech Synthesis. Journal of Computing and Information
Technology, 14(4):299305, 2006.
[33] Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. BLEU: a Method for Automatic Evaluation of Machine
Translation. In Proceedings of the 40th Annual Meeting of ACL, 2002.
[34] Philipp Koehn, Alexandra Birch, and Ralf Steinberger. 462 Machine Translation Systems for Europe. In MT Summit
XII, 2009.
[35] Svetozar Petrovi and Bulcs Lszl. Strojno prevoenje i statistika u jeziku. Nae teme, (6):105298, 1959.
[36] Information Technology in Translation and e-Learning of Croatian. http://rmjt.ffzg.hr/p4_en.html.
[37] Lets MT. https://www.letsmt.eu/Start.aspx.
[38] Accurat. http://www.accurat-project.eu.
86
[39] Inguna Skadia, Andrejs, Vasijevs Raivis Skadi, Robert Gaizauskas, Dan Tu, and Tatiana Gornostay. Analysis and
Evaluation of Comparable Corpora for Under Resourced Areas of Machine Translation. In Proceedings of the 3rd Workshop on Building and Using Comparable Corpora. European Language Resources Association (ELRA), 2010.
[40] Andreas Eisele and Jia Xu. Improving Machine Translation Performance Using Comparable Corpora. In Proceedings of
the 3rd Workshop on Building and Using Comparable Corpora, 2010.
[41] Andrejs Vasiljevs, Tatiana Gornostay, and Raivis Skadins. LetsMT! Online Platform for Sharing Training Data and
Building User Tailored Machine Translation. In Proceedings of the Fourth Baltic conference Human Language Technologies the Baltic Perspective, 2010.
[42] HINA: Hrvatska izvjetajna novinska agencija (HINA: Croatian News Agency). http://websrv2.hina.hr/hina/web/
index.action.
[43] Knowledge Technologies Lab. http://ktlab.fer.hr.
[44] Nives Mikeli Preradovi, Tomislava Lauc, and Damir Boras. CROXMLSUM the System for XML. Document Summarization in Croatian. International Journal of Mathematics and Computers in Simulation, 1(1):8189, 2007.
[45] Branko itko, Slavomir Stankov, Marko Rosi, and Ani Grubii. Dynamic test generation over ontology-based knowledge representation in authoring shell. Expert Systems with Applications, 36(4):81858196, 2009.
[46] eljko Bujas. Osman, kompjutorska konkordancija (Osman, Computer Concordance). Sveuilina naklada Liber, 1974.
[47] Marko Tadi. Raunalna obradba hrvatskih korpusa: povijest, stanje i perspektive (Computer processing of Croatian
corpora: history, status and perspectives). Suvremena lingistika, 43-44(1-3):387394, 1997.
[48] Milan Mogu, Maja Bratani, and Marko Tadi. Hrvatski estotni rjenik (Croatian Frequency Dictionary). kolska
knjiga, 1999.
[49] Hrvatski nacionalni korpus (Croatian National Corpus). http://hnk.ffzg.hr.
[50] Marko Tadi. Building the Croatian National Corpus. In Proceedings of the 3rd International Conference on Language
Resources and Evaluation (LREC2002), 2002.
[51] Marko Tadi. New version of the Croatian National Corpus. In Dana Hlavkov, Ale Hork, Klara Osolsob, and Pavel
Rychl, editors, Aer Half a Century of Slaonic Natural Language Processing, Masaryk Uniersity. Masaryk University,
2009.
[52] Nikola Ljubei and Toma Erjavec. hrWaC and slWaC: Compiling web corpora for Croatian and Slovene. In Proceedings of the 14th International Conference Text, Speech and Dialogue (TSD2011). Springer, 2011.
[53] Portal hrvatske rjenike batine (Croatian Old Dictionary Portal). http://crodip.ffzg.hr.
[54] Hrvatska jezina riznica (Croatian Language Repository). http://riznica.ihjj.hr.
[55] Dunja Brozovi Ronevi and Damir avar. Hrvatska jezina riznica kao podloga jezinim i jezinopovijesnim istraivanjima hrvatskoga jezika ( Croatian Language treasury as a base language and ..... Croatian language studies). In Vidjeti
Ohrid: Proceedings of the 14th international Slaistic Congress in Ohrid, 2008.
[56] Raunalnolingvistiki modeli i jezine tehnologije za hrvatski jezik (Computational Linguistic Models and Language
Technologies for Croatian). http://rmjt.ffzg.hr.
[57] Hrvatska ovisnosna banka stabala (Croatian Dependency Treebank). http://hobs.ffzg.hr.
87
88
B
META-NET LANICE META-NET MEMBERS
Austrija
Austria
Belgija
Belgium
Bugarska
Bulgaria
Cipar
Cyprus
eka
Czech Republic
Institute of Formal and Applied Linguistics, Charles University in Prague: Jan Haji
Danska
Denmark
Estonija
Estonia
Finska
Finland
Francuska
France
Centre National de la Recherche Scientique, Laboratoire dInformatique pour la Mcanique et les Sciences de lIngnieur and Institute for Multilingual and Multimedia Information: Joseph Mariani
Evaluations and Language Resources Distribution Agency: Khalid Choukri
Grka
Greece
R.C. Athena, Institute for Language and Speech Processing: Stelios Piperidis
Hrvatska
Croatia
Irska
Ireland
Island
Iceland
Italija
Italy
Latvija
Latvia
Litva
Lithuania
Luksemburg
Luxembourg
89
Maarska
Hungary
Malta
Malta
Nizozemska
Netherlands
Norveka
Norway
Njemaka
Germany
Poljska
Poland
Portugal
Portugal
Rumunjska
Romania
Slovaka
Slovakia
Slovenija
Slovenia
Srbija
Serbia
panjolska
Spain
90
Center for Language and Speech Technologies and Applications, Universitat Politcnica
de Catalunya: Asuncin Moreno
Department of Signal Processing and Communications, University of Vigo:
Carmen Garca Mateo
vedska
Sweden
vicarska
Switzerland
UK
UK
Oko 100 jezine tehnologije strunjaci Predstavnici zemalja i jezika zastupljenih u META-NET raspravlja
i nalizirani kljune rezultate i poruke Bijele knjige serije na sastanku u Berlinu, Njemaka, listopada 21/22,
2011. About 100 language technology experts representatives of the countries and languages represented
in META-NET discussed and nalised the key results and messages of the White Paper Series at a meeting in
Berlin, Germany, on October 21/22, 2011.
91
C
NIZ BIJELE THE META-NET
KNJIGE META-NET WHITE PAPER SERIES
baskijski
bugarski
eki
danski
engleski
estonski
nski
francuski
galicijski
grki
hrvatski
irski
islandski
katalonski
latvijski
litavski
maarski
malteki
nizozemski
norveki bokml
norveki nnorsk
njemaki
poljski
portugalski
rumunjski
slovaki
slovenski
srpski
panjolski
vedski
talijanski
Basque
Bulgarian
Czech
Danish
English
Estonian
Finnish
French
Galician
Greek
Croatian
Irish
Icelandic
Catalan
Latvian
Lithuanian
Hungarian
Maltese
Dutch
Norwegian Bokml
Norwegian Nynorsk
German
Polish
Portuguese
Romanian
Slovak
Slovene
Serbian
Spanish
Swedish
Italian
euskara
etina
dansk
English
eesti
suomi
franais
galego
hrvatski
Gaeilge
slenska
catal
latvieu valoda
lietuvi kalba
magyar
Malti
Nederlands
bokml
nynorsk
Deutsch
polski
portugus
romn
slovenina
slovenina
espaol
svenska
italiano
93
Research
Co
ies
unit
mm
Lan
gu
a
es
stri
u
d
Soc
iet
rs
Use
e
g
In
pean languages.
Niz Jezinih bijelih knjiga otvara nove uvide u europsku jezinu raznolikost dok istodobno relativizira pojam
tzv. malih jezika, poput hrvatskoga. Stoga jezine tehnologije imaju ne samo kljunu ulogu u iskazivanju jezinoga bogatstva u dananjoj Europi, ve predstavljaju metodoloko ishodite za daljnji razvitak digitalnih humanistikih znanosti, osobito ako ih se promatra kao temelj za daljnja istraivanja u raznim humanistikih disciplinama.
Prof. dr. Milena ic Fuchs (redoviti lan Hrvatske akademije znanosti i umjetnosti, predsjednica Stalnoga odbora
za humanistike znanosti Europske znanstvene zaklade)
www.meta-net.eu
www.meta-net.eu