You are on page 1of 64

.

Tagging is, usually, understood to mean


POS tagging,
but there are other types of tags also
(e.g. Semantic tags)

POS tagging
Parts of Speech Tagging POST

gg g involves the task of labeling


g each word in
Tagging
a sentence with its appropriate
part of speech. (nouns, verbs, adjectives, adverbs , etc.)

, ,,
,
.

Tagging also involves in dissolving the


ambiguity of POS of a given word in the
context in text.


.
++ VM-past tense-third person Masculine sg.

N/V ? How to solve it?

Tagging is also otherwise known as the


primary phase in the assignment of
structure to a text.

Labelling words for POS can be done by

`
`
`
`

dictionary lookup
morphological analysis
Tagging
Morphological analysis should precede tagging.

In other words,
words it is a process of labeling
words in a sentence extracting its grammatical
information by assigning its parts of speech
based on both its definition and the context.

For example,

tiRamaiyaana/Adj

Raja/NNP

neeRRu/Adv
RR /Ad
vandaan/VM
d
/VM
.
`
`

......
N/Adj/V/ How to resolve it?
.

Tagging is much easier a task to solve than


parsing and usually the accuracy is quite
high.
Tagger plays an increasingly important role in
the creation of annotated corpora in various
areas of Natural Language Processing.
We need, then, a tag set

.

,

,
.
, ,

TAGSET ( not full)


CATEGORY

TAG

NOUN

NN

PRONOUN

PRP

ADJECTIVE

ADJ

VERB

VM

ADVERB

ADV

POSTPOSITION

PSP

CONJUNCTS

CNJ

QUESTION WORDS

WQ

QUANTIFIERS

QTF

CARDINAL

CRD

ORDINAL

ORD

PARTICIPIAL NOUN

PRN

FRAMEWORK (TAGSET) Continued


CATEGORY

TAG

INTENSIFIER

INTF

INTERJECTION

INJ

NEGATION

NEG

PARTICIPLE

PTL

PARTICLE

PRT

UNKNOWN/ RESIDUAL

UNK

PUNCTUATION

PU

DEFECTIVE VERB

DV

PARTICIPLE

PRC

VERBAL NOUN

VN

GERUND

GND

`
`
`
`
`
`
`
`
`
`
`
`
`

()

Functional definition is not good

The name given to the lexical class in which the


words for most people, places, or things occur

`
`

Semantic definition of noun

Thing like its ability to occur with determiners, case


markers (a box, its tail, Indian population), and to
occur in the
h plural
l
l form
f
(marangaL).
(
)


.
: , ;

A verbal noun/gerund noun (often known as an atu, -tal,


and -kai)
i a noun fformed
is
d ffrom a verb
bb
by adding
ddi
-, -,
-,
etc.
There are about 45 suffixes available in Tamil
We may need to have these two categories separately.
Why?
These are derived forms of verbs
1. function exactly as a noun and allow nominal inflections.
paDippu
paDippu, vaazhkkai
vaazhkkai, mahizhcci
mahizhcci, etc
2. behave as a verb . These can be distinguished into tensed
and
un-tensed.
ceydal paDittal : peesinadu,
ceydal,
peesinadu peesuhiRadu
peesuhiRadu, peesuvadu

-, -, -,
-, -,-, -, 45

1
1.

2
2.
.
,

`
`
`

Annotate the gender according to the agreement


with the verb.
verb Hence the gender can be semantic
(i.e. natural gender). In some languages it is
grammatical.
In Tamil, there are three gender systems namely
masculine, feminine and neuter. Epicene can be
added as the fourth gender.
g

, , ,
The full distinction into these four genders are only
realized with third person pronouns.

Singular and Plural.


Plural number is marked by
y means of the
plural suffixes kaL, -kkaL, maar, yar, etc.
are always added to the noun stem.

`
`
`
`
`

, ,


()
-, - , - , -

Case markers in Tamil are usually bound


suffixes The list of case markers in the
suffixes.
tagset consists of six casal relations
accusative,, instrumental,,
dative, sociative,
posessive and locative.
p
Nominative case is not marked
There are some more case relationships
which are marked by postpositions.

`
`
`
`

`
`


:
, , .

A pronoun is a word that can be a substitute


for a noun
Pronouns are referential
P
f
i l as well
ll as anaphoric
h i
by nature, i.e., they refer back to something
previously expressed (linguistic antecedent)
either in a sentence or in a discourse.
Inclusive Vs. exclusive 1st person pronoun

`
`

.

.

, ,
, ,

.
:

A pronoun which must normally take as its


antecedent another noun phrase in the same sentence
is known as a reflexive pronoun.
is only a reflexive pronoun. Represents the
1st, 2nd, and
d 3rd persons, respectively,
ti l tto th
the pronoun, , , as shown below

Reciprocal refers to a clause in which two NPs,


NPs both
of which have multiple referents, are interpreted as
coreferential. One of the devices to indicate
reciprocality is to mark one of the two coreferential
NPs with multiple referents. The reciprocal as cooccurrence of two case marked identical nominals
can be represented as follows:
Noun + case noun

Wh
pronouns do not have a fixed reference.
Wh-pronouns
reference
Since they occur in questions, in which they
ask for information, they presuppose that the
reference has not been established.

`
`
`
`
`
`

......

Adjectives and quantifiers have been put into the


same group of nominal modifiers on the basis of
function.
function
Should we have different tags?

Simple and derived adjectives in modern Tamil.

Simple adjectives
sweet, small, good, old,
new, big rare

Derived adjectives

beautiful, "tall, high

A quantifier is a word which quantifies the noun, i.e.


it expresses
p
the nouns definite or indefinite
number or amount.

Example
E
l
a few, many, all, everything,
some

9. INTENSIFIER

, and are intensifiers,

`
`

Typically a sentence in Tamil contains one or more


than one verb, whether it can be a main verb or a
combination of main verb and auxiliary verbs.
It usually gets inflected in agreement with the
subject.
j
If the sentence contains a combination of main and
auxiliary verbs, the main verb in the verb group
does not carry verbal inflections; instead
instead, the last
auxiliary verb of the verb group inflects for
agreement.
Example
.
*
.

2
1. +

2.

`
`
`
`
`

(a)
(b)

`
`

Strong vs. Weak




T
Transitive
i i vs. IIntransitive
ii

Active vs.
vs Passive
-
Regular vs
vs. irregular vs
vs. defective
Classification on the basis of past tense
markers.

`
`
`

--,--, --, --, -- --, - etc

-, -

-, --, --, -

Tense:
` Tense situates the
h sentence in relation
l
to the
h time off utterance. The
h
values are present, past, and future. The verb carries the tense and
nominal agreement information (gender, number, person etc.).
Future tense is expressed by a suffix attached to the main verb.
Person

Present
--

Past
--

Future
--

(1sg)

(2sg)

(3sg.m.)

(3sg.f.)
(3sg f )

(3sg.n.)

(3sg.m.f.)

(1pl.)

(2pl.)

(3pl.m.f.)

(3pl.n.))
(3p

Verb class

Present
,

Past
,
,
,
,

Future
, ,

II

III

IV

VI

VII

12. Finite vs. infinitive


Finiteness is the attribute that indicates two verbs system.
y
1. finite verb, and 2. infinitive form of the verb.
A finite verb is morphologically inflected for Gender,
Number Person
Number,
Person, and Tense
Tense.
A infinite verb has verb root and infinitive marker.
Example (finite verb)

\VM.mas.sg.3.prs.fin. (And the others)

`
`
`

`
`
`

An adverb belongs to a group of words that


modifies the verb or the sentence.

ADVERBS OF MANNER

Th
These
adverbs
d b modify
dif or describe
d
ib th
the verb.
b

`
`
`
`
`
`
`

All the words denoting


g time and place
p
come under
ALC, hence it includes words denoting words or
phrases that spatially and temporally modifies the
verb.
verb

Participle is a non-finite form of verb that has


the characteristics and functions of both verbs and adjectives.
( Relative Participle or Adjectival Participle)
Tamil has a following set of participles, and the same principle is
used for organizing participles in the tag set:
Category

Type
Relative

Attributes

Example

Tense, Negative

the
the boy who did
did

Verbal
Participle

Tense, Negative

Gender, Number,
Nominal

Conditional

Tense, Negative

one who did

he who did

Tense, Negative

if do

`
`
`

`
`
`

,
,


, ,

Relative/Adjectival Participle
` The adjectival participle is the only non-finite verb form which
distinguishes tense.
` In addition there is a tenseless negative adjectival participle.
` The past and present relative participle is formed by adding the past
or present tense and RP suffix a.
` The
Th ffuture
t
is
i formed
f
d by
b adding
ddi
um to
t th
the root/stem
t/ t
off the
th verb.
b
Examples :

(Negative RP)

`
`
`

The verbal participle is the second tenseless non-finite verb


form.
form It has both a affirmative and a negative form.
form
The affirmative verbal participle is formed as
the verb stem + verbal participle suffix (-u) .
the
h verb
b stem + verbal
b l participle
i i l suffix
ffi ((-i)
i)
1.

2
2.

It is formed by
verb stem + tense marker + a nominalizing suffix.
++
++
++
++
++
++
(B
t
RP fform iis N
(Butt th
the ffuture
Nott thi
this.))
++

++

`
`
`
`
`
`

+ -()
-()
-()
-()

-(
-()
()

The conditional,
forms.
conditional has both affirmative and negative forms
The affirmative conditional of the verb is formed by
verb root/stem + past tense + the suffix -
C
Concessive
i
verb root/stem + past tense + the suffix -

p
Ap
postposition
is the functional word that occurs
after a word.
What are the uses or functions of these words?

Ap
particle is a word that does not belong
g to one of the main
parts of speech, They have typical grammatical or pragmatic
meaning.
Category

Type

Attributes

Example
, ,

Co-ordinating

Clitic

In addition
, , ,

Subordinating

Particle

Clitic

! ! Is it

Interjection

yes
yes

(Dis)Agreement

? is it so

Confirmative
Delimiting

, that

Negative, Clitic

only

Reduplication is repetition of a part of a token or complete


token itself. Sometimes they are full and sometimes they are
partial

CARDINAL - A cardinal numeral is the


numeral which is basic in form.
onRu
iraNDu
pattu

An ordinal represents the rank of a number


with respect to some order.

In other words,
words it denotes the position in a
sequence.
- mudalaavadu
- aindaavadu
- irupadaavadu

The punctuation marks such as


(, / ---- / ; / : / ! / /? / etc.)
are also to be tagged

Recall that we distinguished

open-class categories (noun, verb, adjective,


adverb)
Closed-class
class categories (preposition, determiner,
Closed
pronoun, conjunction, )

While the
Whil
h big
bi ffour are fairly
f i l clear-cut,
l
i iis
it
less obvious exactly what and how many
closed-class
closed class categories there may be

Three stages of tagging


`

Stage 1: look up word in lexicon to give list of


potential
i l POS
POSs
Stage 2: Apply rules which certify or disallow
tag sequences (Morphological Analyzer)
Stage 3. Actual correct tagging

How to make a tagger tool?


Alternative now is manual tagging only.

;

, .


. .
,

.
.

, .
.

`
`
`
`
`
`
`


, !
,
, !
! ,
;
,
.

nadarajapillai@rediffmail. Com
9448576300

This POS guideline of hierarchal tagset focuses on


the morpho
syntactic description of Tamil for
morpho-syntactic
facilitating the annotator with the tool.
Basic description of the tool consists of the
definition and description of the tags with
examples from Tamil.
Morphology of Tamil is very rich and hence more
tags are necessary.
Some of them lead to ambiguity. This eventually
leads to tag ambiguity while tagging.

The frame work adopted here tries to focus on the


morpho-syntactic features of words to derive the
appropriate attribute sets for the tags.
Although the natural language properties are quite
systematic, exceptions are also found quite frequently.
Any effort to categorize and classify natural language,
thus, is a challenge for language technology research.
This Tag set tries to capture nuances of the language in
the tag
g set,, though
g this is a huge
g task to achieve.

It is expected that documentation of finer nuances


from the annotators for perfection of the tag set
description.
However, it is almost an impossible job to capture
all the subtleties of a natural language.
What is given is specific to Tamil.
Finally
Finally, this guideline also serves to list out
possible issues that are not handled in the tag set.