Professional Documents
Culture Documents
ANNOTATION
BBI 5411
AP. DR AFIDA MOHAMAD ALI
• 1960S, FIRST ATTEMPTS AT • 1986 – SGML WAS CREATED
STANDARDISING MARKUP FOR • WHEN THE WORLD WIDE WEB WAS
INFORMATION EXCHANGE CREATED BY TIM BERNERS LEE, HTML
BEGAN. (HYPERTEXT MARKUP LANGUAGE)
ARRIVED ON THE SCENE AND BECAME
POPULAR.
• NECESSARY IN ORDER TO BE ABLE
TO EXCHANGE VARIOUS TYPES OF • THEN CAME XML
• AUTOMATIC ANNOTATION
• COMPUTER-ASSISTED ANNOTATION
• MANUAL ANNOTATION
HTTP://UCREL.LANCS.AC.UK/ANNOTATION.HTML#POS
TYPES OF ANNOTATION
• MORPHOLOGICAL LEVEL
• PREFIXES
• SUFFIXES
• STEMS
(MORPHOLOGICAL ANNOTATION)
TYPES OF ANNOTATION
• LEXICAL LEVEL
• PART OF SPEECH (POS TAGGING)
• LEMMAS (LEMMATIZATION)
• SEMANTIC FIELDS (SEMANTIC
ANNOTATION)
• SYNTACTIC LEVEL
• PARSING
• TREEBANKING
• BRACKETING
TYPES OF ANNOTATION
• DISCOURSE LEVEL
• ANAPHORIC RELATIONS (COREFERENCE ANNOTATION)
• SPEECH ACTS (PRAGMATIC ANNOTATION)
• STYLISTIC FEATURES SUCH AS SPEECH AND THOUGHT
IN PRESENTATION (STYLISTIC ANNOTATION).
POS TAGGING
MOST COMMON TYPE OF ANNOTATION.
PROBLEMS:
• WORD SEGMENTATION (TOKENIZATION)
• MULTIWORDS (SO THAT, INSPITE OF)
• MERGERS (CAN’T, GONNA)
• VARIABLY SPELLED COMPOUNDS (NOTICEBOARD, NOTICE-
BOARD, NOTICE BOARD)
LEMMATIZATION
EXAMPLE:
(S (NP MARY)
(VP VISITED)
(NP A
(ADJP VERY NICE)
BOY)))
SEMANTIC ANNOTATION
• PRONOUNS
• REPETITION
• SUBSTITUTION
• ELLIPSIS
COMPUTER-ASSISTED AT BEST.
A SIMPLE EXAMPLE OF ANAPHORIC ANNOTATION IS:
(6 THE MARRIED COUPLE 6) SAID THAT <REF=6 THEY WERE
HAPPY WITH <REF=6 THEIR LOT.
HERE THE NUMBER 6 IS AN INDEX NUMBER WHILE THE LESS THAN CHARACTER <
INDICATES THAT A BACKWARD REFERENTIAL (ANAPHORIC) LINK IS PRESENT, I.E.
THEY AND THEIR POINT BACKWARD TO THE MARRIED COUPLE
(CITED FROM GARSIDE, FLIGELSTONE AND BOTLEY 1997: 68).
PRAGMATIC ANNOTATION
• ERROR TAGGING
• PROBLEM-ORIENTED ANNOTATION
• COUNTER TO ACCOUNTABILITY.
• USE THE ENTIRE CORPUS – AND ALL RELEVANT EVIDENCE
EMERGING FROM ANALYSIS OF THE CORPUS – TO TEST THE
HYPOTHESIS
• THERE SHOULD BE NO MOTIVATED SELECTION OF
EXAMPLES TO FAVOUR THOSE EXAMPLES THAT FIT THE
HYPOTHESIS,
• AND NO SCREENING OUT OF INCONVENIENT EXAMPLES.
REPLICABILITY