Professional Documents
Culture Documents
1. The study of the rules governing the sounds that form words
1. Reduplication
2. Affixation
3. Compounding
1. b*a*
2. (a*b*)*
3. a*b*
1. baad
2. baaad6
3. baaadd
4. baaad
Q6 Which sentence best describes inflectional morphology? 0.5
1. Plural Noun
2. Cliticization
3. Singular Noun
4. Inflectional
Q8 In finite state transducer, the simple pair of symbols in the 0.5
alphabet Σ is called
1. Default pairs
2. Fragments
3. Finite alphabets
4. Feasible pairs
Q9 Morphotactics is a model for 0.5
1. 4
2. 5
3. 3
4. 6
TYPE: DESCRIPTIVE
Q. Question and Solution Mark
No s
.
Q1 Design Minimum Edit Distance algorithm. Apply the same to compute the Total:
1 minimum edit distance between the words PEACEFUL and PAECFL 4
Mark
Solution: s
Algorithm:
1
Mark
Or-
2
Marks
L 8 7 6 5 6 5 4 5
U 7 6 5 4 5 4 5 4
F 6 5 4 3 4 3 4 5
E 5 4 3 2 3 4 5 6
C 4 3 2 3 2 3 4 5
A 3 2 1 2 3 4 5 6
E 2 1 2 1 2 3 4 5
P 1 0 1 2 3 4 5 6
# 0 1 2 3 4 5 6 7
# P A E C F L U
Total cost=5
The operations performed are: 3-delete and 2-insert
Q1 State any two differences between DFA and NFA. Also, design the following Total
2 a) Deterministic Finite Automata with transition table for the language Mark
accepting strings ending with s: 4
’01’ over input alphabets ∑ = {0, 1}
b)Non-Deterministic Finite Automata with transition table for accepting
strings such that there are two
0’s separated by a number of positions that is multiple of 4 input alphabets ∑
= {0, 1}.
Solution:
Any two difference between DFA and NFA: (for example)
DFA NFA
In DFA, the next possible state is In NFA, each pair of state and input 0.5
distinctly set symbol can have many possible Mark
next states.
All DFA are NFA. Not all NFA are DFA.
DFA cannot use Empty String NFA can use Empty String
transition. transition.
DFA allows only one move for There can be choice (more than one
single input alphabet. move) for single input alphabet.
Transition Table:
0.5
Present state Next state on input 0 Next state on input 1 Mark
→q0 q1 q0
q1 q1 q2
*q2 q1 q0
b) Non-Deterministic Finite Automata with transition table for accepting
strings such that there are two 0’s separated by a number of positions that is 1.5
multiple of 4 input alphabets ∑ = {0, 1}. Mark
0.5
Transition Table: mark
Solution:
Steps involved in Byte-pair encoding Algorithm:
Step 1: The BPE token learner begins with a vocabulary that is just the set of all
individual characters.
Step 2 : It then examines the training corpus, chooses the two symbols that are most 0.5
frequently adjacent (say ‘A’, ‘B’), adds a new merged symbol ‘AB’ to the vocabulary, Marks
and replaces every adjacent ’A’ ’B’ in the corpus with the new ‘AB’
Step 3: It continues to count and merge, creating new longer and longer character
strings, until k merges have been done creating k novel tokens; k is thus a parameter
of the algorithm.
The resulting vocabulary consists of the original set of characters plus k new
symbols.
2
Constructing the vocabulary using training corpus: Marks
Corpus Vacabulary
7 old_ _olderfinstw
3 older_
9 finest_
4 lowest_
5 new_
4 newer_
Corpus Vacabulary
7 old_ _ o l d e r f i n s t w st
3 o l d e r_
9 f i n e st _
4 l o w e st _
5 new_
4 n e w e r_
Corpus Vacabulary
7 o l d_ _olderfinstw
3 o l d e r_ st,st_
9 f i n e st_
4 l o w e st_
5 n e w_
4 n e w e r_
Corpus Vacabulary
7 o l d_ _olderfinst
3 o l d e r_ w st,st_, est_
9 f i n est_
4 l o w est_
5 n e w_
4 n e w e r_
Corpus Vacabulary
7 ol d _ _ o l d e r f i n s t w st,st_,
3 ol d e r_ est_,ol
9 f i n est_
4 l o w est_
5 n e w_
4 n e w e r_
Corpus Vacabulary
7 old _ _ o l d e r f i n s t w st,st_, est_,ol,old
3 old e r_
9 f i n est_
4 l o w est_
5 n e w_
4 n e w e r_
Corpus Vacabulary
7 old _ _ o l d e r f i n s t w st,st_,
3 old e r_ est_,ol,old,ne
9 f i n est_
4 l o w est_
5 ne w_
4 ne w e r_
Corpus Vacabulary
7 old _ _ o l d e r f i n s t w st,st_,
3 old e r_ est_,ol,old,ne,new
9 f i n est_
4 l o w est_
5 new_
4 new e r_
Corpus Vacabulary
7 old _ _ o l d e r f i n s t w st,st_,
3 old er _ est_,ol,old,ne,new,er
9 f i n est_
4 l o w est_
5 new_
4 new er _
Corpus Vacabulary
7 old _ _ o l d e r f i n s t w st,st_,
3 old er_ est_,ol,old,ne,new,er,er_
9 f i n est_
4 l o w est_
5 new_
4 new er_
Corpus Vacabulary
7 old _ _ o l d e r f i n s t w st,st_,
3 old er_ est_,ol,old,ne,new,er,er_,nest
9 f i nest_
4 l o w est_
5 new_
4 new er_
Corpus Vacabulary
7 old _ _ o l d e r f i n s t w st,st_,
3 old er_ est_,ol,old,ne,new,er,er_,inest
9 f inest_
4 l o w est_
5 new_
4 new er_
Corpus Vacabulary
7 old _ _ o l d e r f i n s t w st,st_,
3 old er_ est_,ol,old,ne,new,er,er_,nest_,inest_,finest_
9 finest_
4 l o w est_
5 new_
4 new er_
Corpus Vacabulary
7 old_ _ o l d e r f i n s t w st,st_,
3 old er_ est_,ol,old,ne,new,er,er_,nest_,inest_,finest_,old_
9 finest_
4 l o w est_
5 new_
4 new er_
Corpus Vacabulary
7 old_ _ o l d e r f i n s t w st,st_,
3 old er_ est_,ol,old,ne,new,er,er_,nest_,inest_,finest_,old_
9 finest_
4 l o w est_
5 new_
4 new er_
Corpus Vacabulary
7 old_ _ o l d e r f i n s t w st,st_,
3 old er_ est_,ol,old,ne,new,er,er_,nest_,inest_,finest_,old_,ow
9 finest_
4 l ow est_
5 new_
4 new er_
Corpus Vacabulary
7 old_ _ o l d e r f i n s t w st,st_,
3 old er_ est_,ol,old,ne,new,er,er_,nest_,inest_,finest_,old_,ow,low
9 finest_
4 low est_
5 new_
4 new er_
Corpus Vacabulary
7 old_ _ o l d e r f i n s t w st,st_,
3 old er_ est_,ol,old,ne,new,er,er_,nest_,inest_,finest_,old_,ow,low,lowest
9 finest_
4 lowest_
5 new_
4 new er_
Corpus Vacabulary
7 _ o l d e r f i n s t w st,st_,
old_ est_,ol,old,ne,new,er,er_,nest_,inest_,finest_,old_,ow,low,lowest,newer
3 old
er_
9
finest_
4 0.5
lowest_ Marks
5
new_
4
newer_
Corpu Vacabulary
s
7 _ o l d e r f i n s t w st,st_,
old_ est_,ol,old,ne,new,er,er_,nest_,inest_,finest_,old_,ow,low,lowest,n
3 ewer,older
older_
9
finest
_
4
lowest
_
5
new_
4
newer
_
Testing:
Q1 List and demonstrate with examples the use of regular expression operators Total
4 for counting. Mark
s
Solution:
2
Marks
Q1 Design a finite state transducer with an E-insertion orthographic rule for Total
5 parsing the plural form of the string “crash” Marks
3
Solution:
Lexical level C R A S H +N +PL
Intermediate C R A S H ^ S #
level
Surface level C R A S H E S
0.5
Q1 Analyze the Noisy channel model for spell check using Bayesian inference Total
6 Marks
Solution: 3
1
• In the noisy channel model, it is imagined that the surface form is actually a
“distorted” form of an original word passed through a noisy channel.
• Language is generated and passed through a noisy channel.
• This channel introduces “noise” in the form of substitutions or other
changes to the letters, making it hard to recognize the “true” word
• Goal: To build a model of the channel. Given this model, we then find the
true word by passing every word of the language through the model of the 1
noisy channel and seeing which one comes the closest to the misspelled
word
• The decoder passes each hypothesis through a model of this channel and
picks the word that best matches the surface noisy word.
Prior probability and Conditional probability can be used to find the most
likely original version given the actually observed signal..
Q1 Describe the Porter-stemmer algorithm. Apply the algorithm to stem the words Total
7 “Characterization” to “Characterize” and “Multidimensional” to “ Marks
Multidimension” 3
Solution:
• Porter stemmer widely used stemming algorithms. Consonants and vowels
play important role in this algorithm
• A list ccc... of length greater than 0 will be denoted by C, and a list vvv... of
length greater than 0 will be denoted by V.
• Any word, or part of a word, therefore has one of the four forms:
• CVCV … C → collection, management
• CVCV … V → conclude, revise 1
• VCVC … C → entertainment, illumination
• VCVC … V → illustrate, abundance
• These may all be represented by the single form
• The stem of the word has m > 1 (since m = 5) and ends with “AL”.
• “AL” is deleted (replaced with null).
• The word will not change the stem further.
• The stem of the word has m > 0 (since m = 3) and ends with “IZATION”.
• “IZATION” will be replaced with “IZE”. Coz: ((m>0) IZATION -> IZE )
• Then the new stem will be CHARACTERIZE.
Q1 Discuss the spelling error that may occur during typing and in OCR. Justify with
8 examples wherever applicable .
Solution:
Spelling errors that occur while typing are characterized as :
• Substitutions: mistyping the as thw 2
• Insertions : mistyping the as ther
• deletions : mistyping the as th
• transpositions: mistyping the as hte