4 views

Uploaded by Sheeraz Ahmad

kings college london qyestion paper

- Common Lisp quick reference
- Navarro and Maakinen - Compressed Full-Text Indexes
- Cl 4 Final All in One Manual
- Cp2201 Data Structures
- Credit Risk Handling in Telecommunication
- 123937936 Sap Security Tables
- data structure practice questions
- 4401sm
- SNMP
- 171midtermSpring05[1]
- Sj-20141110151550-013-Zxsdr Uniran Tdd-lte (v3.20.50) General Operation Guide
- C++ and DS
- CSCI203 Spring 2010 Workshops Lab 7[1] With Answers!
- Red Black Trees
- 159119_633718946604551478Dynamic Programming
- ADSs
- CS-TR-78-709
- DB101 Fall2008 Midterm
- Chapter 20 - Backup and Restore V2-1.pdf
- Unit IV

You are on page 1of 8

This paper is part of an examination of the College counting towards the award of a

degree. Examinations are governed by the College Regulations under the authority

of the Academic Board.

and on the answer book(s) provided. Do this now.

BSc EXAMINATION

6CCS3TSP TEXT SEARCHING AND PROCESSING

MAY 2010

TIME ALLOWED: TWO HOURS.

ANSWER TWO OF THE THREE QUESTIONS.

NO CREDIT WILL BE GIVEN FOR ATTEMPTING ANY FURTHER QUESTIONS.

ALL QUESTIONS CARRY EQUAL MARKS.

THE USE OF ELECTRONIC CALCULATORS IS NOT PERMITTED.

BOOKS, NOTES OR OTHER WRITTEN MATERIAL MAY NOT BE BROUGHT

INTO THIS EXAMINATION.

TURN OVER WHEN INSTRUCTED

SOLUTIONS

2010

6CCS3TSP

1. Matching Automata

We consider the alphabet = {a, b, c}. For a string x , the string

matching automaton of x, SMA(x), is the minimal deterministic automaton

accepting the language x. Its initial state is denoted by initial , its terminal

state by terminal , and its transition function by .

a. Draw the string matching automaton of the string abaabaab.

[10 marks]

Answer

a

a

0

b, c

b

b, c

2

c

a

b

a

3

b, c

b, c

[unseen]

b. Describe how to build efficiently the automaton SMA(xa) from the automaton SMA(x) when x and a .

[15 marks]

Answer

Let r = (terminal , a). The automaton is transformed by adding a new state s

and keeping the same transitions except that (terminal , a) is set to s. Then,

the transitions from s reproduce those from r, that is: (s, b) = (r, b) for every

b . Finally, s becomes the only terminal state. [in lectures]

c. List all the forward arcs of SMA(abaabaab). List all its backward arcs.

What is the maximal number of backward arcs in the string matching

automaton of a string of length n?

[10 marks]

Answer

Forward arcs: (0, a, 1), (1, b, 2), (2, a, 3), (3, a, 4), (4, b, 5), (5, a, 6), (6, a, 7), (7, b, 8).

Backward arcs: (1, a, 1), (3, b, 2), (4, a, 1), (6, b, 2), (7, a, 1), (8, a, 6). [unseen]

The maximal number of backward arcs is n, reached for example for the string

abn1 . [in lectures]

SOLUTIONS

2010

6CCS3TSP

d. Draw the trie of the set {aa, abaabba, abb, bba}. Mark its terminal

states.

Draw the implementation with failure links of the dictionary-matching

automaton DMA({aa, abaabba, abb, b, bba}). Mark its terminal states.

Define the notion of a failure link f (p) on a state (node) p of the trie of

a finite set X of strings.

[15 marks]

Answer

a

b

b

b

[unseen]

a

a

a

b

b

b

[unseen]

Let p be a state of the trie of X distinct from the root. Let u + , be the label

of the path from the root to state p. Then the failure state f (p) of state p is the

state of the trie whose path from the root is labelled by the longest possible

proper suffix of u. [in lectures]

SOLUTIONS

2010

6CCS3TSP

2. Doubling

Let y be a fixed text of length n.

For a word u and a positive integer k, First k (u) is u if |u| k and is u[0 . . k1]

otherwise. The integer Rk [i] is the rank of First k (y[i . . n1]) inside the sorted

list of all First k (u) where u is a nonempty suffix of y (ranks are numbered

from 0).

a. Give R1 , R2 , R3 , R4 , R8 for the word aababbabba, assuming a < b.

[10 marks]

Answer

i

0

y[i]

a

R1 [i] 0

R2 [i] 1

R3 [i] 1

R4 [i] 1

R8 [i] 1

[unseen]

1

a

0

2

2

2

2

2

b

1

3

5

5

7

3

a

0

2

3

3

4

4

b

1

4

6

7

9

5

b

1

3

5

5

6

6

a

0

2

3

3

3

7

b

1

4

6

6

8

8

b

1

3

4

4

5

9

a

0

0

0

0

0

[15 marks]

Answer

Lemma 1 Rank 2k [i] is the rank of the pair (Rank k [i], Rank k [i + k]) in the sorted

list of these pairs.

[bookwork]

Proof. Let i be a position on y and let u = First 2k (y[i . . n1]). Let j be a position

on y and let v = First 2k (y[j . . n 1]). We show that u v, which is equivalent

to Rank 2k [i] Rank 2k [j], iff (Rank k [i], Rank k [i + k]) (Rank k [j], Rank k [j + k]).

First case: First k (u) < First k (v). This is equivalent to Rank k [i] < Rank k [j] so

the result holds in this case.

Second case: First k (u) = First k (v). This is equivalent to Rank k [i] = Rank k [j].

Then the comparison between u and v depends only on the second halves of

these words; in other terms, Rank 2k [i] Rank 2k [j] is equivalent to Rank k [i +

k] Rank k [j + k]. [unseen]

SOLUTIONS

2010

6CCS3TSP

running time?

[15 marks]

Answer

Two steps: first sort positions i according to the pairs (Rk [i], Rk [i + k]); then

assign the same R2k rank to positions associated with the same pair.

First step can be implemented by bucket sort (count sort) in linear time; second step is obvious and runs also in linear time. [bookwork]

d. Define the two arrays SUF and LCP composing the Suffix Array of the

string y. Using the result of Question 2.c, give the running time of the

induced algorithm to compute the array SUF. Justify your answer.

[10 marks]

Answer

The array SUF contains the permutation of suffix positions in increasing order

of the suffixes:

y[SUF[0] . . n 1] < y[SUF[1] . . n 1] < . . . < y[SUF[n 1] . . n 1]

and the LCP array is defined by:

LCP[i] = |lcp(y[SUF[i 1] . . n 1], y[SUF[i] . . n 1])|

where lcp(u, v) is the longest common prefix of u and v.

The runtime of the induced algorithm is O(n log n) because there are log n

steps and each step can be implemented to run in O(n) from answer to Question 2.c. [bookwork]

SOLUTIONS

2010

6CCS3TSP

Give an example of a word of length n on the alphabet {a, b} having a

suffix trie of size (n2 ).

[10 marks]

Answer

a

b

a

b

a

b

[unseen]

The trie of the word an/4 bn/4 an/4 bn/4 , for two distinct letters a and b, has at

least n/4 branches each of them having n/4 nodes. Which gives (n/4)2 = (n2 )

nodes. [in lectures]

b. Define the notion of Suffix Tree of a string y. Define the notion of Suffix

Link for the nodes of the tree.

[10 marks]

Answer

The Suffix Tree of a string y is the compacted version of its Suffix Trie. It has

the following characteristics, which make the tree unique for a given string:

it is an automaton whose initial state is the root and arcs are labelled by

nonempty factors of y,

each terminal node is associated with a suffix of y, label of the path from

the root to it,

no other string labels such a path,

internal nodes either have two children/successors or have only one child/successor

and are terminal,

when two arcs starts from the same node their labels starts by two different letters.

[unseen]

If a node p is associated with a nonempty factor au of y (a letter, u string),

its suffix target s(p) is associated with the factor u. The Suffix Link is the

function s defined on internal nodes of the tree, except the root.

SOLUTIONS

2010

6CCS3TSP

c. Draw the Suffix Tree of the string y = aabbabb with the Suffix Link.

[10 marks]

Answer

abbabb

a

bb

abb

aab

b

b

[unseen]

aab

word y.

[10 marks]

Answer

Each node or state p of the tree can be implemented as a structure containing

two pointers: the first pointer to implement the suffix link; the second pointer

to give access to the list of arcs outgoing state p. The list of arcs can contain

4-tuples in the form (a, i, , q) where a is a letter, i and are integers, and q is a

pointer to a state. They are such that (p, u, q) is an arc of the automaton with

a = y[i] and u = y[i . . i + 1]. [unseen]

FINAL PAGE

SOLUTIONS

2010

6CCS3TSP

suffix tree.

[10 marks]

Answer

The following procedure compacts a trie T , even if suffix links are defined on

states.

Compact(trie T )

r root of T

for each arc (r, a, p) do

Compact(subtrie of T rooted at p)

if(p has exactly one child)

q that child

u label of (p, q)

replace p by q as child of r

set a u as label of (r, q)

[unseen]

- Common Lisp quick referenceUploaded byyusuf76
- Navarro and Maakinen - Compressed Full-Text IndexesUploaded bysymejohn
- Cl 4 Final All in One ManualUploaded bySalman Pathan
- Cp2201 Data StructuresUploaded byPiyush Kumar
- Credit Risk Handling in TelecommunicationUploaded byAnonymous RrGVQj
- 123937936 Sap Security TablesUploaded bybarber bob
- data structure practice questionsUploaded byPapori Borgohain
- 4401smUploaded byRaktim
- SNMPUploaded bySrikanth Reddy Muvva
- 171midtermSpring05[1]Uploaded byGobara Dhan
- Sj-20141110151550-013-Zxsdr Uniran Tdd-lte (v3.20.50) General Operation GuideUploaded bytest
- C++ and DSUploaded byDrArpita Mathur
- CSCI203 Spring 2010 Workshops Lab 7[1] With Answers!Uploaded bytechnofreak9
- Red Black TreesUploaded bymahesh
- 159119_633718946604551478Dynamic ProgrammingUploaded byMaz Har Ul
- ADSsUploaded bysunilshivaji
- CS-TR-78-709Uploaded bySatish K Patnala
- DB101 Fall2008 MidtermUploaded byVenkata Krishna
- Chapter 20 - Backup and Restore V2-1.pdfUploaded byRodrigoBurgos
- Unit IVUploaded bysabapathi
- lec15Uploaded byShengFeng
- HandoutUploaded byAnahit Serobyan
- Softimage User Guide_ the ExplorerUploaded byJeremy George
- DSA ManualUploaded bydjvishalsolanki
- rfc-4741.txtUploaded byelias
- D..S 2....Uploaded bysuresh gundiya
- Lecture2.pptxUploaded byzekarias amsalu
- 99-138Uploaded byGustavo Gabriel Jimenez
- Go Up to Table of ContentsUploaded byKarthika Reddy
- Cache Oblivious DatabasesUploaded byjoppe82

- 15856 NormalizationUploaded bySheeraz Ahmad
- 14085 Ch 4 Regular GrammarUploaded bySheeraz Ahmad
- 14085 Ch 8 Dedidability,Computability,ComplexityUploaded bySheeraz Ahmad
- lpu question papers cse322 automataUploaded bySheeraz Ahmad
- 17827_ECO111 Home Work 2Uploaded bySheeraz Ahmad
- EceUploaded bySunnyArora
- Statistics for Management.pdfUploaded byHarshith
- Nanomaterial in ArchitectureUploaded bySheeraz Ahmad
- Albert Cocco: the law of elasticityUploaded byRajat Yadav Yaduvanshi
- Bcr-Abl Revise UiUploaded bySheeraz Ahmad
- VPCCB17FXB_mkspUploaded bySheeraz Ahmad
- convex fuzzy setUploaded bySheeraz Ahmad
- Zx250Uploaded bySheeraz Ahmad
- Royal Enfield 2010 Thunderbird TwinsparkUploaded bySheeraz Ahmad
- 5aa32c79_TBTS_Tech_SpecUploaded byRajiv Nair
- 16473 CSE211 QuestionUploaded bySheeraz Ahmad
- 16022_test Gen SkillsUploaded bySheeraz Ahmad
- 14770_CountProUploaded bySheeraz Ahmad
- 14368_assigment2Uploaded bySheeraz Ahmad
- 18658_Classical Hollywood CinemaUploaded bySheeraz Ahmad
- InitiateSingleEntryPaymentSummary21-05-2015Uploaded bySheeraz Ahmad
- IP DATABASE MANAGEMENT SYSTEMUploaded byMohammadAnees
- Interfacing Traffic Light ControllerUploaded bySheeraz Ahmad

- DS GFGUploaded bychantaiah
- a servey of web clustering engineUploaded byhatn1684
- pres Suffix Trees.pptUploaded bySruthy Sasi
- Deepthi Webclustering ReportUploaded byRijy Lorance
- The Increasing Dot Plot and Arc DiagramsUploaded byantonio_forgione
- BIOINFO.pdfUploaded byPammi Kavitha
- 7116173 Structures of String Matching and Data CompressionUploaded byDimas Iwandanu
- Advanced Data StructuresUploaded byAsafAhmad
- Linear Work Suffix Array Construction - Karkkainen, Sanders, BurkhardtUploaded byBug Menot
- UkkonenUploaded byValentin
- Data Structures - GeeksforGeeksUploaded byNagaraj
- naskitis-vldbj09Uploaded byVasya Quest
- Algo & Data Structures & Data MiningUploaded byDHAR DAIRY FARM
- A New Keyphrases Extraction Method Based on Suffix Tree Data Structure for Arabic Documents ClusteringUploaded byMaurice Lee
- Suffix Array TutorialUploaded byarya5691
- Inverted Indexes for Phrases and StringsUploaded bydefefalien
- trieUploaded byWemzgazerock
- Generalized Suffix Tree 1 - GeeksforGeeksUploaded byNagaraj
- String KernelsUploaded bySrirangam Ramaraj
- Advance data structureUploaded byVipul Verma
- KMtalkUploaded bychaltihawa
- Lecture4- Indexing and Searching IUploaded bypriyankaprakasan
- 04052852Uploaded byBalramParmar
- Algorithms for Next-Generation Sequencing Data (3319598244)Uploaded byamiry1373
- genoogleUploaded byFelipe Albrecht
- Text Clustering Using a Suffix Tree SimilarityUploaded bymacngocthan
- diff2Uploaded byMarko Markovic
- Suffix Tree ApplicationsUploaded byGuruprasad Sridharan
- TST-Tries.pdfUploaded byAmar Malik
- Data Structures ProgramsUploaded bySankalpKotewar