You are on page 1of 2

Problem 2

For all letters i in a string, 1≤i≤n, we define OPT(i) to be the optimal segmentation for
the prefix x1x2....xi, and we let Q(i) denote the value of the total quality of OPT(i).
Before we can create our algorithm, we need to define our Q(i).
In other words, we are trying to show that
Q(i) = MAX0≤j<n{quality( y_ j+1.... yi) + Q(j) }, which is the maximum total quality
obtained by using the segmentation OPT(j) with the final block y_ j+1 y_ j+2 ].... yi

We prove this by contradiction:


Observe OPT(i), and assume the final block for this segmentation is y_k+1.....yi.
Let S be the segmentation of the prefix y1y2...yk used by OPT(i). If S is not equal to
OPT(k), then by using the segmentation OPT(k) with the block y_k+1...yi we would have
a segmentation of y1....yi, which has a larger total quality than OPT(i). But this
contradicts with the definition of OPT(i).

From this, we know that OPT(i) must consist of OPT(k) and the block y_k+1....yi.
Thus Q(i) = Q(k) + quality(y_k+1...yi)

As a generalization, we see that for any j, the value quality(y_ j+1....yi)+Q(j) is obtained
by using the segmentation OPT(j) together with the last block of letters (y_ j+1... yi). We
know that this value equals Q(i) for some k, and so for all other values of j, this value
can only be at most Q(i). Thus, we’ve shown that
Q(i) = MAX0≤j<n{quality( y_ j+1 .... yi) + Q(j) },
The algorithm that produces the the quality of the optimal segmentation, as well as the
optimal segmentation, is:

Let P be an array of length n+1


Let L be an array of length n+1
Initialize all the elements of L and P to 0
For i from 1 to n
For j from 0 to i -1
If quality( y_ j+1.... yi) + P[j] ≥ P[i]
Set P[i] to the value quality( y_ j+1.... yi) + P[j]
Set L[j] to the value 1
End
end
Return P[n]
Return L

P[n] is the quality value, and L contains all the indices where the string is to be
segmented, which are indicated by 1.
This algorithm runs in n^2 time, since we’re only using a double for loop, each of
which runs at most n iterations. Inside the two for loops, the time for each assignment
statement is constant.

We prove correctness of this algorithm using induction. We claim that P[i] = Q(i)
for all i. We know this will suffice, since we are returning P[n], which is equal to Q(n),
which holds by definition of the optimal solution.

For the base case, we set P[0] to 0, which is equal to Q(0).


The inductive hypothesis claim is that all cases up to a certain substring (1 to k)
will return the optimal quality of the string. For our hypothesis, we assume that P[k] is
equal to Q(k) for all letters.
By the inductive step, we have the inner loop of the algorithm calculate the
maximum of a final block with its prefix. Thus, we get:
P[i] = MAX0≤j ≤i -1{quality( y_ j+1 .... yi) + P[j] },
= MAX0≤j≤i -1{quality( y_ j+1 .... yi) + Q(j) },
Which is equal to Q(i). so P[i] = Q(i)

You might also like