Lec7 DivideAndConquerII

D IVIDE AND C ONQUER II
‣ master theore
‣ integer multiplicatio
‣ matrix multiplication
m
‣ master theore
SECTIONS 4.4–4.6
m
Divide-and-conquer recurrences
Goal. Recipe for solving common divide-and-conquer recurrences:
n
T (n) = a T + f (n)
b
with T(0) = 0 and T(1) = Θ(1).
3
n
T (n) = a T + f (n)
b
with T(0) = 0 and T(1) = Θ(1).
Terms.
・a ≥ 1 is the number of subproblems.
・b ≥ 2 is the factor by which the subproblem size decreases.
・f (n) ≥ 0 is the work to divide and combine subproblems.
3
n
T (n) = a T + f (n)
b
with T(0) = 0 and T(1) = Θ(1).
Terms.
・a ≥ 1 is the number of subproblems.
・b ≥ 2 is the factor by which the subproblem size decreases.
・f (n) ≥ 0 is the work to divide and combine subproblems.
T (n)
Recursion tree. [ assuming n is a power of b ]
・a = branching factor.
・a i = number of subproblems at level i.
・1 + logb n levels. T (n / b) T (n / b) ... T (n / b)
・n / b i = size of subproblem at level i.
... ... ...
3
Divide-and-conquer recurrences: recursion tree
Suppose T (n) satis es T (n) = a T (n / b) + n c with T (1) = 1, for n a power of b.
T (n)
T (n / b) T (n / b) ... T (n / b)
T (n / b2) T (n / b2) ... T (n / b2) T (n / b2) T (n / b2) ...T (n / b2) T (n / b2) T (n / b2) ...T (n / b2)
T (1) T (1) T (1) T (1) T (1) T (1) T (1) T (1) T (1) T (1) ... T (1) T (1) T (1)
4
fi
T (n)
nc
T (n / b) T (n / b) ... T (n / b)
T (1) T (1) T (1) T (1) T (1) T (1) T (1) T (1) T (1) T (1) ... T (1) T (1) T (1)
4
fi
T (n)
nc
T (n / b) T (n / b) ... T (n / b) a (n / b) c
T (1) T (1) T (1) T (1) T (1) T (1) T (1) T (1) T (1) T (1) ... T (1) T (1) T (1)
4
fi
T (n)
nc
T (n / b) T (n / b) ... T (n / b) a (n / b) c
T (n / b2) T (n / b2) ... T (n / b2) T (n / b2) T (n / b2) ...T (n / b2) T (n / b2) T (n / b2) ...T (n / b2) a2 (n / b2) c
T (1) T (1) T (1) T (1) T (1) T (1) T (1) T (1) T (1) T (1) ... T (1) T (1) T (1)
4
fi
T (n)
nc
T (n / b) T (n / b) ... T (n / b) a (n / b) c
a i (n / b i ) c
⋮
T (1) T (1) T (1) T (1) T (1) T (1) T (1) T (1) T (1) T (1) ... T (1) T (1) T (1)
4
fi
T (n)
nc
T (n / b) T (n / b) ... T (n / b) a (n / b) c
a i (n / b i ) c
⋮
⋮
T (1) T (1) T (1) T (1) T (1) T (1) T (1) T (1) T (1) T (1) ... T (1) T (1) T (1)
4
fi
T (n)
nc
T (n / b) T (n / b) ... T (n / b) a (n / b) c
a i (n / b i ) c
⋮
⋮
T (1) T (1) T (1) T (1) T (1) T (1) T (1) T (1) T (1) T (1) ... T (1) T (1) T (1) nlogb a
4
fi
T (n)
nc
T (n / b) T (n / b) ... T (n / b) a (n / b) c
1 + logb n
a i (n / b i ) c
⋮
⋮
4
fi
T (n)
nc
T (n / b) T (n / b) ... T (n / b) a (n / b) c
1 + logb n
a i (n / b i ) c
⋮
⋮
r = a / bc
4
fi
T (n)
nc
T (n / b) T (n / b) ... T (n / b) a (n / b) c
1 + logb n
a i (n / b i ) c
⋮
⋮
alogb n = nlogb a
r = a / bc
4
fi
T (n)
nc
T (n / b) T (n / b) ... T (n / b) a (n / b) c
1 + logb n
a i (n / b i ) c
⋮
⋮
alogb n = nlogb a
logb n
r = a / bc T (n) = nc ri
<latexit sha1_base64="dMY5/xq94vZUiORXKa1OXPmRnaE=">AAACU3icbVDLSgMxFM2Mr/quunQTLIK6KDMiqIgguHFZwarQaYdMeluDmWRI7ohlmH/wa9zqVwj+iwvT2oWtHkg4nHNvbu5JMiksBsGn58/Mzs0vVBaXlldW19arG5u3VueGQ5Nrqc19wixIoaCJAiXcZwZYmki4Sx4vh/7dExgrtLrBQQbtlPWV6AnO0Elx9eBmT+3T6IyeDy/V4TSyeRoX4jwoO0UkdT9OqCqp6Yi4WgvqwQj0LwnHpEbGaMQb3mbU1TxPQSGXzNpWGGTYLphBwSWUS1FuIWP8kfWh5ahiKdh2MVqqpLtO6dKeNu4opCP1d0fBUmsHaeIqU4YPdtobiv95rRx7J+1CqCxHUPxnUC+XFDUdJkS7wgBHOXCEcSPcXyl/YIZxdDlOTBm9nQGf2KR4zpXgugtTqsRnNKx0KYbTmf0lzcP6aT28PqpdnIzjrJBtskP2SEiOyQW5Ig3SJJy8kFfyRt69D+/L9/3Zn1LfG/dskQn4q99GQrOQ</latexit>
i=0 4
fi
Divide-and-conquer recurrences: recursion tree analysis
Let r = a / b c. Note that r < 1 iff c > logb a
cost dominate
(nc ) r<1 c > logb a by cost of root
logb n
cost evenl
T (n) = nc ri = (nc log n) r=1 c = logb a distributed in tree
i=0
cost dominate
<latexit sha1_base64="NEShmpjKNsOBbDerTDYmROKsym4=">AAAC/nicbVFNj9MwEHXCxy7lq7scuYyoQN1LlSAkdlWKVuLCcZFadqU6jRx32nrrOJHtoFZRDvwaTogrf4QD/wanm5VotyPZenrz3sx4nORSGBsEfz3/3v0HDw8OH7UeP3n67Hn76PiryQrNccQzmemrhBmUQuHICivxKtfI0kTiZbL8VOcvv6E2IlNDu84xStlciZngzDoqbv8ZdtUJ0D4M6ktNOFBTpHEpBkE1KanM5nECqgI9EbeqFk1wLlTJXVtTtWifDhdoWdeZT+ANUIsrW4pZVes1fIAQKB2HmEZbUqhrg9rjGOx33A7Dqj2ejxC2KKppM1Tc7gS9YBNwF4QN6JAmLuIj75hOM16kqCyXzJhxGOQ2Kpm2gkt0rywM5owv2RzHDiqWoonKzf4reO2YKcwy7Y6ysGH/d5QsNWadJk6ZMrswu7ma3JcbF3Z2GpVC5YVFxW8azQoJNoP6M2EqNHIr1w4wroWbFfiCacat+/KtLpvaOfKtl5SrQgmeTXGHlXZlNau3GO7u7C4Yve2d9cIv7zrnp806D8lL8op0SUjek3PymVyQEeFe32Petbf0v/s//J/+rxup7zWeF2Qr/N//AAuD6zQ=</latexit>
(nlogb a ) r>1 c < logb a by cost of leaves
Geometric series
・If 0 < r < 1 then 1 + r + r2 + r3 + … + rk ≤ 1 / (1 − r)
・If r = 1 then 1 + r + r2 + r3 + … + rk = k + 1
・If r > 1, then 1 + r + r2 + r3 + … + rk = (rk+1 − 1) / (r − 1)
5
,
y
,
.
fi
.
Divide-and-conquer recurrences: master theorem
Master theorem. Let a ≥ 1, b ≥ 2, and c > 0 and suppose that T (n) is a

function on the non-negative integers that satis es the recurrence
n
T (n) = a T + (nc )
b <latexit sha1_base64="RRWvKmBD1IL0biczVisiLGro+Bk=">AAACb3icbVBtSxtBEN6c2lq1NeoXQZClQUiohLtSqEUEQRA/KiRVyMUwt5lLFvf2jt25YjjuH/lr/Cb6K/wF3bx8MNGB3X145pmdmSfKlLTk+08Vb2l55dPn1S9r6xtfv21Wt7b/2jQ3AtsiVam5icCikhrbJEnhTWYQkkjhdXR3Ns5f/0NjZapbNMqwm8BAy1gKIEf1quetum7w8JifjC/g4SFvhQpj4vUwNiAKXRZRyUMjB0PiTnnoFD+mT9gaIkFd34pGr1rzm/4k+HsQzECNzeKyt1XZDvupyBPUJBRY2wn8jLoFGJJCYbkW5hYzEHcwwI6DGhK03WKycMkPHNPncWrc0cQn7NuKAhJrR4kb/CABGtrF3Jj8KNfJKT7qFlJnOaEW00ZxrjilfOwe70uDgtTIARBGulm5GILziZzHc10mf2co5jYp7nMtRdrHBVbRPRkonYvBomfvQftn808zuPpVOz2a2bnK9th3VmcB+81O2QW7ZG0m2AN7ZM/spfLq7Xr7Hp9KvcqsZofNhdf4D8cluuQ=</latexit>
with T(0) = 0 and T(1) = Θ(1), where n / b means either ⎣n / b⎦ or ⎡n / b⎤. Then
Case 1 If c < logb a, then T (n) = Θ(nlogb a )

Case 2 If c = logb a, then T (n) = Θ(nc log n)
Case 3 If c > logb a, then T (n) = Θ(nc )
Pf sketch
・Prove when b is an integer and n is an exact power of b
・Extend domain of recurrences to reals (or rationals)
・Deal with oors and ceilings. at most 2 extra levels in recursion tree
n/b /b /b < n/b3 + (1/b2 + 1/b + 1)

<latexit sha1_base64="32puyXOtqPUNNUvXXxzCTvaPp5o=">AAACpHicbZHfT9swEMedbEDX8aOFx71YqyggpDbpJq0TPCDxwgsS08iK1ITKca+theNE9mWiivqH7nH/CU4oEi2cFN9H37uz46/jTAqDnvfPcT983Njcqn2qf97e2d1rNPf/mDTXHAKeylTfxcyAFAoCFCjhLtPAkljCIH64LOuDv6CNSNUtzjOIEjZVYiI4QyuNGkUoOQhJV5PqxjTUFXbpe/iSzmj7vFztwP03ekqPfQs9CzaX6wkNw2HHhySqt+3m8Lq3N2q0vI5XBX0L/hJaZBk3o6azH45TniegkEtmzND3MowKplFwCYt6mBvIGH9gUxhaVCwBExWVSwt6aJUxnaTafgpppb6eKFhizDyJbWfCcGbWa6X4Xm2Y46QfFUJlOYLizwdNckkxpaXldCw0cJRzC4xrYf+V8hnTjKN9mJVTqr0z4Cs3KR5zJXg6hjVV4iNqtrAu+uuevYWg1/nZ8X99b130l3bWyBfylRwTn/wgF+SK3JCAcPLf2XIaTtM9cq/d327w3Oo6y5kDshLu/RM03scN</latexit>
n/b3 + 2 6
.
.
.
.
fl
.
fi
.

n
T (n) = a T + (nc )
b
<latexit sha1_base64="RRWvKmBD1IL0biczVisiLGro+Bk=">AAACb3icbVBtSxtBEN6c2lq1NeoXQZClQUiohLtSqEUEQRA/KiRVyMUwt5lLFvf2jt25YjjuH/lr/Cb6K/wF3bx8MNGB3X145pmdmSfKlLTk+08Vb2l55dPn1S9r6xtfv21Wt7b/2jQ3AtsiVam5icCikhrbJEnhTWYQkkjhdXR3Ns5f/0NjZapbNMqwm8BAy1gKIEf1quetum7w8JifjC/g4SFvhQpj4vUwNiAKXRZRyUMjB0PiTnnoFD+mT9gaIkFd34pGr1rzm/4k+HsQzECNzeKyt1XZDvupyBPUJBRY2wn8jLoFGJJCYbkW5hYzEHcwwI6DGhK03WKycMkPHNPncWrc0cQn7NuKAhJrR4kb/CABGtrF3Jj8KNfJKT7qFlJnOaEW00ZxrjilfOwe70uDgtTIARBGulm5GILziZzHc10mf2co5jYp7nMtRdrHBVbRPRkonYvBomfvQftn808zuPpVOz2a2bnK9th3VmcB+81O2QW7ZG0m2AN7ZM/spfLq7Xr7Hp9KvcqsZofNhdf4D8cluuQ=</latexit>

Extensions.
・Can replace Θ with O everywhere
・Can replace Θ with Ω everywhere
・Can replace initial conditions with T(n) = Θ(1) for all n ≤ n0 and
require recurrence to hold only for all n > n0.
7
.
.
.

fi
,

n
T (n) = a T + (nc )
b

Ex 1. T (n) = 3 T(⎣n / 2⎦) + 5 n

・a = 3, b = 2, c = 1, logb a < 1.58.
・T(n) = Θ(nlog 3 ) = O(n1.58).
2
8
.
.
.
.
fi
,

n
T (n) = a T + (nc )
b

ok to intermix oor and ceiling
Ex 2. T (n) = T(⎣n / 2⎦) + T(⎡n / 2⎤) + 17 n

・a = 2, b = 2, c = 1, logb a = 1.
・T (n) = Θ(n log n).
.
.
.
fl
.
fi
,

n
T (n) = a T + (nc )
b

Ex 3. T (n) = 48 T(⎣n / 4⎦) + n3

・a = 48, b = 4, c = 3, logb a > 2.79
・T (n) = Θ(n3).
10
.
.
.
.
fi
,
Master theorem need not apply
Gaps in master theorem.
11
・Number of subproblems is not a constant.

T (n) = n T (n/2) + n2
11

T (n) = n T (n/2) + n2
・Number of subproblems is less than 1.

1
T (n) = T (n/2) + n2
2
11

T (n) = n T (n/2) + n2
・Number of subproblems is less than 1.

1
T (n) = T (n/2) + n2
2
・Work to divide and combine subproblems is not Θ(nc).

<latexit sha1_base64="NFzJ+YRkSM1S9XB/ZQOOUWaF4bk=">AAACT3icbZDNSgMxEMez9bt+VT16CRZFUepuEayIIHjxqGBV6JaSTac1mE2WZFZalr6BT+NVn8Kbb+JJTOsebHUg8M9vZjKZf5RIYdH3P7zC1PTM7Nz8QnFxaXlltbS2fmt1ajjUuZba3EfMghQK6ihQwn1igMWRhLvo8WKYv3sCY4VWN9hPoBmzrhIdwRk61Crt3OyqPRqenoWntErDA+ruh9Uh2XdE0VDqLlXFVqnsV/xR0L8iyEWZ5HHVWvPWw7bmaQwKuWTWNgI/wWbGDAouYVAMUwsJ44+sCw0nFYvBNrPRQgO67UibdrRxRyEd0d8dGYut7ceRq4wZPtjJ3BD+l2uk2Kk1M6GSFEHxn0GdVFLUdOgObQsDHGXfCcaNcH+l/IEZxtF5ODZl9HYCfGyTrJcqwXUbJqjEHho2cC4Gk579FfVq5aQSXB+Vz2u5nfNkk2yRXRKQY3JOLskVqRNOnskLeSVv3rv36X0V8tKCl4sNMhaFhW/957Ap</latexit>
T (n) = 2 T (n/2) + n log n
11
Divide-and-conquer II: quiz 1
Consider the following recurrence. Which case of the master theorem?
(1) n=1
T (n) =
<latexit sha1_base64="cfD19qNLAoyZEyBBTBBK3PkUcFM=">AAACvnicbVFdb9MwFHUyPkb5WDceeTF0oI5JJRlIG5pAQ7zwOKSWTaqjynFuWmuOHdk3U6uoP4Mfx2/hBacLEm25kq2jc+891/c4LZV0GEW/gnDn3v0HD3cfdR4/efpsr7t/8MOZygoYCaOMvU65AyU1jFCiguvSAi9SBVfpzdcmf3UL1kmjh7goISn4VMtcCo6emnR/Dvv6iLJz+qm5OiyFqdS18Ipu2WHnbDgD5P34iL6hDGGONZU5PdS+PD6kS8rYOBqcQpH42vd02GdKgFRUvzuhzDaw0T72yq2Q3hL6/Feow0Bn7eRJtxcNolXQbRC3oEfauJzsBwcsM6IqQKNQ3LlxHJWY1NyiFAr8KpWDkosbPoWxh5oX4JJ65d+SvvZMRnNj/dFIV+y/HTUvnFsUqa8sOM7cZq4h/5cbV5ifJbXUZYWgxd2gvFIUDW0+g2bSgkC18IALK/1bqZhxywX6L1ubstIuQaxtUs8rLYXJYINVOEfLGxfjTc+2wehk8HEQf//Quzhr7dwlL8gr0icxOSUX5Bu5JCMiyO/gZfA2OA6/hNOwCM1daRi0Pc/JWoTzP3z/0Zs=</latexit>
3T ( n/2 ) + (n) n>1
A. Case 1: T(n) = Θ(nlog2 3) = O(n1.585).
B. Case 2: T(n) = Θ(n log n).
C. Case 3: T(n) = Θ(n).
D. Master theorem not applicable.
12
(1) n=1
T (n) =
3T ( n/2 ) + (n) n>1
A. Case 1: T(n) = Θ(nlog2 3) = O(n1.585).
C. Case 3: T(n) = Θ(n).
12
0 n 1
T (n) =
11
<latexit sha1_base64="7VmkhoT1OeXEp4XqE2EINKSyhHk=">AAAC03icbVHLbtNAFB2bVwmPpmXJ5ooIlAoRPEBEpYhSiQ3LIiVtpYwVjSfX6ajjsTUzRomsbBBbPoRP4m8Yu5ZoEu7q6Jz7PDcplLQuiv4E4Z279+4/2HvYefT4ydP97sHhuc1LI3AicpWby4RbVFLjxEmn8LIwyLNE4UVy/aXWL76jsTLXY7cqMM74QstUCu48Nev+Hvf1EbDRJzbqsAQXUlfCt7PrDhtF8AqYw6WDCmQKa9DAFAIFxqYUs9injPtMpSrPDei3Q2CmwUfwGnxbeAPv4Z9Mo9s6Sw0XFaXralj33Rl00ozpMNTzdqFZtxcNoiZgF9AW9EgbZ7OD4JDNc1FmqJ1Q3NopjQoXV9w4KRT6C0uLBRfXfIFTDzXP0MZV4+kaXnpmDqlfPc21g4a9XVHxzNpVlvjMjLsru63V5P+0aenS47iSuigdanEzKC0VuBzqB8FcGhROrTzgwki/K4gr7t1y/o0bU5reBYqNS6plqaXI57jFKrd0htcu0m3PdsH5uwGNBvTbh97pcevnHnlOXpA+oeQjOSVfyRmZEBHsB8PgJPgcTsIq/BH+vEkNg7bmGdmI8Ndf5PTZpg==</latexit>
T ( n/5 ) + T (n 3 n/10 ) + 5 n n>1
A. Case 1: T(n) = Θ(n).
C. Case 3: T(n) = Θ(n).
13
0 n 1
T (n) =
11
<latexit sha1_base64="7VmkhoT1OeXEp4XqE2EINKSyhHk=">AAAC03icbVHLbtNAFB2bVwmPpmXJ5ooIlAoRPEBEpYhSiQ3LIiVtpYwVjSfX6ajjsTUzRomsbBBbPoRP4m8Yu5ZoEu7q6Jz7PDcplLQuiv4E4Z279+4/2HvYefT4ydP97sHhuc1LI3AicpWby4RbVFLjxEmn8LIwyLNE4UVy/aXWL76jsTLXY7cqMM74QstUCu48Nev+Hvf1EbDRJzbqsAQXUlfCt7PrDhtF8AqYw6WDCmQKa9DAFAIFxqYUs9injPtMpSrPDei3Q2CmwUfwGnxbeAPv4Z9Mo9s6Sw0XFaXralj33Rl00ozpMNTzdqFZtxcNoiZgF9AW9EgbZ7OD4JDNc1FmqJ1Q3NopjQoXV9w4KRT6C0uLBRfXfIFTDzXP0MZV4+kaXnpmDqlfPc21g4a9XVHxzNpVlvjMjLsru63V5P+0aenS47iSuigdanEzKC0VuBzqB8FcGhROrTzgwki/K4gr7t1y/o0bU5reBYqNS6plqaXI57jFKrd0htcu0m3PdsH5uwGNBvTbh97pcevnHnlOXpA+oeQjOSVfyRmZEBHsB8PgJPgcTsIq/BH+vEkNg7bmGdmI8Ndf5PTZpg==</latexit>
T ( n/5 ) + T (n 3 n/10 ) + 5 n n>1
A. Case 1: T(n) = Θ(n).
C. Case 3: T(n) = Θ(n).
13
Master theorem
Master theorem. Suppose that T (n) is a function on the nonnegative integers that
satis es the recurrence
n
T (n) = a T + f (n)
b
where n / b means either ⎣ n / b⎦ or ⎡ n / b⎤. Let k = logb a. Then
Case 1. If f (n) = O(nk – ε) for some constant ε > 0, then T (n) = Θ(nk)
Case 2. If f (n) = Θ(nk log p n) for some constant p ≥ 0, then T (n) = Θ(nk log p+1 n)
Case 3. If f (n) = Ω(nk + ε) for some constant ε > 0 and if a f (n / b) ≤ c f (n) for some
constant c < 1 and all suf ciently large n, then T (n) = Θ ( f (n) ).
regularity condition hold
if f(n) = Θ(nk + ε)
14
fi
s
fi
,
‣ master theore
SECTION 5.5
m
Integer addition and subtraction
16
Addition. Given two n-bit integers a and b, compute a + b.
16

Subtraction. Given two n-bit integers a and b, compute a – b.
16

Grade-school algorithm. Θ(n) bit operations. “bit complexity”
1 1 1 1 1 1 0 1
1 1 0 1 0 1 0 1
+ 0 1 1 1 1 1 0 1
1 0 1 0 1 0 0 1 0
16

Grade-school algorithm. Θ(n) bit operations. “bit complexity”
1 1 1 1 1 1 0 1
1 1 0 1 0 1 0 1
+ 0 1 1 1 1 1 0 1
1 0 1 0 1 0 0 1 0
Remark. Grade-school addition and subtraction algorithms are optimal.
16
Integer multiplication
Multiplication. Given two n-bit integers a and b, compute a × b.
17

Grade-school algorithm (long multiplication). Θ(n2) bit operations.
1 1 0 1 0 1 0 1
× 0 1 1 1 1 1 0 1
1 1 0 1 0 1 0 1
0 0 0 0 0 0 0 0
1 1 0 1 0 1 0 1
1 1 0 1 0 1 0 1
1 1 0 1 0 1 0 1
1 1 0 1 0 1 0 1
1 1 0 1 0 1 0 1
0 0 0 0 0 0 0 0
0 1 1 0 1 0 0 0 0 0 0 0 0 0 0 1
17

1 1 0 1 0 1 0 1
× 0 1 1 1 1 1 0 1
1 1 0 1 0 1 0 1
0 0 0 0 0 0 0 0
1 1 0 1 0 1 0 1
1 1 0 1 0 1 0 1
1 1 0 1 0 1 0 1
1 1 0 1 0 1 0 1
1 1 0 1 0 1 0 1
0 0 0 0 0 0 0 0
0 1 1 0 1 0 0 0 0 0 0 0 0 0 0 1
Conjecture. [Kolmogorov 1956] Grade-school algorithm is optimal.
17

1 1 0 1 0 1 0 1
× 0 1 1 1 1 1 0 1
1 1 0 1 0 1 0 1
0 0 0 0 0 0 0 0
1 1 0 1 0 1 0 1
1 1 0 1 0 1 0 1
1 1 0 1 0 1 0 1
1 1 0 1 0 1 0 1
1 1 0 1 0 1 0 1
0 0 0 0 0 0 0 0
0 1 1 0 1 0 0 0 0 0 0 0 0 0 0 1
Conjecture. [Kolmogorov 1956] Grade-school algorithm is optimal.

Theorem. [Karatsuba 1960] Conjecture is false. 17
Divide-and-conquer multiplication
To multiply two n-bit integers x and y:
18

・Divide x and y into low- and high-order bits.
m=⎡n/2⎤
a = ⎣ x / 2m ⎦ b = x mod 2m
use bit shiftin
c = ⎣ y / 2m ⎦ d = y mod 2m to compute 4 terms
Ex. x = 1 0 0 0 1 1 0 1 y =11100001
a b c d
18
g

・Multiply four ½n-bit integers, recursively.
m=⎡n/2⎤
a = ⎣ x / 2m ⎦ b = x mod 2m
use bit shiftin
x y = (2m a + b) (2m c + d) = 22m ac + 2m (bc + ad) + bd
Ex. x = 1 0 0 0 1 1 0 1 y =11100001
a b c d
18
g

m=⎡n/2⎤
a = ⎣ x / 2m ⎦ b = x mod 2m
use bit shiftin

1
Ex. x = 1 0 0 0 1 1 0 1 y =11100001
a b c d
18
g

m=⎡n/2⎤
a = ⎣ x / 2m ⎦ b = x mod 2m
use bit shiftin

1 2
Ex. x = 1 0 0 0 1 1 0 1 y =11100001
a b c d
18
g

m=⎡n/2⎤
a = ⎣ x / 2m ⎦ b = x mod 2m
use bit shiftin

1 2 3
Ex. x = 1 0 0 0 1 1 0 1 y =11100001
a b c d
18
g

m=⎡n/2⎤
a = ⎣ x / 2m ⎦ b = x mod 2m
use bit shiftin

1 2 3 4
Ex. x = 1 0 0 0 1 1 0 1 y =11100001
a b c d
18
g

・Add and shift to obtain result.
m=⎡n/2⎤
a = ⎣ x / 2m ⎦ b = x mod 2m
use bit shiftin

1 2 3 4
Ex. x = 1 0 0 0 1 1 0 1 y =11100001
a b c d
18
g
MULTIPLY(x, y, n)
_______________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________
IF (n = 1
RETURN x y
ELSE
m← ⎡ n / 2 ⎤
Θ(n)
a ← ⎣ x / 2m ⎦; b ← x mod 2m.
c ← ⎣ y / 2m ⎦; d ← y mod 2m.
e ← MULTIPLY(a, c, m).
f ← MULTIPLY(b, d, m).
4 T(⎡n / 2⎤)
g ← MULTIPLY(b, c, m).
h ← MULTIPLY(a, d, m).
RETURN 22m e + 2m (g + h) + f. Θ(n)
_______________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________
19
)
𐄂
.
How many bit operations to multiply two n-bit integers using the
divide-and-conquer multiplication algorithm?
(1) n=1
T (n) =
<latexit sha1_base64="C59SKrult62qTBWLhiGWTGCUjp8=">AAACvnicbVFda9swFJW9ry77aNo+7kVbupGukNml0Jay0bGXPXaQrIXIBFm+TkQl2UjXI8HkZ+zH7bfsZXLqwZLsgsTh3HvP1T1KSyUdRtGvIHzw8NHjJztPO8+ev3i5293b/+6KygoYiUIV9jblDpQ0MEKJCm5LC1ynCm7Suy9N/uYHWCcLM8RFCYnmUyNzKTh6atL9OeybI8ou6cfm6rAUptLUwiu6ZYddsuEMkPfjI/qOMoQ51lTm9ND48viQLilj42hwBjrxtad02GdKgFTUfDihzDaw0T72yq2Q2RL69Feow8Bk7eRJtxcNolXQbRC3oEfauJ7sBfssK0SlwaBQ3LlxHJWY1NyiFAr8KpWDkos7PoWxh4ZrcEm98m9J33omo3lh/TFIV+y/HTXXzi106is1x5nbzDXk/3LjCvPzpJamrBCMuB+UV4piQZvPoJm0IFAtPODCSv9WKmbccoH+y9amrLRLEGub1PPKSFFksMEqnKPljYvxpmfbYHQyuBjE3057V+etnTvkFXlD+iQmZ+SKfCXXZEQE+R28Dt4Hx+HncBrqsLgvDYO254CsRTj/A38N0Zw=</latexit>
4T ( n/2 ) + (n) n>1
A. T(n) = Θ(n1/2).
B. T(n) = Θ(n log n).
C. T(n) = Θ(nlog2 3) = O(n1.585).
D. T(n) = Θ(n2). Case 1 of master theorem
20
How many bit operations to multiply two n-bit integers using the
divide-and-conquer multiplication algorithm?
(1) n=1
T (n) =
<latexit sha1_base64="C59SKrult62qTBWLhiGWTGCUjp8=">AAACvnicbVFda9swFJW9ry77aNo+7kVbupGukNml0Jay0bGXPXaQrIXIBFm+TkQl2UjXI8HkZ+zH7bfsZXLqwZLsgsTh3HvP1T1KSyUdRtGvIHzw8NHjJztPO8+ev3i5293b/+6KygoYiUIV9jblDpQ0MEKJCm5LC1ynCm7Suy9N/uYHWCcLM8RFCYnmUyNzKTh6atL9OeybI8ou6cfm6rAUptLUwiu6ZYddsuEMkPfjI/qOMoQ51lTm9ND48viQLilj42hwBjrxtad02GdKgFTUfDihzDaw0T72yq2Q2RL69Feow8Bk7eRJtxcNolXQbRC3oEfauJ7sBfssK0SlwaBQ3LlxHJWY1NyiFAr8KpWDkos7PoWxh4ZrcEm98m9J33omo3lh/TFIV+y/HTXXzi106is1x5nbzDXk/3LjCvPzpJamrBCMuB+UV4piQZvPoJm0IFAtPODCSv9WKmbccoH+y9amrLRLEGub1PPKSFFksMEqnKPljYvxpmfbYHQyuBjE3057V+etnTvkFXlD+iQmZ+SKfCXXZEQE+R28Dt4Hx+HncBrqsLgvDYO254CsRTj/A38N0Zw=</latexit>
4T ( n/2 ) + (n) n>1
A. T(n) = Θ(n1/2).
B. T(n) = Θ(n log n).
C. T(n) = Θ(nlog2 3) = O(n1.585).
D. T(n) = Θ(n2). Case 1 of master theorem
20
Karatsuba trick
21
Karatsuba trick
21
Karatsuba trick

x =10001101
m=⎡n/2⎤
a b
a = ⎣ x / 2m ⎦ b = x mod 2m
m m
y =11100001
c = ⎣ y / 2 ⎦ d = y mod 2
c d
21
Karatsuba trick

x =10001101
m=⎡n/2⎤
a b
a = ⎣ x / 2m ⎦ b = x mod 2m
m m
y =11100001
c = ⎣ y / 2 ⎦ d = y mod 2
c d
x y = (2m a + b) (2m c + d) = 22m ac + 2m (bc + ad ) + bd
21
Karatsuba trick

・To compute middle term bc + ad, use identity:
bc + ad = ac + bd – (a – b) (c – d)
x =10001101
m=⎡n/2⎤
a b
a = ⎣ x / 2m ⎦ b = x mod 2m
m m
middle term y =11100001
c = ⎣ y / 2 ⎦ d = y mod 2
c d
21
Karatsuba trick

bc + ad = ac + bd – (a – b) (c – d)
x =10001101
m=⎡n/2⎤
a b
a = ⎣ x / 2m ⎦ b = x mod 2m
m m
c = ⎣ y / 2 ⎦ d = y mod 2
c d
= 22m ac + 2m (ac + bd – (a – b)(c – d)) + bd
21
Karatsuba trick

bc + ad = ac + bd – (a – b) (c – d)
x =10001101
m=⎡n/2⎤
a b
a = ⎣ x / 2m ⎦ b = x mod 2m
m m
c = ⎣ y / 2 ⎦ d = y mod 2
c d
= 22m ac + 2m (ac + bd – (a – b)(c – d)) + bd
1
21
Karatsuba trick

bc + ad = ac + bd – (a – b) (c – d)
x =10001101
m=⎡n/2⎤
a b
a = ⎣ x / 2m ⎦ b = x mod 2m
m m
c = ⎣ y / 2 ⎦ d = y mod 2
c d
= 22m ac + 2m (ac + bd – (a – b)(c – d)) + bd
1 2
21
Karatsuba trick

bc + ad = ac + bd – (a – b) (c – d)
x =10001101
m=⎡n/2⎤
a b
a = ⎣ x / 2m ⎦ b = x mod 2m
m m
c = ⎣ y / 2 ⎦ d = y mod 2
c d
= 22m ac + 2m (ac + bd – (a – b)(c – d)) + bd
1 2 3
21
Karatsuba trick

bc + ad = ac + bd – (a – b) (c – d)
x =10001101
m=⎡n/2⎤
a b
a = ⎣ x / 2m ⎦ b = x mod 2m
m m
c = ⎣ y / 2 ⎦ d = y mod 2
c d
= 22m ac + 2m (ac + bd – (a – b)(c – d)) + bd
1 1 2 3
21
Karatsuba trick

bc + ad = ac + bd – (a – b) (c – d)
x =10001101
m=⎡n/2⎤
a b
a = ⎣ x / 2m ⎦ b = x mod 2m
m m
c = ⎣ y / 2 ⎦ d = y mod 2
c d
= 22m ac + 2m (ac + bd – (a – b)(c – d)) + bd
1 1 3 2 3
21
Karatsuba trick

bc + ad = ac + bd – (a – b) (c – d)
・Multiply only three ½n-bit integers, recursively.

x =10001101
m=⎡n/2⎤
a b
a = ⎣ x / 2m ⎦ b = x mod 2m
m m
c = ⎣ y / 2 ⎦ d = y mod 2
c d
= 22m ac + 2m (ac + bd – (a – b)(c – d)) + bd
1 1 3 2 3
21
Karatsuba multiplication
KARATSUBA-MULTIPLY(x, y, n)
_______________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________
IF (n = 1
RETURN x y
ELSE
m← ⎡n/2⎤
Θ(n)
a ← ⎣ x / 2m ⎦; b ← x mod 2m.
c ← ⎣ y / 2m ⎦; d ← y mod 2m.
e ← KARATSUBA-MULTIPLY(a, c, m).
f ← KARATSUBA-MULTIPLY(b, d, m). 3 T(⎡n / 2⎤)
g ← KARATSUBA-MULTIPLY(⎢a – b ⎢, ⎢c – d ⎢, m)
Flip sign of g if needed.
RETURN 22m e + 2m (e + f – g) + f Θ(n)
_________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________
22
)
𐄂
.
Karatsuba analysis
Proposition. Karatsuba’s algorithm requires O(n1.585) bit operations to multiply two n-

bit integers.
23
Karatsuba analysis

bit integers.
Pf. Apply Case 1 of the master theorem to the recurrence:
(1) n=1
T (n) =
3T ( n/2 ) + (n) n>1
23
Karatsuba analysis

bit integers.
(1) n=1
T (n) =
3T ( n/2 ) + (n) n>1
) + (n) = T (n) = (nlog2 3 ) = O(n1.585 )
23
Karatsuba analysis

bit integers.
(1) n=1
T (n) =
3T ( n/2 ) + (n) n>1
) + (n) = T (n) = (nlog2 3 ) = O(n1.585 )
Practice
・Use base 32 or 64 (instead of base 2)
・Faster than grade-school algorithm for about 320–640 bits.
23
.
Integer arithmetic reductions
Integer multiplication. Given two n-bit integers, compute their product.
arithmetic problem formula bit complexity
integer multiplication a×b M(n)
integer arithmetic problems with the same bit complexity M(n) as integer multiplication
24
integer multiplication a×b M(n)
integer square a2 Θ(M(n))
24
integer multiplication a×b M(n) (a + b)2 a2 b2

ab =
<latexit sha1_base64="vEhGWAG7lUjl6rsRHboNGxBfdYk=">AAACTXicdVBNa9tAEF05bfPVDyc55rLULaSUCkmxY+cQCPTSYwp1E7AdM1qPkiWrldgdlRjhP5Bfk2vyK3rtH8kplK4UB+rQPtjl8d7Mzs6LcyUtBcEvr7H07PmL5ZXVtfWXr16/aW5sfrdZYQT2RaYycxKDRSU19kmSwpPcIKSxwuP44nPlH/9AY2Wmv9E0x1EKZ1omUgA5adx8BzE/4MPEgCh3gH/k8YfTiH/iUN/xaTQro9m42Qr8zt5ut9fmgR/ud3ajqCLtTq/b5qEf1GixOY7GG97mcJKJIkVNQoG1gzDIaVSCISkUztaGhcUcxAWc4cBRDSnaUVmvM+PvnTLhSWbc0cRr9e+OElJrp2nsKlOgc/vUq8R/eYOCkt6olDovCLV4GJQUilPGq2z4RBoUpKaOgDDS/ZWLc3DRkEtwYUr9do5iYZPystBSZBN8oiq6JANVio9R8f+TfuTv++HXduuwN49zhW2zt2yHhazLDtkXdsT6TLArds1u2K3307vz7r3fD6UNb96zxRbQWP4DfheyBQ==</latexit>
2
24

ab =
2
integer division ⎣a / b⎦, a mod b Θ(M(n))
24

ab =
2
integer division ⎣a / b⎦, a mod b Θ(M(n))
integer square root ⎣√a ⎦ Θ(M(n))
24
History of asymptotic complexity of integer multiplication
year algorithm bit operations
12xx grade school O (n2)
1962 Karatsuba–Ofman O(n1.585)
number of bit operations to multiply two n-bit integers
25
1963 Toom-3, Toom-4 O (n1.465), O (n1.404)
25
1963 Toom-3, Toom-4 O (n1.465), O (n1.404)
1966 Toom–Cook O (n1 + ε)
25
1963 Toom-3, Toom-4 O (n1.465), O (n1.404)
1971 Schönhage–Strassen O (n log n ⋅ log log n)
25
1963 Toom-3, Toom-4 O (n1.465), O (n1.404)
2007 Fürer n log n 2 O(log*n)
25
1963 Toom-3, Toom-4 O (n1.465), O (n1.404)
2019 Harvey–van der Hoeven O (n log n)
25
1963 Toom-3, Toom-4 O (n1.465), O (n1.404)
O (n)
25
1963 Toom-3, Toom-4 O (n1.465), O (n1.404)
O (n)
Remark. GNU Multiple Precision library uses one of rs

ve algorithms depending on n.
used in Maple, Mathematica, gcc, cryptography, ...
25
fi
fi
t
‣ master theore
SECTION 4.2
m
Dot product
Dot product. Given two length-n vectors a and b, compute c = a ⋅ b.
n
a·b = a i bi
i=1
a = [ .70 .20 .10 ]

b = [ .30 .40 .30 ]
a ⋅ b = (.70 × .30) + (.20 × .40) + (.10 × .30) = .32
27
Dot product

Grade-school. Θ(n) arithmetic operations.
n
a·b = a i bi
i=1
a = [ .70 .20 .10 ]

b = [ .30 .40 .30 ]
a ⋅ b = (.70 × .30) + (.20 × .40) + (.10 × .30) = .32
27
Dot product

Grade-school. Θ(n) arithmetic operations.
n
a·b = a i bi
i=1
a = [ .70 .20 .10 ]

b = [ .30 .40 .30 ]
a ⋅ b = (.70 × .30) + (.20 × .40) + (.10 × .30) = .32
Remark. “Grade-school” dot product algorithm is asymptotically optimal.
27
Matrix multiplication
Matrix multiplication. Given two n-by-n matrices A and B, compute C = AB.

n
cij = aik bkj
k=1
"c11 c12 ! c1n % "a11 a12 ! a1n % "b11 b12 ! b1n %

$ ' $ ' $ '
$c21 c22 ! c2n ' $a 21 a 22 ! a 2n ' $b21 b22 ! b2n '
= ×
$" " # "' $" " # " ' $" " # "'
$ ' $ ' $ '
#cn1 cn2 ! cnn & #a n1 a n2 ! a nn & #bn1 bn2 ! bnn &
€
".59 .32 .41% ".70 .20 .10% " .80 .30 .50%
$ ' $ ' $ '
$ .31 .36 .25 ' = $ .30 .60 .10 ' × $ .10 .40 .10 '
$#.45 .31 .42'& $#.50 .10 .40'& $# .10 .30 .40'&
28

Grade-school. Θ(n3) arithmetic operations. n
cij = aik bkj
k=1
"c11 c12 ! c1n % "a11 a12 ! a1n % "b11 b12 ! b1n %

$ ' $ ' $ '
$c21 c22 ! c2n ' $a 21 a 22 ! a 2n ' $b21 b22 ! b2n '
= ×
$" " # "' $" " # " ' $" " # "'
$ ' $ ' $ '
€
".59 .32 .41% ".70 .20 .10% " .80 .30 .50%
$ ' $ ' $ '
$ .31 .36 .25 ' = $ .30 .60 .10 ' × $ .10 .40 .10 '
$#.45 .31 .42'& $#.50 .10 .40'& $# .10 .30 .40'&
28

Grade-school. Θ(n3) arithmetic operations. n
cij = aik bkj
k=1
"c11 c12 ! c1n % "a11 a12 ! a1n % "b11 b12 ! b1n %

$ ' $ ' $ '
$c21 c22 ! c2n ' $a 21 a 22 ! a 2n ' $b21 b22 ! b2n '
= ×
$" " # "' $" " # " ' $" " # "'
$ ' $ ' $ '
€
".59 .32 .41% ".70 .20 .10% " .80 .30 .50%
$ ' $ ' $ '
$ .31 .36 .25 ' = $ .30 .60 .10 ' × $ .10 .40 .10 '
$#.45 .31 .42'& $#.50 .10 .40'& $# .10 .30 .40'&
Q. Is “grade-school” matrix multiplication algorithm asymptotically optimal?
28
Block matrix multiplication
A11 A12 B11

C11
" 152 158 164 170 % " 0 1 2 3 % "16 17 18 19 %

$ ' $ ' $ '
$ 504 526 548 570 ' = $ 4 5 6 7 ' × $20 21 22 23'
$ 856 894 932 970 ' $ 8 9 10 11' $24 25 26 27'
$1208 1262 1316 1370' $12 13 14 15' $28
# & # & # 29 30 31'&
B21
€
#0 1& #16 17& #2 3& #24 25& #152 158 &

C 11 = A11 × B11 + A12 × B21 = % (× % ( + % (× % ( = % (
$ 4 5' $ 20 21' $6 7' $ 28 29' $504 526 '
€
29
Block matrix multiplication: warmup
To multiply two n-by-n matrices A and B:
n-by-n matrices
C = A B
30

・Divide: partition A and B into ½n-by-½n blocks.
n-by-n matrices
C = A B
"C11 C12 % " A11 A12 % " B11 B12 %

$ ' = $ '× $ '
#C21 C22 & # A21 A22 & # B21 B22 &
½n-by-½n matrices
€
30

・Conquer: multiply 8 pairs of ½n-by-½n matrices, recursively.
8 matrix multiplication
n-by-n matrices (of ½n-by-½n matrices)
C = A B
C11 = ( A11 × B11 ) + ( A12 × B21 )
"C11 C12 % " A11 A12 % " B11 B12 % C12 = ( A11 × B12 ) + ( A12 × B22 )
$ ' = $ '× $ ' C21 = ( A21 × B11 ) + ( A22 × B21 )
#C21 C22 & # A21 A22 & # B21 B22 &
C22 = ( A21 × B12 ) + ( A22 × B22 )
½n-by-½n matrices
€ €
30
s

・Combine: add appropriate products using 4 matrix additions.
C = A B
C11 = ( A11 × B11 ) + ( A12 × B21 )
"C11 C12 % " A11 A12 % " B11 B12 % C12 = ( A11 × B12 ) + ( A12 × B22 )
$ ' = $ '× $ ' C21 = ( A21 × B11 ) + ( A22 × B21 )
#C21 C22 & # A21 A22 & # B21 B22 &
C22 = ( A21 × B12 ) + ( A22 × B22 )
½n-by-½n matrices
€ € 4 matrix addition
(of ½n-by-½n matrices)
30
s

・Combine: add appropriate products using 4 matrix additions.
C = A B
C11 = ( A11 × B11 ) + ( A12 × B21 )
"C11 C12 % " A11 A12 % " B11 B12 % C12 = ( A11 × B12 ) + ( A12 × B22 )
$ ' = $ '× $ ' C21 = ( A21 × B11 ) + ( A22 × B21 )
#C21 C22 & # A21 A22 & # B21 B22 &
C22 = ( A21 × B12 ) + ( A22 × B22 )
½n-by-½n matrices
€ € 4 matrix addition
Running time. Apply Case 1 of the master theorem.
T (n) = 8T (n /2 ) + Θ(n 2 ) ⇒ T (n) = Θ(n 3 )

!#"# $ !##"##$
recursive calls add, form submatrices 30
s
Strassen’s trick
Key idea. Can multiply two 2-by-2 matrices via 7 scalar multiplications
(plus 11 additions and 7 subtractions).
scalars
"C11 C12 % " A11 A12 % " B11 B12 %

$ ' = $ ' × $ '
#C21 C22 & # A21 A22 & # B21 B22 &
31
Strassen’s trick
scalars
"C11 C12 % " A11 A12 % " B11 B12 % P1 ← A11 (B12 – B22
$ ' = $ ' × $ '
#C21 C22 & # A21 A22 & # B21 B22 & P2 ← (A11 + A12) B22
P3 ← (A21 + A22) B11
P4 ← A22 (B21 – B11
€
P5 ← (A11 + A22) (B11 + B22
P6 ← (A12 – A22) (B21 + B22
P7 ← (A11 – A21) (B11 + B12)
7 scalar multiplications
31
𐄂
𐄂
𐄂
𐄂
𐄂
𐄂
𐄂
)
Strassen’s trick
scalars
"C11 C12 % " A11 A12 % " B11 B12 % P1 ← A11 (B12 – B22
$ ' = $ ' × $ '
#C21 C22 & # A21 A22 & # B21 B22 & P2 ← (A11 + A12) B22
P3 ← (A21 + A22) B11
P4 ← A22 (B21 – B11
€ C11 = P5 + P4 – P2 + P
C12 = P1 + P2 P5 ← (A11 + A22) (B11 + B22
C21 = P3 + P4 P6 ← (A12 – A22) (B21 + B22

C22 = P1 + P5 – P3 – P7 P7 ← (A11 – A21) (B11 + B12)
7 scalar multiplications
31
𐄂
𐄂
𐄂
𐄂
𐄂
𐄂
𐄂
)
Strassen’s trick
scalars
"C11 C12 % " A11 A12 % " B11 B12 % P1 ← A11 (B12 – B22
$ ' = $ ' × $ '
#C21 C22 & # A21 A22 & # B21 B22 & P2 ← (A11 + A12) B22
P3 ← (A21 + A22) B11
P4 ← A22 (B21 – B11
€ C11 = P5 + P4 – P2 + P
C12 = P1 + P2 P5 ← (A11 + A22) (B11 + B22
C21 = P3 + P4 P6 ← (A12 – A22) (B21 + B22

C22 = P1 + P5 – P3 – P7 P7 ← (A11 – A21) (B11 + B12)
Pf. C12 = P1 + P2 7 scalar multiplications
31
𐄂
𐄂
𐄂
𐄂
𐄂
𐄂
𐄂
)
Strassen’s trick
scalars
"C11 C12 % " A11 A12 % " B11 B12 % P1 ← A11 (B12 – B22
$ ' = $ ' × $ '
#C21 C22 & # A21 A22 & # B21 B22 & P2 ← (A11 + A12) B22
P3 ← (A21 + A22) B11
P4 ← A22 (B21 – B11
€ C11 = P5 + P4 – P2 + P
C12 = P1 + P2 P5 ← (A11 + A22) (B11 + B22
C21 = P3 + P4 P6 ← (A12 – A22) (B21 + B22

C22 = P1 + P5 – P3 – P7 P7 ← (A11 – A21) (B11 + B12)

= A11 (B12 – B22) + (A11 + A12) B22
31
𐄂
𐄂
𐄂
𐄂
𐄂
𐄂
𐄂
𐄂
)
𐄂
Strassen’s trick
scalars
"C11 C12 % " A11 A12 % " B11 B12 % P1 ← A11 (B12 – B22
$ ' = $ ' × $ '
#C21 C22 & # A21 A22 & # B21 B22 & P2 ← (A11 + A12) B22
P3 ← (A21 + A22) B11
P4 ← A22 (B21 – B11
€ C11 = P5 + P4 – P2 + P
C12 = P1 + P2 P5 ← (A11 + A22) (B11 + B22
C21 = P3 + P4 P6 ← (A12 – A22) (B21 + B22

C22 = P1 + P5 – P3 – P7 P7 ← (A11 – A21) (B11 + B12)

= A11 (B12 – B22) + (A11 + A12) B22
= A11 B12 + A12 B22. ✔
31
𐄂
𐄂
𐄂
𐄂
𐄂
𐄂
𐄂
𐄂
𐄂
)
𐄂
)
𐄂
Strassen’s trick
n-by-n ½n-by-½n matrix
½n-by-½n matrices
"C11 C12 % " A11 A12 % " B11 B12 %

$ ' = $ ' × $ '
#C21 C22 & # A21 A22 & # B21 B22 &
32
Strassen’s trick
½n-by-½n matrices
"C11 C12 % " A11 A12 % " B11 B12 % P1 ← A11 (B12 – B22
$ ' = $ ' × $ '
#C21 C22 & # A21 A22 & # B21 B22 & P2 ← (A11 + A12) B22
P3 ← (A21 + A22) B11
P4 ← A22 (B21 – B11
€
P5 ← (A11 + A22) (B11 + B22
P6 ← (A12 – A22) (B21 + B22
P7 ← (A11 – A21) (B11 + B12)
32
𐄂
𐄂
s
𐄂
𐄂
𐄂
𐄂
𐄂
)
Strassen’s trick
½n-by-½n matrices
"C11 C12 % " A11 A12 % " B11 B12 % P1 ← A11 (B12 – B22
$ ' = $ ' × $ '
#C21 C22 & # A21 A22 & # B21 B22 & P2 ← (A11 + A12) B22
P3 ← (A21 + A22) B11
P4 ← A22 (B21 – B11
€ C11 = P5 + P4 – P2 + P
C12 = P1 + P2 P5 ← (A11 + A22) (B11 + B22
C21 = P3 + P4 P6 ← (A12 – A22) (B21 + B22

C22 = P1 + P5 – P3 – P7 P7 ← (A11 – A21) (B11 + B12)
32
𐄂
𐄂
s
𐄂
𐄂
𐄂
𐄂
𐄂
)
Strassen’s trick
½n-by-½n matrices
"C11 C12 % " A11 A12 % " B11 B12 % P1 ← A11 (B12 – B22
$ ' = $ ' × $ '
#C21 C22 & # A21 A22 & # B21 B22 & P2 ← (A11 + A12) B22
P3 ← (A21 + A22) B11
P4 ← A22 (B21 – B11
€ C11 = P5 + P4 – P2 + P
C12 = P1 + P2 P5 ← (A11 + A22) (B11 + B22
C21 = P3 + P4 P6 ← (A12 – A22) (B21 + B22

C22 = P1 + P5 – P3 – P7 P7 ← (A11 – A21) (B11 + B12)
Pf. C12 = P1 + P2 7 matrix multiplication

32
𐄂
𐄂
s
𐄂
𐄂
𐄂
𐄂
𐄂
)
Strassen’s trick
½n-by-½n matrices
"C11 C12 % " A11 A12 % " B11 B12 % P1 ← A11 (B12 – B22
$ ' = $ ' × $ '
#C21 C22 & # A21 A22 & # B21 B22 & P2 ← (A11 + A12) B22
P3 ← (A21 + A22) B11
P4 ← A22 (B21 – B11
€ C11 = P5 + P4 – P2 + P
C12 = P1 + P2 P5 ← (A11 + A22) (B11 + B22
C21 = P3 + P4 P6 ← (A12 – A22) (B21 + B22

C22 = P1 + P5 – P3 – P7 P7 ← (A11 – A21) (B11 + B12)

= A11 (B12 – B22) + (A11 + A12) B22 (of ½n-by-½n matrices)
32
𐄂
𐄂
𐄂
s
𐄂
𐄂
𐄂
𐄂
𐄂
)
𐄂
Strassen’s trick
½n-by-½n matrices
"C11 C12 % " A11 A12 % " B11 B12 % P1 ← A11 (B12 – B22
$ ' = $ ' × $ '
#C21 C22 & # A21 A22 & # B21 B22 & P2 ← (A11 + A12) B22
P3 ← (A21 + A22) B11
P4 ← A22 (B21 – B11
€ C11 = P5 + P4 – P2 + P
C12 = P1 + P2 P5 ← (A11 + A22) (B11 + B22
C21 = P3 + P4 P6 ← (A12 – A22) (B21 + B22

C22 = P1 + P5 – P3 – P7 P7 ← (A11 – A21) (B11 + B12)

= A11 (B12 – B22) + (A11 + A12) B22 (of ½n-by-½n matrices)
= A11 B12 + A12 B22. ✔

32
𐄂
𐄂
𐄂
𐄂
s
𐄂
𐄂
𐄂
𐄂
𐄂
)
𐄂
)
𐄂
Strassen’s algorithm
assume n is a power of 2
STRASSEN(n, A, B)
______________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________
IF (n = 1) RETURN A B.
Partition A and B into ½n-by-½n blocks
P1 ← STRASSEN(n / 2, A11, (B12 – B22))
P2 ← STRASSEN(n / 2, (A11 + A12), B22)
P3 ← STRASSEN(n / 2, (A21 + A22), B11)
P4 ← STRASSEN(n / 2, A22, (B21 – B11)) 7 T(n / 2) + Θ(n2)
P5 ← STRASSEN(n / 2, (A11 + A22), (B11 + B22))

P6 ← STRASSEN(n / 2, (A12 – A22), (B21 + B22))
P7 ← STRASSEN(n / 2, (A11 – A21), (B11 + B12))
C11 = P5 + P4 – P2 + P6.
C12 = P1 + P2
Θ(n2)
C21 = P3 + P4
C22 = P1 + P5 – P3 – P7
"C11 C12 % " A11 A12 % " B11 B12 %
RETURN C. $ ' = $ ' × $ '
#C21 C22 & # A21 A22 & # B21 B22 &
33
.
𐄂
.
Analysis of Strassen’s algorithm
Numer. Math. t3, 354--356 (t969)
Gaussian Elimination is not Optimal

VOLKER ~TRASSEN*
Received December 12, t 968
t. Below we will give an algorithm which computes the coefficients of the

product of two square matrices A and B of order n from the coefficients of A
and B with tess t h a n 4 . 7 - n l°g7 arithmetical operations (all logarithms in this
paper are for base 2, thus tog 7 ~ 2.8; the usual m e t h o d requires approximately
2n 3 arithmetical operations). The algorithm induces algorithms for inverting a
matrix of order n, solving a system of n linear equations in n unknowns, com-
puting a determinant of order n etc. all requiring less than const n l°g 7 arithmetical
operations.
This fact should be compared with the result of KLYUYEV and KOKOVKIN-
SHCHERBAK [1 ] that Gaussian elimination for solving a system of linearequations
is optimal if one restricts oneself to operations upon rows and columns as a
whole. We also note t h a t WlNOGRAD [21 modifies the usual algorithms for matrix
multiplication and inversion and for solving systems of linear equations, trading
roughly half of the multiplications for additions and subtractions.
It is a pleasure to thank D. BRILLINGERfor inspiring discussions about the present
subject and ST. COOK and B. PARLETT for encouraging me to write this paper.
2. We define algorithms e~, ~ which multiply matrices of order m 2 ~, b y in-
duction on k: ~ , 0 is the usual algorithm, for matrix multiplication (requiring
m a multiplications and m 2 ( m - t) additions), e~,k already being known, define
~ , ~+t as follows:
34
If A, B are matrices of order m 2 k ~ to be multiplied, write
Theorem. Strassen’s algorithm requires O(n2.81) arithmetic operations to multiply two

n-by-n matrices.
Numer. Math. t3, 354--356 (t969)
Gaussian Elimination is not Optimal

VOLKER ~TRASSEN*
Received December 12, t 968
t. Below we will give an algorithm which computes the coefficients of the

product of two square matrices A and B of order n from the coefficients of A
and B with tess t h a n 4 . 7 - n l°g7 arithmetical operations (all logarithms in this
paper are for base 2, thus tog 7 ~ 2.8; the usual m e t h o d requires approximately
2n 3 arithmetical operations). The algorithm induces algorithms for inverting a
matrix of order n, solving a system of n linear equations in n unknowns, com-
puting a determinant of order n etc. all requiring less than const n l°g 7 arithmetical
operations.
This fact should be compared with the result of KLYUYEV and KOKOVKIN-
SHCHERBAK [1 ] that Gaussian elimination for solving a system of linearequations
is optimal if one restricts oneself to operations upon rows and columns as a
whole. We also note t h a t WlNOGRAD [21 modifies the usual algorithms for matrix
multiplication and inversion and for solving systems of linear equations, trading
roughly half of the multiplications for additions and subtractions.
It is a pleasure to thank D. BRILLINGERfor inspiring discussions about the present
subject and ST. COOK and B. PARLETT for encouraging me to write this paper.
2. We define algorithms e~, ~ which multiply matrices of order m 2 ~, b y in-
duction on k: ~ , 0 is the usual algorithm, for matrix multiplication (requiring
m a multiplications and m 2 ( m - t) additions), e~,k already being known, define
~ , ~+t as follows:
34
If A, B are matrices of order m 2 k ~ to be multiplied, write

n-by-n matrices.
Pf.
・When n is a power of 2, apply Case 1 of the master theorem:
35

n-by-n matrices.
Pf.
T (n) = 7T (n /2 ) + Θ(n 2 ) ⇒ T (n) = Θ(n log2 7 ) = O(n 2.81 )
!#"# $ !#"#$
recursive calls add, subtract
35

n-by-n matrices.
Pf.
T (n) = 7T (n /2 ) + Θ(n 2 ) ⇒ T (n) = Θ(n log2 7 ) = O(n 2.81 )
!#"# $ !#"#$
recursive calls add, subtract
・€When n is not a power of 2, pad matrices with zeros to be nʹ-by-nʹ,

where n ≤ nʹ < 2n and nʹ is a power of 2.
1 2 3 0 10 11 12 0 84 90 96 0
4 5 6 0 13 14 15 0 201 216 231 0
=
7 8 9 0 16 17 18 0 318 342 366 0
0 0 0 0 0 0 0 0 0 0 0 0
35
Strassen’s algorithm: practice
Implementation issues
・Sparsity
・Caching
・n not a power of 2
・Numerical stability
・Non-square matrices
・Storage for intermediate submatrices
・Crossover to classical algorithm when n is “small.”
・Parallelism for multi-core and many-core architectures
Common misperception. “Strassen’s algorithm is only a theoretical curiosity.”
・Apple reports 8x speedup when n ≈ 2,048
・Range of instances where it’s useful is a subject of controversy.
Strassen’s Algorithm Reloaded
Jianyu Huang⇤ , Tyler M. Smith⇤† , Greg M. Henry‡ , Robert A. van de Geijn⇤†
⇤ Department of Computer Science and † Institute for Computational Engineering and Sciences,
The University of Texas at Austin, Austin, TX 78712
Email: jianyu,tms,rvdg@cs.utexas.edu
‡ Intel Corporation, Hillsboro, OR 97124
Email: greg.henry@intel.com
Abstract—We dispel with “street wisdom” regarding the that can be computed and makes it so an implementation is not
36
practical implementation of Strassen’s algorithm for matrix- plug-compatible with the standard calling sequence supported
.
Suppose that you could multiply two 3-by-3 matrices with 21 scalar
multiplications. How fast could you multiply two n-by-n matrices?
A. Θ(n log3 21
B. Θ(n log2 21
C. Θ(n log9 21
Θ (n log3 21 ) = O(n 2.77 )
D. Θ(n 2)
37
)
Suppose that you could multiply two 3-by-3 matrices with 21 scalar
multiplications. How fast could you multiply two n-by-n matrices?
A. Θ(n log3 21
B. Θ(n log2 21
C. Θ(n log9 21
Θ (n log3 21 ) = O(n 2.77 )
D. Θ(n 2)
37
)
Is it possible to multiply two 3-by-3 matrices using only 21 scalar

multiplications?
A. Yes.
B. No.
C. Unknown.
38
Is it possible to multiply two 3-by-3 matrices using only 21 scalar

multiplications?
A. Yes.
B. No.
C. Unknown.
38
Fast matrix multiplication: theory
Q. Multiply two 2-by-2 matrices with 7 scalar multiplications?

A. Yes! [Strassen 1969 Θ(nlog2 7 ) = O(n2.81)
39
]

39
]


A. Impossible. [Hopcroft–Kerr, Winograd 1971 Θ(nlog2 6 ) = O(n2.59)
39
]
]


Begun, the decimal wars have. [Pan 1978, Bini et al., Schönhage, …]
39
]
]


・Two 70-by-70 matrices with 143,640 scalar multiplications O(n2.7962)
39
]
]
.


・Two 48-by-48 matrices with 47,217 scalar multiplications. O(n2.7801)
39
]
]

.


・A year later. O(n2.7799)
39

]
]

.


・December 1979. O(n2.521813)
39

]
]

.


・December 1979. O(n2.521813)
・January 1980 O(n2.521801)
39

.

]
]

.
History of arithmetic complexity of matrix multiplication
year algorithm arithmetic operations
1858 “grade school” O (n 3 )
1969 Strassen O (n 2.808 )
1978 Pan O (n 2.796 )
1979 Bini O (n 2.780 )
1981 Schönhage O (n 2.522 )
number of arithmetic operations to multiply two n-by-n matrices 40

1969 Strassen O (n 2.808 )
1978 Pan O (n 2.796 )
1979 Bini O (n 2.780 )
1982 Romani O (n 2.517 )

1969 Strassen O (n 2.808 )
1978 Pan O (n 2.796 )
1979 Bini O (n 2.780 )
1982 Romani O (n 2.517 )
1982 Coppersmith–Winograd O (n 2.496 )

1969 Strassen O (n 2.808 )
1978 Pan O (n 2.796 )
1979 Bini O (n 2.780 )
1982 Romani O (n 2.517 )
1986 Strassen O (n 2.479 )

1969 Strassen O (n 2.808 )
1978 Pan O (n 2.796 )
1979 Bini O (n 2.780 )
1982 Romani O (n 2.517 )
1986 Strassen O (n 2.479 )

1969 Strassen O (n 2.808 )
1978 Pan O (n 2.796 )
1979 Bini O (n 2.780 )
1982 Romani O (n 2.517 )
1986 Strassen O (n 2.479 )
2010 Strother O (n 2.3737 )

1969 Strassen O (n 2.808 )
1978 Pan O (n 2.796 )
1979 Bini O (n 2.780 )
1982 Romani O (n 2.517 )
1986 Strassen O (n 2.479 )
2010 Strother O (n 2.3737 )
2011 Williams O (n 2.372873 )

1969 Strassen O (n 2.808 )
1978 Pan O (n 2.796 )
1979 Bini O (n 2.780 )
1982 Romani O (n 2.517 )
1986 Strassen O (n 2.479 )
2010 Strother O (n 2.3737 )
2011 Williams O (n 2.372873 )
2014 Le Gall O (n 2.372864 )

1969 Strassen O (n 2.808 )
1978 Pan O (n 2.796 )
1979 Bini O (n 2.780 )
1982 Romani O (n 2.517 )
1986 Strassen O (n 2.479 )
2010 Strother O (n 2.3737 )
2011 Williams O (n 2.372873 )
2014 Le Gall O (n 2.372864 )
O (n 2 + ε )

1969 Strassen O (n 2.808 )
1978 Pan O (n 2.796 )
1979 Bini O (n 2.780 )
1982 Romani O (n 2.517 )
1982 Coppersmith–Winograd O (n 2.496 ) galacti

algorithms
1986 Strassen O (n 2.479 )
2010 Strother O (n 2.3737 )
2011 Williams O (n 2.372873 )
2014 Le Gall O (n 2.372864 )
O (n 2 + ε )

c
Numeric linear algebra reductions
Matrix multiplication. Given two n-by-n matrices, compute their product.
linear algebra problem expression arithmetic complexity
matrix multiplication A×B MM(n)
matrix squaring A2 Θ(MM(n))
matrix inversion A –1 Θ(MM(n))
determinant ⎢A ⎢ Θ(MM(n))
rank rank(A) Θ(MM(n))
system of linear equations Ax = b Θ(MM(n))
LU decomposition A = LU Θ(MM(n))
least squares min ⎢⎢Ax – b ⎢⎢2 Θ(MM(n))
numerical linear algebra problems with the sam

arithmetic complexity MM(n) as matrix multiplication 41
e

Lec7 DivideAndConquerII

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lec7 DivideAndConquerII

Uploaded by

Copyright:

Available Formats

D IVIDE AND C ONQUER II

D IVIDE AND C ONQUER II

Goal. Recipe for solving common divide-and-conquer recurrences:

with T(0) = 0 and T(1) = Θ(1).

Goal. Recipe for solving common divide-and-conquer recurrences:

with T(0) = 0 and T(1) = Θ(1).

Goal. Recipe for solving common divide-and-conquer recurrences:

with T(0) = 0 and T(1) = Θ(1).

Suppose T (n) satis es T (n) = a T (n / b) + n c with T (1) = 1, for n a power of b.

Suppose T (n) satis es T (n) = a T (n / b) + n c with T (1) = 1, for n a power of b.

Suppose T (n) satis es T (n) = a T (n / b) + n c with T (1) = 1, for n a power of b.

Suppose T (n) satis es T (n) = a T (n / b) + n c with T (1) = 1, for n a power of b.

Suppose T (n) satis es T (n) = a T (n / b) + n c with T (1) = 1, for n a power of b.

Suppose T (n) satis es T (n) = a T (n / b) + n c with T (1) = 1, for n a power of b.

Suppose T (n) satis es T (n) = a T (n / b) + n c with T (1) = 1, for n a power of b.

Suppose T (n) satis es T (n) = a T (n / b) + n c with T (1) = 1, for n a power of b.

Suppose T (n) satis es T (n) = a T (n / b) + n c with T (1) = 1, for n a power of b.

Suppose T (n) satis es T (n) = a T (n / b) + n c with T (1) = 1, for n a power of b.

Suppose T (n) satis es T (n) = a T (n / b) + n c with T (1) = 1, for n a power of b.

Suppose T (n) satis es T (n) = a T (n / b) + n c with T (1) = 1, for n a power of b.

Let r = a / b c. Note that r < 1 iff c > logb a

Divide-and-conquer recurrences: master theorem

Master theorem. Let a ≥ 1, b ≥ 2, and c > 0 and suppose that T (n) is a

Case 1 If c < logb a, then T (n) = Θ(nlogb a )

n/b /b /b < n/b3 + (1/b2 + 1/b + 1)

Divide-and-conquer recurrences: master theorem

Master theorem. Let a ≥ 1, b ≥ 2, and c > 0 and suppose that T (n) is a

Case 1 If c < logb a, then T (n) = Θ(nlogb a )

Divide-and-conquer recurrences: master theorem

Master theorem. Let a ≥ 1, b ≥ 2, and c > 0 and suppose that T (n) is a

Case 1 If c < logb a, then T (n) = Θ(nlogb a )

Ex 1. T (n) = 3 T(⎣n / 2⎦) + 5 n

Divide-and-conquer recurrences: master theorem

Master theorem. Let a ≥ 1, b ≥ 2, and c > 0 and suppose that T (n) is a

Case 1 If c < logb a, then T (n) = Θ(nlogb a )

ok to intermix oor and ceiling

Ex 2. T (n) = T(⎣n / 2⎦) + T(⎡n / 2⎤) + 17 n

Divide-and-conquer recurrences: master theorem

Master theorem. Let a ≥ 1, b ≥ 2, and c > 0 and suppose that T (n) is a

Case 1 If c < logb a, then T (n) = Θ(nlogb a )

Ex 3. T (n) = 48 T(⎣n / 4⎦) + n3

Master theorem need not apply

Gaps in master theorem.

Gaps in master theorem.

・Number of subproblems is not a constant.

Gaps in master theorem.

・Number of subproblems is not a constant.

・Number of subproblems is less than 1.

Gaps in master theorem.

・Number of subproblems is not a constant.

・Number of subproblems is less than 1.

・Work to divide and combine subproblems is not Θ(nc).

Consider the following recurrence. Which case of the master theorem?

A. Case 1: T(n) = Θ(nlog2 3) = O(n1.585).

B. Case 2: T(n) = Θ(n log n).

C. Case 3: T(n) = Θ(n).

D. Master theorem not applicable.

Consider the following recurrence. Which case of the master theorem?

A. Case 1: T(n) = Θ(nlog2 3) = O(n1.585).

B. Case 2: T(n) = Θ(n log n).

C. Case 3: T(n) = Θ(n).

D. Master theorem not applicable.