Dr. Arunachalam V Associate Professor, SENSE Introduction
• Modular multiplication means computing A×B mod N, where A and B are
residues modulo N. • Of course, once the product C = A×B has been computed, it suffices to perform a modular reduction C mod N, which itself reduces to an integer division. • The algorithms presented here benefit from some precomputations involving N, and are thus specific to the case where several reductions are performed with the same modulus. • Also, some algorithms avoid performing the full product C = A×B ; one such example is McLaughlin’s algorithm. Precomputations and different algorithms • Algorithms with precomputations include Barrett’s algorithm, which computes an approximation to the inverse of the modulus, thus trading division for multiplication; Montgomery’s algorithm, which corresponds to Hensel’s division with remainder only, and its sub-quadratic variant, which is the LSB- variant of Barrett’s algorithm; and finally McLaughlin’s algorithm. • The cost of the precomputations is not taken into account: it is assumed to be negligible if many modular reductions are performed. • However, we assume that the amount of precomputed data uses only linear, that is O(logN), space. • As usual, we assume that the modulus N has n words in base β, that A and B have at most n words, and in some cases that they are fully reduced, i.e., 0≤ , < . Barrett’s algorithm • Barrett’s algorithm is attractive when many divisions have to be made with the same divisor; this is the case when one performs computations modulo a fixed integer. • The idea is to precompute an approximation to the inverse of the divisor. • Thus, an approximation to the quotient is obtained with just one multiplication, and the corresponding remainder after a second multiplication. • A small number of corrections suffice to convert the approximations into exact values. • For the sake of simplicity, we describe Barrett’s algorithm in base β, where β might be replaced by any integer, in particular 2n or β n. = 1980; = 36; = 64 = 4096 = 113
= (1980) = (30 × 64 + 60)
= 52; = 108 = 3 ×
= 55; =0
Theorem 2.4.1 Algorithm BarrettDivRem is correct and step 5 is performed
at most 3 times. Complexity of the algorithm
• The multiplications at steps 2 and 3 may be replaced by short products, more
precisely the multiplication at step 2 by a high short product, and that at step 3 by a low short product . • Barrett’s algorithm can also be used for an unbalanced division, when dividing + 1 words by n words for ≥ 2, which amounts to k divisions of 2n words by the same n-word divisor. • In this case, we say that the divisor is implicitly invariant. • In the FFT range, this cost might be lowered to 1.5M(n) using the “wrap- around trick”; moreover, if the forward transforms of I and B are stored, the cost decreases to M(n), assuming M(n) is the cost of three FFTs. Montgomery’s algorithm
• Montgomery’s algorithm is very efficient for modular arithmetic modulo a
fixed modulus N. • The main idea is to replace a residue by = , where is the “Montgomery form” corresponding to the residue A, with λ an integer constant such that , = 1. • Addition and subtraction are unchanged, since + = + . • The multiplication of two residues in Montgomery form does not give exactly what we want: ( ) ≠( ) . • The trick is to replace the classical modular multiplication by “Montgomery’s multiplication”: ′, ′ = . • For some values of λ, ′, ′ can easily be computed, in particular for = , where N uses n words in base . REDC & Fast REDC
• Algorithm 2.6 is a quadratic algorithm (REDC) to compute
′, ′ in this case, and a sub-quadratic reduction (FastREDC) is given in Algorithm 2.7. • Another view of Montgomery’s algorithm for = is to consider that it computes the remainder of Hensel’s division. • For example, with inputs C = 766 970 544 842 443 844, N = 862 664 913, and β = 1000, • Algorithm REDC precomputes μ = 23; then we have = 412, which yields ← + 412 = 766 970 900 260 388 000; • then = 924 , which yields ← + 924 = 767 768 002 640 000 000; • then = 720 , which yields ← + 720 = 1 388 886 740. • At step 4, R = 1 388 886 740, and since ≥ , REDC returns − = 526 221 827 Precomputation of µ • For example, N = 862 664 913, and β = 1000, • =− ⇒ =1 • Apply Euclid’s algorithm till the reminder is 1 • 1000 = 913 1 + 87 ⇒ 1000 + 913 −1 = 87 • 913 = 87 10 + 43 ⇒ 913 + 87 −10 = 43 • 87 = 43 2 + 1 ⇒ 87 + 43 −2 = 1 • Rewrite the factors in terms of β and least word of N • 87 + 913 + 87 −10 −2 = 1 ⇒ 913 −2 + 87 21 = 1 • 913 −2 + 1000 + 913 −1 21 = 1 ⇒ 1000 21 + 913 −23 = 1 • Therefore precomputed μ = 23;
Refer this video https://www.youtube.com/watch?v=shaQZg8bqUM for finding µ
Comparison with classical method • Compared to classical division (Algorithm BasecaseDivRem), • Montgomery’s algorithm has two significant advantages: • the quotient selection is performed by a multiplication modulo the word base , which is more efficient than a division by the most significant word of the divisor as in BasecaseDivRem; • and there is no repair step inside the for-loop — the repair step is at the very end. Reference 1. Chapter 2.4 of Richard P Brent and Paul Zimmerman, “Modern Computer Arithmetic”, Cambridge University Press 2010. Next Class