Multiple point multiplication - Point multiplication

Elliptic Curve Arithmetic

3.3 Point multiplication

3.3.3 Multiple point multiplication

One method to potentially speed the computation ofk P+l Qis simultaneous multiple point multiplication (Algorithm 3.48), also known asShamir’s trick. Ifkandlaret-bit numbers, then their binary representations are written in a 2×t matrix known as the exponent array. Given widthw, the valuesi P+ j Qare calculated for 0≤i,j<2^w. At each oft/wsteps, the accumulator receiveswdoublings and an addition from the table of valuesi P+j Qdetermined by the contents of a 2×wwindow passed over the exponent array; see Figure 3.7.

Figure 3.7. Simultaneous point multiplication accumulation step.

Algorithm 3.48 has an expected running time of approximately

* and requires storage for 2^2w−1 points.

Algorithm 3.48Simultaneous multiple point multiplication

Algorithm 3.48 can be improved by use of a sliding window. At each step, place-ment of a window of width at mostwis such that the right-most column is nonzero.

Precomputation storage is reduced by 2^2(w−1)−1 points. The improved algorithm is expected to havet/(w+(1/3))point additions in the evaluation stage, a savings of approximately 9% (in evaluation stage additions) compared with Algorithm 3.48 for w∈ {2,3}.

Joint sparse form

Ifkandl are each written in NAF form, then the expected number of zero columns in the exponent array increases, so that the expected number of additions in the evaluation stage of a suitably modified Algorithm 3.48 (processing one column at a time) is 5t/9.

The expected number of zero columns can be increased by choosing signed binary expansions ofk andl jointly. Thejoint sparse form (JSF)exponent array of positive integerskandl is characterized by the following properties.

1. At least one of any three consecutive columns is zero.

2. Consecutive terms in a row do not have opposite signs.

3. Ifk_j+1k_j =0 thenl_j₊₁ =0 andl_j =0. Ifl_j+1l_j =0 thenk_j+1 =0 andk_j =0.

The representation has minimal weight among all joint signed binary expansions, where the weight is defined to be the number of nonzero columns.

Example 3.49(joint sparse form) The following table gives exponent arrays for k= 53 andl=102.

binary NAF joint sparse form

k=53 0 1 1 0 1 0 1 0 1 0 −1 0 1 0 1 1 0 0 −1 0 −1 −1 l=102 1 1 0 0 1 1 0 1 0 −1 0 1 0 −1 0 1 1 0 1 0 −1 0

weight 6 8 5

If Algorithm 3.48 is modified to use JSF, processing a single column in each itera-tion, then t/2 additions (rather than 5t/9 using NAFs) are required in the evaluation stage. Algorithm 3.50 finds the joint sparse form for integersk¹ andk². Although it is written in terms of integer operations, in fact only simple bit arithmetic is required;

for example, evaluation modulo 8 means that three bits must be examined, andkⁱ/2 discards the rightmost bit.

Algorithm 3.50Joint sparse form

INPUT: Nonnegative integersk¹andk², not both zero.

OUTPUT: JSF(k²,k¹), the joint sparse form ofk¹andk².

The simultaneous and comb methods process multiple point multiplications using precomputation involving combinations of the points. Roughly speaking, if each pre-computed value involves only a single point, then the associated method is known as interleaving.

In the calculation of

k^jP_j for points P_j and integersk^j, interleaving allows dif-ferent methods to be used for eachk^jP_j, provided that the doubling step can be done jointly. For example, width-wNAF methods with different widths can be used, or some point multiplications may be done by comb methods. However, the cost of the doubling is determined by the maximum number of doublings required in the methods fork^jP_j, and hence the benefits of a comb method may be lost in interleaving.

Algorithm 3.51 is an interleaving method for computing_v

j=1k^jP_j, where a width-wjNAF is used onk^j. Pointsi P_jfor oddi<2^w^j⁻¹are calculated in a precomputation phase. The expansions NAF_w_j(k^j) are processed jointly, left to right, with a single doubling of the accumulator at each stage; Figure 3.8 illustrates the casev=2. The algorithm has an expected running time of approximately

lookup

Figure 3.8. Computing k¹P₁+k²P₂ using interleaving with NAFs. The point multiplication accumulation step is shown for the casev=2points. Scalark^j is written in width-wj NAF form.

Note 3.52(comparison with simultaneous methods) Consider the calculation ofk P+ l Q, wherekandlare approximately the same bitlength. The simultaneous sliding and interleaving methods require essentially the same number of point doublings regardless of the window widths. For a givenw, simultaneous sliding requires 3·2^2(w−1) points of storage, and approximately t/(w+(1/3)) point additions in the evaluation stage, while interleaving with width 2w+1 onkand width 2wonlrequires the same amount of storage, but only(4w+3)t/(4w²+5w+2) <t/(w+(1/2))additions in evalua-tion. Interleaving may also be preferable at the precomputation phase, since operations involving a known point Pmay be done in advance (encouraging the use of a wider width for NAF_w(k)), in contrast to the joint computations required in the simultaneous method. Table 3.6 compares operation counts for computingk P+l Qin the case that

P(but notQ) is known in advance.

In the case that storage for precomputation is limited to four points (including P and Q), interleaving with width-3 NAFs or use of the JSF give essentially the same performance, with interleaving requiring one or two more point doublings at the pre-computation stage. Table 3.6 gives some comparative results for small window sizes.

method w storage additions doubles

Alg 3.48 1 3 1+3t/4≈1+.75t t

Alg 3.48 2 15 9+15t/32≈9+.47t 2+t

Alg 3.48 with sliding 2 12 9+3t/7≈9+.43t 2+t

Alg 3.48 with NAF 4 2+5t/9≈2+.56t t

Alg 3.48 with JSF 4 2+t/2≈2+.5t t

interleave with 3-NAF 3,3 2+2 1+t/2≈1+.5t 1+t interleave with 5-NAF & 4-NAF 5,4 8+4 3+11t/30≈3+.37t 1+t

Table 3.6. Approximate operation counts for computingk P+l Q, wherekandlaret-bit integers.

The precomputation involving onlyPis excluded.

Interleaving can be considered as an alternative to the comb method (Algo-rithm 3.44) for computingk P. In this case, the exponent array forkis processed using interleaving (Algorithm 3.51), with k^j given byk =_w

j=1k^j2⁽^j^−1)d and points P_j given by P_j =2⁽^j−1)dP, 1≤ j ≤w, wheredis defined in Algorithm 3.44. Table 3.7 compares the comb and interleaving methods for fixed storage.

method rows storage additions doubles

comb 2 3 3t/8≈.38t t/2

interleave(3,3) 2 4 t/4≈.25t t/2

comb 4 15 15t/64≈.23t t/4

comb (two-table) 3 14 7t/24≈.29t t/6

interleave(4,4,4,4) 4 16 t/4≈.25t t/4 interleave(4,4,4,3,3) 5 16 11t/50≈.22t t/5

comb 5 31 31t/160≈.19t t/5

comb (two-table) 4 30 15t/64≈.23t t/8

interleave(5,5,5,4,4) 5 32 9t/50≈.18t t/5 interleave(5,5,4,4,4,4) 6 32 17t/90≈.19t t/6

Table 3.7. Approximate operation counts in comb and interleaving methods for computingk P, Pknown in advance. The bitlength ofkis denoted byt. The interleaving methods list the widths used on each row in calculating the NAF.

Dans le document Guide to Elliptic Curve Cryptography (Page 130-135)