HAL Id: hal-01141347
https://hal.inria.fr/hal-01141347
Submitted on 11 Apr 2015
HAL is a multi-disciplinary open access
archive for the deposit and dissemination of sci-entific research documents, whether they are pub-lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.
RNS Modular Computations for Cryptographic Applications
Karim Bigou, Arnaud Tisserand
To cite this version:
Karim Bigou, Arnaud Tisserand. RNS Modular Computations for Cryptographic Applications. RAIM: 7ème Rencontre Arithmétique de l’Informatique Mathématique, Apr 2015, Rennes, France. 2015. �hal-01141347�
RNS Modular Computations for Cryptographic Applications
Karim Bigou & Arnaud Tisserand
1. Elliptic Curve Cryptography (ECC)
Elliptic curve over FP: y2 = x 3 + a x + b with P a `-bit prime
y2 = x3 + 4x + 20 over F1009
Security levels: ` ∈ {160, . . . , 600} bits Curve level operations:
I point addition (ADD): Q + Q0
I point doubling (DBL): Q + Q
I scalar multiplication:
[k ]Q = Q + Q + . . . + Q
| {z }
k times
Security (ECDLP): knowing Q and
[k ]Q, k cannot be recovered
ECDLP : Elliptic Curve Discrete Logarithm Problem
3. RNS Computation Flow in ECC Applications
RNS allows to perform some field level operations in parallel
mod m1mod m2 mod m3mod m4 mod m5 +, −, ×,−1 in Fp ADD, DBL [k]Q
±× over one channel over one RNS vector
(i.e. n channels)
base extension modulo P in RNS
1 n time n ±× ±× ±× ±× • • • ±× ±× ±× ±× • • • ±× ±× ±× ±× • • • ±× ±× ±× ±× • • • • • • • • • • • • • • • ±× ±× ±× ±× • • • ±× ±× ±× ±× • • • ±× ±× ±× ±× • • • ±× ±× ±× ±× • • •
5. New RNS Modular Inversion (MI) (CHES 2013)
State-of-the-art RNS MI methods:
I based on Fermat’s Little Theorem (FLT-MI): X −1 = X P−2 mod P i.e. a large exponentiation with a lot of modular reductions
which costs O(log2 P × n2) EMMs
I very limited parallelization due to internal data dependencies Proposed method PM-MI:
I extended binary Euclidean algorithm (binary-ternary version)
I uses the plus-minus trick:
if X and Y are odd then X + Y = 0 mod 4 or X − Y = 0 mod 4
I PM-MI works without BE and costs O(log2 P × n) EMMs
CTRL (shared) local reg. {@, en, r/w} Arithmetic Unit (6 pipeline stages) {rst, mode, . . . } w w w w w IN w OUT w cmp w = b1 = c−1 precomp. mult. ≈ 2n × w w @1 precomp. ri (×2) @2 d log 2 ri e precomp. add. 17 × w @3 w
Example: # EMMs for ` = 192 bits
n × w FLT-MI PM-MI Gain Factor
12 × 17 103140 5474 18 9 × 22 61884 4106 15 7 × 29 40110 3193 12 0 50 100 150 200 250 300 350 400 450 500 Inversion time [ µ s] 192 bits FLT−MI PM−MI
256 bits 384 bits 521 bits
4 5 6 7 8 9 10 7 8 9 10 11 12 speed up n 8 9 10 11 12 n 10 12 14 16 18 20 22 n 15 16 17 18 19 n 0 500 1000 1500 2000 2500 3000 3500 4000 7 9 12 slices FLT−MI 192 bits 7 9 12 PM−MI 192 bits 8 9 12 FLT−MI 256 bits 8 9 12 PM−MI 256 bits 0 10 20 30 40 50 60 70 80 7 9 12 # blocks (DSP / BRAM) n DSP BRAM 7 9 12 n 8 9 12 n 8 9 12 n 0 2000 4000 6000 8000 10000 12000 10 12 14 17 18 20 22 slices FLT−MI 384 bits 10 12 14 17 18 20 22 PM−MI 384 bits 15 16 19 FLT−MI 521 bits 15 16 19 PM−MI 521 bits 0 20 40 60 80 100 120 10 12 14 17 18 20 22 # blocks (DSP / BRAM) n DSP BRAM 10 12 14 17 18 20 22 n 15 16 19 n 15 16 19 n
2. Residue Number System (RNS)
X a large `-bit integer is represented by: − → X = (x1, . . . , xn) = (X mod m1, . . . , X mod mn) channel 1 ±× mod m1 w z1 w y1 w x1 channel 2 ±× mod m2 w z2 w y2 w x2
. . .
. . .
. . .
. . .
channel n ±× mod mn w zn w yn w xn X Y Z RNS base B = (m1, . . . , mn)n pairwise w -bit co-primes with n × w > `
The Chinese remainder
theorem (CRT) is the base of RNS
EMM elementary modular multiplication (w bits)
Pros:
I carry free between channels
I fast parallel +, −, × and some exact divisions
I non-positional number system, randomization against SCAs
I flexibility for hardware implementations
Cons:
I comparison, modular reduction and division are much harder
4. State-of-the-Art Algorithms and Architectures
RNS Montgomery Reduction
Input: −→X , −→X 0
Output: (−→ω , −→ω 0) with ω ≡ X × M−1 mod P
− → Q ←− −→X × (−−→P −1) (in base B) − → Q0 ←−BE(−→Q , B, B0) − → S 0 ←− −→X 0 + −→Q0 × −→P 0 (in base B0) − →ω 0 ←− −→S 0 × −→M−1 (in base B0) − →ω ←−BE(−→ω 0, B0, B) B B0 × • • × + × • • BE BE
BE: base extension M = Q mi channel 1 rower 1 w w channel 2 rower 2 w w
. . .
channel n rower n w w cox. . .
1 t w w Output Input n × w w w w w w w CTRL6. Fast Patterns for RNS Computations (ASAP 2014)
Cost of standard and modular multiplications in RNS:
I standard: n EMMs fully parallel
I modular: 2n2 + O(n) EMMs 1 mult. & 1 red.
Proposed method:
I splits operands into 2 parts: −→X = −−→(Kx) × −−−→(Ma) + −−→(Rx) allows to replace 2n moduli by only 32n
I reuses split result in various computation patterns
I requires an hypothesis on P: OK for ECC/DH, but not for RSA
Cost for some patterns (#EMMs):
Operations s-o-t-a our
AB mod P 2n2 + 4n 2.5n2 + 12.5n
A2 mod P 2n2 + 4n 1.75n2 + 10.5n
Cst ×A mod P 2n2 + 4n 1.75n2 + 7n
Cst ×A2 mod P 4n2 + 8n 2.75n2 + 16.5n
Usage for Diffie-Hellman or ElGamal:
0.7 0.8 0.9 1.0 1.1 1.2 10 20 30 40 50 60 70 Our / Ref n EMM Expo. LSBF 0.7 0.8 0.9 1.0 1.1 1.2 Our / Ref
EMM Expo. Montg.
base extension (BE) computations in 1 base SPLIT PR MR base Ba Xa Ya Ua Kx Ky Ry = Ya Rx = Xa Qa Sa base Bb Xb Yb Rx Kx Ry Ky Ub Qb Sb base Bc Xc Yc Rx Kx Ry Ky Uc Qc Sc
Funding from DGA-INRIA PhD grant and project PAVOIS ANR 12 BS02 002 01