HAL Id: hal-01197048
https://hal.inria.fr/hal-01197048
Submitted on 11 Sep 2015
HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.
Experimental Comparison of Crypto-processors Architectures for Elliptic and Hyper-Elliptic Curves
Cryptography
Gabriel Gallin, Arnaud Tisserand, Nicolas Veyrat-Charvillon
To cite this version:
Gabriel Gallin, Arnaud Tisserand, Nicolas Veyrat-Charvillon. Experimental Comparison of Crypto-
processors Architectures for Elliptic and Hyper-Elliptic Curves Cryptography. CryptArchi: 13th
International Workshops on Cryptographic Architectures Embedded in Reconfigurable Devices, Jun
2015, Leuven, Belgium. �hal-01197048�
Experimental Comparison of Crypto-processors Architectures for Elliptic and Hyper-Elliptic Curves
Cryptography
Gabriel GALLIN, Arnaud TISSERAND and Nicolas VEYRAT-CHARVILLON
CNRS – IRISA – University Rennes 1 – CAIRN HAH Project
13th CryptArchi Workshop: June 29-30th, 2015
Summary
1
Context & Motivations
2
Proposed Crypto-processor(s)
3
Experiments & Comparisons
4
Conclusion & Perspectives
G.Gallin - A.Tisserand - N.Veyrat-Charvillon Comparison of Architectures ECC/HECC CryptArchi, Jun. 29-30, 2015 2 / 18
Summary Context & Motivations Crypto-processor(s) Experiments & Comparisons Conclusion
Summary
1
Context & Motivations
2
Proposed Crypto-processor(s)
3
Experiments & Comparisons
4
Conclusion & Perspectives
G.Gallin - A.Tisserand - N.Veyrat-Charvillon Comparison of Architectures ECC/HECC CryptArchi, Jun. 29-30, 2015 2 / 18
Asymmetric Cryptography: ECC and HECC
Cryptographic primitives for protocols such as digital signature, exchange of secret keys and some specific encryption schemes Elliptic Curve Cryptography (ECC):
- Actual standard for public key crypto-systems - Better performanceandlower cost than RSA
Hyper-Elliptic Curve Cryptography (HECC):
- Evolution of ECC focusing a larger set of curves
- Studied for future generations of asymmetric crypto-systems
G.Gallin - A.Tisserand - N.Veyrat-Charvillon Comparison of Architectures ECC/HECC CryptArchi, Jun. 29-30, 2015 3 / 18
Summary Context & Motivations Crypto-processor(s) Experiments & Comparisons Conclusion
Operations for (H)ECC
HAH Project, IRISA–IRMAR
Hardware and Arithmetic for Hyperelliptic Curves Cryptography
1. Elliptic Curve Cryptography (ECC)
encryption signature
key gen.
protocollevel etc [k]P
ADD(P,Q) DBL(P)
P+P
curvelevel
x±y x×y . . .
fieldlevel
E:y2=x3+4x+20 over GF(1009) Points onE:P,Q= (x,y)or(x,y,z) Coordinates:x,y,z∈GF(·) GF(p), GF(2m),t:160–600bits k= (kt−1kt−2. . .k1k0)2∈N Scalar multiplication operation for i from 0 to t−1 do
if ki=1 then Q=ADD(P,Q) P=DBL(P)
Point addition/doubling operations sequence of finite field operations DBL:v1=z12,v2=x1−v1, . . . ADD:w1=z12,w2=z1×w1, . . . GF(p) or GF(2m) operations
operation modulo large prime (GF(p)) or irreducible polynomial (GF(2m))
2. Side Channel Attacks (SCAs)
DBL DBL DBL ADD DBL ADD DBL DBL
0 0 0 1 1 0
Side channels:
IPower consumption IElectromagnetic radiation IComputation timings
Attacks:
ISimple analysis
IDifferential analysis (statistics) ITemplates and learning
3. Protections & Counter-Measures Against SCAs IUniform comp. durations
IUniform power/EM profile IRandom behavior ICircuit reconfiguration Idetection/correction codes IAdd noise (!)
Example: use redundant number systems k
R1(k) [R1(k)]P
R2(k) [R2(k)]P
R3(k) [R3(k)]P
R4(k) [R4(k)]P
R5(k) [R5(k)]P
R6(k) [R6(k)]P
. . .
. . . Random recoding:∀i [Ri(k)]P= [k]P
4. From ECC to HECC
field size ADD DBL
ECC `bits mulRXOUT
subRYOUT
mulRZOUT
PZ
mul PZ
mul PZ
mul PZ
PXPXmul
PYPYmul
QYQY
QXQX
QZ
QZ QZ
QZ
mul v18
addv12add
sub v13
mul v10
v10
v10 mulv11v11
sub v11
mul v16
v17
v14 v14
sub v0
v1 v1
v2 v2 sqr v2 sub v3
v4 v4
v5
sqr v5
mul v6
v7
v7 v8
v9 v9
Cost: 12M + 2S
mulRXOUT
sub OUT
RY
addRZOUT
PZ
mul PZ
mul PZ
PX
sqr PX
mul PX PY
PY
mul PY
aa
addv18add
v18
add v19 v19
sub add v12v12
sub v12
v13
addv10v10add
v10 v11
mul v16 sqrv17v17
v15 mulv23v23add sqr
v22
v20
add v25
v25
v24v24
add v0
add v1v1
add v1
v2 v3
v4 sqr v4
v5 v6 v6
v6
v6
v7v7 add
v8v8 v9v9
Cost: 6M + 5S
HECC 2`bits
mulRU0OUT
mulRU1OUT
addRV0OUT
addRV1OUT
mulRZOUT
PZ
mul PZ
mul
PZ mul
PZ add PZ
mul
PZ
mul PZ
mul PZ
mul PZ
QV0 QV0 QV1
QV1
QU1 add QU1 QU1
QU0QU0
QU0 PU0
mul PU0
mul PU0
mul PU0
PU1
PU1mul
PU1 mul PU1
PV1PV1mul
QZ
mul QZ QZ QZ QZ QZ
PV0PV0 sub
v18
mul
v18 add v19
mul
v19 add v12
mul v13 mul v13
mul v10 sqr
v11
addv16mul
v17
sqr v14 mul v14
mul v14
v15
addv85
mul v84
mul v87
add
v86
add sub v81
mulv80
subv83
v82 sqr
v69
v69
mul v69
mul
sub v68 sub
v67 mul v67 sub
v66 v66
sub v65
add v64
v64
mulv63
v63
sub v61
v60 v60
add v78
v79
v74 v76
mul v77
mul v70
v70 v71
v72 mul
v73
v73
v23 mul
mul v41 sub
v40
mul v43 v43
v43 v43 mul v43
mul
v43 add v42
add mul
v45
add v44 sub
add
v47 add v46 v49 v48
sub
v22 mul v22 v21 v21v21
v21 v20
mul
v27 v26 v26
add v25
sub v25
sub v24 v29
v28 mul
v56 add v56v56
sub v57
add v54
v54 v54 v52
v53 v50
v51
v58 v58
v59
v30 v30
v30 v30
mul v30
mul v30
v31 v31 v31
v31 v32
v32
v32 v32
v33 mul v34
v35 v35
sqr v35
add v35v35
v36 add
v37
v37 v38
v39 v0 v0
v0 v0
v0
v1 sub
v1 v2
v3
v3 v3 sub v4
v5 v5 v5
v6
v6 v6 v6
v6 v6
v6 v6
add v7
v8 v9
v9 v9
Cost: 47M + 4S
mulOUT
RU0
mulOUT
RU1
subRV0OUT
subOUT
RV1
mulOUT
RZ
PZ mul PZ
mul
PZ sub
PZ
mul
PZ mul PZ
sub PZ
mul PZ
mul PZ
mul
PZ mul
PZ
sqr PZ
mul PZ mul PZ PU0
add PU0PU0
mul PU0 add PU0
mul PU0
mul PU0 PU1
sqr PU1
add PU1
mul PU1 PU1
mul PU1
mul PU1
PU1
Z
sub
Z
PV1
mul
PV1 sqr PV1
add PV1 PV1 PV0
mul
PV0 add PV0PV0
add sub v18
add v18v18
v19
add mul
v12 add v13v13
v13 add v13
mul mul
v10
sqr v10
mul v10 v11
mul
v11 v16 v17
v14
v15
v15 v15
v80 v69
mulv68v68
subv67
mul v66 v65 add v65
sqr v64 v64
mul v64
mul v63
addv62
mul v62
sub v61 v60 v60
add v78
v79 mul
sub v74
v75
sub v76
v77
v71 add
v72 v73
v41
v40 v40 v40
v40 sub
sqr v43
mul v43
v42
mul add v45
add v44
mulv47
v46
v49 v48 sub
v23 v22 v21
add v20 sub v27
add v26
mul v26
v25
v24 v29 v28
sub v56
v57 v57
v57 v54
add v54
v54 v55
v53 v53
v53
v50 v51 v51 v51
mul v59
v59
v59
v30 v30
v31 sub v32
v33 add
v33 v34 v34
add v34
v35
v36
v37
v38 v38 v38
v38 v39
v39 v39
v0 v0
v1
mul
v1
sub
v2
v3 v4 v4
v4
add
v5
v6 v6
sqr
v6 v7 v8 v9
v9
Cost: 38M + 6S Examples of computation expressions for projective coordinates
5. HAH Project Objectives
IEfficient algorithms and representations for HECC IHECC protections against SCAs (passive and active) IFast, low-power and secure hardware implementations (open
source hardware code and programming tools) IIntensive security evaluation using our SCA setup
6. Developed Crypto-Processor(s) from PAVOIS ANR Project
AU1 AU2 AU3 points
CTRL mem.
@coord.
prg.mem.
inst.
@inst. key recode
ki k
inst. 21 bits address control datawbits
scalar word digit
IArithmetic Units (AUs):±,×,÷over GF(p)/GF(2m) various configurations (area vs speed, internal protection) IVarious key recoding methods (and dedicated units)
IConfiguration: field size, internal word size, #AUs, type(AUs) ICircuit/architecture level protections
7. Programming Tools for Our Crypto-Processor(s)
HW modules
. . .
configurations
CAD tools selection
user crypto. lib.
assembler binary code implementation
compiler Sage
API/TLS-SSL commands
8. Implementation Results on FPGA
XC6SLX75 FPGA, GF(p), 256-bit ECC or 128-bit HECC, internal word sizew=32 bits
Recoding units:
Recoding BIN NAF-2 NAF-3 NAF-4
area slices (FF/LUT) 565(1321/1461)570(1340/1479)571(1344/1495)503(1348/1489)
freq. (MHz) 225 228 237 217
Area/speed trade-offs for ECC and HECC configurations:
#mult. BRAM mult. 1 col. mult. 2 col. mult. 4 col.
ECC 1 2 503(1348/1489) 217 626(1450/1643) 230 694(1649/1891) 211 2 2 689(1744/1894) 219 754(1948/2208) 234 931(2345/2712) 220 3 2 809(2146/2245) 205 942(2449/2704) 222 1105(3046/3436) 222 HECC 1 2 522(1344/1405) 228 520(1434/1535) 217
2 2 634(1746/1786) 226 689(1926/2055) 220 area freq.
4 2 852(2552/2531) 201 917(2912/3045) 195 slices (FF/LUT) MHz 8 2 1347(4145/3882)204 1601(4865/4928)209
9. Algorithms and Architecture Impacts on SCAs
Activity traces from CABA1simulations (after filtering) for several configurations of the field multiplier (area/speed)
small/slow medium/medium large/fast
ADD
0 200 400 600 800 1000 1200
0 5000 10000 15000 20000 25000
activity [#transitions]
time [clock cycles]
0 200 400 600 800 1000 1200
0 2000 4000 6000 8000 10000 12000 14000 16000
activity [#transitions]
time [clock cycles]
0 200 400 600 800 1000 1200
0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000
activity [#transitions]
time [clock cycles]
DBL
0 200 400 600 800 1000 1200
0 5000 10000 15000 20000 25000
activity [#transitions]
time [clock cycles]
0 200 400 600 800 1000 1200
0 2000 4000 6000 8000 10000 12000 14000
activity [#transitions]
time [clock cycles]
0 200 400 600 800 1000 1200
0 1000 2000 3000 4000 5000 6000 7000 8000 9000
activity [#transitions]
time [clock cycles]
1Cycle Accurate Bit Accurate (i.e.simulations close to real power measurements)
http://h-a-h.inria.fr/
Metric for algorithms efficiency: number of multiplications (M) and squares (S) inFp
G.Gallin - A.Tisserand - N.Veyrat-Charvillon Comparison of Architectures ECC/HECC CryptArchi, Jun. 29-30, 2015 4 / 18
ECC versus HECC
size of F
p& scalar ADD DBL source
ECC `
ECC12M + 2S 7M + 3S [1]
HECC `
HECC≈
12`
ECC40M + 4S 38M + 6S [3]
ECC:
- Size ofFp and scalar 2×larger - SimplerADDandDBLoperations
HECC:
- SmallerFp and scalar
- More operations inFp forADDandDBL
In theory, HECC naturally better than ECC
G.Gallin - A.Tisserand - N.Veyrat-Charvillon Comparison of Architectures ECC/HECC CryptArchi, Jun. 29-30, 2015 5 / 18
Summary Context & Motivations Crypto-processor(s) Experiments & Comparisons Conclusion
Objectives
Design hardware crypto-processors based on customisable architecture Implementation of small circuits on FPGA and ASIC
Study various trade-offs between:
- Computation speed - Area cost
- Energy consumption and protection against physical attacks
Experimental comparisons of different architecture configurations
→ choice of the architecture parameters for the crypto-processor
G.Gallin - A.Tisserand - N.Veyrat-Charvillon Comparison of Architectures ECC/HECC CryptArchi, Jun. 29-30, 2015 6 / 18
Summary
1
Context & Motivations
2
Proposed Crypto-processor(s)
3
Experiments & Comparisons
4
Conclusion & Perspectives
G.Gallin - A.Tisserand - N.Veyrat-Charvillon Comparison of Architectures ECC/HECC CryptArchi, Jun. 29-30, 2015 6 / 18
Summary Context & Motivations Crypto-processor(s) Experiments & Comparisons Conclusion
Developed Processor
AU1 AU2 AU3 Points
CTRL Mem.
address coord.
Prg.Mem.
instruction addr
Recoding Unit
ki
k
25-bit instruction address control signals scalar
word digit w-bit data word
Customizable number of arithmetic units over F
p: ± , × , ÷
→ n
Mmultipliers of size n
Bw : size of data words
G.Gallin - A.Tisserand - N.Veyrat-Charvillon Comparison of Architectures ECC/HECC CryptArchi, Jun. 29-30, 2015 7 / 18
Architecture Parameters
w = 32 bits, for small processors
1 adder/subtracter ( ± ), small and fast 1 inverter ( ÷ ), sufficient for the computations n
Mmultipliers ( × ):
- Based on Montgomery algorithm for modular multiplication [5]
- nB: number of parallel active words in the multiplier - 3-stage pipeline
Classical key recoding techniques from literature:
→ standard binary, window λNAF methods with λ ∈ { 2...4 }
G.Gallin - A.Tisserand - N.Veyrat-Charvillon Comparison of Architectures ECC/HECC CryptArchi, Jun. 29-30, 2015 8 / 18
Summary Context & Motivations Crypto-processor(s) Experiments & Comparisons Conclusion
Summary
1
Context & Motivations
2
Proposed Crypto-processor(s)
3
Experiments & Comparisons
4
Conclusion & Perspectives
G.Gallin - A.Tisserand - N.Veyrat-Charvillon Comparison of Architectures ECC/HECC CryptArchi, Jun. 29-30, 2015 8 / 18
Details of the Experiments
Objective: compare versions of the processor for various (n
M, n
B) Implementation on Xilinx Spartan 6 LX75 FPGA
No DSP blocks (for ASIC compatibility) Design tools:
- VHDL and assemblycode generation: Python scripts - Design implementation: Xilinx design environment ISE 14.6 - Simulations of complete scalar multiplications with Modelsim
Theoretical validation with SAGE (with more than 10k vectors)
Implementation: translate, map, place and route of full processor Same optimisation efforts for ECC and HECC
G.Gallin - A.Tisserand - N.Veyrat-Charvillon Comparison of Architectures ECC/HECC CryptArchi, Jun. 29-30, 2015 9 / 18
Summary Context & Motivations Crypto-processor(s) Experiments & Comparisons Conclusion
Impact of Key Recoding Unit
Various recoding techniques proposed to reduce the number of curve operations:
- BIN: standard binary from left to right - NAF: non-adjacent form
- λNAF: window methods withλ∈ {3,4}
Implementation results for an ECC processor (n
M= 1, n
B= 1)
recoding BIN NAF 3NAF 4NAF
area, slices (FF/LUT) 517 (1347/1433) 536 (1366/1445) 560 (1370/1454) 547 (1374/1460)
frequency, MHz 229 234 235 231
G.Gallin - A.Tisserand - N.Veyrat-Charvillon Comparison of Architectures ECC/HECC CryptArchi, Jun. 29-30, 2015 10 / 18
Results for ECC (256 bits) and HECC (128 bits) (1/2)
nM
BRAM
nB= 1 nB= 2 nB= 4
area freq. area freq. area freq.
slices (FF/LUT) MHz slices (FF/LUT) MHz slices (FF/LUT) MHz
ECC
1 3 547 (1374/1460) 231 573 (1476/1625) 233 673 (1674/1875) 233 2 3 722 (1776/1903) 220 811 (1979/2210) 227 942 (2377/2701) 220 3 3 810 (2174/2236) 221 915 (2480/2698) 215 1130 (3077/3430) 214 4 3 952 (2569/2656) 215 1100 (2977/3282) 217 1512 (3771/4293) 216 5 3 1064 (2982/3136) 210 1405 (3492/3902) 206 1722 (4487/5122) 209
HECC
1 4 514 (1336/1374) 235 549 (1434/1513) 234 2 4 646 (1716/1783) 220 737 (1912/2055) 234 3 4 732 (2092/2075) 224 826 (2386/2485) 225 4 4 870 (2476/2424) 218 1022 (2868/2987) 214 5 4 976 (2865/2773) 219 1115 (3355/3465) 210 6 4 1089 (3233/3092) 203 1240 (3821/3908) 208 7 4 1145 (3601/3426) 213 1372 (4287/4365) 205 8 4 1281 (3981/3809) 191 1552 (4765/4890) 183 9 4 1379 (4363/4051) 202 1691 (5245/5277) 199 10 4 1543 (4739/4435) 196 1856 (5719/5801) 198 11 4 1547 (5114/4750) 189 1936 (6192/6240) 198 12 4 1738 (5499/5128) 191 2100 (6675/6771) 188
G.Gallin - A.Tisserand - N.Veyrat-Charvillon Comparison of Architectures ECC/HECC CryptArchi, Jun. 29-30, 2015 11 / 18
Summary Context & Motivations Crypto-processor(s) Experiments & Comparisons Conclusion
Results for ECC (256 bits) and HECC (128 bits) (2/2)
Average computation time (ms) for 50 [k]P:
nB nM
1 2 3 4 5 6 7 8 9 10 11 12
HECC 1 15.6 8.6 5.7 4.7 3.9 3.7 3.3 3.6 3.4 3.5 3.6 3.6 2 11.9 6.2 4.5 3.6 3.2 2.8 2.8 3.0 2.7 2.7 2.8 2.9 ECC
1 28.1 15.3 12.4 12.4 12.7 2 17.7 9.6 8.3 8.0 8.4 4 11.1 6.2 5.4 5.1 5.3
Standard deviation for 1000 [k]P:
configuration ECC (1,1) ECC (3,4) HECC (1,1) HECC (6,2)
average time [ms] 28.2 5.3 15.5 2.8
std. deviation [ms] 0.289 0.056 0.324 0.045
G.Gallin - A.Tisserand - N.Veyrat-Charvillon Comparison of Architectures ECC/HECC CryptArchi, Jun. 29-30, 2015 12 / 18
Comparison of Various Configurations for our Processor
area [slices]
computation time [ms]
ECC HECC
600 800 1000 1200 1400 1600 1800 2000 2200
5 10 15 20 25 30
5,4 5,2
5,1
4,4 4,2
4,1
3,4 3,2
3,1
2,4 2,2 2,1
1,4 1,2 1,1
12,1 11,2 12,2
11,1 10,2
10,1 9,2
9,1 8,2 8,1
7,2 7,1
6,2 6,1
5,2 5,1
4,2 4,1 3,2 3,1 2,2 2,1 1,2 1,1
(nM,nB)
G.Gallin - A.Tisserand - N.Veyrat-Charvillon Comparison of Architectures ECC/HECC CryptArchi, Jun. 29-30, 2015 13 / 18
Summary Context & Motivations Crypto-processor(s) Experiments & Comparisons Conclusion
Relative Efficiency
utilisation ratiohardware add.costspeedup
ECC HECC
0 2040 6080 100 1 2 3 0 1 2 3 4 5
1,1 1,2 1,4 2,4 3,4 4,4 1,1 1,2 2,1 3,1 3,2 5,2 8,2 (nM,nB)
G.Gallin - A.Tisserand - N.Veyrat-Charvillon Comparison of Architectures ECC/HECC CryptArchi, Jun. 29-30, 2015 14 / 18
Comparison with Literature
Source FPGA area freq. [k]P duration
slices / DSP MHz ms ECC 1,2
Spartan 6
573 / 0 233 17.7
ECC 1,4 673 / 0 233 11.1
ECC 2,4 942 / 0 220 6.2
ECC 3,4 1 130 / 0 214 5.4
[4] Virtex-5 1 725 / 37 291 0.38
Virtex-4 4 655 / 37 250 0.44
[2] Virtex-4 13 661 / 0 43 9.2
20 123 / 0 43 7.7
G.Gallin - A.Tisserand - N.Veyrat-Charvillon Comparison of Architectures ECC/HECC CryptArchi, Jun. 29-30, 2015 15 / 18
Summary Context & Motivations Crypto-processor(s) Experiments & Comparisons Conclusion
Summary
1
Context & Motivations
2
Proposed Crypto-processor(s)
3
Experiments & Comparisons
4
Conclusion & Perspectives
G.Gallin - A.Tisserand - N.Veyrat-Charvillon Comparison of Architectures ECC/HECC CryptArchi, Jun. 29-30, 2015 15 / 18
Conclusion
Implementation of ECC/HECC processors ⇒ cost/performance trade-offs
We are able to select the appropriate number of multipliers and their size
Experimental results: HECC always has better performance compared to ECC
G.Gallin - A.Tisserand - N.Veyrat-Charvillon Comparison of Architectures ECC/HECC CryptArchi, Jun. 29-30, 2015 16 / 18
Summary Context & Motivations Crypto-processor(s) Experiments & Comparisons Conclusion
Perspectives
Improve arithmetic units (especially add DSP blocks)
Study resistance to physical attacks (SCA
1, faults injection) of processors with good time/area trade-offs
1Side Channel Attacks
G.Gallin - A.Tisserand - N.Veyrat-Charvillon Comparison of Architectures ECC/HECC CryptArchi, Jun. 29-30, 2015 17 / 18
This work is partly funded by
PAVOISproject (ANR-12-BS02-002-01)http://pavois.irisa.fr/
HAHprojecthttp://h-a-h.inria.fr/
Thank you for your attention
G.Gallin - A.Tisserand - N.Veyrat-Charvillon Comparison of Architectures ECC/HECC CryptArchi, Jun. 29-30, 2015 18 / 18
Summary Context & Motivations Crypto-processor(s) Experiments & Comparisons Conclusion
References
[1] D. J. Bernstein and T. Lange.
Explicit-formulas database.
http://hyperelliptic.org/EFD/.
[2] S. Ghosh, M. Alam, D. Roychowdhury, and I.S. Gupta.
Parallel crypto-devices for GF(p) elliptic curve multiplication resistant against side channel attacks.
Computers and Electrical Engineering, 35(2):329–338, March 2009.
[3] T. Lange.
Formulae for Arithmetic on Genus 2 Hyperelliptic Curves.
Applicable Algebra in Engineering, Communication and Computing, 15(5):295–328, February 2005.
[4] Y. Ma, Z. Liu, W. Pan, and J. Jing.
A high-speed elliptic curve cryptographic processor for generic curves over GF(p).
InProc. 20th International Workshop on Selected Areas in Cryptography (SAC), volume 8282 ofLNCS, pages 421–437.
Springer, August 2013.
[5] P. L. Montgomery.
Modular multiplication without trial division.
Mathematics of Computation, 44(170):519–521, April 1985.
G.Gallin - A.Tisserand - N.Veyrat-Charvillon Comparison of Architectures ECC/HECC CryptArchi, Jun. 29-30, 2015 19 / 18
Summary Context & Motivations Crypto-processor(s) Experiments & Comparisons Conclusion
Implementation Flow
HAH Project, IRISA–IRMAR
Hardware and Arithmetic for Hyperelliptic Curves Cryptography
1. Elliptic Curve Cryptography (ECC)
encryption signature
key gen.
pr otocol le vel
etc[k]P
ADD(P,Q) DBL(P)
P+P
cur ve le vel
x±y x×y . . .
field le vel
E:y2=x3+4x+20 over GF(1009) Points onE:P,Q= (x,y)or(x,y,z) Coordinates:x,y,z∈GF(·)
GF(p), GF(2m),t :160–600bits k= (kt−1kt−2. . .k1k0)2∈N Scalar multiplication operation for i from 0 to t−1 do
if ki=1 then Q=ADD(P,Q) P=DBL(P)
Point addition/doubling operations sequence of finite field operations DBL:v1=z12,v2=x1−v1, . . . ADD:w1=z12,w2=z1×w1, . . . GF(p) or GF(2m) operations
operation modulo large prime (GF(p)) or irreducible polynomial (GF(2m))
2. Side Channel Attacks (SCAs)
DBL DBL DBL ADD DBL ADD DBL DBL
0 0 0 1 1 0
Side channels:
I Power consumption I Electromagnetic radiation I Computation timings
Attacks:
I Simple analysis
I Differential analysis (statistics) I Templates and learning
3. Protections & Counter-Measures Against SCAs I Uniform comp. durations
I Uniform power/EM profile I Random behavior
I Circuit reconfiguration I detection/correction codes I Add noise (!)
Example: use redundant number systems k
R1(k) [R1(k)]P
R2(k) [R2(k)]P
R3(k) [R3(k)]P
R4(k) [R4(k)]P
R5(k) [R5(k)]P
R6(k) [R6(k)]P
. . .
. . . Random recoding:∀i [Ri(k)]P= [k]P
4. From ECC to HECC
field size ADD DBL
ECC ` bits
mulRXOUTsubRYOUT
mulRZOUT
PZ
mul PZ
mul PZ
mul PZ
PXPXmul
PYPYmul
QYQY
QXQX
QZ
QZ QZ
QZ
mul v18
addv12add
sub v13
mul v10
v10
v10 mulv11v11
sub v11
mul v16
v17
v14 v14
sub v0
v1 v1
v2 v2 sqr v2 sub v3
v4 v4
v5
sqr v5
mul
v6 v7
v7 v8
v9 v9
Cost: 12M + 2S
mulRXOUT
subRYOUT
addRZOUT
PZ
mul PZ
mul PZ
PX
sqr PX
mul PX
PYPY
mul PY
aa
addv18v18add
add v19 v19
sub add v12v12
sub v12
v13
addv10v10add
v10 v11
mul v16 sqrv17v17
v15
mulv23add
v23
sqrv22
v20
add v25
v25 v24 v24
add v0
add v1v1
add
v1v2
v3 v4 sqr v4
v5 v6 v6 v6 v6
v7
v7 v8v8addv9v9
Cost: 6M + 5S
HECC
2`bits
mulRU0OUT
mulRU1OUT
addRV0OUT
addRV1OUT
mulRZOUT
PZ
mul PZ
mul
PZ mul
PZ add PZ
mul
PZ
mul PZ
mul PZ
mul PZ
QV0 QV0 QV1
QV1
QU1
add QU1 QU1
QU0QU0
QU0 PU0
mul PU0
mul PU0
mul PU0
PU1
PU1mul
PU1 mul PU1
PV1PV1mul
QZ
mul QZ QZ QZ QZ QZ
PV0PV0 sub
v18
mul
v18 add v19
mul
v19 add v12
mul v13 mul v13
mul v10 sqr
v11
addv16mul
v17
sqr v14 mul v14
mul v14
v15
add v85
mul v84
mul v87
add
v86
add sub v81
mulv80
subv83
v82 sqr
v69
v69
mul v69
mul
sub v68 sub
v67 mul v67 sub
v66 v66
sub v65
add v64
v64
mulv63
v63
sub v61
v60 v60
add
v78
v79
v74 v76
mul v77
mul v70
v70 v71
v72 mul
v73
v73
v23
mul
mul v41 sub
v40
mul v43 v43
v43 v43 mul v43
mul
v43 add v42
add mul
v45
add
v44 sub
add
v47 add v46 v49 v48
sub
v22 mul v22 v21 v21v21
v21 v20
mul
v27 v26 v26
add v25
sub
v25
sub v24 v29 v28
mul v56 add v56v56
sub v57
add v54
v54 v54 v52
v53 v50
v51
v58 v58
v59
v30 v30
v30 v30
mul v30
mul v30
v31 v31 v31
v31 v32
v32
v32 v32
v33
mul v34
v35 v35
sqr v35
add v35v35
v36 add
v37
v37 v38
v39 v0 v0
v0 v0
v0
v1 sub
v1
v2 v3
v3 v3 sub v4
v5 v5 v5
v6
v6 v6 v6
v6 v6
v6 v6
add v7
v8 v9
v9 v9
Cost: 47M + 4S
mulRU0OUT
mulRU1OUT
subRV0OUT
subRV1OUT
mulRZOUT
PZ mul PZ
mul
PZ sub
PZ mul
PZ mul PZ
sub PZ
mul PZ
mul PZ
mul
PZ mul
PZ
sqr PZ
mul PZ mul PZ PU0
add PU0 PU0
mul PU0
add
PU0
mul PU0
mul PU0 PU1
sqr PU1
add PU1
mul PU1
PU1
mul PU1
mul PU1
PU1
Z
sub
Z
PV1
mul
PV1 sqr
PV1
add PV1 PV1 PV0
mul
PV0 add
PV0PV0 add
sub v18
add v18v18
v19
add mul
v12 add v13v13 v13
add v13
mul mul
v10
sqr v10
mul v10 v11
mul
v11 v16 v17
v14
v15
v15 v15
v80 v69
mulv68v68
subv67
mul v66 v65 add v65
sqrv64 v64
mul v64
mul v63
addv62
mul v62 sub
v61 v60 v60
add v78
v79 mul
sub v74 v75
sub v76
v77
v71 add
v72 v73
v41
v40 v40 v40
v40 sub
sqr v43
mul v43 v42
mul add v45
add v44
mulv47
v46
v49 v48 sub
v23 v22
v21 add
v20 sub v27
add v26
mul v26
v25
v24 v29 v28
sub v56
v57 v57
v57 v54
add v54
v54 v55
v53 v53
v53
v50 v51 v51 v51
mul v59
v59 v59
v30 v30
v31 sub v32
v33 add
v33 v34 v34
add v34
v35 v36
v37
v38 v38 v38 v38
v39
v39 v39
v0
v0
v1
mul v1
sub v2 v3
v4 v4
v4 add v5
v6 v6
sqr v6
v7 v8
v9 v9
Cost: 38M + 6S Examples of computation expressions for projective coordinates
5. HAH Project Objectives
I Efficient algorithms and representations for HECC I HECC protections against SCAs (passive and active)
I Fast, low-power and secure hardware implementations (open source hardware code and programming tools)
I Intensive security evaluation using our SCA setup
6. Developed Crypto-Processor(s) from PAVOIS ANR Project
AU1 AU2 AU3 points
CTRL mem.
@coord.
prg.mem.
inst.
@inst.
key recode
ki
k
inst. 21 bits address control datawbits
scalar word digit
I Arithmetic Units (AUs): ± , × , ÷ over GF(p)/GF(2
m)
various configurations (area vs speed, internal protection) I Various key recoding methods (and dedicated units)
I Configuration: field size, internal word size, #AUs, type(AUs) I Circuit/architecture level protections
7. Programming Tools for Our Crypto-Processor(s)
HW modules. . .
configurations
CAD tools selection
user crypto. lib.
assembler binary code implementation
compiler Sage
API/TLS-SSL commands
8. Implementation Results on FPGA
XC6SLX75 FPGA, GF(p), 256-bit ECC or 128-bit HECC, internal word sizew=32 bits
Recoding units:
Recoding BIN NAF-2 NAF-3 NAF-4
area slices (FF/LUT) 565(1321/1461) 570(1340/1479)571(1344/1495)503(1348/1489)
freq. (MHz) 225 228 237 217
Area/speed trade-offs for ECC and HECC configurations:
#mult. BRAM mult. 1 col. mult. 2 col. mult. 4 col.
ECC 1 2 503(1348/1489) 217 626(1450/1643) 230 694(1649/1891) 211 2 2 689(1744/1894) 219 754(1948/2208) 234 931(2345/2712) 220 3 2 809(2146/2245) 205 942(2449/2704) 222 1105(3046/3436) 222 HECC 1 2 522(1344/1405) 228 520(1434/1535) 217
2 2 634(1746/1786) 226 689(1926/2055) 220 area freq.
4 2 852(2552/2531) 201 917(2912/3045) 195 slices (FF/LUT) MHz 8 2 1347(4145/3882)204 1601(4865/4928)209
9. Algorithms and Architecture Impacts on SCAs
Activity traces from CABA
1simulations (after filtering) for several configurations of the field multiplier (area/speed)
small/slow medium/medium large/fast
ADD
0 200 400 600 800 1000 1200
0 5000 10000 15000 20000 25000
activity [#transitions]
time [clock cycles]
0 200 400 600 800 1000 1200
0 2000 4000 6000 8000 10000 12000 14000 16000
activity [#transitions]
time [clock cycles]
0 200 400 600 800 1000 1200
0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000
activity [#transitions]
time [clock cycles]
DBL
0 200 400 600 800 1000 1200
0 5000 10000 15000 20000 25000
activity [#transitions]
time [clock cycles]
0 200 400 600 800 1000 1200
0 2000 4000 6000 8000 10000 12000 14000
activity [#transitions]
time [clock cycles]
0 200 400 600 800 1000 1200
0 1000 2000 3000 4000 5000 6000 7000 8000 9000
activity [#transitions]
time [clock cycles]
1Cycle Accurate Bit Accurate (i.e.simulations close to real power measurements)
http://h-a-h.inria.fr/
G.Gallin - A.Tisserand - N.Veyrat-Charvillon Comparison of Architectures ECC/HECC CryptArchi, Jun. 29-30, 2015 20 / 18