Experimental Comparison of Crypto-processors Architectures for Elliptic and Hyper-Elliptic Curves Cryptography

(1)

HAL Id: hal-01197048

https://hal.inria.fr/hal-01197048

Submitted on 11 Sep 2015

HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

Experimental Comparison of Crypto-processors Architectures for Elliptic and Hyper-Elliptic Curves

Cryptography

Gabriel Gallin, Arnaud Tisserand, Nicolas Veyrat-Charvillon

To cite this version:

Gabriel Gallin, Arnaud Tisserand, Nicolas Veyrat-Charvillon. Experimental Comparison of Crypto-

processors Architectures for Elliptic and Hyper-Elliptic Curves Cryptography. CryptArchi: 13th

International Workshops on Cryptographic Architectures Embedded in Reconfigurable Devices, Jun

2015, Leuven, Belgium. �hal-01197048�

(2)

Experimental Comparison of Crypto-processors Architectures for Elliptic and Hyper-Elliptic Curves

Cryptography

Gabriel GALLIN, Arnaud TISSERAND and Nicolas VEYRAT-CHARVILLON

CNRS – IRISA – University Rennes 1 – CAIRN HAH Project

13th CryptArchi Workshop: June 29-30th, 2015

(3)

Summary

1

Context & Motivations

2

Proposed Crypto-processor(s)

3

Experiments & Comparisons

4

Conclusion & Perspectives

G.Gallin - A.Tisserand - N.Veyrat-Charvillon Comparison of Architectures ECC/HECC CryptArchi, Jun. 29-30, 2015 2 / 18

(4)

Summary Context & Motivations Crypto-processor(s) Experiments & Comparisons Conclusion

Summary

1

Context & Motivations

2

Proposed Crypto-processor(s)

3

Experiments & Comparisons

4

Conclusion & Perspectives

(5)

Asymmetric Cryptography: ECC and HECC

Cryptographic primitives for protocols such as digital signature, exchange of secret keys and some specific encryption schemes Elliptic Curve Cryptography (ECC):

- Actual standard for public key crypto-systems - Better performanceandlower cost than RSA

Hyper-Elliptic Curve Cryptography (HECC):

- Evolution of ECC focusing a larger set of curves

- Studied for future generations of asymmetric crypto-systems

(6)

Operations for (H)ECC

HAH Project, IRISA–IRMAR

Hardware and Arithmetic for Hyperelliptic Curves Cryptography

1. Elliptic Curve Cryptography (ECC)

encryption signature

key gen.

protocollevel etc [k]P

ADD(P,Q) DBL(P)

P+P

curvelevel

x±y x×y . . .

fieldlevel

E:y²=x³+4x+20 over GF(1009) Points onE:P,Q= (x,y)or(x,y,z) Coordinates:x,y,z∈GF(·) GF(p), GF(2^m),t:160–600bits k= (kt−1kt−2. . .k1k0)2∈N Scalar multiplication operation for i from 0 to t−1 do

if ki=1 then Q=ADD(P,Q) P=DBL(P)

Point addition/doubling operations sequence of finite field operations DBL:v1=z₁²,v2=x1−v1, . . . ADD:w1=z₁²,w2=z1×w1, . . . GF(p) or GF(2^m) operations

operation modulo large prime (GF(p)) or irreducible polynomial (GF(2^m))

2. Side Channel Attacks (SCAs)

DBL DBL DBL ADD DBL ADD DBL DBL

0 0 0 1 1 0

Side channels:

IPower consumption IElectromagnetic radiation IComputation timings

Attacks:

ISimple analysis

IDifferential analysis (statistics) ITemplates and learning

3. Protections & Counter-Measures Against SCAs IUniform comp. durations

IUniform power/EM profile IRandom behavior ICircuit reconfiguration Idetection/correction codes IAdd noise (!)

Example: use redundant number systems k

R1(k) [R1(k)]P

R2(k) [R2(k)]P

R3(k) [R3(k)]P

R4(k) [R4(k)]P

R5(k) [R5(k)]P

R6(k) [R6(k)]P

. . .

. . . Random recoding:∀i [Ri(k)]P= [k]P

4. From ECC to HECC

field size ADD DBL

ECC `bits ^mul^RX^OUT

subRYOUT

mulRZOUT

PZ

mul PZ

PXPXmul

PYPYmul

QYQY

QXQX

QZ

QZ QZ

QZ

mul v18

addv12add

sub v13

mul v10

v10

v10 mulv11v11

sub v11

mul v16

v17

v14 v14

sub v0

v1 v1

v2 v2 sqr v2 sub v3

v4 v4

v5

sqr v5

mul v6

v7

v7 v8

v9 v9

Cost: 12M + 2S

mulRXOUT

sub OUT

RY

addRZOUT

PZ

mul PZ

PX

sqr PX

mul PX PY

PY

mul PY

aa

addv18add

v18

add v19 v19

sub add v12v12

sub v12

v13

addv10v10add

v10 v11

mul v16 sqrv17v17

v15 mulv23v23add sqr

v22

v20

add v25

v25

v24v24

add v0

add v1v1

add v1

v2 v3

v4 sqr v4

v5 v6 v6

v6

v7v7 add

v8v8 v9v9

Cost: 6M + 5S

HECC ₂^`bits

mulRU0OUT

mulRU1OUT

addRV0OUT

addRV1OUT

mulRZOUT

PZ

mul PZ

mul

PZ mul

PZ add PZ

mul

PZ

mul PZ

QV0 QV0 QV1

QV1

QU1 add QU1 QU1

QU0QU0

QU0 PU0

mul PU0

PU1

PU1mul

PU1 mul PU1

PV1PV1mul

QZ

mul QZ QZ QZ QZ QZ

PV0PV0 sub

v18

mul

v18 add v19

mul

v19 add v12

mul v13 mul v13

mul v10 sqr

v11

addv16mul

v17

sqr v14 mul v14

mul v14

v15

addv85

mul v84

mul v87

add

v86

add sub v81

mulv80

subv83

v82 sqr

v69

mul v69

mul

sub v68 sub

v67 mul v67 sub

v66 v66

sub v65

add v64

v64

mulv63

v63

sub v61

v60 v60

add v78

v79

v74 v76

mul v77

mul v70

v70 v71

v72 mul

v73

v23 mul

mul v41 sub

v40

mul v43 v43

v43 v43 mul v43

mul

v43 add v42

add mul

v45

add v44 sub

add

v47 add v46 v49 v48

sub

v22 mul v22 v21 v21v21

v21 v20

mul

v27 v26 v26

add v25

sub v25

sub v24 v29

v28 mul

v56 add v56v56

sub v57

add v54

v54 v54 v52

v53 v50

v51

v58 v58

v59

v30 v30

mul v30

v31 v31 v31

v31 v32

v32

v32 v32

v33 mul v34

v35 v35

sqr v35

add v35v35

v36 add

v37

v37 v38

v39 v0 v0

v0 v0

v0

v1 sub

v1 v2

v3

v3 v3 sub v4

v5 v5 v5

v6

v6 v6 v6

v6 v6

add v7

v8 v9

v9 v9

Cost: 47M + 4S

mulOUT

RU0

mulOUT

RU1

subRV0OUT

subOUT

RV1

mulOUT

RZ

PZ mul PZ

mul

PZ sub

PZ

mul

PZ mul PZ

sub PZ

mul PZ

mul

PZ mul

PZ

sqr PZ

mul PZ mul PZ PU0

add PU0PU0

mul PU0 add PU0

mul PU0

mul PU0 PU1

sqr PU1

add PU1

mul PU1 PU1

mul PU1

PU1

Z

sub

Z

PV1

mul

PV1 sqr PV1

add PV1 PV1 PV0

mul

PV0 add PV0PV0

add sub v18

add v18v18

v19

add mul

v12 add v13v13

v13 add v13

mul mul

v10

sqr v10

mul v10 v11

mul

v11 v16 v17

v14

v15

v15 v15

v80 v69

mulv68v68

subv67

mul v66 v65 add v65

sqr v64 v64

mul v64

mul v63

addv62

mul v62

sub v61 v60 v60

add v78

v79 mul

sub v74

v75

sub v76

v77

v71 add

v72 v73

v41

v40 v40 v40

v40 sub

sqr v43

mul v43

v42

mul add v45

add v44

mulv47

v46

v49 v48 sub

v23 v22 v21

add v20 sub v27

add v26

mul v26

v25

v24 v29 v28

sub v56

v57 v57

v57 v54

add v54

v54 v55

v53 v53

v53

v50 v51 v51 v51

mul v59

v59

v30 v30

v31 sub v32

v33 add

v33 v34 v34

add v34

v35

v36

v37

v38 v38 v38

v38 v39

v39 v39

v0 v0

v1

mul

v1

sub

v2

v3 v4 v4

v4

add

v5

v6 v6

sqr

v6 v7 v8 v9

v9

Cost: 38M + 6S Examples of computation expressions for projective coordinates

5. HAH Project Objectives

IEfficient algorithms and representations for HECC IHECC protections against SCAs (passive and active) IFast, low-power and secure hardware implementations (open

source hardware code and programming tools) IIntensive security evaluation using our SCA setup

6. Developed Crypto-Processor(s) from PAVOIS ANR Project

AU1 AU2 AU3 points

CTRL mem.

@coord.

prg.mem.

inst.

@inst. key recode

ki k

inst. 21 bits address control datawbits

scalar word digit

IArithmetic Units (AUs):±,×,÷over GF(p)/GF(2^m) various configurations (area vs speed, internal protection) IVarious key recoding methods (and dedicated units)

IConfiguration: field size, internal word size, #AUs, type(AUs) ICircuit/architecture level protections

7. Programming Tools for Our Crypto-Processor(s)

HW modules

. . .

configurations

CAD tools selection

user crypto. lib.

assembler binary code implementation

compiler Sage

API/TLS-SSL commands

8. Implementation Results on FPGA

XC6SLX75 FPGA, GF(p), 256-bit ECC or 128-bit HECC, internal word sizew=32 bits

Recoding units:

Recoding BIN NAF-2 NAF-3 NAF-4

area slices (FF/LUT) 565(1321/1461)570(1340/1479)571(1344/1495)503(1348/1489)

freq. (MHz) 225 228 237 217

Area/speed trade-offs for ECC and HECC configurations:

#mult. BRAM mult. 1 col. mult. 2 col. mult. 4 col.

ECC 1 2 503(1348/1489) 217 626(1450/1643) 230 694(1649/1891) 211 2 2 689(1744/1894) 219 754(1948/2208) 234 931(2345/2712) 220 3 2 809(2146/2245) 205 942(2449/2704) 222 1105(3046/3436) 222 HECC 1 2 522(1344/1405) 228 520(1434/1535) 217

2 2 634(1746/1786) 226 689(1926/2055) 220 area freq.

4 2 852(2552/2531) 201 917(2912/3045) 195 slices (FF/LUT) MHz 8 2 1347(4145/3882)204 1601(4865/4928)209

9. Algorithms and Architecture Impacts on SCAs

Activity traces from CABA¹simulations (after filtering) for several configurations of the field multiplier (area/speed)

small/slow medium/medium large/fast

ADD

0 200 400 600 800 1000 1200

0 5000 10000 15000 20000 25000

activity [#transitions]

time [clock cycles]

0 200 400 600 800 1000 1200

0 2000 4000 6000 8000 10000 12000 14000 16000

time [clock cycles]

0 200 400 600 800 1000 1200

0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000

time [clock cycles]

DBL

0 200 400 600 800 1000 1200

0 5000 10000 15000 20000 25000

time [clock cycles]

0 200 400 600 800 1000 1200

0 2000 4000 6000 8000 10000 12000 14000

time [clock cycles]

0 200 400 600 800 1000 1200

0 1000 2000 3000 4000 5000 6000 7000 8000 9000

time [clock cycles]

1Cycle Accurate Bit Accurate (i.e.simulations close to real power measurements)

http://h-a-h.inria.fr/

Metric for algorithms efficiency: number of multiplications (M) and squares (S) inF^p

(7)

ECC versus HECC

size of F

p

& scalar ADD DBL source

ECC `

ECC

12M + 2S 7M + 3S [1]

HECC `

HECC

≈

¹₂

`

ECC

40M + 4S 38M + 6S [3]

ECC:

- Size ofF^p and scalar 2×larger - SimplerADDandDBLoperations

HECC:

- SmallerF^p and scalar

- More operations inF^p forADDandDBL

In theory, HECC naturally better than ECC

(8)

Objectives

Design hardware crypto-processors based on customisable architecture Implementation of small circuits on FPGA and ASIC

Study various trade-offs between:

- Computation speed - Area cost

- Energy consumption and protection against physical attacks

Experimental comparisons of different architecture configurations

→ choice of the architecture parameters for the crypto-processor

(9)

Summary

1

Context & Motivations

2

Proposed Crypto-processor(s)

3

Experiments & Comparisons

4

Conclusion & Perspectives

(10)

Developed Processor

AU1 AU2 AU3 Points

CTRL Mem.

address coord.

Prg.Mem.

instruction addr

Recoding Unit

ki

k

25-bit instruction address control signals scalar

word digit w-bit data word

Customizable number of arithmetic units over F

^p

: ± , × , ÷

→ n

M

multipliers of size n

B

w : size of data words

(11)

Architecture Parameters

w = 32 bits, for small processors

1 adder/subtracter ( ± ), small and fast 1 inverter ( ÷ ), sufficient for the computations n

_M

multipliers ( × ):

- Based on Montgomery algorithm for modular multiplication [5]

- n_B: number of parallel active words in the multiplier - 3-stage pipeline

Classical key recoding techniques from literature:

→ standard binary, window λNAF methods with λ ∈ { 2...4 }

(12)

Summary

1

Context & Motivations

2

Proposed Crypto-processor(s)

3

Experiments & Comparisons

4

Conclusion & Perspectives

(13)

Details of the Experiments

Objective: compare versions of the processor for various (n

M

, n

B

) Implementation on Xilinx Spartan 6 LX75 FPGA

No DSP blocks (for ASIC compatibility) Design tools:

- VHDL and assemblycode generation: Python scripts - Design implementation: Xilinx design environment ISE 14.6 - Simulations of complete scalar multiplications with Modelsim

Theoretical validation with SAGE (with more than 10k vectors)

Implementation: translate, map, place and route of full processor Same optimisation efforts for ECC and HECC

(14)

Impact of Key Recoding Unit

Various recoding techniques proposed to reduce the number of curve operations:

- BIN: standard binary from left to right - NAF: non-adjacent form

- λNAF: window methods withλ∈ {3,4}

Implementation results for an ECC processor (n

_M

= 1, n

_B

= 1)

recoding BIN NAF 3NAF 4NAF

area, slices (FF/LUT) 517 (1347/1433) 536 (1366/1445) 560 (1370/1454) 547 (1374/1460)

frequency, MHz 229 234 235 231

(15)

Results for ECC (256 bits) and HECC (128 bits) (1/2)

nM

BRAM

nB= 1 nB= 2 nB= 4

area freq. area freq. area freq.

slices (FF/LUT) MHz slices (FF/LUT) MHz slices (FF/LUT) MHz

ECC

1 3 547 (1374/1460) 231 573 (1476/1625) 233 673 (1674/1875) 233 2 3 722 (1776/1903) 220 811 (1979/2210) 227 942 (2377/2701) 220 3 3 810 (2174/2236) 221 915 (2480/2698) 215 1130 (3077/3430) 214 4 3 952 (2569/2656) 215 1100 (2977/3282) 217 1512 (3771/4293) 216 5 3 1064 (2982/3136) 210 1405 (3492/3902) 206 1722 (4487/5122) 209

HECC

1 4 514 (1336/1374) 235 549 (1434/1513) 234 2 4 646 (1716/1783) 220 737 (1912/2055) 234 3 4 732 (2092/2075) 224 826 (2386/2485) 225 4 4 870 (2476/2424) 218 1022 (2868/2987) 214 5 4 976 (2865/2773) 219 1115 (3355/3465) 210 6 4 1089 (3233/3092) 203 1240 (3821/3908) 208 7 4 1145 (3601/3426) 213 1372 (4287/4365) 205 8 4 1281 (3981/3809) 191 1552 (4765/4890) 183 9 4 1379 (4363/4051) 202 1691 (5245/5277) 199 10 4 1543 (4739/4435) 196 1856 (5719/5801) 198 11 4 1547 (5114/4750) 189 1936 (6192/6240) 198 12 4 1738 (5499/5128) 191 2100 (6675/6771) 188

(16)

Results for ECC (256 bits) and HECC (128 bits) (2/2)

Average computation time (ms) for 50 [k]P:

n_B n_M

1 2 3 4 5 6 7 8 9 10 11 12

HECC 1 15.6 8.6 5.7 4.7 3.9 3.7 3.3 3.6 3.4 3.5 3.6 3.6 2 11.9 6.2 4.5 3.6 3.2 2.8 2.8 3.0 2.7 2.7 2.8 2.9 ECC

1 28.1 15.3 12.4 12.4 12.7 2 17.7 9.6 8.3 8.0 8.4 4 11.1 6.2 5.4 5.1 5.3

Standard deviation for 1000 [k]P:

configuration ECC (1,1) ECC (3,4) HECC (1,1) HECC (6,2)

average time [ms] 28.2 5.3 15.5 2.8

std. deviation [ms] 0.289 0.056 0.324 0.045

(17)

Comparison of Various Configurations for our Processor

area [slices]

computation time [ms]

ECC HECC

600 800 1000 1200 1400 1600 1800 2000 2200

5 10 15 20 25 30

5,4 5,2

5,1

4,4 4,2

4,1

3,4 3,2

3,1

2,4 2,2 2,1

1,4 1,2 1,1

12,1 11,2 12,2

11,1 10,2

10,1 9,2

9,1 8,2 8,1

7,2 7,1

6,2 6,1

5,2 5,1

4,2 4,1 3,2 3,1 2,2 2,1 1,2 1,1

(nM,nB)

(18)

Relative Efficiency

utilisation ratiohardware add.costspeedup

ECC HECC

0 2040 6080 100 1 2 3 0 1 2 3 4 5

1,1 1,2 1,4 2,4 3,4 4,4 1,1 1,2 2,1 3,1 3,2 5,2 8,2 (nM,nB)

(19)

Comparison with Literature

Source FPGA area freq. [k]P duration

slices / DSP MHz ms ECC 1,2

Spartan 6

573 / 0 233 17.7

ECC 1,4 673 / 0 233 11.1

ECC 2,4 942 / 0 220 6.2

ECC 3,4 1 130 / 0 214 5.4

[4] Virtex-5 1 725 / 37 291 0.38

Virtex-4 4 655 / 37 250 0.44

[2] Virtex-4 13 661 / 0 43 9.2

20 123 / 0 43 7.7

(20)

Summary

1

Context & Motivations

2

Proposed Crypto-processor(s)

3

Experiments & Comparisons

4

Conclusion & Perspectives

(21)

Conclusion

Implementation of ECC/HECC processors ⇒ cost/performance trade-offs

We are able to select the appropriate number of multipliers and their size

Experimental results: HECC always has better performance compared to ECC

(22)

Perspectives

Improve arithmetic units (especially add DSP blocks)

Study resistance to physical attacks (SCA

¹

, faults injection) of processors with good time/area trade-offs

1Side Channel Attacks

(23)

This work is partly funded by

PAVOISproject (ANR-12-BS02-002-01)http://pavois.irisa.fr/

HAHprojecthttp://h-a-h.inria.fr/

Thank you for your attention

(24)

References

[1] D. J. Bernstein and T. Lange.

Explicit-formulas database.

http://hyperelliptic.org/EFD/.

[2] S. Ghosh, M. Alam, D. Roychowdhury, and I.S. Gupta.

Parallel crypto-devices for GF(p) elliptic curve multiplication resistant against side channel attacks.

Computers and Electrical Engineering, 35(2):329–338, March 2009.

[3] T. Lange.

Formulae for Arithmetic on Genus 2 Hyperelliptic Curves.

Applicable Algebra in Engineering, Communication and Computing, 15(5):295–328, February 2005.

[4] Y. Ma, Z. Liu, W. Pan, and J. Jing.

A high-speed elliptic curve cryptographic processor for generic curves over GF(p).

InProc. 20th International Workshop on Selected Areas in Cryptography (SAC), volume 8282 ofLNCS, pages 421–437.

Springer, August 2013.

[5] P. L. Montgomery.

Modular multiplication without trial division.

Mathematics of Computation, 44(170):519–521, April 1985.

(25)

Implementation Flow

HAH Project, IRISA–IRMAR

Hardware and Arithmetic for Hyperelliptic Curves Cryptography

1. Elliptic Curve Cryptography (ECC)

encryption signature

key gen.

pr otocol le vel

etc

[k]P

ADD(P,Q) DBL(P)

P+P

cur ve le vel

x±y x×y . . .

field le vel

E:y²=x³+4x+20 over GF(1009) Points onE:P,Q= (x,y)or(x,y,z) Coordinates:x,y,z∈GF(·)

GF(p), GF(2^m),t :160–600bits k= (kt−1kt−2. . .k1k0)2∈N Scalar multiplication operation for i from 0 to t−1 do

if ki=1 then Q=_ADD(P,Q) P=_DBL(P)

Point addition/doubling operations sequence of finite field operations DBL:v1=z₁²,v2=x1−v1, . . . ADD:w1=z₁²,w2=z1×w1, . . . GF(p) or GF(2^m) operations

operation modulo large prime (GF(p)) or irreducible polynomial (GF(2^m))

2. Side Channel Attacks (SCAs)

DBL DBL DBL ADD DBL ADD DBL DBL

0 0 0 1 1 0

Side channels:

I Power consumption I Electromagnetic radiation I Computation timings

Attacks:

I Simple analysis

I Differential analysis (statistics) I Templates and learning

3. Protections & Counter-Measures Against SCAs I Uniform comp. durations

I Uniform power/EM profile I Random behavior

I Circuit reconfiguration I detection/correction codes I Add noise (!)

Example: use redundant number systems k

R1(k) [R₁(k)]P

R2(k) [R₂(k)]P

R3(k) [R₃(k)]P

R4(k) [R₄(k)]P

R5(k) [R5(k)]P

R6(k) [R₆(k)]P

. . .

. . . Random recoding:∀i [Ri(k)]P= [k]P

4. From ECC to HECC

field size ADD DBL

ECC ` bits

^mul^RX^OUT

subRYOUT

mulRZOUT

PZ

mul PZ

PXPXmul

PYPYmul

QYQY

QXQX

QZ

QZ QZ

QZ

mul v18

addv12add

sub v13

mul v10

v10

v10 mulv11v11

sub v11

mul v16

v17

v14 v14

sub v0

v1 v1

v2 v2 sqr v2 sub v3

v4 v4

v5

sqr v5

mul

v6 v7

v7 v8

v9 v9

Cost: 12M + 2S

mulRXOUT

subRYOUT

addRZOUT

PZ

mul PZ

PX

sqr PX

mul PX

PYPY

mul PY

aa

addv18v18add

add v19 v19

sub add v12v12

sub v12

v13

addv10v10add

v10 v11

mul v16 sqrv17v17

v15

mulv23add

v23

sqrv22

v20

add v25

v25 v24 v24

add v0

add v1v1

add

v1v2

v3 v4 sqr v4

v5 v6 v6 v6 v6

v7

v7 v8v8addv9v9

Cost: 6M + 5S

HECC

₂^`

bits

mulRU0OUT

mulRU1OUT

addRV0OUT

addRV1OUT

mulRZOUT

PZ

mul PZ

mul

PZ mul

PZ add PZ

mul

PZ

mul PZ

QV0 QV0 QV1

QV1

QU1

add QU1 QU1

QU0QU0

QU0 PU0

mul PU0

PU1

PU1mul

PU1 mul PU1

PV1PV1mul

QZ

mul QZ QZ QZ QZ QZ

PV0PV0 sub

v18

mul

v18 add v19

mul

v19 add v12

mul v13 mul v13

mul v10 sqr

v11

addv16mul

v17

sqr v14 mul v14

mul v14

v15

add v85

mul v84

mul v87

add

v86

add sub v81

mulv80

subv83

v82 sqr

v69

mul v69

mul

sub v68 sub

v67 mul v67 sub

v66 v66

sub v65

add v64

v64

mulv63

v63

sub v61

v60 v60

add

v78

v79

v74 v76

mul v77

mul v70

v70 v71

v72 mul

v73

v23

mul

mul v41 sub

v40

mul v43 v43

v43 v43 mul v43

mul

v43 add v42

add mul

v45

add

v44 sub

add

v47 add v46 v49 v48

sub

v22 mul v22 v21 v21v21

v21 v20

mul

v27 v26 v26

add v25

sub

v25

sub v24 v29 v28

mul v56 add v56v56

sub v57

add v54

v54 v54 v52

v53 v50

v51

v58 v58

v59

v30 v30

mul v30

v31 v31 v31

v31 v32

v32

v32 v32

v33

mul v34

v35 v35

sqr v35

add v35v35

v36 add

v37

v37 v38

v39 v0 v0

v0 v0

v0

v1 sub

v1

v2 v3

v3 v3 sub v4

v5 v5 v5

v6

v6 v6 v6

v6 v6

add v7

v8 v9

v9 v9

Cost: 47M + 4S

mulRU0OUT

mulRU1OUT

subRV0OUT

subRV1OUT

mulRZOUT

PZ mul PZ

mul

PZ sub

PZ mul

PZ mul PZ

sub PZ

mul PZ

mul

PZ mul

PZ

sqr PZ

mul PZ mul PZ PU0

add PU0 PU0

mul PU0

add

PU0

mul PU0

mul PU0 PU1

sqr PU1

add PU1

mul PU1

PU1

mul PU1

PU1

Z

sub

Z

PV1

mul

PV1 sqr

PV1

add PV1 PV1 PV0

mul

PV0 add

PV0PV0 add

sub v18

add v18v18

v19

add mul

v12 add v13v13 v13

add v13

mul mul

v10

sqr v10

mul v10 v11

mul

v11 v16 v17

v14

v15

v15 v15

v80 v69

mulv68v68

subv67

mul v66 v65 add v65

sqrv64 v64

mul v64

mul v63

addv62

mul v62 sub

v61 v60 v60

add v78

v79 mul

sub v74 v75

sub v76

v77

v71 add

v72 v73

v41

v40 v40 v40

v40 sub

sqr v43

mul v43 v42

mul add v45

add v44

mulv47

v46

v49 v48 sub

v23 v22

v21 add

v20 sub v27

add v26

mul v26

v25

v24 v29 v28

sub v56

v57 v57

v57 v54

add v54

v54 v55

v53 v53

v53

v50 v51 v51 v51

mul v59

v59 v59

v30 v30

v31 sub v32

v33 add

v33 v34 v34

add v34

v35 v36

v37

v38 v38 v38 v38

v39

v39 v39

v0

v1

mul v1

sub v2 v3

v4 v4

v4 add v5

v6 v6

sqr v6

v7 v8

v9 v9

Cost: 38M + 6S Examples of computation expressions for projective coordinates

5. HAH Project Objectives

I Efficient algorithms and representations for HECC I HECC protections against SCAs (passive and active)

I Fast, low-power and secure hardware implementations (open source hardware code and programming tools)

I Intensive security evaluation using our SCA setup

6. Developed Crypto-Processor(s) from PAVOIS ANR Project

AU1 AU2 AU3 points

CTRL mem.

@coord.

prg.mem.

inst.

@inst.

key recode

ki

k

inst. 21 bits address control datawbits

scalar word digit

I Arithmetic Units (AUs): ± , × , ÷ over GF(p)/GF(2

^m

)

various configurations (area vs speed, internal protection) I Various key recoding methods (and dedicated units)

I Configuration: field size, internal word size, #AUs, type(AUs) I Circuit/architecture level protections

7. Programming Tools for Our Crypto-Processor(s)

HW modules

. . .

configurations

CAD tools selection

user crypto. lib.

assembler binary code implementation

compiler Sage

API/TLS-SSL commands

8. Implementation Results on FPGA

XC6SLX75 FPGA, GF(p), 256-bit ECC or 128-bit HECC, internal word sizew=32 bits

Recoding units:

Recoding BIN NAF-2 NAF-3 NAF-4

area slices (FF/LUT) 565(1321/1461) 570(1340/1479)571(1344/1495)503(1348/1489)

freq. (MHz) 225 228 237 217

Area/speed trade-offs for ECC and HECC configurations:

#mult. BRAM mult. 1 col. mult. 2 col. mult. 4 col.

ECC 1 2 503(1348/1489) 217 626(1450/1643) 230 694(1649/1891) 211 2 2 689(1744/1894) 219 754(1948/2208) 234 931(2345/2712) 220 3 2 809(2146/2245) 205 942(2449/2704) 222 1105(3046/3436) 222 HECC 1 2 522(1344/1405) 228 520(1434/1535) 217

2 2 634(1746/1786) 226 689(1926/2055) 220 area freq.

4 2 852(2552/2531) 201 917(2912/3045) 195 slices (FF/LUT) MHz 8 2 1347(4145/3882)204 1601(4865/4928)209

9. Algorithms and Architecture Impacts on SCAs

Activity traces from CABA

¹

simulations (after filtering) for several configurations of the field multiplier (area/speed)

small/slow medium/medium large/fast

ADD

0 200 400 600 800 1000 1200

0 5000 10000 15000 20000 25000

time [clock cycles]

0 200 400 600 800 1000 1200

0 2000 4000 6000 8000 10000 12000 14000 16000

time [clock cycles]

0 200 400 600 800 1000 1200

0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000

time [clock cycles]

DBL

0 200 400 600 800 1000 1200

0 5000 10000 15000 20000 25000

time [clock cycles]

0 200 400 600 800 1000 1200

0 2000 4000 6000 8000 10000 12000 14000

time [clock cycles]

0 200 400 600 800 1000 1200

0 1000 2000 3000 4000 5000 6000 7000 8000 9000

time [clock cycles]

1Cycle Accurate Bit Accurate (i.e.simulations close to real power measurements)