• Aucun résultat trouvé

Groestl Distinguishing Attack: A New Rebound Attack of an AES-like Permutation

N/A
N/A
Protected

Academic year: 2021

Partager "Groestl Distinguishing Attack: A New Rebound Attack of an AES-like Permutation"

Copied!
24
0
0

Texte intégral

(1)

HAL Id: hal-01668116

https://hal.archives-ouvertes.fr/hal-01668116

Submitted on 19 Dec 2017

HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

of an AES-like Permutation

Victor Cauchois, Clément Gomez, Reynald Lercier

To cite this version:

Victor Cauchois, Clément Gomez, Reynald Lercier. Groestl Distinguishing Attack: A New Rebound

Attack of an AES-like Permutation. FSE 2018 - 25th International Conference on Fast Software

Encryption , Mar 2018, Bruges, Belgium. �hal-01668116�

(2)

Attack of an AES-like Permutation

Victor Cauchois

1,2

, Clément Gomez

1

and Reynald Lercier

1,2

1 DGA MI

, Boîte Postale 7, 35998 Rennes Cedex 9, France.

clement.gomez@m4x.org

2

IRMAR, Université de Rennes 1, Campus de Beaulieu, 35042 Rennes, France.

{victor.cauchois,reynald.lercier}@m4x.org

Abstract.

We consider highly structured truncated differential paths to mount a new rebound attack on Grøstl-512, a hash functions based on two AES-like permutations,

P1024

and

Q1024

, with non-square input and output registers. We explain how such differ- ential paths can be computed using a Mixed-Integer Linear Programming approach.

Together with a SuperSBox description, this allows us to build a rebound attack with a 6-round inbound phase whereas classical rebound attacks have 4-round inbound phases. This yields the first distinguishing attack on a 11-round version of

P1024

and

Q1024

with about 2

72

computations and a memory complexity of about 2

56

bytes, to be compared with the 2

96

computations required by the corresponding generic attack.

Previous best results on this permutation reached 10 rounds with a computational complexity of about 2

392

operations, to be compared with the 2

448

computations required by the corresponding generic attack.

Keywords:

Cryptanalysis

·

Hash function

·

Rebound attacks

·

AES-like

·

Grøstl

1 Introduction

Hash functions are of first importance in cryptography. Producing fixed size outputs from inputs of variable length, they are at the heart of many protocols to ensure integrity or authentication properties in numerous cryptographic applications, from signatures to encryption schemes. To be considered safe, they have to offer many security guarantees, among which protection against preimage, second preimage and collision attacks.

The indifferentiability framework, which became very popular among hash functions designers at the time of the SHA-3 competition, aims at avoiding these attacks. This security notion enables to instantiate cryptographic protocols proved secure in the random oracle model by a hash function. Hash functions that iterate a cryptographic permutation whose internal state is partially modified with the message during an absorbing phase and then partially extracted to build the output of the hash function during a squeezing phase are a large class of functions that can be proved secure in this model. However, their validities rely on the assumed ideal behavior of the permutation.

Distinguishers are algorithms that exhibit a non-randomness behavior of a permutation.

Even when they do not directly weaken permutation-based hash functions, distinguishers break the security proof of the hash function. They are a first step in a cryptanalysis and reduce the confidence in these primitives.

Since Rijndael [DR02] has been chosen as the Advanced Encryption Standard, its

“wide-trail strategy” design has inspired lots of symmetric-key primitives, especially since

new processors have integrated the AES round function and key generation as basic

(3)

processor operations. Rebound attacks, introduced in [MRST09], was the cryptanalists’

response to the generalization of AES-like permutations used in hash function designs.

It consists in finding a pair of internal register values in the middle of the permutation whose difference propagations towards the input and output registers follow a truncated differential path. When there exist efficient algorithms that yield such differences, it shows a non-random behavior of the permutation, and as such are distinguishers. A critical point is the choice of the truncated differential path, since the complexity of a rebound attack depends for a large part of it.

SHA-3 Cryptographic Hash Algorithm Competition organized by the National Institute of Standards and Technology (NIST) was the ideal playground to extend, develop and apply these techniques, as many of the candidates are built on top of AES-like permutations.

Techniques as start-from-the-middle, introduced by [MPRS09], or SuperSBox descriptions, due to Daemen and Rijmen [DR06] and used for instance in [GP10, WFWS10, Gil14], are useful ideas to mount efficient rebound attacks. Further improvements due to [NP11]

and [SLW

+

10] have cleared the way for the most recent variants.

Grøstl [GKM

+

09] is one of these AES inspired primitives. It has been one of the five finalists of the SHA-3 competition. Even if its mode of operation differs from the sponge construction [BDPVA08] used in Keccak, the function finally standardized by the the U.S. National Institute of Standards and Technology, its design still relies on two internal permutations and it comes with a security proof. Andreeva et al. [AMP10] prove that it is indifferentiable from a random oracle, up to the birthday bound under the assumed ideal behavior of the permutations. Two variants of Grøstl have been proposed by its authors, Grøstl-256 and Grøstl-512.

We focus on Grøstl-512, because its rectangular registers of 8 × 16 bytes make our attack possible on 11 rounds. In full generality, non square AES-like permutations are more vulnerable than square ones to rebound attacks, due to their slower diffusion, here three rounds instead of two rounds to obtain a full active state from a single difference, can be used to reach more rounds. The last rebound attack on Grøstl-512 in date is a work by Jean et al. [JNPP14]. It outlines that the rectangular shape of these permutations enables to extend the regular 9-round path of AES-like permutations to reach 10 rounds, thanks to a clever guess and determine algorithm. It turns out, that due to the dense differential pattern at the middle of the differential path, this 9-round path fully constrains the middle register value. In this paper, we consider instead a new differential path that spans over only half of the middle registers. In return, we obtain a huge number of register values that realize this differential path, and we take advantage of their easy characterization to filter among them these that yield a 11-round differential path.

This distinguisher applies on a reduced version of the two internal permutations, 11 rounds instead of the 14 rounds specified in [GKM

+

09], and not on Grøstl-512 in its entirety. It is commonly admitted that such results help to evaluate the resistance of the Grøstl hash function, but as such, they do not threaten its security.

Although the presented truncated differential path is specific to Grøstl-512, we believe that the technique that we use to patch up three sets of differential values could be applied to other AES-like structures. Such a structured truncated differential path may be found as a solution to a Mixed-Integer Linear Programming problem (MILP). To the best of our knowledge, only few works make use of a MILP approach to analyze hash function, the first of which is a work by Bouillaguet et al. about the SHA-3 candidate SIMD [BFL11].

Acknowledgments. We wish to thank Henri Gilbert for comments that greatly improved

the paper. We are also grateful to the referees for their constructive input.

(4)

Our Contributions

We present sparse and highly structured truncated differential paths for the permutations P

1024

and Q

1024

of Grøstl-512, obtained with Mixed-Integer Linear Programming techniques.

We recast it as a SuperSBox description to mount a variant of the rebound attack with a 6-round inbound phase. This method allows us to obtain the best known complexities, reaching 11 rounds with a surprisingly low complexity of 2

72

computations while previous results reached 10 rounds with a computational complexity of 2

392

.

Table 1 summarizes the best known results on the 512-bit version of Grøstl. All these attacks are rebound ones, which mainly differ from the truncated differential path used.

Table 1: Known rebound attacks on Grøstl-512 internal permutations.

Rounds Time Generic attack Memory Reference

7 2

152

2

512

2

56

[SLW

+

10]

8 2

280

2

448

2

64

[JNPP14]

9 2

328

2

384

2

64

[JNPP14]

10 2

392

2

448

2

64

[JNPP14]

11 2

72

2

96

2

56

This paper (§ 4.2)

Outline of the Document

In Section 2, we present Grøstl-512 by focusing on its underlying permutation P

1024

, detailing some properties of its building blocks. In Section 3, we present the structured truncated differential path that we found. We give a SuperSBox description of it and give some hints about its satisfiability and how it can be obtained. In Section 4, we present a distinguisher which finds a pair of inputs whose difference when propagated through the cryptographic permutation agrees with the truncated differential path. Every technical detail is then explained.

2 Description of Grøstl-512

Two different versions of Grøstl have been submitted to the SHA-3 hash function com- petition, Grøstl-256 which outputs 256-bit digests and Grøstl-512 which outputs 512-bit digests.

They handle messages by dividing them into blocks, using some padding. They update then iteratively an internal state initialized with some IV by computing a compression function f on both a block of message and the internal state as illustrated in Figure 1. Its design follows a wide-pipe construction: the size of the internal state is twice larger than the size of the output of the hash function. In both versions, the compression function is built on top of two AES-like permutations, they differ however in a fundamental way:

Grøstl-256 uses a square matrix representation of its internal state whereas Grøstl-512 uses one with a rectangular matrix representation. We focus in this article on Grøstl-512.

2.1 Grøstl-512 Compression Function and Output Transformation

The compression function of Grøstl-512, f

1024

, is built from two 1024-bit permutations P

1024

and Q

1024

, as illustrated in Figure 2a, according to the following definition,

f

1024

(h, m) = P

1024

(h ⊕ m)Q

1024

(m) ⊕ h .

(5)

f f f fH (m) IV

m

1

m

2

m

3

m

t

Figure 1: The Grøstl-512 hash function.

We denote by Trunc

512

the truncation which returns only the 512 last bits of its input.

The output transformation is simpler than the compression function and, as illustrated in Figure 2b, is defined by

Ω(x) = Trunc

512

(P

1024

(x) ⊕ x) .

h

m

P

1024

Q

1024

f

1024

(h, m)

(a) The compression function

f1024

.

P

1024 Trunc512

x Ω(x)

(b) The output transformation Ω.

Figure 2: Grøstl-512 internal functions.

This compression function has been proved collision and preimage resistant under the assumption that P

1024

and Q

1024

are ideal [FSZ09]. Furthermore, the whole Grøstl-512 construction has been proved to be indifferentiable from a random oracle under this latter assumption and the additional hypotheses that P

1024

and Q

1024

are independent from each other [AMP10].

2.2 Grøstl-512 Internal Permutations Round Transformations

The 1024-bit internal states of the AES-like structures P

1024

and Q

1024

are specified as a 16 × 8 matrix of bytes. These permutations then consist in 14 iterations of the following round permutation,

R := MixBytesShiftBytesWideSubBytesAddRoundConstant , where

AddRoundConstant (ARC) adds a constant depending on the round to the internal state;

SubBytes (SB) substitutes each byte in the matrix representation by its image by the non-linear SBox used in Rijndael. As it applies independently on bytes, we will refer as SB indifferently to consider this transformation on full states or on any partial states. To simplify our analysis, we consider an ideal behavior of this transformation:

∀(δ, δ

0

) ∈ ( F

28

)

2

, |{X ∈ F

28

| SBox(X)SBox(Xδ) = δ

0

}| ∈ {0, 2} . (1)

(6)

Typical examples are almost perfect nonlinear (APN) functions [Dob99]. Another classical assumption due to the non-linearity of SB will be of great use:

“The image of any set of distinct byte values by SB is uniformly distributed” (2) i.e. for all Y

1

6= Y

2

6= · · · 6= Y

k

∈ F

28

,

Pr

X16=X26=...6=Xk

{SBox(X

1

) = Y

1

, . . . , SBox(X

k

) = Y

k

} = k ! 2

8

! . We assume that SB

−1

behaves as its inverse;

ShiftBytesWide (Sh) cyclically shifts the bytes within a row to the left by a number of positions in the matrix representation. For P

1024

, Rows 1 to 8 are respectively shifted by 0, 1, 2, 3, 4, 5, 6 and 11 positions. For Q

1024

, Rows 1 to 8 are respectively shifted by 1, 3, 5, 11, 0, 2, 4 and 6 positions;

MixBytes (MB) applies to each column independently. We will refer as MB indifferently to consider this transformation on full states or on a single column.

Bytes are seen as elements of F

28

. This transformation is built from a MDS matrix with coefficients in F

28

. This will have a great importance in our analysis, it means that the image of a column with k > 0 non-zero bytes by MB has non-zero bytes in at least 9 − k byte positions. The invert matrix of a MDS matrix being MDS, MB

−1

behaves as its inverse.

Remark 1. The last round of AES-like structures traditionally avoids the MB transforma- tion. To be consistent with rebound attacks literature, we will in this article always consider full rounds: with the MB transformation. A quick look at the truncated differential path that we use will however convince the reader that this change does not come into play in the attack.

3 A Distinguisher for Reduced-Round P 1024 with 11 Rounds

We aim here to prove the non-randomness of a reduced version with 11 rounds of the permutations of Grøstl-512. To achieve this goal, we present a distinguisher, conflicting with a non-random behavior of these permutations. We focus here, arbitrarily, on the permutation P

1024

, but everything can be made similarly for Q

1024

(see Appendix A).

Remark 2. As it will be one fundamental measure of complexity in the remaining of this text, we denote β = 2

8

. To to ease the presentation, we often use Landau asymptotic approximations O(), applied on powers of β to avoid additional logarithmic terms that typically arise from sorting operations. By a slight abuse of notation, we overload this notation for a fixed β, by assuring the reader that we have checked any “O(β

n

)” to be smaller than β

n+1

.

3.1 Limited-Birthday Distinguishers

Gilbert et al. introduced limited-birthday distinguishers [GP10]. The challenge consists in finding a pair of input values whose difference lies in a predefined subspace and whose images by the permutation have their difference lying in another predefined subspace.

Problem 1. Limited-birthday(P, E

in

, E

out

): Given a permutation P and two F

2

-linear

subspaces E

in

and E

out

, find a pair of input values (X, X

0

) such that XX

0

E

in

and

P (X) ⊕ P (X

0

) ∈ E

out

.

(7)

They introduced the best known generic algorithm for solving this problem too, the so-called limited-birthday algorithm. Theorem 1 gives its complexity.

Theorem 1. For a n-bit permutation P , a F

2

-subspace E

in

of dimension d

i

, a F

2

-subspace E

out

of dimension d

o

and d

i

d

o

, the computational complexity C of the limited-birthday algorithm solving Limited-birthday(P, E

in

, E

out

) satisfies:

log

2

(C) =

(n − d

o

)/2 if n < 2d

i

+ d

o

, nd

i

d

o

otherwise.

The optimality of this algorithm has been proven by Iwamoto et al [IPS13].

The non-random behavior that we exhibit in Section 4 is based upon Problem 1. We provide an algorithm which solves an instance of this problem with a lower complexity than the one given by Theorem 1.

Remark 3. Solving instanciations of Problem 1 is not the only existing type of distinguishing attack. For instance, Lamberger et al. [LMR

+

09], Jean et al. [JNPP13] and Gilbert [Gil14]

consider some other ones.

3.2 A 11-Round Truncated Differential Path over P

1024

Introduced by Knudsen in 1995 [Knu95], truncated differentials consider only whether a byte position is affected by a non-zero differential value or not, where a differential value is the difference between two state values. Plenty of hash function cryptanalysis have been analyzed from this perspective, among which [Pey07, MNPN

+

09, MPRS09, JF11, JNPS12, JNPP14]. Figure 3 specifies the truncated differential path at the center of our attack: Blue cells denote active bytes, where there might be non-zero differences, and white cells denote non-active bytes where there are zero-differences (see Section 3.3 for an explanation on how we have obtained this path).

For such a truncated description, all transformations besides MB are deterministic.

MB induces probabilistic truncated transitions: a column value δ which is non-zero on u chosen byte positions has an image MB(δ) vanishing in v chosen byte positions with probability β

−v

as long as the MDS property is satisfied, and 0 otherwise.

Pr

MB(δ) vanishes onv

chosen byte positions

|δ

is non-zero on

u

chosen byte positions

=

β

−v

if u + (8 − v) ≥ 9 ,

0 otherwise. (3)

Coding theory ensures that the same behavior applies to MB

−1

.

Most rebound attacks rely on truncated differential paths which are diffusing to full active state in three rounds backward and forward. The sequence of numbers of active bytes is a classical representation of such truncated differential paths. The sequence of the differential path used by the 10-round rebound attack of [JNPP14] is

64 −−→

R1

8 −−→

R2

1 −−→

R3

8 −−→

R4

64 −−→

R5

128 −−→

R6

64 −−→

R7

8 −−→

R8

1 −−→

R9

8 −−→

R10

64 .

Our truncated differential path is very different, especially the sequence of numbers of active bytes outlines how structured our path is (see Figure 3),

104 −−→

R1

53 −−→

R2

34 −−→

R3

34 −−→

R4

34 −−→

R5

34 −−→

R6

34 −−→

R7

34 −−→

R8

34 −−→

R9

53 −−→

R10

104 −−→

R11

128 . A first analysis consists in evaluating the plausibility of such a truncated differential path:

the probability that there exists a pair of inputs such that their difference when propagated

through the permutation rounds agrees with the patterns of the truncated differential path.

(8)

SB Sh MB

ARC

SB Sh MB

ARC

SB Sh MB

ARC

SB Sh MB

ARC

SB Sh MB

ARC

SB Sh MB

ARC

SB Sh MB

ARC

SB Sh MB

ARC

SB Sh MB

ARC

SB Sh MB

ARC

SB Sh MB

1 2 3 4

5 6 7 8

9 10 11 12

13 14 15 16

17 18 19 20

21 22 23 24

25 26 27 28

29 30 31 32

33 34 35 36

37 38 39 40

41 42 43 44

R

1

:

R

2

:

R

3

:

R

4

:

R

5

:

R

6

:

R

7

:

R

8

:

R

9

:

R

10

:

R

11

:

Outb ound In b ound Outb ound

Figure 3: A 11−round truncated differential path over P

1024

of Grøstl-512.

(9)

Starting with β

104

differences, there exist β

128

(ordered) pairs of internal state values that have their difference equaling each differential value, i.e. a total number of β

232

pairs of input values. From Equation (3), applied independently on each column, we can estimate the diffusion probability of the differential path given in Figure 3: β

−51

for the propagation through the Round 1 MB transformation, β

−22

for Rounds 2 to 8 and β

−3

for Round 9. Rounds 10 to 11 are deterministic. By Assumption (2), β

128+104

· β

−51+7·(−22)−3

= β

232

· β

−208

= β

24

= 2

192

input pairs shall thus fulfill our 11-round differential path.

From these considerations, we got confident in finding pairs of input values whose differences when propagated through the successive transformations agree with the patterns of the truncated differential path of Figure 3. Such a pair solves the problem Limited- birthday(P

1024

,

in

,

out

) where ∆

in

and ∆

out

are both F

2

-subspace of differential values of dimension 104. From Theorem 1, we know that the computational complexity of the generic attack is β

(128−104)/2

= β

12

= 2

96

. Our distinguisher is an algorithm that finds such a pair with a computational complexity lower than β

12

.

In comparison, we have β

64

· β

128

= β

192

pairs of inputs that satisfy the input differential value of the path used in [JNPP14]. Rounds 1 and 7 propagations have then probability β

−56

, the MB propagation probability of Rounds 2 and 8 is β

−7

and the one of Round 6 is β

−64

. Since the other rounds have a deterministic behavior we have therefore β

192

· β

−190

= β

2

= 2

16

input pairs that fulfill this 10-round truncated differential path.

Note that such an analysis benefits to be made from the middle rounds. Starting at Round 6 of the path given in Figure 3, we have β

34

· β

128

= β

162

pairs of values of internal state that have their difference agreeing with Pattern 21. Propagations for Round 6 to Round 11 through MB or for Round 6 to Round 2 through MB

−1

happen both with probability β

−22

· β

−22

· β

−22

· β

−3

= β

−69

and we retrieve that β

162

· β

−2·69

= β

24

input pairs shall fulfill our 11-round differential path.

This analysis derives from rebound attacks. They share an overall strategy, which splits into two phases. The first step is called inbound phase and consists in finding several pairs of values in the middle of the truncated differential path such that this path is verified for as many rounds as possible in the middle of the path. The second step consists in enumerating these pairs to find one satisfying remaining probabilistic transitions of the truncated differential path in outward directions and is called outbound phase. Our inbound phase involves Rounds 3 to 8 and consists in collecting pairs of full state values whose differences agree with Pattern 8 and when propagated until Round 9 with Pattern 32. Our outbound phase consists in finding one among these pairs which satisfies the two remaining non deterministic transitions: through MB in Round 9 and MB

−1

in Round 2. Each holds with probability β

−3

and simultaneously with probability β

−6

= 1/2

48

.

3.3 Searching Sparse Truncated Differential Paths

Looking for truncated differential paths well-adapted to rebound attacks is closely related to the problem of finding low-weight differential paths in an AES-like block cipher. Mixed- Integer Linear Programming (MILP) solvers turn out be very efficient to solve such problems [MWGP11]. We explain here how the differential paths needed for the inbound and outbound phases can be found with this approach.

Inbound Rounds

We aim at finding long differential paths that span over a as small as possible number of columns, in the hope that it yields in return many more register values that verify the path.

More precisely, we look for a small number of non-zero columns in the 6 inner inbound

rounds: 7 columns at each round (it is unfeasible to find a solution for 6 columns). Instead

(10)

of minimizing the number of active S-Boxes in the support of the differential values as in a block-cipher context, we thus have to minimize the number of active columns.

Precisely, we define 7 × 128 decision Boolean variables x

i

, each seen as one byte of the 7 registers of the permutation P

1024

restricted to 6 rounds: the input, the 5 internal and the output registers. These variables encode truncated differential paths: the variable x

i

equal to 0 or 1 depending on whether or not a differential path is active at this byte. As it is now classical, we can write the action of Sh and MB as 6 × 16 linear inequalities between the x

i

’s of the type

X

x∈P

x + X

x0∈P0

x

0

≥ 9 t and ∀ v ∈ P ∪ P

0

, tv .

The sets P (resp. P

0

) are subsets with 8 variables that span the r-th round register (resp.

the r + 1-th round register) and the variables t that indicate whether a column is active or not. We furthermore add 6 inequalities, which state that the sum of the 16 variables t defined by Round r is at most 7. This done, we ask for minimizing the sum of these 6 × 16 variables t.

After few minutes with the gurobi solver [Gur17], we found numerous such paths.

Most of them are trivially linked: any shift by a fixed amount of a differential path yields an equivalent path. Finally, two iterative paths catch our attention, the one that we use (see Figure 3) and a second one defined as follows (that does not seem to be better for our purpose).

Sh MB

Note that in both cases, five columns do not reach the MDS bound and we can easily derive from them sparser paths (see Remark 5 in Section 4.4).

Remark 4. The number of non-zero columns depends directly on Sh. But more obvious choices lead to much worse behaviors, for instance shifting by 0, 1, 2, 3, 4, 5, 6 and 7 (instead of 11) positions yields a reproducible differential pattern on only 5 columns.

Outbound Rounds

We take advantage of this MILP approach for searching low-cost outbound differential characteristic as well. For this task, we define the 128 first variables x

0

, . . . , x

127

as being equal to the “inbound” differential pattern that we have selected, we also add that after three rounds the output register must have more than 16 non-active bytes and we trace in new variables c the number of zero bytes x

i

in the output in each active column by the MB transformation. We ask then for minimizing the sum of these variables c, under the condition than the bytes of a register can not be all equal to one.

The best path that we find is the one of Figure 3. Its output register contains 24 zero bytes, up to a transition probability equal to β

−25

= 1/2

200

: β

−22

for Round 1 and β

−3

for Round 2. We found another path in this way, but with a slightly smaller probability transition, β

−26

, and 17 zero bytes in the output. We give it below for the sake of completeness.

MBSh MBSh MBSh

(11)

3.4 SuperSBox Description

For truncated differential paths, ARC transformations may be ignored as they have no impact on truncated differences. Since Sh transformations only move the difference positions and SB transformations apply independently on bytes, they commute and we may permute the applications of these transformations.

Following [DR06], we now define the two following transformations.

• A non-linear SuperSBox (SSB) transformation which applies independently on columns,

SuperSBox := SubBytesMixBytesSubBytes .

We will refer as SuperSBox (SSB) indifferently to consider this transformation on full states or on single columns.

• A linear SuperLinear (SL) transformation,

SuperLinear := ShiftBytesWideMixBytesShiftBytesWide . Our 11-round truncated differential path given in Figure 3 may then be rewritten in a more compact form as in Figure 4.

1

2 3 4 5

6 7 8 9

10 11 12 13

14

SL1 SSB1 SL2

SSB2

SL3 SSB3 SL4

SSB4

SL5 SSB5 Sh

MB SB

Figure 4: The 11-round truncated differential path with a SuperSBox description.

4 The Distinguisher

4.1 Notations

Byte values are seen as elements in B = F

28

. Column values are seen as elements in

C = B

8

and for X ∈ C , we denote by X

i

the i

th

coordinate of X which is also the i

th

byte of the column. State values are seen indifferently as elements in S ' B

128

' C

16

and

for Y ∈ S, we denote by Y

i,j

the byte value in the i

th

row and the j

th

column. For now

(12)

on, all References 1-14 to (grid) patterns are linked to Figure 4. For i ∈ {1, . . . , 14}, P

i

denotes the B-linear subspace of differential values which agree with Pattern i (whose non-zero byte positions are included in hatched cell positions of Pattern i) and I

i

is the set of column index which have active bytes. We denote by δ

|j

the restriction of a differential value δ to the j

th

column and by extension (P

i

)

j

denotes the linear subspace of C which agree with the j

th

column of Pattern i. We call completion of a partial state value V (typically a column value) a full state value whose restriction to byte positions of V is V . By extension, we call completion of a pair of partial state values defined on the same byte positions (V

1

, V

2

) a pair of completions of the V

i

’s such that their difference in other byte positions than byte positions of the V

i

’s is zero.

4.2 Sketch of the Algorithm

We give an overview of the distinguishing algorithm here. Detailed explanations are postponed to subsequent sections, exception made for the outbound phase, which is trivial.

Step 1. We construct a basis for ∆

1

, the 12-dimensional B-vector subspace of elements δ

1

in S whose non-zero byte positions are included in Pattern 4 and whose images by SL

2

, δ

01

, have their non-zero byte positions included in Pattern 5:

1

= {δ

1

P

4

| δ

01

= SL

2

1

) ∈ P

5

} . (4) We construct a basis for ∆

2

= {δ

2

P

6

| δ

02

= SL

3

2

) ∈ P

7

} and a basis for ∆

3

= {δ

3

P

8

| δ

30

= SL

4

3

) ∈ P

9

} too. The use of these basis allows us to enumerate elements in these subspaces with negligible computational and memory complexities. The construction of these basis requires O (1) computational and memory complexities.

Step 2. We choose an arbitrary δ

2

in ∆

2

. Column by column, for all columns indexed by I

6

, we store in lists (C

i

)

i∈I6

all pairs of column values whose differences are compatible with δ

2

and whose images by SSB

−12

have differences suitable with Pattern 5. Identically for all columns indexed by I

7

, we store in lists (C

j0

)

j∈I7

all pairs of column values whose differences are compatible with δ

20

and whose images by SSB

3

have differences suitable with Pattern 8. We compute thus for all i in I

6

, respectively for all j in I

7

, the lists C

i

, respectively C

j0

, defined by:

C

i

= { (X, Y ) ∈ C

2

| XY = (δ

2

)

|i

, SSB

−12

(X ) ⊕ SSB

−12

(Y ) ∈ (P

5

)

|i

} , C

j0

= { (X, Y ) ∈ C

2

| XY = (δ

02

)

|j

, SSB

3

(X ) ⊕ SSB

3

(Y ) ∈ (P

8

)

|j

} . The construction of these lists requires O

7

) computations and memory space.

Step 3. From O(β

6

) elements δ

1

of ∆

1

, we compute and store β

6

pairs of 7-column values in a list E, built from combinations of elements in the lists (C

i

)

i∈I6

, such that any completion induced by this pair of indexed 7-column values has a difference which is equal to δ

2

and has the difference of its images by SL

−12

◦SSB

−12

that lies in ∆

1

. From a similar enumeration in ∆

3

, in a list F, we store β

6

pairs of 7-column values built from combinations of elements in the lists (C

j0

)

j∈I7

such that any completion induced by these pairs of indexed 7-column values has a difference which is equal to δ

02

and has the difference of its images by SSB

3

lying in ∆

3

. This step costs O(β

6

) computations and the same in memory.

Step 4. We find (e, e ⊕ (δ

2

)

|I6

) in E and (f, f ⊕ (δ

20

)

|I7

) in F such that (f, f ⊕ (δ

02

)

|I7

)

admits completions whose images by SL

−13

do not contradict with (e, e ⊕ (δ

2

)

|I6

). An

arbitrary choice in E ×F yields such completions only with probability β

−12

. However, β

28

completions are available whenever it does. Such a pair is found in O(β

7

) computations

and O

6

) in memory.

(13)

Step 5. We determine a pair (s, s ⊕ δ

30

) of 34-byte values

1

whose byte positions are induced by Pattern 9 and whose difference is suitable with δ

03

induced by (f, f ⊕ (δ

20

)

I7

).

This pair is built in such a way that the image of any completion of these 34-byte values by SSB

4

has its difference in P

10

and that there exist β

72

completions of (f, f ⊕ (δ

02

)

|I7

) whose images by SL

4

SSB

3

do not contradict with these 34-byte values. This pair of 34-byte values can be found in O(β

3

) computations.

Step 6. Among the β

28

completions found in Step 4 and the β

72

completions found in Step 5, we compute an intersection of β

6

completions. Propagated outwards, all these β

6

pairs of full state values have their differences following the truncated differential path from Pattern 4 to Pattern 10. This has O(β

9

) computational complexity, requires O(β

7

) memory space and concludes the inbound phase.

Step 7. By enumerating the β

6

pairs of full state values collected in Step 6, we find a pair which satisfies simultaneously both remaining independent probabilistic transitions:

through MB from Pattern 10 to Pattern 11 and through MB

−1

from Pattern 4 to Pattern 3. This outbound phase requires O(β

6

) computations.

To summarize, this algorithm constructs a pair of full state values such that when propagated through 11 rounds of P

1024

, the successive differences agree with the truncated differential path of Figure 4. This is done with O(β

9

) ' 2

72

< 2

80

computational complexity and O(β

7

) ' 2

56

< 2

64

memory complexity. The generic attack on such input and output patterns requires about β

12

= 2

96

computations (see Section 3.1).

4.3 Step 1: Construction of the Basis of Linear Subspaces

i

We explain how to construct a basis of ∆

1

, the B-vector space defined by Equation (4).

Making explicit the SL

2

transformation yields Figure 5.

4 4.2 4.3 5

Sh MB Sh

Figure 5: The truncated differential path verified by elements of the set ∆

1

. From the 34 hatched cells of Pattern 4 , we see that P

4

is a B-linear subspace of dimension 34. Recalling that MB applies independently on columns, we first focus on the transition of the 9

th

column of Pattern 4.2 through MB (see Figure 6).

MB

Figure 6: Differential through a MixBytes transformation.

Since MB satisfies the MDS property, the following subspace is of dimension 1:

{ X ∈ C | X

1

= X

8

= 0 and (MB(X ))

3

= . . . = (MB(X))

7

= 0 } .

By Gaussian elimination, we find {b

9

}, a basis of this subspace. The same procedure yields {b

15

}, a basis of the subspace of dimension 1 corresponding to Column 15 and

1By a slight abuse of notation,δ03 refers here indifferently to differential values that span the whose state or to the restriction to the active bytes.

(14)

{b

10

, b

010

}, . . . , {b

14

, b

014

} basis of the subspaces of dimension 2 corresponding respectively to Columns 10 to 14. We construct then full state values (Y

i

)

i∈I4

and (Y

i0

)

i∈{10,...,14}

as completions of (b

i

)

i∈I4

and (b

0i

)

i∈{10,...,14}

with 0 bytes in the remaining bytes positions.

A basis of ∆

1

is then given by

Sh

−1

(Y

9

), Sh

−1

(Y

15

), Sh

−1

(Y

10

), Sh

−1

(Y

100

), . . . , Sh

−1

(Y

14

), Sh

−1

(Y

140

) . Basis for ∆

2

and ∆

3

are built in a same way.

4.4 Step 2: Pairs of Columns with Input/Output SuperSBox Differ- ences

Let δ

2

be an element of ∆

2

. The purpose of this step is to compute and store the lists C

i

for all iI

6

and the lists C

j0

for all jI

7

. Exhaustive search could achieve this goal in 14 · β

8

computations but we will show now how to do it faster, in the spirit of [SLW

+

10].

Let us focus on C

8

, that stores elements in C

2

whose difference equals (δ

2

)

|8

and whose images by SSB

−12

have their difference in (P

5

)

|8

. To this purpose, Figure 7 introduces two intermediate patterns by decomposing SSB

2

transformation applied on the arbitrarily chosen 8

th

column.

5 5.2 5.3 6

SB MB SB

Figure 7: A truncated differential through a SuperSBox operation.

We denote by D

1

the set of differential values agreeing with Pattern 5.3 of Figure 7 and having their image by MB

−1

transformation agreeing with Pattern 5.2,

D

1

= {δ ∈ (P

6

)

|8

| MB

−1

(δ) ∈ (P

5

)

|8

} . From Equation (3), we have |D

1

| = β. We now denote by D

2

the set

D

2

= {δ ∈ C | ∃ X ∈ C s.t. SB

−1

(X ⊕ (δ

2

)

|8

) ⊕ SB

−1

(X ) = δ} . From Equation (1), |D

2

| = β

3

· 2

−3

and

δD

2

{X ∈ C | SB

−1

(X ⊕ (δ

2

)

|8

) ⊕ SB

−1

(X ) = δ}

= 2

3

· β

5

. By Assumption (2), |D

1

D

2

| = 2

−3

· β. The following list has then cardinal β

6

:

C

8

= [

δ∈D1∩D2

{(X, X + (δ

2

)

|8

) ∈ C

2

| SB

−1

(X ⊕ (δ

2

)

|8

) ⊕ SB

−1

(X ) = δ} .

To compute C

8

, we consider all pairs (X, X ⊕ (δ

2

)

|8

) ∈ C

2

such that the restriction of

X on the non-active bytes (white cells) is 0. We store, in an intermediate list H , pairs

verifying that the restriction of MB

−1

(SB

−1

(X ⊕ (δ

2

)

|8

) ⊕ SB

−1

(X)) to the first and

eighth byte positions is 0. We store then in C

8

, ordered according to the sorting key

δ(X ) = SSB

−12

(X ) ⊕ SSB

−12

(X ⊕ (δ

2

)

|8

), all elements in C

2

such that their restriction to

the first, second and eight byte positions equals restriction to those byte positions of some

element in H and such that on the remaining byte positions, the difference is 0. For now

on, when considering elements in C

8

it could be following the context initial pairs of values

or their images through SSB

−12

.

(15)

Remaining lists are computed with the same routine, the computational complexity to construct them correspond to their memory complexity, i.e. the size of the lists: C

8

, C

9

, C

10

, C

11

, C

12

, C

13

, C

14

are respectively of size β

6

, β

7

, β

6

, β

5

, β

4

, β

3

, β

3

and C

60

, C

70

, C

80

, C

90

, C

100

, C

110

, C

120

are respectively of size β

3

, β

3

, β

4

, β

5

, β

6

, β

7

, β

6

.

Remark 5. We could have considered another truncated differential path, sparser, which is strictly included in the path of Figure 3, replacing Pattern 5 with the following Pattern 5

0

.

50

We expect this path to be realized by fewer pairs of full state values. The selection of this pattern reduces the dimension of ∆

1

from 12 to 7. It remains large enough to mount the attack and induces here lists C

i

and C

i0

of maximum size β

6

. This step has then O(β

6

) computational and memory complexities.

4.5 Step 3: Pairs of Partial States Satisfying

1

to δ

2

and δ

2

to

3

We pick

2

arbitrary differential values δ

1

in ∆

1

. For each δ

1

in ∆

1

, we compute its image δ

01

by SL

2

. For i in I

6

, we consider all pairs of columns values in C

i

computed in Step 2 whose difference through SSB

−12

equals (δ

10

)

|i

. The computational cost is simply a search in a sorted list. Whenever a match is found simultaneously for each of these seven columns, we store in a list E the pairs of 7-column values computed as the concatenations of all corresponding pairs of columns values in the C

i

, whose difference is (δ

2

)

|I6

and whose images through SSB

−12

have a difference equal to (δ

01

)

|I6

. We reproduce this routine until we get β

6

such pairs of 7-column values. This requires to enumerate O(β

6

) elements in ∆

1

(see Remark 6). This done, we get the desired list E with β

6

elements.

A list F of β

6

pairs of 7-column values whose differences equal δ

20

and whose images by SSB

3

have a difference which equals (δ

3

)

|I7

for some δ

3

in ∆

3

is built following the same procedure with the lists C

j0

.

Remark 6. By Assumption (1), the map SSB

−12

has the behavior of an ideal SBox applied independently on each column. For a fixed δ

01

, whenever there exists X in C

7

such that

SBB

−12

(X ⊕ (δ

2

)

|I6

) ⊕ SBB

−12

(X ) = (δ

10

)

|I6

) ,

which holds with probability 2

−7

, we have 2

7

elements X with the same property.

Since C

8

, . . . , C

14

computed in Step 2 store all possible pairs of column values whose differences equal respectively (δ

2

)

|8

, . . . ,

2

)

|14

and whose images have a difference lying in P

5

, we find from the lists C

i

, with a probability of 2

−7

for an arbitrary element δ

1

in ∆

1

, 2

7

pairs of partial state values, on columns indexed by I

6

, whose differences equal (δ

2

)

|I6

and images by SSB

−12

have a difference which equals (δ

01

)

|I6

.

4.6 Step 4: First Patching up

We want to find (e, e ⊕ (δ

2

)

|I6

) in E and (f, f ⊕ (δ

20

)

|I7

) in F such that there exists a completion of (f, f ⊕ (δ

20

)

|I7

) whose image by SL

−13

does not contradict with (e, e ⊕ (δ

2

)

|I6

).

We call such a completion a matching completion of (f, f ⊕ (δ

02

)

|I7

) with (e, e ⊕ (δ

2

)

|I6

).

Since δ

20

= SL

3

2

), we know that any matching completion of f with e is a match- ing completion of f ⊕ (δ

02

)

|I7

with e ⊕ (δ

2

)

|I6

. We show now that for an arbitrary

2We make use of the basis computed at Step 1 for the 12-dimensionalB-vector subspace ∆1.

(16)

(e, e ⊕ (δ

2

)

|I6

), (f, f ⊕ (δ

20

)

|I7

)

in E × F , a matching completion exists only with proba- bility β

−12

. We then show how to find such a pair in O(β

7

) computations and O(β

6

) in memory.

Figure 8 introduces two intermediate patterns by decomposing the SL

3

transformation.

Pink cells in Pattern 6.3 correspond to byte values fixed by f whereas green cells in Pattern 6.2 correspond to byte values fixed by e.

6 6.2 6.3 7

Sh MB Sh

Figure 8: Fitting the two pieces, first edition.

MB applies independently on columns, we can therefore analyze local columns transi- tions. Two types of columns have to be distinguished: In columns 7 to 14, there are too many of the c constraints (green cells in Pattern 6.2) to be compensated by the d degrees of freedom (white cells in 6.3). A local matching completion is then possible only with probability β

d−c

. In the remaining columns, there are enough of the d degrees of freedom (white cells in Pattern 6.3) to compensate the c constraints (green cells in Pattern 6.2).

We have β

d−c

local matching completions for each of these columns.

We do now the exact analysis for the 7

th

column. Let’s denote by M = {m

i,j

} the matrix of MB. Two degrees of freedom are available (byte positions 1 and 8) and three constraints are imposed by fixed values (byte positions 1, 2 and 8). Here, we want to determine given values x

2

, . . . , x

7

and y

1

, y

2

, y

8

whether there exist x

1

and x

8

verifying Equation (5) or not,

∀` ∈ {1, 2, 8},

7

X

j=2

m

`,j

x

j

+ m

`,1

x

1

+ m

`,8

x

8

= y

`

. (5) Since M is MDS, the minor m

1,1

· m

2,8

m

1,8

· m

2,1

is non zero, we can then write

x

1

= l

1,7

(x

2

, . . . , x

7

) + g

1,7

(y

1

, y

2

) and x

8

= l

8,7

(x

2

, . . . , x

7

) + g

8,7

(y

1

, y

2

) , where g

1,7

, g

8,7

, l

1,7

and l

8,7

are linear forms depending only on MB and on the positions of the fixed bytes. Introducing L

7

and R

7

two linear forms obtained by substituting x

1

and x

8

by their expressions, Equation (5) is then equivalent to

x

1

= l

1,7

(x

2

, . . . , x

7

) + g

1,7

(y

1

, y

2

) , x

8

= l

8,7

(x

2

, . . . , x

7

) + g

8,7

(y

1

, y

2

) , L

7

(x

2

, . . . , x

7

) = R

7

(y

1

, y

2

, y

8

) .

A match between e and f is then possible on the 7

th

column if and only if L

7

(e

2,7

, ..., e

7,7

) equals R

7

(f

2,1

, f

2,2

, f

2,8

). For the sake of simplicity, we drop the byte positions and shall rather write L

7

(e) and R

7

(f ).

We find the two linear forms L

13

and R

13

by conducting the same analysis for the 13

th

column. For i ∈ {8, . . . , 12}, we find couple of linear forms L

i

, L

0i

and R

i

, R

0i

. We get indeed two pairs of linear forms since the difference between the number of constraints and the degree of freedom is not 1 anymore but 2. We reproduce these arguments for the remaining Columns 1 to 6 and 14 to 16. As there are locally more degrees of freedom than constraints, each choice of (e, e ⊕ (δ

2

)

|I6

), (f, f ⊕ (δ

20

)

|I7

)

in E × F admits local

matching completions. Finally, we have a matching completion of f with e if and only

if the twelve equations L

7

(e) = R

7

(f ), L

13

(e) = R

13

(f), L

8

(e) = R

8

(f ), L

08

(e) = R

08

(f ),

(17)

. . ., L

12

(e) = R

12

(f ), L

012

(e) = R

012

(f ) are simultaneously satisfied. Uniformly distributed values fulfill this set of equations with probability β

−12

.

We show now how to find a pair (e, e ⊕ (δ

2

)

|I6

), (f, f ⊕ (δ

20

)

|I7

)

in E × F for which a matching completion of f with e exists in O(β

7

) computations: We sort the list F according to the lexicographic order given by (R

7

(f ), R

13

(f ), R

8

(f ), R

08

(f ), . . . , R

12

(f ), R

012

(f )) in O(β

7

) computations and O(β

6

) in memory. For each element (e, e ⊕ (δ

2

)

|I6

) of E, we compute the value (L

7

(e), L

13

(e), L

8

(e), L

08

(e), . . . , L

12

(e), L

012

(e)) and we check if it belongs to the previous sorted list. This costs O(β

6

) computations and O(β

6

) searches in a sorted list. Since β

12

pairs (e, e ⊕ (δ

2

)

|I6

), (f, f ⊕ (δ

02

)

|I7

)

in E × F are available and since a matching completion of f with e exists with probability β

−12

, we expect to find a match.

No pair admitting a matching completion should happen, we start again from Step 2 with another δ

2

. For now on, we suppose that such a pair exists. We find it in O(β

7

) computations and O

6

) in memory.

1 1 1 1 1

2 2 2 2 2 2 3 3 3 3 3 3 3

4 4 4 4 4 4 4

5 5 5 5 5 5 5

6 6 6 6 6 6

7 7

7 8 8 8

8 9 9 9 9

9 MB

6.2 6.3

Figure 9: Fixed bytes and lists of elements.

The pair (e, e ⊕ (δ

2

)

|I6

), (f, f ⊕ (δ

02

)

|I7

)

in E × F being now fixed, all byte values in Columns 7 to 13 of Pattern 6.3 are fixed by this choice (red cells in Figure 9).

We compute now all matching completions of f with e. We construct for each of the nine remaining columns the lists of column values satisfying the linear constraints imposed by fixed bytes. The construction of these lists does not differ from what we just did with the use of linear forms analogous to g

i,j

and f

i,j

. We have then, as illustrated by Figure 9, B

1

, B

2

, B

3

, B

4

and B

9

, lists of β

4

elements in the 1

st

, 2

nd

, 3

rd

, 4

th

and 16

th

columns, B

5

and B

8

, lists of β

3

elements in the 5

th

and the 15

th

columns and B

6

and B

7

, lists of β elements in the 6

th

and 14

th

columns. We get then β

28

possible matching completions as direct product of the lists B

i

(we cannot store this product in a unique list, since this would require outrageous computational and memory complexities).

4.7 Step 5: Second Patching up

At this point of the algorithm, the differential value δ

2

has been fixed. We have built E, a list of β

6

pairs of 7-column values indexed by I

6

such that their difference is (δ

2

)

|I6

and the difference of the images of any completion by SL

−12

SSB

−12

is in ∆

1

. Similarly, we have built F, a list of β

6

pairs of 7-column values indexed by I

7

such that their difference is (δ

02

)

|I7

and the difference of the images of any completion by SSB

3

is in ∆

3

. We have found (e, e ⊕ (δ

2

)

|I6

) in E and (f, f ⊕ (δ

20

)

|I7

) in F such that there exist β

28

matching completions of f with e, whose images by SL

−13

do not contradict with e. Columns e and f are then now fixed. We call δ

1

∈ ∆

1

and δ

3

in ∆

3

the differential values induced by these choices when propagating (e, e ⊕ (δ

2

)

|I6

) backward by SSB

−12

and (f, f ⊕ (δ

20

)

|I7

) forward by SSB

3

. We denote by (f

0

, f

0

⊕ (δ

3

)

|I7

) the image of (f, f ⊕ (δ

20

)

|I7

) by SSB

3

.

Each of these β

28

pairs of full state values, when propagated backward and forward,

have their differences agreeing with Figure 4 from Pattern 4 to 9. To fulfill the whole

truncated differential path, three independent probabilistic transitions remain: through

Références

Documents relatifs

By triple puncturing of the dodecacode, we obtain an additive code of length 9, and taking its coset graph, we construct a strongly regular graph of parameters ( 64 , 27 , 10 , 12

In Section 2, we review Cop- persmith’s method, the theory of elliptic curves, Demytko’s elliptic curve cryp- tosystem and the Elliptic Curve Method ECM for factorization.. In

Since then, Coppersmith’s method has been applied in various applications in cryptography, mainly to attack the RSA cryptosystem.. A typical example is the

We have proposed an attack on three variants of the RSA cryptosystem, namely the Kuwakado-Koyama-Tsuruoka extension for singular elliptic curves, Elkamchouchi et al.’s extension of

Proof: Summing up implications of all m1 messages sent by Initiator parties to Responder parties, at time t, Victim has installed σ × t connections in its memory.. However, all

In this experimental study, a clarinet-like instrument is blown through an artificial mouth, which allows the time profile of the blowing pressure to be controlled during the

As a direct application of this differential q-multicollision distinguisher we show that AES-256 when used in the Davies-Meyer mode allows to construct q pseudo-collisions with

For that second step, we simulate the fault injection by debugging with GDB the optimized round 3 ARM64 implementation of SIKE [14] with curve p434 (non compressed version) using