Collision-Resistant Hash Functions - Message Authentication Codes and Collision-Resistant Hash

Message Authentication Codes and Collision-Resistant Hash Functions

4.6 Collision-Resistant Hash Functions

Collision-resistant hash functions (sometimes called “cryptographic” hash functions) have many applications in cryptography and computer security. In this section we will study collision-resistant hash functions and how they are constructed. In the next section, we will show how they are used in order to construct secure message authentication codes (this explains why we study collision-resistant hash functions in this chapter).

In general, hash functions are just functions that take arbitrary-length strings andcompress them into shorter strings. The classic use of hash

func-tions is in data structures as a way to achieveO(1) lookup time for retrieving an element. Specifically, if the size of the range of the hash functionH isN, then a table is first allocated with N entries. Then, the elementx is stored in cell H(x) in the table. In order to retrievex, it suffices to compute H(x) and probe that entry in the table. Observe that since the output range of H equals the size of the table, the output length must be rather short (or else, the table will be too large). A “good” hash function for this purpose is one that yields as few collisions as possible, where a collision is a pair of distinct data items x and x⁰ for which H(x) = H(x⁰). Notice that when a collision occurs, two elements end up being stored in the same cell. Therefore, many collisions may result in a higher than desired retrieval complexity. In short, what is desired is that the hash function spreads the elements well in the table, thereby minimizing the number of collisions.

Collision-resistant hash functions are similar in principle to those used in data structures. In particular, they are also functions that compress their output by transforming arbitrary-length input strings into output strings of a fixed shorter length. Furthermore, as in data structures, collisions are a problem. However, there is a fundamental difference between standard hash functions and collision-resistant ones. Namely, thedesire in data structures to have few collisions is converted into amandatory requirementin cryptography.

That is, a collision-resistant hash function must have the property that no polynomial-time adversary can find a collision in it. Stated differently, no polynomial-time adversary should be able to find a distinct pair of values x and x⁰ such that H(x) = H(x⁰). We stress that in data structures some collisions may be tolerated, whereas in cryptography no collisions whatsoever are allowed. Furthermore, the adversary in cryptography specifically searches for a collision, whereas in data structures, the “data items” do not attempt to collide intentionally. This means that the requirements on hash functions in cryptography are much more stringent than the analogous requirements in data structures. It also means that cryptographic hash functions are harder to construct.

4.6.1 Defining Collision Resistance

A collision in a function H is a pair of distinct inputs x andx⁰ such that H(x) =H(x⁰); in this case we also say thatxandx⁰ collide under H. As we have mentioned, a function H iscollision-resistant if it is infeasible for any probabilistic polynomial-time algorithm to find a collision in H. Typically we will be interested in functions H that have an infinite domain (i.e., they accept all strings of all input lengths) and a finite range. Note that in such a case, collisions mustexist (by the pigeon-hole principle). The requirement is therefore only that such collisions should be “hard” to find. We will sometimes refer to functions H for which both the input domain and output range are finite. However, we will only be interested in functions that compress the input, meaning that the length of the output is shorter than that of the

input. We remark that collision resistance is trivial to achieve if compression is not required: for example, the identity function is collision resistant.

More formally, one usually deals with afamily of hash functions indexed by a “key”s. This is not a usual cryptographic key, at least in the sense that it is not kept secret. Rather, it is merely a means to specify a particular function H^sfrom the family. The requirement is that it must be hard to find a collision in H^sfor a randomly-chosen keys. We stress that s is not a secret key and as such it is fundamentally different to the keys that we have seen so far in this book. In order to make this explicit, we use the notationH^s(rather than the more standard Hs). As usual, we begin by defining the syntax for hash functions.

DEFINITION 4.9 (hash function – syntax): A hash functionis a pair of probabilistic polynomial-time algorithms (Gen, H)fulfilling the following:

• Genis a probabilistic algorithm which takes as input a security parameter 1ⁿ and outputs a keys. We assume that1ⁿ is included ins(though, we will let this be implicit).

• There exists a polynomial ` such that H is(deterministic) polynomial-time algorithm that takes as input a keys and any string x ∈ {0,1}^∗, and outputs a stringH^s(x)∈ {0,1}^`(n).

If for everynands,H^sis defined only over inputs of length`⁰(n)and`⁰(n)>

`(n), then we say that (Gen, H) is a fixed-length hash function with length parameter`⁰.

Notice that in the fixed-length case we require that `⁰ be greater than `.

This ensures that the function is a hash function in the classic sense in that it compresses the input. We remark that in the general case we have no requirement on`because the function takes for input all (finite) binary strings, and so in particular all strings that are longer than`(n). Thus, by definition, it also compresses (albeit only strings that are longer than `(n)). We now proceed to define security. As usual, we begin by defining an experiment for a hash function Π = (Gen, H), an adversaryAand a security parametern:

The collision-finding experimentHash-coll_A,Π(n):

1. A keysis chosen: s←Gen(1ⁿ)

2. The adversary A is given s and outputs a pair x and x⁰. Formally, (x, x⁰)← A(s).

3. The output of the experiment is 1 if and only if x 6=x⁰ and H^s(x) =H^s(x⁰). In such a case we say that A has found a collision.

The definition of collision resistance for hash functions states that no efficient adversary can find a collision except with negligible probability.

DEFINITION 4.10 A hash function Π = (Gen, H) is collision resistant if for all probabilistic polynomial-time adversaries A there exists a negligible function neglsuch that

Pr [Hash-coll_A,Π(n) = 1]≤negl(n)

Terminology: For simplicity, we refer to H, H^s, and (Gen, H) all using the same term “collision-resistant hash function.” This should not cause any confusion.

4.6.2 Weaker Notions of Security for Hash Functions

Collision resistance is a strong security requirement and is quite difficult to achieve. However, in some applications it suffices to rely on more relaxed requirements. When considering “cryptographic” hash functions, there are typically three levels of security considered:

1. Collision resistance: This is the highest level and the one we have con-sidered so far.

2. Second preimage resistance: Informally speaking, a hash function is sec-ond preimage resistant if given s and x it is hard for a probabilistic polynomial-time adversary to findx⁰ such that H^s(x) =H^s(x⁰).

3. Preimage resistance: Informally, a hash function is preimage resistant if given s and some y it is hard for a probabilistic polynomial-time adversary to find a valuex⁰ such thatH^s(x⁰) =y. This is exactly the notion of a one-way function that we will describe in Chapter 6.1 (except that here we include a keys).

Notice that any hash function that is collision resistant is second preimage resistant. Intuitively, this is the case because if givenxan adversary can find x⁰for whichH^s(x) =H^s(x⁰), then it can clearly find a colliding pairxandx⁰ from scratch. Likewise, any hash function that is second preimage resistant is also preimage resistant. This is due to the fact that if it is possible to invert y and find an x⁰ such thatH^s(x⁰) =y then it is possible to takex, compute y=H^s(x) and invert it again obtainingx⁰. Since the domain ofH is infinite, it follows that with good probability x 6= x⁰. We conclude that the above three security requirements form a hierarchy with each definition implying the one below it. We remark that our arguments here can all be formalized and we leave this for an exercise.

4.6.3 A Generic “Birthday” Attack

Before we show how to construct collision-resistant hash functions, we present a generic attack that finds collisions inanylength-decreasing function (for simplicity of notation, we will focus on the basic case where the input domain is infinite). This attack implies a minimal output length necessary for a hash function to potentially be secure against adversaries of a certain time complexity, as is subsequently explained.

Assume that we are given a hash function H^s : {0,1}^∗ → {0,1}^` (for notational convenience we set`=`(n)). Then, in order to find a collision, we choose random (distinct) inputs x1, . . . , xq∈ {0,1}^2`, computeyi :=H^s(xi) for alli, and check whether any of the twoyi values are equal.

What is the probability that this algorithm finds a collision? Clearly, if q >2^`, then this occurs with probability 1. However, we are interested in the case of a smallerq. It is somewhat difficult to analyze this probability exactly, and so we will instead analyze an idealized case in whichH^s is treated as a random function.² That is, for eachxi we assume that the valueyi=H^s(xi) is uniformly distributed in{0,1}^`independent of any of the previous output values{yj}^j<i(recall we assume all{xi}are distinct). We have thus reduced our problem to the following one: if we choose values y1, . . . , yq ∈ {0,1}^` uniformly at random, what is the probability that there exist distinct i, j withyi =yj?

This problem has been extensively studied, and is related to the so-called birthday problem. In fact, the collision-finding algorithm we have described is often called a “birthday attack.” The birthday problem is the following: if q people are in a room, what is the probability that there exist two people with the same birthday? (We assume birthdays are uniformly and independently distributed among the 365 days of a non-leap year.) This is exactly the same as our problem: ifyi represents the birthday of personi, then we have y1, . . . , yq∈ {1, . . . ,365}chosen uniformly at random. Furthermore, matching birthdays correspond to distinct i, j with yi = yj (i.e., matching birthdays correspond to collisions).

It turns out that when q = O(√

2^`) (or equivalently O(2^`/2)), then the probability of such a collision is greater than one half. (In the real birthday case, it turns out that if there are 23 people in the room, then the probability that two have the same birthday is at greater than one half.) We prove this fact in Section A.4.

Birthday attacks on hash functions – summary. If the output length of a hash function is ` bits then the aforementioned birthday attack finds a collision with high probability inO(q) =O(2^`/2) time (for simplicity, we as-sume that evaluating H^s can be done in constant time, and ignore the time

2Actually, it can be shown that this is (essentially) the worst case, and the algorithm finds collisions with higher probability ifH^sdeviates significantly from random.

required to make all the comparisons). We therefore conclude that for the hash function to resist collision-finding attacks that run in timeT, the output length of the hash function needs to be at least 2 logT bits. When considering asymptotic bounds on security, there is no difference between a naive attack that tries 2^`+ 1 elements and a birthday attack that tries 2^`/2elements. This is because if `(n) = O(logn) then both attacks are polynomial and if `(n) is super-logarithmic then both attacks are not polynomial. Nevertheless, in practice, birthday attacks make a huge difference. Specifically, assume that a hash function is designed with output length of 128 bits. It is clearly in-feasible to run 2¹²⁸steps in order to find a collision. However, 2⁶⁴is already feasible (albeit, still rather difficult). Thus, the existence of generic birthday attacks essentially mandates that any collision-resistant hash function in prac-tice needs to have output that is significantly longer than 128 bits. We stress that having a long enough output is only a necessary condition for meeting Definition 4.10, but is very far from being a sufficient one. We also stress that birthday attacks work only for collision resistance; there are no generic attacks on hash functions for second preimage or preimage resistance that are quicker than time 2^`.

Improved birthday attacks. The birthday attack that we described above has two weaknesses. First, it requires a large amount of memory. Second, we have very little control over the colliding values (note that although we choose theq valuesx1, . . . , xq we have no control over which xi andxj are likely to collide). It is possible to construct better birthday attacks as follows.

The basic birthday attack requires the attacker to store allqvalues, because it cannot know which pair will form a collision until it happens. This is very significant because, for example, it is far more feasible to run in 2⁶⁴time than it is to obtain a disk of size 2⁶⁴. Nevertheless, it is possible to construct a birthday attack that takes time 2^`/2 as above, yet requires only a constant amount of memory. We will only describe the basic idea here, and leave the details as an exercise. The idea is to take two random valuesx1 andy1

and then for every i > 1 to compute xi = H(xi−1) and yi = H(H(yi−1)), until we have some m for which xm = ym. Notice that this means that H(xm−1) =H(H(ym−1)) and soxm−1andH(ym−1) constitute a collision in H (as long as they are distinct, which holds with good probability). It can be shown that this collision is expected to appear afterO(2^`/2) steps, as with the regular birthday attack. Thus, it is not necessary to use a large amount of memory in order to carry out a birthday attack.

The second weakness that we mentioned relates to the lack of control over the colliding messages that are found. We stress that it is not necessary to find

“nice” collisions in order to break cryptographic applications that use collision-resistant hash functions. Nevertheless, it is informative to see that birthday attacks can be made to work on messages of a certain form. Assume that an attacker wishes to prepare two messages x and x⁰ such that H(x) = H(x⁰).

Furthermore, the first message x is a letter from her employer that she was

fired for lack of motivation at work, while the second messagex⁰ is a flattering letter of recommendation from her employer. Now, a birthday attack works by generating 2^`/2different messages and it seems hard to conceive that this can be done for the aforementioned letters. However, it is actually possible to write the same sentence in many different ways. For example, consider the following sentence:

It is (hard)(difficult)(infeasible) to (find)(obtain)(acquire)(locate) collisions in(cryptographic)(collision-resistant) hash functions in (reasonable)(acceptable) (unexcessive)time.

The important point to notice about this sentence is that any combination of the words is possible. Thus, the sentence can be written in 3·4·2·3>

2⁶ different ways. This is just one sentence and so it is actually easy to write a letter that can be rewritten in 2⁶⁴ different ways (you just need 64 words with one synonym each). Using this idea it is possible to prepare 2^`/2 letters explaining why the attacker was fired and another 2^`/2 letters of recommendation and with good probability, a collision between the two types of letters will be found. We remark that this attack does require a large amount of memory and the low-memory version described above cannot be used here.

4.6.4 The Merkle-Damg˚ard Transform

We now present an important methodology that is widely used for con-structing resistant hash functions. The task of concon-structing a collision-resistant hash function is difficult. It is made even more difficult by the fact that the input domain must be infinite (any string of any length may be in-put). The Merkle-Damg˚ard transform is a way of extending a fixed-length collision-resistant hash function into a general one that receives inputs of any length. The method works for any fixed-length collision-resistant hash func-tion, even one that reduces the length of its input by just a single bit. This transform therefore reduces the problem of designing a collision-resistant hash function to the (easier) problem of designing a fixed-length collision-resistant function that compresses its input by any amount (even a single bit). The Merkle-Damg˚ard transform is used in practical constructions of hash func-tions, and is also very interesting from a theoretical point of view since it implies that compressing by one bit is as easy (or as hard) as compressing by an arbitrary amount.

We remark that in order to obtain a full construction of a collision-resistant hash function it is necessary to first construct a fixed-length collision-resistant hash function. We will not show how this is achieved in practice. However, in Section 7.4.2 we will present theoretical constructions of fixed-length collision-resistant hash functions. These constructions constitute a proof that it is pos-sible to obtain collision-resistant hash functions under standard cryptographic assumptions.

For concreteness, we look at the case that we are given a fixed-length collision-resistant hash function that compresses its input by half; that is, the input length is `⁰(n) = 2`(n) and the output length is `(n). In Exer-cise 4.7 you are asked to generalize the construction for any`⁰ > `. We denote the given fixed-length collision-resistant hash function by (Genh, h) and use it to construct a general collision-resistant hash function (Gen, H) that maps inputs of any length to outputs of length`(n). We remark that in much of the literature, the fixed-length collision-resistant hash function used in the Merkle-Damg˚ard transform is called a compression function. The Merkle-Damg˚ard transform is defined in Construction 4.11 and depicted in Figure??.

CONSTRUCTION 4.11 The Merkle-Damg˚ard Transform.

Let (Genh, h) be a fixed-length hash function with input length 2`(n) and output length`(n). Construct a variable-length hash function (Gen, H) as follows:

• Gen(1ⁿ): upon input 1ⁿ, run the key-generation algorithmGenh

of the fixed-length hash function and output the key. That is, outputs←Genh.

• H^s(x): Upon input key s and message x ∈ {0,1}^∗ of length at most 2^`(n)−1, compute as follows:

1. Let L = |x| (the length of x) and let B = _L

(i.e., the number of blocks inx). Padxwith zeroes so that its length is an exact multiple of`.

2. Define z0 := 0^` and then for every i = 1, . . . , B, compute zi :=h^s(zi−1kxi), where h^s is the given fixed-length hash function.

3. Outputz=H^s(zBkL)

We remark that we limit the length of x to be at most 2^`(n)−1 so that its length can fit into a single block of length `(n) bits. Of course, this is not a limitation because we assume that all messages considered are of length polynomial innand not exponential.

The initialization vector. We remark that the valuez0 used in step 2 is arbitrary can be replaced by any constant. This value is typically called the IV orinitialization vector.

THEOREM 4.12 If (Genh, h) is a fixed-length collision-resistant hash function, then (Gen, H)is a collision-resistant hash function.

PROOF We first show that for anys, a collision inH^syields a collision in h^s. Letx andx⁰ be two different strings of respective lengthsLand L⁰ such thatH^s(x) =H^s(x⁰). Letx1· · ·xB be theB blocks of the paddedx, and let x⁰₁· · ·x⁰_B⁰ be theB⁰ blocks of the paddedx⁰. There are two cases to consider.

1. Case 1 –L6=L⁰: In this case, the last step of the computation ofH^s(x) isz=h^s(zBkL) and ofH^s(x⁰) isz=h^s(zB⁰kL⁰). SinceH^s(x) =H^s(x⁰) it follows thath^s(zBkL) = h^s(zB⁰kL⁰). However,L6=L⁰ and so hBkL andhB⁰kL⁰ are two different strings that collide forh^s.

2. Case 2 – L =L⁰: Let zi and z_i⁰ be the intermediate hash values of x andx⁰ (as in Figure??) during the computation ofH^s(x) and H^s(x⁰), respectively. Sincex6=x⁰ and they are of the same length, there must exist at least one index i (with 1 ≤i≤ B) such thatxi 6=x⁰_i. Let i^∗ be thehighest index for which it holds that zi^∗−1kxi^∗ 6=z_i⁰∗−1kx⁰_i∗. If i^∗=B then (zi^∗−1kxi^∗) and (z_i⁰∗−1kx⁰_i∗) constitutes a collision because we know thatH^s(x) =H^s(x⁰) andL=L⁰ implying that thatzB=z_B⁰ . Ifi^∗< B, then the maximality ofi^∗implies that zi^∗=z⁰_i^∗. Thus, once again, (zi^∗−1kxi^∗) and (z_i⁰^∗₋₁kx⁰_i^∗) constitutes a collision. That is, in both cases, we obtain that

zi^∗−1kxi^∗6=z_i⁰∗−1kx⁰_i∗

while

h^s(zi^∗−1kxi^∗) =h^s(z⁰_i^∗₋₁kx⁰_i^∗), meaning that we have found a collision inh^s.

It follows that any collision in the hash functionH^s yields a collision in the fixed-length hash functionh^s. It is straightforward to turn this into a formal

Dans le document II Private-Key (Symmetric) Cryptography 45 (Page 135-146)