A Graduate Course in Applied Cryptography
Dan Boneh and Victor Shoup
Version 0.4, September 2017
Preface
Cryptography is an indispensable tool used to protect information in computing systems. It is used everywhere and by billions of people worldwide on a daily basis. It is used to protect data at rest and data in motion. Cryptographic systems are an integral part of standard protocols, most notably the Transport Layer Security (TLS) protocol, making it relatively easy to incorporate strong encryption into a wide range of applications.
While extremely useful, cryptography is also highly brittle. The most secure cryptographic system can be rendered completely insecure by a single specification or programming error. No amount of unit testing will uncover a security vulnerability in a cryptosystem.
Instead, to argue that a cryptosystem is secure, we rely on mathematical modeling and proofs to show that a particular system satisfies the security properties attributed to it. We often need to introduce certain plausible assumptions to push our security arguments through.
This book is about exactly that: constructing practical cryptosystems for which we can argue security under plausible assumptions. The book covers many constructions for di↵erent tasks in cryptography. For each task we define a precise security goal that we aim to achieve and then present constructions that achieve the required goal. To analyze the constructions, we develop a unified framework for doing cryptographic proofs. A reader who masters this framework will be capable of applying it to new constructions that may not be covered in the book.
Throughout the book we present many case studies to survey how deployed systems operate.
We describe common mistakes to avoid as well as attacks on real-world systems that illustrate the importance of rigor in cryptography. We end every chapter with a fun application that applies the ideas in the chapter in some unexpected way.
Intended audience and how to use this book
The book is intended to be self contained. Some supplementary material covering basic facts from probability theory and algebra is provided in the appendices. The book is divided into three parts.
• Part I develops symmetric encryption which explains how two parties, Alice and Bob, can securely exchange information when they have a shared key unknown to the attacker. We discuss data confidentiality, data integrity, and the important concept of authenticated en- cryption.
• Part II develops the concepts of public-key encryption and digital signatures, which allow Alice and Bob to communicate securely, without having a pre-shared secret key.
• Part III is about cryptographic protocols, such as protocols for user identification, key ex- change, zero knowledge, and secure computation.
A beginning reader can read though the book to learn how cryptographic systems work and why they are secure. Every security theorem in the book is followed by a proof idea that explains at a high level why the scheme is secure. On a first read one can skip over the detailed proofs without losing continuity. A beginning reader may also skip over the mathematical details sections that explore nuances of certain definitions.
An advanced reader may enjoy reading the detailed proofs to learn how to do proofs in cryptog- raphy. At the end of every chapter you will find many exercises that explore additional aspects of the material covered in the chapter. Some exercises rehearse what was learned, but many exercises expand on the material and discuss topics not covered in the chapter.
Status of the book
The current draft contains part I and most of parts II and III. The remaining four chapters are forthcoming. We hope you enjoy this write-up. Please send us comments and let us know if you find typos or mistakes.
Citations: While the current draft is mostly complete, we still do not include citations and references to the many works on which this book is based. Those will be coming soon and will be presented in the Notes section at the end of every chapter.
Dan Boneh and Victor Shoup September, 2017
Contents
1 Introduction 1
1.1 Historic ciphers . . . 1
1.2 Terminology used throughout the book . . . 1
I Secret key cryptography 3 2 Encryption 4 2.1 Introduction . . . 4
2.2 Shannon ciphers and perfect security . . . 5
2.2.1 Definition of a Shannon cipher . . . 5
2.2.2 Perfect security . . . 7
2.2.3 The bad news . . . 13
2.3 Computational ciphers and semantic security . . . 13
2.3.1 Definition of a computational cipher . . . 14
2.3.2 Definition of semantic security . . . 15
2.3.3 Connections to weaker notions of security . . . 18
2.3.4 Consequences of semantic security . . . 22
2.3.5 Bit guessing: an alternative characterization of semantic security . . . 25
2.4 Mathematical details . . . 27
2.4.1 Negligible, super-poly, and poly-bounded functions . . . 28
2.4.2 Computational ciphers: the formalities . . . 29
2.4.3 Efficient adversaries and attack games . . . 32
2.4.4 Semantic security: the formalities . . . 34
2.5 A fun application: anonymous routing . . . 35
2.6 Notes . . . 38
2.7 Exercises . . . 38
3 Stream ciphers 45 3.1 Pseudo-random generators . . . 45
3.1.1 Definition of a pseudo-random generator . . . 46
3.1.2 Mathematical details . . . 48
3.2 Stream ciphers: encryption with a PRG . . . 48
3.3 Stream cipher limitations: attacks on the one time pad . . . 52
3.3.1 The two-time pad is insecure . . . 53
3.3.2 The one-time pad is malleable . . . 53
3.4 Composing PRGs . . . 54
3.4.1 A parallel construction . . . 54
3.4.2 A sequential construction: the Blum-Micali method . . . 59
3.4.3 Mathematical details . . . 61
3.5 The next bit test . . . 64
3.6 Case study: the Salsa and ChaCha PRGs . . . 67
3.7 Case study: linear generators . . . 70
3.7.1 An example cryptanalysis: linear congruential generators . . . 70
3.7.2 The subset sum generator . . . 73
3.8 Case study: cryptanalysis of the DVD encryption system . . . 74
3.9 Case study: cryptanalysis of the RC4 stream cipher . . . 76
3.9.1 Security of RC4 . . . 78
3.10 Generating random bits in practice . . . 80
3.11 A broader perspective: computational indistinguishability . . . 81
3.11.1 Mathematical details . . . 86
3.12 A fun application: coin flipping and commitments . . . 87
3.13 Notes . . . 88
3.14 Exercises . . . 88
4 Block ciphers 94 4.1 Block ciphers: basic definitions and properties . . . 94
4.1.1 Some implications of security . . . 96
4.1.2 Efficient implementation of random permutations . . . 99
4.1.3 Strongly secure block ciphers . . . 99
4.1.4 Using a block cipher directly for encryption . . . 100
4.1.5 Mathematical details . . . 104
4.2 Constructing block ciphers in practice . . . 105
4.2.1 Case study: DES . . . 107
4.2.2 Exhaustive search on DES: the DES challenges . . . 111
4.2.3 Strengthening ciphers against exhaustive search: the 3E construction . . . . 113
4.2.4 Case study: AES . . . 115
4.3 Sophisticated attacks on block ciphers . . . 120
4.3.1 Algorithmic attacks . . . 121
4.3.2 Side-channel attacks . . . 124
4.3.3 Fault-injection attacks on AES . . . 128
4.3.4 Quantum exhaustive search attacks . . . 129
4.4 Pseudo-random functions: basic definitions and properties . . . 130
4.4.1 Definitions . . . 130
4.4.2 Efficient implementation of random functions . . . 131
4.4.3 When is a secure block cipher a secure PRF? . . . 132
4.4.4 Constructing PRGs from PRFs . . . 136
4.4.5 Mathematical details . . . 137
4.5 Constructing block ciphers from PRFs . . . 139
4.6 The tree construction: from PRGs to PRFs . . . 145
4.6.1 Variable length tree construction . . . 149
4.7 The ideal cipher model . . . 152
4.7.1 Formal definitions . . . 152
4.7.2 Exhaustive search in the ideal cipher model . . . 153
4.7.3 The Even-Mansour block cipher and theEXconstruction . . . 156
4.7.4 Proof of the Even-Mansour and EX theorems . . . 157
4.8 Fun application: comparing information without revealing it . . . 163
4.9 Notes . . . 165
4.10 Exercises . . . 165
5 Chosen Plaintext Attack 174 5.1 Introduction . . . 174
5.2 Security against multi-key attacks . . . 176
5.3 Semantic security against chosen plaintext attack . . . 178
5.4 Building CPA secure ciphers . . . 180
5.4.1 A generic hybrid construction . . . 180
5.4.2 Randomized counter mode . . . 186
5.4.3 CBC mode . . . 191
5.4.4 Case study: CBC padding in TLS 1.0 . . . 196
5.4.5 Concrete parameters and a comparison of counter and CBC modes . . . 196
5.5 Nonce-based encryption . . . 198
5.5.1 Nonce-based generic hybrid encryption . . . 200
5.5.2 Nonce-based Counter mode . . . 200
5.5.3 Nonce-based CBC mode . . . 201
5.6 A fun application: revocable broadcast encryption . . . 202
5.7 Notes . . . 205
5.8 Exercises . . . 205
6 Message integrity 212 6.1 Definition of a message authentication code . . . 214
6.1.1 Mathematical details . . . 217
6.2 MAC verification queries do not help the attacker . . . 217
6.3 Constructing MACs from PRFs . . . 220
6.4 Prefix-free PRFs for long messages . . . 222
6.4.1 The CBC prefix-free secure PRF . . . 223
6.4.2 The cascade prefix-free secure PRF . . . 226
6.4.3 Extension attacks: CBC and cascade are insecure MACs . . . 227
6.5 From prefix-free secure PRF to fully secure PRF (method 1): encrypted PRF . . . 228
6.5.1 ECBC and NMAC: MACs for variable length inputs . . . 229
6.6 From prefix-free secure PRF to fully secure PRF (method 2): prefix-free encodings . 232 6.6.1 Prefix free encodings . . . 232
6.7 From prefix-free secure PRF to fully secure PRF (method 3): CMAC . . . 233
6.8 Converting a block-wise PRF to bit-wise PRF . . . 236
6.9 Case study: ANSI CBC-MAC . . . 237
6.10 Case study: CMAC . . . 238
6.11 PMAC: a parallel MAC . . . 240
6.12 A fun application: searching on encrypted data . . . 242
6.13 Notes . . . 242
6.14 Exercises . . . 243
7 Message integrity from universal hashing 248 7.1 Universal hash functions (UHFs) . . . 248
7.1.1 Multi-query UHFs . . . 250
7.1.2 Mathematical details . . . 251
7.2 Constructing UHFs . . . 251
7.2.1 Construction 1: UHFs using polynomials . . . 251
7.2.2 Construction 2: CBC and cascade are computational UHFs . . . 254
7.2.3 Construction 3: a parallel UHF from a small PRF . . . 256
7.3 PRF(UHF) composition: constructing MACs using UHFs . . . 258
7.3.1 Using PRF(UHF) composition: ECBC and NMAC security . . . 261
7.3.2 Using PRF(UHF) composition with polynomial UHFs . . . 261
7.3.3 Using PRF(UHF) composition: PMAC0 security . . . 262
7.4 The Carter-Wegman MAC . . . 262
7.4.1 Using Carter-Wegman with polynomial UHFs . . . 269
7.5 Nonce-based MACs . . . 269
7.5.1 Secure nonce-based MACs . . . 269
7.6 Unconditionally secure one-time MACs . . . 270
7.6.1 Pairwise unpredictable functions . . . 271
7.6.2 Building unpredictable functions . . . 271
7.6.3 From PUFs to unconditionally secure one-time MACs . . . 272
7.7 A fun application: timing attacks . . . 272
7.8 Notes . . . 272
7.9 Exercises . . . 273
8 Message integrity from collision resistant hashing 283 8.1 Definition of collision resistant hashing . . . 286
8.1.1 Mathematical details . . . 286
8.2 Building a MAC for large messages . . . 287
8.3 Birthday attacks on collision resistant hash functions . . . 289
8.4 The Merkle-Damg˚ard paradigm . . . 291
8.4.1 Joux’s attack . . . 294
8.5 Building Compression Functions . . . 294
8.5.1 A simple but inefficient compression function . . . 295
8.5.2 Davies-Meyer compression functions . . . 295
8.5.3 Collision resistance of Davies-Meyer . . . 297
8.6 Case study: SHA256 . . . 298
8.6.1 Other Merkle-Damg˚ard hash functions . . . 300
8.7 Case study: HMAC . . . 302
8.7.1 Security of two-key nest . . . 303
8.7.2 The HMAC standard . . . 305
8.7.3 Davies-Meyer is a secure PRF in the ideal cipher model . . . 306
8.8 The Sponge Construction and SHA3 . . . 309
8.8.1 The sponge construction . . . 309
8.8.2 Case study: SHA3, SHAKE256, and SHAKE512 . . . 314
8.9 Merkle trees: proving properties of a hashed sequence . . . 315
8.9.1 Authenticated data structures . . . 318
8.10 Key derivation and the random oracle model . . . 320
8.10.1 The key derivation problem . . . 320
8.10.2 Random oracles: a useful heuristic . . . 322
8.10.3 Random oracles: safe modes of operation . . . 327
8.10.4 The leftover hash lemma . . . 329
8.10.5 Case study: HKDF . . . 330
8.11 Security without collision resistance . . . 331
8.11.1 Second preimage resistance . . . 331
8.11.2 Randomized hash functions: target collision resistance . . . 333
8.11.3 TCR from 2nd-preimage resistance . . . 333
8.11.4 Using target collision resistance . . . 336
8.12 A fun application: an efficient commitment scheme . . . 339
8.13 Another fun application: proofs of work . . . 339
8.14 Notes . . . 339
8.15 Exercises . . . 339
9 Authenticated Encryption 347 9.1 Authenticated encryption: definitions . . . 348
9.1.1 One-time authenticated encryption . . . 349
9.2 Implications of authenticated encryption . . . 350
9.2.1 Chosen ciphertext attacks: a motivating example . . . 350
9.2.2 Chosen ciphertext attacks: definition . . . 352
9.2.3 Authenticated encryption implies chosen ciphertext security . . . 353
9.3 Encryption as an abstract interface . . . 355
9.4 Authenticated encryption ciphers from generic composition . . . 357
9.4.1 Encrypt-then-MAC . . . 357
9.4.2 MAC-then-encrypt is not generally secure: padding oracle attacks on SSL . 359 9.4.3 More padding oracle attacks. . . 362
9.4.4 Secure instances of MAC-then-encrypt . . . 363
9.4.5 Encrypt-then-MAC or MAC-then-encrypt? . . . 367
9.5 Nonce-based authenticated encryption with associated data . . . 367
9.6 One more variation: CCA-secure ciphers with associated data . . . 370
9.7 Case study: Galois counter mode (GCM) . . . 371
9.8 Case study: the TLS 1.3 record protocol . . . 373
9.9 Case study: an attack on non-atomic decryption in SSH . . . 376
9.10 Case study: 802.11b WEP, a badly broken system . . . 379
9.11 Case study: IPsec . . . 381
9.12 A fun application: private information retrieval . . . 386
9.13 Notes . . . 386
9.14 Exercises . . . 386
II Public key cryptography 393
10 Public key tools 394
10.1 A toy problem: anonymous key exchange . . . 394
10.2 One-way trapdoor functions . . . 395
10.2.1 Key exchange using a one-way trapdoor function scheme . . . 396
10.2.2 Mathematical details . . . 397
10.3 A trapdoor permutation scheme based on RSA . . . 398
10.3.1 Key exchange based on the RSA assumption . . . 400
10.3.2 Mathematical details . . . 400
10.4 Diffie-Hellman key exchange . . . 401
10.4.1 The key exchange protocol . . . 402
10.4.2 Security of Diffie-Hellman key exchange . . . 402
10.5 Discrete logarithm and related assumptions . . . 403
10.5.1 Random self-reducibility . . . 406
10.5.2 Mathematical details . . . 407
10.6 Collision resistant hash functions from number-theoretic primitives . . . 409
10.6.1 Collision resistance based on DL . . . 409
10.6.2 Collision resistance based on RSA . . . 410
10.7 Attacks on the anonymous Diffie-Hellman protocol . . . 412
10.8 Merkle puzzles: a partial solution to key exchange using block ciphers . . . 413
10.9 Fun application: Pedersen commitments . . . 415
10.10 Notes . . . 415
10.11 Exercises . . . 415
11 Public key encryption 424 11.1 Two further example applications . . . 425
11.1.1 Sharing encrypted files . . . 425
11.1.2 Key escrow . . . 425
11.2 Basic definitions . . . 426
11.2.1 Mathematical details . . . 427
11.3 Implications of semantic security . . . 428
11.3.1 The need for randomized encryption . . . 428
11.3.2 Semantic security against chosen plaintext attack . . . 429
11.4 Encryption based on a trapdoor function scheme . . . 431
11.4.1 Instantiating ETDF with RSA . . . 434
11.5 ElGamal encryption . . . 435
11.5.1 Semantic security of ElGamal in the random oracle model . . . 436
11.5.2 Semantic security of ElGamal without random oracles . . . 438
11.6 Threshold decryption . . . 441
11.6.1 Shamir’s secret sharing scheme . . . 443
11.6.2 ElGamal threshold decryption . . . 445
11.7 Fun application: oblivious transfer from DDH . . . 448
11.8 Notes . . . 449
11.9 Exercises . . . 449
12 Chosen ciphertext secure public key encryption 456
12.1 Basic definitions . . . 456
12.2 Understanding CCA security . . . 458
12.2.1 CCA security and ciphertext malleability . . . 458
12.2.2 CCA security vs authentication . . . 459
12.2.3 CCA security and key escrow . . . 460
12.2.4 Encryption as an abstract interface . . . 461
12.3 CCA-secure encryption from trapdoor function schemes . . . 463
12.3.1 Instantiating ETDF0 with RSA . . . 468
12.4 CCA-secure ElGamal encryption . . . 468
12.5 CCA security from DDH without random oracles . . . 474
12.5.1 Universal projective hash functions . . . 474
12.5.2 Universal2 projective hash functions . . . 476
12.5.3 The ECS scheme . . . 478
12.6 CCA security via a generic transformation . . . 483
12.6.1 A generic instantiation . . . 488
12.6.2 A concrete instantiation with ElGamal . . . 489
12.7 CCA-secure public-key encryption with associated data . . . 490
12.8 Case study: PKCS1, OAEP, OAEP+, and SAEP . . . 492
12.8.1 Padding schemes . . . 492
12.8.2 PKCS1 padding . . . 493
12.8.3 Bleichenbacher’s attack on the RSA-PKCS1 encryption scheme . . . 494
12.8.4 Optimal Asymmetric Encryption Padding (OAEP) . . . 496
12.8.5 OAEP+ and SAEP+ . . . 499
12.9 Fun application: sealed bid auctions . . . 499
12.10 Notes . . . 500
12.11 Exercises . . . 500
13 Digital signatures 513 13.1 Definition of a digital signature . . . 515
13.1.1 Secure signatures . . . 516
13.1.2 Mathematical details . . . 519
13.2 Extending the message space with collision resistant hashing . . . 519
13.2.1 Extending the message space using TCR functions . . . 520
13.3 Signatures from trapdoor permutations: the full domain hash . . . 521
13.3.1 Signatures based on the RSA trapdoor permutation . . . 522
13.4 Security analysis of full domain hash . . . 525
13.4.1 Repeated one-way functions: a useful lemma . . . 525
13.4.2 Proofs of Theorems 13.3 and 13.4 . . . 529
13.5 An RSA-based signature scheme with tighter security proof . . . 530
13.6 Case study: PKCS1 signatures . . . 532
13.6.1 Bleichenbacher’s attack on PKCS1 signatures . . . 534
13.7 Signcryption: combining signatures and encryption . . . 535
13.7.1 Secure signcryption . . . 537
13.7.2 Signcryption as an abstract interface . . . 539
13.7.3 Constructions: encrypt-then-sign and sign-then-encrypt . . . 542
13.7.4 A construction based on Diffie-Hellman key exchange . . . 546
13.7.5 Additional desirable properties: forward secrecy and non-repudiation . . . . 548
13.8 Certificates and the public-key infrastructure . . . 552
13.8.1 Coping with malicious or negligent certificate authorities . . . 554
13.8.2 Certificate revocation . . . 557
13.9 Case study: legal aspects of digital signatures . . . 559
13.10 A fun application: private information retrieval . . . 560
13.11 Notes . . . 560
13.12 Exercises . . . 561
14 Fast hash-based signatures 568 14.1 Basic Lamport signatures . . . 569
14.1.1 Shrinking the signature using an enhanced TCR . . . 570
14.2 A general Lamport framework . . . 571
14.2.1 An explicit containment free function . . . 573
14.3 Winternitz one-time signatures . . . 574
14.3.1 A domination free function for Winternitz signatures . . . 577
14.4 HORS: short Lamport signatures . . . 578
14.4.1 Shrinking the public-key using a Merkle tree . . . 579
14.5 Applications of one-time signatures . . . 580
14.5.1 Online/o✏ine signatures from one-time signatures . . . 580
14.5.2 Authenticating streamed data with one-time signatures . . . 581
14.6 From one-time signatures to many-time signatures . . . 581
14.6.1 Indexed signatures . . . 582
14.6.2 A many-time signature scheme from an indexed signature . . . 583
14.6.3 The complete Merkle stateless signature system . . . 585
14.6.4 Nonce-based Merkle signatures . . . 587
14.7 A fun application . . . 588
14.8 Notes . . . 588
14.9 Exercises . . . 589
15 Elliptic curve cryptography and pairings 595 15.1 The group of points of an elliptic curve . . . 596
15.2 Elliptic curves over finite fields . . . 597
15.2.1 Montgomery and Edwards curves . . . 599
15.3 Elliptic curve cryptography . . . 599
15.3.1 The curve P256 . . . 600
15.3.2 The curve 25519 . . . 600
15.4 Pairings . . . 602
15.5 Signature schemes from pairings . . . 603
15.5.1 BLS signatures . . . 603
15.5.2 Group signatures . . . 603
15.6 Advanced encryption schemes from pairings . . . 603
15.6.1 Identity based encryption . . . 603
15.6.2 Threshold decryption with ciphertext security . . . 603
15.6.3 Broadcast encryption . . . 603
15.6.4 Homomorphic encryption . . . 603
15.7 Multilinear maps . . . 603
15.8 A fun application: secret handshakes . . . 604
15.9 Notes . . . 604
15.10 Exercises . . . 604
16 Lattice based cryptography 606 16.1 Integer lattices . . . 606
16.2 Hard problems on lattices . . . 606
16.2.1 The SIS problem . . . 606
16.2.2 The learning with errors (LWE) problem . . . 606
16.2.3 The ring LWE problem . . . 606
16.3 Trapdoor sampling from a lattice . . . 606
16.4 Signatures from lattice problems . . . 606
16.5 Public-key encryption from lattices . . . 606
16.6 Fully homomorphic encryption . . . 606
16.7 A fun application: factoring integers using lattices . . . 606
16.8 Exercises . . . 606
17 Analysis of number theoretic assumptions 607 17.1 How reasonable are the factoring and RSA assumptions? . . . 607
17.1.1 Quadratic resudousity assumption . . . 607
17.2 How reasonable are the DL and CDH assumptions? . . . 607
17.2.1 Brute-force search . . . 608
17.2.2 The baby-step/giant-step method . . . 608
17.2.3 Groups of orderqe . . . 609
17.2.4 The Pohlig-Hellman algorithm . . . 611
17.2.5 The Pohlig-Hellman algorithm . . . 611
17.2.6 Information leakage . . . 614
17.3 Discrete log inZ⇤p . . . 614
17.3.1 The number field sieve . . . 614
17.3.2 Discrete-log records in Z⇤p . . . 615
17.4 How reasonable is decision Diffie-Hellman? . . . 616
17.5 Quantum attacks on number theoretic problems . . . 616
17.6 Side channel and fault attacks . . . 616
17.7 Notes . . . 616
17.8 Chapter summary . . . 616
17.9 Exercises . . . 616
III Protocols 618 18 Protocols for identification and login 619 18.1 Interactive protocols: general notions . . . 621
18.1.1 Mathematical details . . . 622
18.2 ID protocols: definitions . . . 622
18.3 Password protocols: security against direct attacks . . . 623
18.3.1 Password cracking using a dictionary attack . . . 624
18.4 Making dictionary attacks harder . . . 628
18.4.1 Public salts . . . 628
18.4.2 Secret salts . . . 630
18.4.3 Slow hash functions . . . 630
18.4.4 Slow memory-hard hash functions . . . 632
18.4.5 More password management issues . . . 636
18.5 One time passwords: security against eavesdropping . . . 637
18.5.1 PRF-based one-time passwords: HOTP and TOTP . . . 639
18.5.2 The S/key system . . . 641
18.6 Challenge-response: security against active attacks . . . 642
18.6.1 Challenge-response protocols . . . 644
18.7 A fun application: rainbow tables . . . 646
18.8 Another fun application: hardening password storage . . . 650
18.9 Notes . . . 650
18.10 Exercises . . . 651
19 Identification and signatures from sigma protocols 657 19.1 Schnorr’s identification protocol . . . 657
19.1.1 Honest verifier zero knowledge and security against eavesdropping . . . 662
19.2 From identification protocols to signatures . . . 664
19.2.1 A useful abstraction: repeated impersonation attacks . . . 665
19.2.2 Security analysis of Schnorr signatures . . . 666
19.2.3 A concrete implementation and an optimization . . . 671
19.3 Case study: ECDSA signatures . . . 672
19.4 Sigma protocols: basic definitions . . . 673
19.4.1 Knowledge soundness . . . 675
19.4.2 Special honest verifier zero knowledge . . . 676
19.5 Sigma protocols: examples . . . 676
19.5.1 Okamoto’s protocol for representations . . . 677
19.5.2 The Chaum-Pedersen protocol for DH-triples . . . 679
19.5.3 A Sigma protocol for arbitrary linear relations . . . 680
19.5.4 A Sigma protocol for RSA . . . 682
19.6 Identification and signatures from Sigma protocols . . . 683
19.6.1 The Fiat-Shamir heuristic for signatures . . . 685
19.7 Combining Sigma protocols: AND and OR proofs . . . 688
19.7.1 The AND-proof construction . . . 688
19.7.2 The OR-proof construction . . . 689
19.8 Witness independence and applications . . . 690
19.8.1 Definition of witness independence . . . 691
19.8.2 Special HVZK implies witness independence . . . 692
19.8.3 Actively secure identification protocols . . . 693
19.8.4 Okamoto’s identification protocol . . . 695
19.9 A fun application: a two round witness independent protocol . . . 696
19.10 Notes . . . 697
19.11 Exercises . . . 697
20 Proving properties in zero-knowledge 707 20.1 Languages and existential soundness . . . 707
20.2 Proving properties on encrypted data . . . 708
20.2.1 A generic protocol for non-linear relations . . . 713
20.3 Non-interactive proof systems . . . 715
20.3.1 Example: a voting protocol . . . 715
20.3.2 Non-interactive proofs: basic syntax . . . 717
20.3.3 The Fiat-Shamir transform . . . 717
20.3.4 Non-interactive existential soundness . . . 718
20.3.5 Non-interactive zero knowledge . . . 718
20.4 Computational zero-knowledge and applications . . . 720
20.4.1 Example: range proofs . . . 721
20.4.2 Special computational HVZK . . . 722
20.4.3 An unconstrained generic protocol for non-linear relations . . . 723
20.5 Efficient multi-round protocols . . . 724
20.6 Succinct non-interactive zero-knowledge proofs (SNARKs) . . . 724
20.7 A fun application: everything that can be proved, can be proved in zero knowledge 724 20.8 Notes . . . 724
20.9 Exercises . . . 725
21 Authenticated Key Exchange 735 21.1 Identification and AKE . . . 737
21.2 An encryption-based protocol . . . 738
21.2.1 Insecure variations . . . 741
21.2.2 Summary . . . 746
21.3 Perfect forward secrecy and a protocol based on ephemeral encryption . . . 747
21.3.1 Assuming only semantically secure encryption . . . 749
21.4 HSM security . . . 749
21.4.1 A technical requirement: strongly unpredictable ciphertexts . . . 752
21.4.2 Insecure variations . . . 752
21.5 Identity protection . . . 756
21.6 One-sided authenticated key exchange . . . 758
21.6.1 A one-sided authenticated variant of AKE4 . . . 759
21.7 Deniability . . . 760
21.7.1 Deniability without identity protection . . . 761
21.7.2 Deniability with identity protection . . . 762
21.8 Channel bindings . . . 764
21.9 Formal definitions . . . 765
21.9.1 Understanding the definition . . . 769
21.9.2 Security of protocol AKE1 . . . 770
21.9.3 Modeling perfect forward secrecy . . . 771
21.9.4 Modeling HSM security . . . 773
21.9.5 Modeling one-sided authentication . . . 776
21.9.6 Modeling channel bindings . . . 777
21.10 Case study: TLS session setup . . . 777
21.10.1 Authenticated key exchange with preshared keys . . . 780
21.11 Password authenticated key exchange . . . 783
21.11.1 Phishing attacks . . . 783
21.11.2 PAKE: an introduction . . . 786
21.11.3 ProtocolPAKE0 . . . 786
21.11.4 ProtocolPAKE1 . . . 787
21.11.5 ProtocolPAKE2 . . . 789
21.11.6 ProtocolPAKE+2 . . . 792
21.11.7 Explicit key confirmation . . . 793
21.11.8 Phishing again . . . 794
21.12 A fun application: establishing Tor channels . . . 794
21.13 Notes . . . 794
21.14 Exercises . . . 795
22 Key establishment with online Trusted Third Parties 798 22.1 A key exchange protocol with an online TTP . . . 799
22.2 Insecure variations of protocolOnlineTTP . . . 801
22.3 Security proof for protocolOnlineTTP . . . 806
22.4 Case study: Kerberos V5 . . . 806
22.5 O✏ine TTP vs. Online TTP . . . 810
22.6 Notes . . . 811
22.7 Exercises . . . 811
23 Two-party and multi-party secure computation 812 23.1 Yao’s two party protocol . . . 812
23.2 Multi-party secure computation . . . 812
IV Appendices 813 A Basic number theory 814 A.1 Cyclic groups . . . 814
A.2 Arithmetic modulo primes . . . 814
A.2.1 Basic concepts . . . 814
A.2.2 Structure of Z⇤p . . . 815
A.2.3 Quadratic residues . . . 815
A.2.4 Computing in Zp . . . 816
A.2.5 Summary: arithmetic modulo primes . . . 816
A.3 Arithmetic modulo composites . . . 817
B Basic probability theory 819 B.1 Birthday Paradox . . . 819
B.1.1 More collision bounds . . . 821
B.1.2 A simple distinguisher . . . 821
C Basic complexity theory 823
D Probabilistic algorithms 824
Part I
Secret key cryptography
Chapter 2
Encryption
Roughly speaking, encryption is the problem of how two parties can communicate in secret in the presence of an eavesdropper. The main goals of this chapter are to develop a meaningful and useful definition of what we are trying to achieve, and to take some first steps in actually achieving it.
2.1 Introduction
Suppose Alice and Bob share a secret keyk, and Alice wants to transmit a messagem to Bob over a network while maintaining the secrecy ofm in the presence of an eavesdropping adversary. This chapter begins the development of basic techniques to solve this problem. Besides transmitting a message over a network, these same techniques allow Alice to store a file on a disk so that no one else with access to the disk can read the file, but Alice herself can read the file at a later time.
We should stress that while the techniques we develop to solve this fundamental problem are important and interesting, they do not by themselves solve all problems related to “secure commu- nication.”
• The techniques only provide secrecy in the situation where Alice transmits a single message per key. If Alice wants to secretly transmit several messages using the same key, then she must use methods developed in Chapter 5.
• The techniques do not provide any assurances of message integrity: if the attacker has the ability to modify the bits of the ciphertext while it travels from Alice to Bob, then Bob may not realize that this happened, and accept a message other than the one that Alice sent. We will discuss techniques for providing message integrity in Chapter 6.
• The techniques do not provide a mechanism that allow Alice and Bob to come to share a secret key in the first place. Maybe they are able to do this using some secure network (or a physical, face-to-face meeting) at some point in time, while the message is sent at some later time when Alice and Bob must communicate over an insecure network. However, with an appropriate infrastructure in place, there are also protocols that allow Alice and Bob to exchange a secret key even over an insecure network: such protocols are discussed in Chapters 21 and 22.
2.2 Shannon ciphers and perfect security
2.2.1 Definition of a Shannon cipher
The basic mechanism for encrypting a message using a shared secret key is called a cipher (or encryption scheme). In this section, we introduce a slightly simplified notion of a cipher, which we call a Shannon cipher.
A Shannon cipher is a pairE = (E, D) of functions.
• The functionE (theencryption function) takes as input a keyk and amessage m(also called aplaintext), and produces as output aciphertext c. That is,
c=E(k, m), and we say that cis the encryption of m under k.
• The functionD (the decryption function) takes as input a key k and a ciphertext c, and produces a messagem. That is,
m=D(k, c), and we say that mis the decryption of c under k.
• We require that decryption “undoes” encryption; that is, the cipher must satisfy the following correctness property: for all keysk and all messages m, we have
D(k, E(k, m) ) =m.
To be slightly more formal, let us assume thatKis the set of all keys (thekey space),Mis the set of all messages (themessage space), and that C is the set of all ciphertexts (the ciphertext space). With this notation, we can write:
E :K⇥M!C, D:K⇥C!M. Also, we shall say that E is defined over(K,M,C).
Suppose Alice and Bob want to use such a cipher so that Alice can send a message to Bob.
The idea is that Alice and Bob must somehow agree in advance on a keyk2K. Assuming this is done, then when Alice wants to send a messagem2Mto Bob, she encryptsmunderk, obtaining the ciphertextc=E(k, m)2C, and then sendscto Bob via some communication network. Upon receiving c, Bob decryptscunderk, and the correctness property ensures that D(k, c) is the same as Alice’s original messagem. For this to work, we have to assume thatcis not tampered with in transit from Alice to Bob. Of course, the goal, intuitively, is that an eavesdropper, who may obtain cwhile it is in transit, does not learn too much about Alice’s message m— this intuitive notion is what the formal definition of security, which we explore below, will capture.
In practice, keys, messages, and ciphertexts are often sequences of bytes. Keys are usually of some fixed length; for example, 16-byte (i.e., 128-bit) keys are very common. Messages and ciphertexts may be sequences of bytes of some fixed length, or of variable length. For example, a message may be a 1GB video file, a 10MB music file, a 1KB email message, or even a single bit encoding a “yes” or “no” vote in an electronic election.
Keys, messages, and ciphertexts may also be other types of mathematical objects, such as integers, or tuples of integers (perhaps lying in some specified interval), or other, more sophisticated types of mathematical objects (polynomials, matrices, or group elements). Regardless of how fancy these mathematical objects are, in practice, they must at some point be represented as sequences of bytes for purposes of storage in, and transmission between, computers.
For simplicity, in our mathematical treatment of ciphers, we shall assume that K, M, and C are sets of finite size. While this simplifies the theory, it means that if a real-world system allows messages of unbounded length, we will (somewhat artificially) impose a (large) upper bound on legal message lengths.
To exercise the above terminology, we take another look at some of the example ciphers discussed in Chapter 1.
Example 2.1. A one-time padis a Shannon cipher E = (E, D), where the keys, messages, and ciphertexts are bit strings of the same length; that is, E is defined over (K,M,C), where
K:=M:=C:={0,1}L,
for some fixed parameter L. For a key k 2 {0,1}L and a message m 2 {0,1}L the encryption function is defined as follows:
E(k, m) :=k m,
and for a keyk2{0,1}L and ciphertext c2{0,1}L, the decryption function is defined as follows:
D(k, c) :=k c.
Here, “ ” denotes bit-wise exclusive-OR, or in other words, component-wise addition modulo 2, and satisfies the following algebraic laws: for all bit vectors x, y, z2{0,1}L, we have
x y =y x, x (y z) = (x y) z, x 0L=x, and x x= 0L.
These properties follow immediately from the corresponding properties for addition modulo 2.
Using these properties, it is easy to check that the correctness property holds for E: for all k, m2 {0,1}L, we have
D(k, E(k, m) ) =D(k, k m) =k (k m) = (k k) m= 0L m=m.
The encryption and decryption functions happen to be the same in this case, but of course, not all ciphers have this property. 2
Example 2.2. A variable length one-time pad is a Shannon cipher E = (E, D), where the keys are bit strings of some fixed lengthL, while messages and ciphertexts are variable length bit strings, of length at mostL. Thus, E is defined over (K,M,C), where
K :={0,1}L and M:=C:={0,1}L.
for some parameterL. Here,{0,1}Ldenotes the set of all bit strings of length at mostL(including the empty string). For a key k2{0,1}L and a messagem 2{0,1}L of length `, the encryption function is defined as follows:
E(k, m) :=k[0 . .` 1] m,
and for a keyk2{0,1}L and ciphertextc2{0,1}Lof length`, the decryption function is defined as follows:
D(k, c) :=k[0 . .` 1] c.
Here, k[0 . .` 1] denotes the truncation of k to its first ` bits. The reader may verify that the correctness property holds for E. 2
Example 2.3. Asubstitution cipheris a Shannon cipherE = (E, D) of the following form. Let
⌃be a finite alphabet of symbols (e.g., the lettersA–Z, plus a space symbol, ). The message space Mand the ciphertext space C are both sequences of symbols from⌃of some fixed lengthL:
M:=C:=⌃L.
The key spaceKconsists of all permutations on⌃; that is, eachk2K is a one-to-one function from
⌃onto itself. Note that K is a very large set; indeed, |K|=|⌃|! (for |⌃|= 27, |K|⇡1.09·1028).
Encryption of a message m2⌃L under a keyk2K (a permutation on⌃) is defined as follows E(k, m) := k(m[0]), k(m[1]), . . . , k(m[L 1]) ,
where m[i] denotes the ith entry of m (counting from zero), and k(m[i]) denotes the application of the permutation k to the symbol m[i]. Thus, to encrypt m under k, we simply apply the permutation kcomponent-wise to the sequencem. Decryption of a ciphertextc2⌃L under a key k2K is defined as follows:
D(k, c) := k 1(c[0]), k 1(c[1]), . . . , k 1(c[L 1]) .
Here,k 1is the inverse permutation ofk, and to decryptcunderk, we simply applyk 1component- wise to the sequencec. The correctness property is easily verified: for a message m2⌃L and key k2K, we have
D(k, E(k, m) ) =D(k, (k(m[0]), k(m[1]), . . . , k(m[L 1]) )
= (k 1(k(m[0])), k 1(k(m[1])), . . . , k 1(k(m[L 1])))
= (m[0], m[1], . . . , m[L 1]) =m. 2
Example 2.4 (additive one-time pad). We may also define a “addition mod n” variation of the one-time pad. This is a cipher E = (E, D), defined over (K,M,C), where K := M := C :=
{0, . . . , n 1}, wheren is a positive integer. Encryption and decryption are defined as follows:
E(k, m) :=m+kmodn D(k, c) :=c kmodn.
The reader may easily verify that the correctness property holds forE. 2 2.2.2 Perfect security
So far, we have just defined the basic syntax and correctness requirements of a Shannon cipher.
Next, we address the question: what is a “secure” cipher? Intuitively, the answer is that a secure cipher is one for which an encrypted message remains “well hidden,” even after seeing its encryp- tion. However, turning this intuitive answer into one that is both mathematically meaningful and practically relevant is a real challenge. Indeed, although ciphers have been used for centuries, it
is only in the last few decades that mathematically acceptable definitions of security have been developed.
In this section, we develop the mathematical notion of perfect security — this is the “gold standard” for security (at least, when we are only worried about encrypting a single message and do not care about integrity). We will also see that it is possible to achieve this level of security;
indeed, we will show that the one-time pad satisfies the definition. However, the one-time pad is not very practical, in the sense that the keys must be as long as the messages: if Alice wants to send a 1GB file to Bob, they must already share a 1GB key! Unfortunately, this cannot be avoided:
we will also prove that any perfectly secure cipher must have a key space at least as large as its message space. This fact provides the motivation for developing a definition of security that is weaker, but that is acceptable from a practical point of view, and which allows one to encrypt long messages using short keys.
If Alice encrypts a message m under a key k, and an eavesdropping adversary obtains the ciphertext c, Alice only has a hope of keeping m secret if the key k is hard to guess, and that means, at the very least, that the key k should be chosen at random from a large key space. To say that m is “well hidden” must at least mean that it is hard to completely determine m from c, without knowledge of k; however, this is not really enough. Even though the adversary may not knowk, we assume that he does know the encryption algorithm and the distribution of k. In fact, we will assume that when a message is encrypted, the key k is always chosen at random, uniformly from among all keys in the key space. The adversary may also have some knowledge of the message encrypted — because of circumstances, he may know that the set of possible messages is quite small, and he may know something about how likely each possible message is. For example, suppose he knows the message m is eitherm0 ="ATTACK AT DAWN" orm1 ="ATTACK AT DUSK", and that based on the adversary’s available intelligence, Alice is equally likely to choose either one of these two messages. This, without seeing the ciphertext c, the adversary would only have a 50% chance of guessing which message Alice sent. But we are assuming the adversary does know c. Even with this knowledge, both messages may be possible; that is, there may exist keys k0
and k1 such that E(k0, m0) = c and E(k1, m1) = c, so he cannot be sure ifm = m0 orm = m1. However, he can still guess. Perhaps it is a property of the cipher that there are 800 keysk0 such that E(k0, m0) =c, and 600 keysk1 such that E(k1, m1) = c. If that is the case, the adversary’s best guess would be that m = m0. Indeed, the probability that this guess is correct is equal to 800/(800 + 600) ⇡ 57%, which is better than the 50% chance he would have without knowledge of the ciphertext. Our formal definition of perfect security expressly rules out the possibility that knowledge of the ciphertext increases the probability of guessing the encrypted message, or for that matter, determiningany property of the message whatsoever.
Without further ado, we formally define perfect security. In this definition, we will consider a probabilistic experiment in which the key is drawn uniformly from the key space. We write k to denote the random variable representing this random key. For a message m, E(k, m) is another random variable, which represents the application of the encryption function to our random key and the message m. Thus, every message m gives rise to a di↵erent random variable E(k, m).
Definition 2.1 (perfect security). Let E = (E, D) be a Shannon cipher defined over (K,M,C).
Consider a probabilistic experiment in which the random variablekis uniformly distributed overK. If for all m0, m12M, and all c2C, we have
Pr[E(k, m0) =c] = Pr[E(k, m1) =c],
then we say that E is a perfectly secure Shannon cipher.
There are a number of equivalent formulations of perfect security that we shall explore. We state a couple of these here.
Theorem 2.1. Let E = (E, D) be a Shannon cipher defined over (K,M,C). The following are equivalent:
(i) E is perfectly secure.
(ii) For every c2C, there exists Nc (possibly depending on c) such that for all m2M, we have
|{k2K :E(k, m) =c}|=Nc.
(iii) If the random variable k is uniformly distributed over K, then each of the random variables E(k, m), for m2M, has the same distribution.
Proof. To begin with, let us restate (ii) as follows: for every c 2 C, there exists a number Pc (depending on c) such that for all m 2 M, we have Pr[E(k, m) = c] =Pc. Here, k is a random variable uniformly distributed over K. Note that Pc = Nc/|K|, where Nc is as in the original statement of (ii).
This version of (ii) is clearly the same as (iii).
(i) =) (ii). We prove (ii) assuming (i). To prove (ii), let c 2 C be some fixed ciphertext.
Pick some arbitrary messagem0 2M, and letPc := Pr[E(k, m0) =c]. By (i), we know that for all m2M, we have Pr[E(k, m) =c] = Pr[E(k, m0) =c] =Pc. That proves (ii).
(ii) =) (i). We prove (i) assuming (ii). Consider any fixed m0, m1 2Mand c2C. (ii) says that Pr[E(k, m0) =c] =Pc = Pr[E(k, m1) =c], which proves (i). 2
As promised, we give a proof that the one-time pad (see Example 2.1) is perfectly secure.
Theorem 2.2. The one-time pad is a perfectly secure Shannon cipher.
Proof. Suppose that the Shannon cipherE = (E, D) is a one-time pad, and is defined over (K,M,C), where K := M := C := {0,1}L. For any fixed message m 2 {0,1}L and ciphertext c 2 {0,1}L, there is a unique key k2{0,1}L satisfying the equation
k m=c,
namely,k:= m c. Therefore, E satisfies condition (ii) in Theorem 2.1 (with Nc = 1 for each c).
2
Example 2.5. Consider again the variable length one-time pad, defined in Example 2.2. This does not satisfy our definition of perfect security, since a ciphertext has the same length as the corresponding plaintext. Indeed, let us choose an arbitrary string of length 1, call it m0, and an arbitrary string of length 2, call itm1. In addition, suppose thatcis an arbitrary length 1 string, and thatk is a random variable that is uniformly distributed over the key space. Then we have
Pr[E(k, m0) =c] = 1/2 and Pr[E(k, m1) =c] = 0, which provides a direct counter-example to Definition 2.1.
Intuitively, the variable length one-time pad cannot satisfy our definition of perfect security simply because any ciphertext leaks the length of the corresponding plaintext. However, in some sense (which we do not make precise right now), this is theonlyinformation leaked. It is perhaps not clear whether this should be viewed as a problem with the cipher or with our definition of perfect security. On the one hand, one can imagine scenarios where the length of a message may vary greatly, and while we could always “pad” short messages to e↵ectively make all messages equally long, this may be unacceptable from a practical point of view, as it is a waste of bandwidth. On the other hand, one must be aware of the fact that in certain applications, leaking just the length of a message may be dangerous: if you are encrypting a “yes” or “no” answer to a question, just the length of the obvious ASCII encoding of these strings leakseverything, so you better pad “no”
out to three characters. 2
Example 2.6. Consider again the substitution cipher defined in Example 2.3. There are a couple of di↵erent ways to see that this cipher is not perfectly secure.
For example, choose a pair of messages m0, m1 2⌃L such that the first two components of m0 are equal, yet the first two components of m1 are not equal; that is,
m0[0] =m0[1] and m1[0]6=m1[1].
Then for each key k, which is a permutation on ⌃, if c = E(k, m0), then c[0] = c[1], while if c =E(k, m1), then c[0] 6=c[1]. In particular, it follows that if k is uniformly distributed over the key space, then the distributions of E(k, m0) andE(k, m1) will not be the same.
Even the weakness described in the previous paragraph may seem somewhat artificial. Another, perhaps more realistic, type of attack on the substitution cipher works as follows. Suppose the substitution cipher is used to encrypt email messages. As anyone knows, an email starts with a
“standard header,” such as"FROM". Suppose the ciphertext isc2⌃Lis intercepted by an adversary.
The secret key is actually a permutation kon ⌃. The adversary knows that c[0. . .3] = (k(F), k(R), k(O), k(M)).
Thus, if the original message is m 2 ⌃L, the adversary can now locate all positions in m where an F occurs, where an R occurs, where an O occurs, and where an M occurs. Based just on this information, along with specific, contextual information about the message, together with general information about letter frequencies, the adversary may be able to deduce quite a bit about the original message. 2
Example 2.7. Consider the additive one-time pad, defined in Example 2.4. It is easy to verity that this is perfectly secure. Indeed, it satisfies condition (ii) in Theorem 2.1 (withNc = 1 for each c). 2
The next two theorems develop two more alternative characterizations of perfect security. For the first, suppose an eavesdropping adversary applies some predicate to a ciphertext he has obtained. The predicate (which is a boolean-valued function on the ciphertext space) may be something very simple, like the parity function (i.e., whether the number of 1 bits in the ciphertext is even or odd), or it might be some more elaborate type of statistical test. Regardless of how clever or complicated the predicate is, perfect security guarantees that the value of this predicate on the ciphertext reveals nothing about the message.
Theorem 2.3. LetE= (E, D)be a Shannon cipher defined over(K,M,C). Consider a probabilistic experiment in whichkis a random variable uniformly distributed overK. ThenE is perfectly secure if and only if for every predicate onC, for all m0, m1 2M, we have
Pr[ (E(k, m0))] = Pr[ (E(k, m1))].
Proof. This is really just a simple calculation. On the one hand, suppose E is perfectly secure, and let ,m0, and m1 be given. LetS :={c2C: (c)}. Then we have
Pr[ (E(k, m0))] =X
c2S
Pr[E(k, m0) =c] =X
c2S
Pr[E(k, m1) =c] = Pr[ (E(k, m1))].
Here, we use the assumption that E is perfectly secure in establishing the second equality. On the other hand, suppose E is not perfectly secure, so there exist m0,m1, and csuch that
Pr[E(k, m0) =c]6= Pr[E(k, m1) =c].
Defining to be the predicate that is true for this particularc, and false for all other ciphertexts, we see that
Pr[ (E(k, m0))] = Pr[E(k, m0) =c]6= Pr[E(k, m1) =c] = Pr[ (E(k, m1))]. 2
The next theorem states in yet another way that perfect security guarantees that the ciphertext reveals nothing about the message. Suppose that m is a random variable distributed over the message space M. We do not assume that m is uniformly distributed over M. Now suppose k
is a random variable uniformly distributed over the key space K, independently of m, and define
c := E(k,m), which is a random variable distributed over the ciphertext space C. The following theorem says that perfect security guarantees thatcand mare independent random variables.
One way of characterizing this independence is to say that for each ciphertextc2Cthat occurs with nonzero probability, and each message m2M, we have
Pr[m=m|c=c] = Pr[m=m].
Intuitively, this means that after seeing a ciphertext, we have no more information about the message than we did before seeing the ciphertext.
Another way of characterizing this independence is to say that for each message m2M that occurs with nonzero probability, and each ciphertextc2C, we have
Pr[c=c|m=m] = Pr[c=c].
Intuitively, this means that the choice of message has no impact on the distribution of the ciphertext.
The restriction that mandk are independent random variables is sensible: in using any cipher, it is a very bad idea to choose the key in a way that depends on the message, or vice versa (see Exercise 2.16).
Theorem 2.4. Let E = (E, D) be a Shannon cipher defined over (K,M,C). Consider a random experiment in which k andm are random variables, such that
• k is uniformly distributed overK,
• mis distributed over M, and
• k and mare independent.
Define the random variable c:=E(k,m). Then we have:
• if E is perfectly secure, thenc and mare independent;
• conversely, if c and m are independent, and each message in M occurs with nonzero proba- bility, thenE is perfectly secure.
Proof. For the first implication, assume that E is perfectly secure. Consider any fixedm2Mand c2C. We want to show that
Pr[c=c^m=m] = Pr[c=c] Pr[m=m].
We have
Pr[c=c^m=m] = Pr[E(k,m) =c^m=m]
= Pr[E(k, m) =c^m=m]
= Pr[E(k, m) =c] Pr[m=m] (by independence ofkand m).
So it will suffice to show that Pr[E(k, m) =c] = Pr[c=c]. But we have Pr[c=c] = Pr[E(k,m) =c]
= X
m02M
Pr[E(k,m) =c^m=m0] (by total probability)
= X
m02M
Pr[E(k, m0) =c^m=m0]
= X
m02M
Pr[E(k, m0) =c] Pr[m=m0] (by independence ofk and m)
= X
m02M
Pr[E(k, m) =c] Pr[m=m0] (by definition of perfect security)
= Pr[E(k, m) =c] X
m02M
Pr[m=m0]
= Pr[E(k, m) =c] (probabilities sum to 1).
For the second implication, assume thatcandmare independent, and each message inMoccurs with nonzero probability. Let m2M and c2C. We will show that Pr[E(k, m) =c] = Pr[c=c], from which perfect security immediately follows. Since Pr[m=m]6= 0, this is seen thusly:
Pr[E(k, m) =c] Pr[m=m] = Pr[E(k, m) =c^m=m] (by independence ofk and m)
= Pr[E(k,m) =c^m=m]
= Pr[c=c^m=m]
= Pr[c=c] Pr[m=m] (by independence ofcand m). 2
2.2.3 The bad news
We have saved the bad news for last. The next theorem shows that perfect security is such a powerful notion that one can really do no better than the one-time pad: keys must be at least as long as messages. As a result, it is almost impossible to use perfectly secure ciphers in practice: if Alice wants to send Bob a 1GB video file, then Alice and Bob have to agree on a 1GB secret key in advance.
Theorem 2.5 (Shannon’s theorem). Let E = (E, D) be a Shannon cipher defined over (K,M,C). If E is perfectly secure, then |K| |M|.
Proof. Assume that |K|< |M|. We want to show that E is not perfectly secure. To this end, we show that there exist messages m0 and m1, and a ciphertextc, such that
Pr[E(k, m0) =c]>0, and (2.1)
Pr[E(k, m1) =c] = 0. (2.2)
Here,k is a random variable, uniformly distributed overK.
To do this, choose any message m0 2M, and any key k0 2K. Let c:=E(k0, m0). It is clear that (2.1) holds.
Next, let
S:={D(k1, c) :k1 2K}. Clearly,
|S||K|<|M|, and so we can choose a messagem12M \S.
To prove (2.2), we need to show that there is no key k1 such that E(k1, m1) = c. Assume to the contrary that E(k1, m1) =c for somek1; then for this keyk1, by the correctness property for ciphers, we would have
D(k1, c) =D(k1, E(k1, m1) ) =m1,
which would imply thatm1belongs toS, which is not the case. That proves (2.2), and the theorem follows. 2
2.3 Computational ciphers and semantic security
As we have seen in Shannon’s theorem (Theorem 2.5), the only way to achieve perfect security is to have keys that are as long as messages. However, this is quite impractical: we would like to be able to encrypt a long message (say, a document of several megabytes) using a short key (say, a few hundred bits). The only way around Shannon’s theorem is to relax our security requirements. The way we shall do this is to consider not all possible adversaries, but only computationally feasible adversaries, that is, “real world” adversaries that must perform their calculations on real computers using a reasonable amount of time and memory. This will lead to a weaker definition of security calledsemantic security. Furthermore, our definition of security will be flexible enough to allow ciphers with variable length message spaces to be considered secure so long as they do not leak any useful information about an encrypted message to an adversary other than the length of message.
Also, since our focus is now on the “practical,” instead of the “mathematically possible,” we shall also insist that the encryption and decryption functions are themselves efficient algorithms, and not just arbitrary functions.
2.3.1 Definition of a computational cipher
A computational cipher E = (E, D) is a pair of efficient algorithms, E and D. The encryption algorithmEtakes as input a keyk, along with a messagem, and produces as output a ciphertextc.
The decryption algorithmDtakes as input a keyk, a ciphertextc, and outputs a messagem. Keys lie in some finite key spaceK, messages lie in a finite message spaceM, and ciphertexts lie in some finite ciphertext spaceC. Just as for a Shannon cipher, we say thatE is defined over (K,M,C).
Although it is not really necessary for our purposes in this chapter, we will allow the encryption functionE to be aprobabilisticalgorithm (see Chapter D). This means that for fixed inputskand m, the output of E(k, m) may be one of many values. To emphasize the probabilistic nature of this computation, we write
c R E(k, m)
to denote the process of executingE(k, m) and assigning the output to the program variablec. We shall use this notation throughout the text whenever we use probabilistic algorithms. Similarly, we write
k R K
to denote the process of assigning to the program variable k a random, uniformly distributed element of from the key space K. We shall use the analogous notation to sample uniformly from any finite set.
We will not see any examples of probabilistic encryption algorithms in this chapter (we will see our first examples of this in Chapter 5). Although one could allow the decryption algorithm to be probabilistic, we will have no need for this, and so will only discuss ciphers with deterministic decryption algorithms. However, it will be occasionally be convenient to allow the decryption algorithm to return a specialrejectvalue (distinct from all messages), indicating some kind of error occurred during the decryption process.
Since the encryption algorithm is probabilistic, for a given keykand messagem, the encryption algorithm may output one of many possible ciphertexts; however, each of these possible ciphertexts should decrypt tom. We can state thiscorrectness requirement more formally as follows: for all keys k2K and messagesm2M, if we execute
c R E(k, m),m0 D(k, c), thenm=m0 with probability 1.
From now on, whenever we refer to a cipher, we shall mean a computational cipher, as defined above. Moreover, if the encryption algorithm happens to be deterministic, then we may call the cipher a deterministic cipher.
Observe that any deterministic cipher is a Shannon cipher; however, a computational cipher need not be a Shannon cipher (if it has a probabilistic encryption algorithm), and a Shannon cipher need not be a computational cipher (if its encryption or decryption operations have no efficient implementations).
Example 2.8. The one-time pad (see Example 2.1) and the variable length one-time pad (see Example 2.2) are both deterministic ciphers, since their encryption and decryption operations may be trivially implemented as efficient, deterministic algorithms. The same holds for the substitution
cipher (see Example 2.3), provided the alphabet⌃ is not too large. Indeed, in the obvious imple- mentation, a key — which is a permutation on⌃— will be represented by an array indexed by⌃, and so we will require O(|⌃|) space just to store a key. This will only be practical for reasonably sized ⌃. The additive one-time pad discussed in Example 2.4 is also a deterministic cipher, since both encryption and decryption operations may be efficiently implemented (if n is large, special software to do arithmetic with large integers may be necessary). 2
2.3.2 Definition of semantic security
To motivate the definition of semantic security, consider a deterministic cipherE= (E, D), defined over (K,M,C). Consider again the formulation of perfect security in Theorem 2.3. This says that for all predicates on the ciphertext space, and all messages m0, m1, we have
Pr[ (E(k, m0))] = Pr[ (E(k, m1))], (2.3) wherek is a random variable uniformly distributed over the key spaceK. Instead of insisting that these probabilities are equal, we shall only require that they are very close; that is,
Pr[ (E(k, m0))] Pr[ (E(k, m1))] ✏, (2.4) for some very small, or negligible, value of ✏. By itself, this relaxation does not help very much (see Exercise 2.5). However, instead of requiring that (2.4) holds for every possible , m0, and m1, we only require that (2.4) holds for all messages m0 and m1 that can be generated by some efficient algorithm, and all predicates that can be computed by some efficient algorithm (these algorithms could be probabilistic). For example, suppose it were the case that using the best possible algorithms for generating m0 and m1, and for testing some predicate , and using (say) 10,000 computers in parallel for 10 years to perform these calculations, (2.4) holds for ✏= 2 100. While not perfectly secure, we might be willing to say that the cipher is secure for all practical purposes.
Also, in defining semantic security, we address an issue raised in Example 2.5. In that example, we saw that the variable length one-time pad did not satisfy the definition of perfect security.
However, we want our definition to be flexible enough so that ciphers like the variable length one- time pad, which e↵ectively leak no information about an encrypted message other than its length, may be considered secure as well.
Now the details. To precisely formulate the definition of semantic security, we shall describe an attack game played between two parties: thechallengerand anadversary. As we will see, the challenger follows a very simple, fixed protocol. However, an adversary Amay follow an arbitrary (but still efficient) protocol. The challenger and the adversary A send messages back and forth to each other, as specified by their protocols, and at the end of the game, A outputs some value.
Actually, our attack game for defining semantic security comprises two alternative “sub-games,”
or “experiments” — in both experiments, the adversary follows the same protocol; however, the challenger’s behavior is slightly di↵erent in the two experiments. The attack game also defines a probability space, and this in turn defines the adversary’sadvantage, which measures the di↵erence between the probabilities of two events in this probability space.
Attack Game 2.1 (semantic security). For a given cipherE= (E, D), defined over (K,M,C), and for a given adversary A, we define two experiments, Experiment 0 and Experiment 1. For b= 0,1, we define
Challenger A m0, m12M
k R K
ˆb2{0,1} (Experimentb)
c R E(k, mb) c
Figure 2.1: Experiment b of Attack Game 2.1
Experiment b:
• The adversary computesm0, m1 2M, of the same length, and sends them to the challenger.
• The challenger computesk R K,c R E(k, mb), and sendsc to the adversary.
• The adversary outputs a bit ˆb2{0,1}.
For b= 0,1, let Wb be the event that A outputs 1 in Experiment b. We define A’s semantic security advantagewith respect to E as
SSadv[A,E] := Pr[W0] Pr[W1]. 2
Note that in the above game, the eventsW0 andW1 are defined with respect to the probability space determined by the random choice of k, the random choices made (if any) by the encryption algorithm, and the random choices made (if any) by the adversary. The value SSadv[A,E] is a number between 0 and 1.
See Fig. 2.1 for a schematic diagram of Attack Game 2.1. As indicated in the diagram, A’s
“output” is really just a final message to the challenger.
Definition 2.2 (semantic security). A cipher E is semantically secure if for all efficient adversaries A, the value SSadv[A,E] is negligible.
As a formal definition, this is not quite complete, as we have yet to define what we mean by
“messages of the same length”, “efficient adversaries”, and “negligible”. We will come back to this shortly.
Let us relate this formal definition to the discussion preceding it. Suppose that the adversary A in Attack Game 2.1 is deterministic. First, the adversary computes in a deterministic fashion messages m0, m1, and then evaluates a predicate on the ciphertext c, outputting 1 if true and 0 if false. Semantic security says that the value ✏ in (2.4) is negligible. In the case where A is probabilistic, we can view A as being structured as follows: it generates a random value r from