• Aucun résultat trouvé

Reverse Engineering for Beginners

N/A
N/A
Protected

Academic year: 2022

Partager "Reverse Engineering for Beginners"

Copied!
1507
0
0

Texte intégral

(1)

Reverse Engineering for Beginners

Dennis Yurichev

(2)

Reverse Engineering for Beginners

Dennis Yurichev

<dennis(a)yurichev.com>

c b n d

©2013-2015, Dennis Yurichev.

This work is licensed under the Creative Commons

Attribution-NonCommercial-NoDerivs 3.0 Unported License. To view a copy of this license, visit

http://creativecommons.org/licenses/by-nc-nd/3.0/.

Text version (

July 16, 2015

).

The latest version (and Russian edition) of this text accessible atbeginners.re. An A4-format version is also available.

There is also a LITE-version (introductory short version), intended for those who want a very quick introduction to the basics of reverse engineering:beginners.re You can also follow me on twitter to get information about updates of this text:

@yurichev1or to subscribe to the mailing list2. The cover was made by Andy Nechaevsky: facebook.

1twitter.com/yurichev

2yurichev.com

(3)

Please take this short survey!

Your feedback is extremely important to the author!

http://beginners.re/survey.html

(4)
(5)

ABRIDGED CONTENTS ABRIDGED CONTENTS

Abridged contents

I Code patterns 1

II Important fundamentals 663

III Slightly more advanced examples 675

IV Java 904

V Finding important/interesting stuff in the code 962

VI OS-specific 999

VII Tools 1084

VIII Examples of real-world RE 3

tasks 1091

IX Examples of reversing proprietary file formats 1266

X Other things 1303

XI Books/blogs worth reading 1331

XII Exercises 1335

Afterword 1388

Appendix 1390

Acronyms used 1451

iv

(6)

CONTENTS CONTENTS

Contents

0.0.1 Donate . . . xxxi

I Code patterns 1

1 A short introduction to the CPU 4 1.1 A couple of words about different ISA4s . . . 5

2 The simplest Function 6 2.1 x86 . . . 6

2.2 ARM . . . 7

2.3 MIPS . . . 7

2.3.1 A note about MIPS instruction/register names . . . 8

3 Hello, world! 9 3.1 x86 . . . 9

3.1.1 MSVC . . . 9

3.1.2 GCC . . . 11

3.1.3 GCC: AT&T syntax . . . 13

3.2 x86-64 . . . 15

3.2.1 MSVC—x86-64 . . . 15

3.2.2 GCC—x86-64 . . . 16

3.3 GCC—one more thing . . . 17

3.4 ARM . . . 18

3.4.1 Non-optimizing Keil 6/2013 (ARM mode) . . . 18

3.4.2 Non-optimizing Keil 6/2013 (Thumb mode) . . . 21

3.4.3 Optimizing Xcode 4.6.3 (LLVM) (ARM mode) . . . 22

3.4.4 Optimizing Xcode 4.6.3 (LLVM) (Thumb-2 mode) . . . 23

3.4.5 ARM64 . . . 26

3.5 MIPS . . . 28

3.5.1 A word about “global pointer”. . . 28

4Instruction Set Architecture

(7)

CONTENTS CONTENTS

3.5.2 Optimizing GCC. . . 28

3.5.3 Non-optimizing GCC. . . 31

3.5.4 Role of the stack frame in this example. . . 33

3.5.5 Optimizing GCC: load it into GDB. . . 34

3.6 Conclusion. . . 35

3.7 Exercises . . . 35

3.7.1 Exercise #1 . . . 35

3.7.2 Exercise #2 . . . 35

4 Function prologue and epilogue 37 4.1 Recursion . . . 37

5 Stack 39 5.1 Why does the stack grow backwards?. . . 40

5.2 What is the stack used for? . . . 41

5.2.1 Save the function’s return address. . . 41

5.2.2 Passing function arguments . . . 43

5.2.3 Local variable storage. . . 44

5.2.4 x86: alloca() function . . . 44

5.2.5 (Windows) SEH . . . 47

5.2.6 Buffer overflow protection . . . 47

5.2.7 Automatic deallocation of data in stack . . . 47

5.3 Typical stack layout . . . 48

5.4 Noise in stack . . . 48

5.5 Exercises . . . 52

5.5.1 Exercise #1 . . . 52

5.5.2 Exercise #2 . . . 53

6 printf()with several arguments 56 6.1 x86 . . . 56

6.1.1 x86: 3 arguments . . . 56

6.1.2 x64: 8 arguments . . . 67

6.2 ARM . . . 72

6.2.1 ARM: 3 arguments . . . 72

6.2.2 ARM: 8 arguments . . . 75

6.3 MIPS . . . 81

6.3.1 3 arguments. . . 81

6.3.2 8 arguments. . . 85

6.4 Conclusion. . . 90

6.5 By the way. . . 92

7 scanf() 93 7.1 Simple example . . . 93

7.1.1 About pointers . . . 93

(8)

CONTENTS CONTENTS

7.1.2 x86 . . . 94

7.1.3 MSVC + OllyDbg . . . 97

7.1.4 x64 . . . 100

7.1.5 ARM . . . 101

7.1.6 MIPS. . . 103

7.2 Global variables . . . 106

7.2.1 MSVC: x86 . . . 106

7.2.2 MSVC: x86 + OllyDbg . . . 109

7.2.3 GCC: x86 . . . 110

7.2.4 MSVC: x64 . . . 110

7.2.5 ARM: Optimizing Keil 6/2013 (Thumb mode) . . . 111

7.2.6 ARM64 . . . 113

7.2.7 MIPS. . . 114

7.3 scanf() result checking . . . 119

7.3.1 MSVC: x86 . . . 120

7.3.2 MSVC: x86: IDA . . . 121

7.3.3 MSVC: x86 + OllyDbg . . . 126

7.3.4 MSVC: x86 + Hiew . . . 128

7.3.5 MSVC: x64 . . . 129

7.3.6 ARM . . . 130

7.3.7 MIPS. . . 133

7.3.8 Exercise. . . 134

7.4 Exercises. . . 134

7.4.1 Exercise #1 . . . 134

8 Accessing passed arguments 135 8.1 x86 . . . 135

8.1.1 MSVC . . . 135

8.1.2 MSVC + OllyDbg . . . 137

8.1.3 GCC . . . 137

8.2 x64 . . . 138

8.2.1 MSVC . . . 139

8.2.2 GCC . . . 140

8.2.3 GCC: uint64_t instead of int . . . 142

8.3 ARM . . . 143

8.3.1 Non-optimizing Keil 6/2013 (ARM mode) . . . 143

8.3.2 Optimizing Keil 6/2013 (ARM mode) . . . 144

8.3.3 Optimizing Keil 6/2013 (Thumb mode) . . . 144

8.3.4 ARM64 . . . 145

8.4 MIPS . . . 147

9 More about results returning 150 9.1 Attempt to use the result of a function returningvoid . . . 150

9.2 What if we do not use the function result? . . . 152

(9)

CONTENTS CONTENTS

9.3 Returning a structure . . . 152

10 Pointers 155 10.1 Global variables example . . . 155

10.2 Local variables example . . . 161

10.3 Conclusion. . . 165

11 GOTO operator 166 11.1 Dead code . . . 169

11.2 Exercise . . . 170

12 Conditional jumps 171 12.1 Simple example . . . 171

12.1.1 x86 . . . 172

12.1.2 ARM . . . 185

12.1.3 MIPS. . . 189

12.2 Calculating absolute value. . . 194

12.2.1 Optimizing MSVC. . . 194

12.2.2 Optimizing Keil 6/2013: Thumb mode . . . 195

12.2.3 Optimizing Keil 6/2013: ARM mode . . . 195

12.2.4 Non-optimizing GCC 4.9 (ARM64) . . . 196

12.2.5 MIPS. . . 196

12.2.6 Branchless version? . . . 197

12.3 Ternary conditional operator . . . 197

12.3.1 x86 . . . 197

12.3.2 ARM . . . 199

12.3.3 ARM64 . . . 200

12.3.4 MIPS. . . 200

12.3.5 Let’s rewrite it in anif/elseway . . . 201

12.3.6 Conclusion. . . 202

12.4 Getting minimal and maximal values . . . 202

12.4.1 32-bit . . . 202

12.4.2 64-bit . . . 205

12.4.3 MIPS. . . 208

12.5 Conclusion. . . 209

12.5.1 x86 . . . 209

12.5.2 ARM . . . 209

12.5.3 MIPS. . . 209

12.5.4 Branchless . . . 210

12.6 Exercise . . . 211

13 switch()/case/default 212 13.1 Small number of cases . . . 212

13.1.1 x86 . . . 212

(10)

CONTENTS CONTENTS

13.1.2 ARM: Optimizing Keil 6/2013 (ARM mode). . . 223

13.1.3 ARM: Optimizing Keil 6/2013 (Thumb mode) . . . 224

13.1.4 ARM64: Non-optimizing GCC (Linaro) 4.9 . . . 225

13.1.5 ARM64: Optimizing GCC (Linaro) 4.9 . . . 226

13.1.6 MIPS. . . 226

13.1.7 Conclusion. . . 228

13.2 A lot of cases . . . 228

13.2.1 x86 . . . 229

13.2.2 ARM: Optimizing Keil 6/2013 (ARM mode). . . 235

13.2.3 ARM: Optimizing Keil 6/2013 (Thumb mode) . . . 238

13.2.4 MIPS. . . 240

13.2.5 Conclusion. . . 242

13.3 When there are severalcasestatements in one block . . . 244

13.3.1 MSVC . . . 245

13.3.2 GCC . . . 246

13.3.3 ARM64: Optimizing GCC 4.9.1. . . 247

13.4 Fall-through. . . 249

13.4.1 MSVC x86 . . . 250

13.4.2 ARM64 . . . 251

13.5 Exercises . . . 252

13.5.1 Exercise #1 . . . 252

14 Loops 253 14.1 Simple example . . . 253

14.1.1 x86 . . . 253

14.1.2 x86: OllyDbg . . . 259

14.1.3 x86: tracer . . . 260

14.1.4 ARM . . . 263

14.1.5 MIPS. . . 266

14.1.6 One more thing. . . 268

14.2 Memory blocks copying routine . . . 268

14.2.1 Straight-forward implementation . . . 269

14.2.2 ARM in ARM mode . . . 270

14.2.3 MIPS. . . 271

14.2.4 Vectorization . . . 272

14.3 Conclusion. . . 272

14.4 Exercises . . . 274

14.4.1 Exercise #1 . . . 274

14.4.2 Exercise #2 . . . 274

14.4.3 Exercise #3 . . . 275

14.4.4 Exercise #4 . . . 277

15 Simple C-strings processing 281 15.1 strlen() . . . 281

(11)

CONTENTS CONTENTS

15.1.1 x86 . . . 282

15.1.2 ARM . . . 290

15.1.3 MIPS. . . 295

15.2 Exercises . . . 296

15.2.1 Exercise #1 . . . 296

16 Replacing arithmetic instructions to other ones 300 16.1 Multiplication. . . 300

16.1.1 Multiplication using addition . . . 300

16.1.2 Multiplication using shifting. . . 301

16.1.3 Multiplication using shifting, subtracting, and adding . . . . 302

16.2 Division . . . 307

16.2.1 Division using shifts . . . 307

16.3 Exercises . . . 308

16.3.1 Exercise #2 . . . 308

17 Floating-point unit 310 17.1 IEEE 754 . . . 310

17.2 x86 . . . 310

17.3 ARM, MIPS, x86/x64 SIMD . . . 311

17.4 C/C++ . . . 311

17.5 Simple example . . . 311

17.5.1 x86 . . . 312

17.5.2 ARM: Optimizing Xcode 4.6.3 (LLVM) (ARM mode) . . . 320

17.5.3 ARM: Optimizing Keil 6/2013 (Thumb mode) . . . 321

17.5.4 ARM64: Optimizing GCC (Linaro) 4.9 . . . 322

17.5.5 ARM64: Non-optimizing GCC (Linaro) 4.9 . . . 322

17.5.6 MIPS. . . 324

17.6 Passing floating point numbers via arguments . . . 325

17.6.1 x86 . . . 325

17.6.2 ARM + Non-optimizing Xcode 4.6.3 (LLVM) (Thumb-2 mode) 326 17.6.3 ARM + Non-optimizing Keil 6/2013 (ARM mode) . . . 327

17.6.4 ARM64 + Optimizing GCC (Linaro) 4.9 . . . 328

17.6.5 MIPS. . . 329

17.7 Comparison example . . . 330

17.7.1 x86 . . . 331

17.7.2 ARM . . . 363

17.7.3 ARM64 . . . 367

17.7.4 MIPS. . . 369

17.8 Stack, calculators and reverse Polish notation . . . 370

17.9 x64 . . . 370

17.10Exercises . . . 370

17.10.1 Exercise #1 . . . 370

17.10.2 Exercise #2 . . . 371

(12)

CONTENTS CONTENTS

18 Arrays 373

18.1 Simple example . . . 373

18.1.1 x86 . . . 374

18.1.2 ARM . . . 378

18.1.3 MIPS. . . 382

18.2 Buffer overflow . . . 384

18.2.1 Reading outside array bounds. . . 384

18.2.2 Writing beyond array bounds . . . 387

18.3 Buffer overflow protection methods . . . 392

18.3.1 Optimizing Xcode 4.6.3 (LLVM) (Thumb-2 mode) . . . 396

18.4 One more word about arrays . . . 398

18.5 Array of pointers to strings . . . 399

18.5.1 x64 . . . 399

18.5.2 32-bit ARM. . . 401

18.5.3 ARM64 . . . 403

18.5.4 MIPS. . . 404

18.5.5 Array overflow . . . 405

18.6 Multidimensional arrays . . . 408

18.6.1 Two-dimensional array example . . . 410

18.6.2 Access two-dimensional array as one-dimensional . . . 411

18.6.3 Three-dimensional array example . . . 414

18.6.4 More examples . . . 419

18.7 Pack of strings as a two-dimensional array . . . 419

18.7.1 32-bit ARM. . . 422

18.7.2 ARM64 . . . 422

18.7.3 MIPS. . . 423

18.7.4 Conclusion. . . 424

18.8 Conclusion. . . 424

18.9 Exercises . . . 424

18.9.1 Exercise #1 . . . 424

18.9.2 Exercise #2 . . . 429

18.9.3 Exercise #3 . . . 435

18.9.4 Exercise #4 . . . 436

18.9.5 Exercise #5 . . . 438

19 Manipulating specific bit(s) 444 19.1 Specific bit checking . . . 444

19.1.1 x86 . . . 444

19.1.2 ARM . . . 448

19.2 Setting and clearing specific bits. . . 450

19.2.1 x86 . . . 451

19.2.2 ARM + Optimizing Keil 6/2013 (ARM mode) . . . 457

19.2.3 ARM + Optimizing Keil 6/2013 (Thumb mode) . . . 458

19.2.4 ARM + Optimizing Xcode 4.6.3 (LLVM) (ARM mode) . . . 458

(13)

CONTENTS CONTENTS

19.2.5 ARM: more about theBICinstruction. . . 459

19.2.6 ARM64: Optimizing GCC (Linaro) 4.9 . . . 459

19.2.7 ARM64: Non-optimizing GCC (Linaro) 4.9 . . . 459

19.2.8 MIPS. . . 460

19.3 Shifts . . . 460

19.4 Setting and clearing specific bits: FPU5example. . . 461

19.4.1 A word about theXORoperation . . . 462

19.4.2 x86 . . . 462

19.4.3 MIPS. . . 464

19.4.4 ARM . . . 465

19.5 Counting bits set to 1 . . . 468

19.5.1 x86 . . . 470

19.5.2 x64 . . . 478

19.5.3 ARM + Optimizing Xcode 4.6.3 (LLVM) (ARM mode) . . . 482

19.5.4 ARM + Optimizing Xcode 4.6.3 (LLVM) (Thumb-2 mode) . . . 483

19.5.5 ARM64 + Optimizing GCC 4.9 . . . 483

19.5.6 ARM64 + Non-optimizing GCC 4.9 . . . 484

19.5.7 MIPS. . . 485

19.6 Conclusion. . . 487

19.6.1 Check for specific bit (known at compile stage) . . . 488

19.6.2 Check for specific bit (specified at runtime) . . . 488

19.6.3 Set specific bit (known at compile stage) . . . 489

19.6.4 Set specific bit (specified at runtime) . . . 489

19.6.5 Clear specific bit (known at compile stage) . . . 490

19.6.6 Clear specific bit (specified at runtime). . . 490

19.7 Exercises . . . 491

19.7.1 Exercise #1 . . . 491

19.7.2 Exercise #2 . . . 492

19.7.3 Exercise #3 . . . 496

19.7.4 Exercise #4 . . . 497

20 Linear congruential generator 500 20.1 x86 . . . 501

20.2 x64 . . . 502

20.3 32-bit ARM . . . 503

20.4 MIPS . . . 504

20.4.1 MIPS relocations . . . 506

20.5 Thread-safe version of the example . . . 507

21 Structures 508 21.1 MSVC: SYSTEMTIME example . . . 508

21.1.1 OllyDbg. . . 511

5Floating-point unit

(14)

CONTENTS CONTENTS

21.1.2 Replacing the structure with array . . . 512

21.2 Let’s allocate space for a structure using malloc() . . . 514

21.3 UNIX: struct tm . . . 516

21.3.1 Linux . . . 516

21.3.2 ARM . . . 520

21.3.3 MIPS. . . 523

21.3.4 Structure as a set of values . . . 525

21.3.5 Structure as an array of 32-bit words . . . 527

21.3.6 Structure as an array of bytes . . . 529

21.4 Fields packing in structure. . . 531

21.4.1 x86 . . . 532

21.4.2 ARM . . . 538

21.4.3 MIPS. . . 540

21.4.4 One more word . . . 541

21.5 Nested structures . . . 541

21.5.1 OllyDbg. . . 544

21.6 Bit fields in a structure . . . 544

21.6.1 CPUID example . . . 544

21.6.2 Working with the float type as with a structure . . . 551

21.7 Exercises . . . 555

21.7.1 Exercise #1 . . . 555

21.7.2 Exercise #2 . . . 555

22 Unions 563 22.1 Pseudo-random number generator example. . . 563

22.1.1 x86 . . . 565

22.1.2 MIPS. . . 566

22.1.3 ARM (ARM mode). . . 568

22.2 Calculating machine epsilon . . . 570

22.2.1 x86 . . . 571

22.2.2 ARM64 . . . 572

22.2.3 MIPS. . . 572

22.2.4 Conclusion. . . 573

22.3 Fast square root calculation. . . 573

23 Pointers to functions 575 23.1 MSVC . . . 576

23.1.1 MSVC + OllyDbg . . . 580

23.1.2 MSVC + tracer . . . 582

23.1.3 MSVC + tracer (code coverage) . . . 584

23.2 GCC . . . 584

23.2.1 GCC + GDB (with source code). . . 586

23.2.2 GCC + GDB (no source code) . . . 587

(15)

CONTENTS CONTENTS

24 64-bit values in 32-bit environment 591

24.1 Returning of 64-bit value . . . 591

24.1.1 x86 . . . 591

24.1.2 ARM . . . 592

24.1.3 MIPS. . . 592

24.2 Arguments passing, addition, subtraction . . . 593

24.2.1 x86 . . . 593

24.2.2 ARM . . . 595

24.2.3 MIPS. . . 596

24.3 Multiplication, division . . . 598

24.3.1 x86 . . . 598

24.3.2 ARM . . . 600

24.3.3 MIPS. . . 602

24.4 Shifting right . . . 603

24.4.1 x86 . . . 603

24.4.2 ARM . . . 604

24.4.3 MIPS. . . 604

24.5 Converting 32-bit value into 64-bit one . . . 605

24.5.1 x86 . . . 605

24.5.2 ARM . . . 605

24.5.3 MIPS. . . 606

25 SIMD 607 25.1 Vectorization . . . 608

25.1.1 Addition example . . . 609

25.1.2 Memory copy example . . . 616

25.2 SIMDstrlen()implementation . . . 622

26 64 bits 628 26.1 x86-64 . . . 628

26.2 ARM . . . 638

26.3 Float point numbers. . . 638

27 Working with floating point numbers using SIMD 639 27.1 Simple example . . . 639

27.1.1 x64 . . . 640

27.1.2 x86 . . . 641

27.2 Passing floating point number via arguments. . . 648

27.3 Comparison example . . . 649

27.3.1 x64 . . . 650

27.3.2 x86 . . . 651

27.4 Calculating machine epsilon: x64 and SIMD . . . 652

27.5 Pseudo-random number generator example revisited. . . 652

27.6 Summary. . . 653

(16)

CONTENTS CONTENTS

28 ARM-specific details 655

28.1 Number sign (#) before number . . . 655

28.2 Addressing modes . . . 655

28.3 Loading a constant into a register . . . 656

28.3.1 32-bit ARM. . . 656

28.3.2 ARM64 . . . 657

28.4 Relocs in ARM64 . . . 658

29 MIPS-specific details 661 29.1 Loading constants into register . . . 661

29.2 Further reading about MIPS . . . 662

II Important fundamentals 663

30 Signed number representations 665 31 Endianness 667 31.1 Big-endian. . . 667

31.2 Little-endian . . . 667

31.3 Example . . . 668

31.4 Bi-endian . . . 668

31.5 Converting data . . . 669

32 Memory 670 33 CPU 672 33.1 Branch predictors . . . 672

33.2 Data dependencies . . . 672

34 Hash functions 673 34.1 How one-way function works? . . . 673

III Slightly more advanced examples 675

35 Temperature converting 676 35.1 Integer values. . . 676

35.1.1 Optimizing MSVC 2012 x86 . . . 677

35.1.2 Optimizing MSVC 2012 x64 . . . 679

35.2 Floating-point values . . . 679

36 Fibonacci numbers 684 36.1 Example #1 . . . 684

36.2 Example #2 . . . 688

(17)

CONTENTS CONTENTS

36.3 Summary. . . 693

37 CRC32 calculation example 694 38 Network address calculation example 700 38.1 calc_network_address() . . . 702

38.2 form_IP() . . . 703

38.3 print_as_IP() . . . 705

38.4 form_netmask() and set_bit() . . . 707

38.5 Summary. . . 708

39 Loops: several iterators 710 39.1 Three iterators . . . 710

39.2 Two iterators . . . 711

39.3 Intel C++ 2011 case . . . 714

40 Duff’s device 718 41 Division by 9 723 41.1 x86 . . . 723

41.2 ARM . . . 725

41.2.1 Optimizing Xcode 4.6.3 (LLVM) (ARM mode) . . . 725

41.2.2 Optimizing Xcode 4.6.3 (LLVM) (Thumb-2 mode) . . . 726

41.2.3 Non-optimizing Xcode 4.6.3 (LLVM) and Keil 6/2013 . . . 726

41.3 MIPS . . . 727

41.4 How it works . . . 727

41.4.1 More theory . . . 729

41.5 Getting the divisor. . . 729

41.5.1 Variant #1 . . . 729

41.5.2 Variant #2 . . . 731

41.6 Exercise #1 . . . 731

42 String to number conversion (atoi()) 733 42.1 Simple example . . . 733

42.1.1 Optimizing MSVC 2013 x64 . . . 734

42.1.2 Optimizing GCC 4.9.1 x64 . . . 735

42.1.3 Optimizing Keil 6/2013 (ARM mode) . . . 736

42.1.4 Optimizing Keil 6/2013 (Thumb mode) . . . 736

42.1.5 Optimizing GCC 4.9.1 ARM64 . . . 737

42.2 A slightly advanced example . . . 738

42.2.1 Optimizing GCC 4.9.1 x64 . . . 739

42.2.2 Optimizing Keil 6/2013 (ARM mode) . . . 741

42.3 Exercise . . . 742

43 Inline functions 743

(18)

CONTENTS CONTENTS

43.1 Strings and memory functions . . . 744

43.1.1 strcmp(). . . 744

43.1.2 strlen() . . . 747

43.1.3 strcpy() . . . 748

43.1.4 memset() . . . 748

43.1.5 memcpy(). . . 750

43.1.6 memcmp() . . . 754

43.1.7 IDA script. . . 755

44 C99 restrict 756 45 Branchlessabs()function 760 45.1 Optimizing GCC 4.9.1 x64 . . . 760

45.2 Optimizing GCC 4.9 ARM64 . . . 761

46 Variadic functions 762 46.1 Computing arithmetic mean. . . 762

46.1.1 cdeclcalling conventions . . . 763

46.1.2 Register-based calling conventions . . . 764

46.2 vprintf()function case . . . 767

47 Strings trimming 770 47.1 x64: Optimizing MSVC 2013 . . . 771

47.2 x64: Non-optimizing GCC 4.9.1 . . . 773

47.3 x64: Optimizing GCC 4.9.1 . . . 775

47.4 ARM64: Non-optimizing GCC (Linaro) 4.9 . . . 777

47.5 ARM64: Optimizing GCC (Linaro) 4.9 . . . 778

47.6 ARM: Optimizing Keil 6/2013 (ARM mode). . . 779

47.7 ARM: Optimizing Keil 6/2013 (Thumb mode) . . . 780

47.8 MIPS . . . 781

48 toupper() function 784 48.1 x64 . . . 785

48.1.1 Two comparison operations . . . 785

48.1.2 One comparison operation. . . 786

48.2 ARM . . . 787

48.2.1 GCC for ARM64 . . . 788

48.3 Summary. . . 789

49 Incorrectly disassembled code 790 49.1 Disassembling from an incorrect start (x86) . . . 790

49.2 How does random noise looks disassembled? . . . 791

50 Obfuscation 798 50.1 Text strings . . . 798

(19)

CONTENTS CONTENTS

50.2 Executable code . . . 799

50.2.1 Inserting garbage . . . 799

50.2.2 Replacing instructions with bloated equivalents . . . 800

50.2.3 Always executed/never executed code . . . 800

50.2.4 Making a lot of mess . . . 800

50.2.5 Using indirect pointers . . . 801

50.3 Virtual machine / pseudo-code . . . 802

50.4 Other things to mention . . . 802

50.5 Exercises . . . 802

50.5.1 Exercise #1 . . . 802

51 C++ 803 51.1 Classes . . . 803

51.1.1 A simple example . . . 803

51.1.2 Class inheritance . . . 812

51.1.3 Encapsulation. . . 816

51.1.4 Multiple inheritance. . . 819

51.1.5 Virtual methods . . . 824

51.2 ostream . . . 828

51.3 References . . . 829

51.4 STL . . . 830

51.4.1 std::string. . . 830

51.4.2 std::list . . . 841

51.4.3 std::vector . . . 856

51.4.4 std::map and std::set. . . 866

52 Negative array indices 883 53 Windows 16-bit 887 53.1 Example#1 . . . 887

53.2 Example #2 . . . 888

53.3 Example #3 . . . 889

53.4 Example #4 . . . 891

53.5 Example #5 . . . 894

53.6 Example #6 . . . 899

53.6.1 Global variables . . . 901

IV Java 904

54 Java 905 54.1 Introduction . . . 905

54.2 Returning a value . . . 906

54.3 Simple calculating functions . . . 912

(20)

CONTENTS CONTENTS

54.4 JVM6memory model . . . 915

54.5 Simple function calling. . . 916

54.6 Calling beep() . . . 919

54.7 Linear congruential PRNG7 . . . 919

54.8 Conditional jumps . . . 921

54.9 Passing arguments. . . 924

54.10Bitfields . . . 926

54.11Loops . . . 927

54.12switch() . . . 930

54.13Arrays . . . 932

54.13.1Simple example . . . 932

54.13.2Summing elements of array . . . 933

54.13.3main()function sole argument is array too . . . 934

54.13.4Pre-initialized array of strings. . . 935

54.13.5Variadic functions . . . 938

54.13.6Two-dimensional arrays. . . 941

54.13.7Three-dimensional arrays . . . 942

54.13.8Summary . . . 943

54.14Strings . . . 944

54.14.1First example . . . 944

54.14.2Second example . . . 945

54.15Exceptions . . . 947

54.16Classes . . . 952

54.17Simple patching . . . 954

54.17.1 First example . . . 954

54.17.2 Second example . . . 957

54.18Summary. . . 960

V Finding important/interesting stuff in the code 962

55 Identification of executable files 964 55.1 Microsoft Visual C++. . . 964

55.1.1 Name mangling. . . 964

55.2 GCC . . . 965

55.2.1 Name mangling. . . 965

55.2.2 Cygwin . . . 965

55.2.3 MinGW . . . 965

55.3 Intel FORTRAN . . . 965

55.4 Watcom, OpenWatcom . . . 965

55.4.1 Name mangling. . . 965

6Java virtual machine

7Pseudorandom number generator

(21)

CONTENTS CONTENTS

55.5 Borland . . . 966

55.5.1 Delphi. . . 966

55.6 Other known DLLs . . . 969

56 Communication with the outer world (win32) 970 56.1 Often used functions in the Windows API . . . 971

56.2 tracer: Intercepting all functions in specific module. . . 972

57 Strings 974 57.1 Text strings . . . 974

57.1.1 C/C++ . . . 974

57.1.2 Borland Delphi . . . 975

57.1.3 Unicode. . . 975

57.1.4 Base64 . . . 979

57.2 Error/debug messages . . . 980

57.3 Suspicious magic strings . . . 980

58 Calls to assert() 982 59 Constants 984 59.1 Magic numbers . . . 985

59.1.1 DHCP . . . 986

59.2 Searching for constants. . . 986

60 Finding the right instructions 987 61 Suspicious code patterns 990 61.1 XOR instructions . . . 990

61.2 Hand-written assembly code . . . 990

62 Using magic numbers while tracing 993 63 Other things 995 63.1 General idea. . . 995

63.2 C++ . . . 995

63.3 Some binary file patterns . . . 995

63.4 Memory “snapshots” comparing . . . 996

63.4.1 Windows registry . . . 997

63.4.2 Blink-comparator . . . 997

VI OS-specific 999

64 Arguments passing methods (calling conventions) 1000 64.1 cdecl . . . 1000

(22)

CONTENTS CONTENTS

64.2 stdcall . . . 1000 64.2.1 Functions with variable number of arguments . . . 1002 64.3 fastcall . . . 1002 64.3.1 GCC regparm . . . 1004 64.3.2 Watcom/OpenWatcom. . . 1004 64.4 thiscall . . . 1004 64.5 x86-64 . . . 1004 64.5.1 Windows x64 . . . 1004 64.5.2 Linux x64 . . . 1008 64.6 Return values offloatanddoubletype . . . 1009 64.7 Modifying arguments . . . 1009 64.8 Taking a pointer to function argument . . . 1010

65 Thread Local Storage 1013

65.1 Linear congruential generator revisited . . . 1014 65.1.1 Win32. . . 1014 65.1.2 Linux . . . 1020

66 System calls (syscall-s) 1021

66.1 Linux . . . 1022 66.2 Windows . . . 1022

67 Linux 1023

67.1 Position-independent code . . . 1023 67.1.1 Windows . . . 1027 67.2 LD_PRELOADhack in Linux . . . 1027

68 Windows NT 1031

68.1 CRT (win32) . . . 1031 68.2 Win32 PE. . . 1036 68.2.1 Terminology . . . 1037 68.2.2 Base address . . . 1037 68.2.3 Subsystem . . . 1038 68.2.4 OS version . . . 1038 68.2.5 Sections . . . 1039 68.2.6 Relocations (relocs) . . . 1040 68.2.7 Exports and imports. . . 1041 68.2.8 Resources . . . 1044 68.2.9 .NET . . . 1045 68.2.10TLS . . . 1045 68.2.11Tools. . . 1045 68.2.12Further reading . . . 1046 68.3 Windows SEH . . . 1046 68.3.1 Let’s forget about MSVC. . . 1046

(23)

CONTENTS CONTENTS

68.3.2 Now let’s get back to MSVC . . . 1054 68.3.3 Windows x64 . . . 1075 68.3.4 Read more about SEH . . . 1080 68.4 Windows NT: Critical section . . . 1081

VII Tools 1084

69 Disassembler 1085

69.1 IDA . . . 1085

70 Debugger 1086

70.1 OllyDbg . . . 1086 70.2 GDB . . . 1086 70.3 tracer . . . 1086

71 System calls tracing 1088

71.0.1 strace / dtruss. . . 1088

72 Decompilers 1089

73 Other tools 1090

VIII Examples of real-world RE tasks 1091

74 Task manager practical joke (Windows Vista) 1093 74.1 Using LEA to load values. . . 1097

75 Color Lines game practical joke 1098

76 Minesweeper (Windows XP) 1102

76.1 Exercises . . . 1109

77 Hand decompiling + Z3 SMT solver 1110

77.1 Hand decompiling . . . 1110 77.2 Now let’s use the Z3 SMT solver . . . 1115

78 Dongles 1122

78.1 Example #1: MacOS Classic and PowerPC . . . 1122 78.2 Example #2: SCO OpenServer. . . 1133 78.2.1 Decrypting error messages. . . 1145 78.3 Example #3: MS-DOS . . . 1149 79 “QR9”: Rubik’s cube inspired amateur crypto-algorithm 1157

(24)

CONTENTS CONTENTS

80 SAP 1201

80.1 About SAP client network traffic compression . . . 1201 80.2 SAP 6.0 password checking functions. . . 1218

81 Oracle RDBMS 1225

81.1 V$VERSIONtable in the Oracle RDBMS . . . 1225 81.2 X$KSMLRUtable in Oracle RDBMS . . . 1238 81.3 V$TIMERtable in Oracle RDBMS . . . 1240

82 Handwritten assembly code 1246

82.1 EICAR test file . . . 1246

83 Demos 1248

83.1 10 PRINT CHR$(205.5+RND(1)); : GOTO 10 . . . 1248 83.1.1 Trixter’s 42 byte version. . . 1249 83.1.2 My attempt to reduce Trixter’s version: 27 bytes. . . 1250 83.1.3 Taking random memory garbage as a source of randomness 1251 83.1.4 Conclusion. . . 1252 83.2 Mandelbrot set . . . 1253 83.2.1 Theory . . . 1254 83.2.2 Let’s get back to the demo . . . 1260 83.2.3 My “fixed” version . . . 1263

IX Examples of reversing proprietary file formats 1266

84 Primitive XOR-encryption 1267

84.1 Norton Guide: simplest possible 1-byte XOR encryption . . . 1268 84.1.1 Entropy . . . 1269 84.2 Simplest possible 4-byte XOR encryption . . . 1271 84.2.1 Exercise. . . 1274

85 Millenium game save file 1275

86 Oracle RDBMS: .SYM-files 1282

87 Oracle RDBMS: .MSB-files 1295

87.1 Summary. . . 1302

X Other things 1303

88 npad 1304

89 Executable files patching 1307

(25)

CONTENTS CONTENTS

89.1 Text strings . . . 1307 89.2 x86 code . . . 1307

90 Compiler intrinsic 1309

91 Compiler’s anomalies 1310

92 OpenMP 1312

92.1 MSVC . . . 1315 92.2 GCC . . . 1318

93 Itanium 1321

94 8086 memory model 1326

95 Basic blocks reordering 1328

95.1 Profile-guided optimization . . . 1328

XI Books/blogs worth reading 1331

96 Books 1332

96.1 Windows . . . 1332 96.2 C/C++ . . . 1332 96.3 x86 / x86-64 . . . 1332 96.4 ARM . . . 1332 96.5 Cryptography . . . 1332

97 Blogs 1333

97.1 Windows . . . 1333

98 Other 1334

XII Exercises 1335

99 Level 1 1338

99.1 Exercise 1.4 . . . 1338

100Level 2 1339

100.1Exercise 2.1 . . . 1339 100.1.1Optimizing MSVC 2010 x86 . . . 1339 100.1.2Optimizing MSVC 2012 x64 . . . 1340 100.2Exercise 2.4 . . . 1341 100.2.1Optimizing MSVC 2010 . . . 1341 100.2.2GCC 4.4.1 . . . 1342

(26)

CONTENTS CONTENTS

100.2.3Optimizing Keil (ARM mode). . . 1344 100.2.4Optimizing Keil (Thumb mode) . . . 1345 100.2.5Optimizing GCC 4.9.1 (ARM64) . . . 1346 100.2.6Optimizing GCC 4.4.5 (MIPS). . . 1347 100.3Exercise 2.6 . . . 1348 100.3.1Optimizing MSVC 2010 . . . 1348 100.3.2Optimizing Keil (ARM mode). . . 1349 100.3.3Optimizing Keil (Thumb mode) . . . 1350 100.3.4Optimizing GCC 4.9.1 (ARM64) . . . 1351 100.3.5Optimizing GCC 4.4.5 (MIPS). . . 1352 100.4Exercise 2.13 . . . 1353 100.4.1Optimizing MSVC 2012 . . . 1353 100.4.2Keil (ARM mode) . . . 1353 100.4.3Keil (Thumb mode). . . 1354 100.4.4Optimizing GCC 4.9.1 (ARM64) . . . 1354 100.4.5Optimizing GCC 4.4.5 (MIPS). . . 1355 100.5Exercise 2.14 . . . 1355 100.5.1MSVC 2012 . . . 1355 100.5.2Keil (ARM mode) . . . 1356 100.5.3GCC 4.6.3 for Raspberry Pi (ARM mode). . . 1357 100.5.4Optimizing GCC 4.9.1 (ARM64) . . . 1358 100.5.5Optimizing GCC 4.4.5 (MIPS). . . 1359 100.6Exercise 2.15 . . . 1362 100.6.1Optimizing MSVC 2012 x64 . . . 1362 100.6.2Optimizing GCC 4.4.6 x64 . . . 1366 100.6.3Optimizing GCC 4.8.1 x86 . . . 1367 100.6.4Keil (ARM mode): Cortex-R4F CPU as target . . . 1369 100.6.5Optimizing GCC 4.9.1 (ARM64) . . . 1370 100.6.6Optimizing GCC 4.4.5 (MIPS). . . 1371 100.7Exercise 2.16 . . . 1373 100.7.1 Optimizing MSVC 2012 x64 . . . 1373 100.7.2 Optimizing Keil (ARM mode). . . 1374 100.7.3 Optimizing Keil (Thumb mode) . . . 1374 100.7.4 Non-optimizing GCC 4.9.1 (ARM64) . . . 1375 100.7.5 Optimizing GCC 4.9.1 (ARM64) . . . 1376 100.7.6 Non-optimizing GCC 4.4.5 (MIPS). . . 1379 100.8Exercise 2.17 . . . 1381 100.9Exercise 2.18 . . . 1381 100.10Exercise 2.19 . . . 1382 100.11Exercise 2.20 . . . 1382

101Level 3 1383

101.1Exercise 3.2 . . . 1383 101.2Exercise 3.3 . . . 1383

(27)

CONTENTS CONTENTS

101.3Exercise 3.4 . . . 1384 101.4Exercise 3.5 . . . 1384 101.5Exercise 3.6 . . . 1384 101.6Exercise 3.8 . . . 1385

102crackme / keygenme 1386

Afterword 1388

103Questions? 1388

Appendix 1390

A x86 1390

A.1 Terminology . . . 1390 A.2 General purpose registers . . . 1390 A.2.1 RAX/EAX/AX/AL. . . 1391 A.2.2 RBX/EBX/BX/BL . . . 1391 A.2.3 RCX/ECX/CX/CL. . . 1391 A.2.4 RDX/EDX/DX/DL . . . 1392 A.2.5 RSI/ESI/SI/SIL. . . 1392 A.2.6 RDI/EDI/DI/DIL . . . 1392 A.2.7 R8/R8D/R8W/R8L . . . 1392 A.2.8 R9/R9D/R9W/R9L . . . 1393 A.2.9 R10/R10D/R10W/R10L . . . 1393 A.2.10 R11/R11D/R11W/R11L . . . 1393 A.2.11 R12/R12D/R12W/R12L . . . 1393 A.2.12 R13/R13D/R13W/R13L . . . 1393 A.2.13 R14/R14D/R14W/R14L . . . 1394 A.2.14 R15/R15D/R15W/R15L . . . 1394 A.2.15 RSP/ESP/SP/SPL . . . 1394 A.2.16 RBP/EBP/BP/BPL . . . 1394 A.2.17 RIP/EIP/IP . . . 1395 A.2.18 CS/DS/ES/SS/FS/GS . . . 1395 A.2.19 Flags register . . . 1395 A.3 FPU registers . . . 1396 A.3.1 Control Word . . . 1397 A.3.2 Status Word . . . 1397 A.3.3 Tag Word . . . 1398 A.4 SIMD registers . . . 1399 A.4.1 MMX registers. . . 1399 A.4.2 SSE and AVX registers. . . 1399

(28)

CONTENTS CONTENTS

A.5 Debugging registers. . . 1399 A.5.1 DR6 . . . 1400 A.5.2 DR7 . . . 1400 A.6 Instructions . . . 1401 A.6.1 Prefixes . . . 1401 A.6.2 Most frequently used instructions . . . 1402 A.6.3 Less frequently used instructions. . . 1409 A.6.4 FPU instructions . . . 1416 A.6.5 Instructions having printable ASCII opcode . . . 1418

B ARM 1420

B.1 Terminology . . . 1420 B.2 Versions . . . 1420 B.3 32-bit ARM (AArch32). . . 1421 B.3.1 General purpose registers . . . 1421 B.3.2 Current Program Status Register (CPSR) . . . 1421 B.3.3 VFP (floating point) and NEON registers . . . 1422 B.4 64-bit ARM (AArch64). . . 1422 B.4.1 General purpose registers . . . 1422 B.5 Instructions . . . 1423 B.5.1 Conditional codes table. . . 1424

C MIPS 1425

C.1 Registers . . . 1425 C.1.1 General purpose registers GPR8. . . 1425 C.1.2 Floating-point registers. . . 1426 C.2 Instructions . . . 1426 C.2.1 Jump instructions. . . 1426

D Some GCC library functions 1428

E Some MSVC library functions 1429

F Cheatsheets 1430

F.1 IDA . . . 1430 F.2 OllyDbg . . . 1431 F.3 MSVC . . . 1432 F.4 GCC . . . 1432 F.5 GDB . . . 1432

G Exercise solutions 1434

G.1 Per chapter . . . 1434 G.1.1 “Hello, world!” chapter . . . 1434

8General Purpose Registers

(29)

CONTENTS CONTENTS

G.1.2 “Stack” chapter . . . 1435 G.1.3 “scanf()” chapter . . . 1435 G.1.4 “switch()/case/default” chapter . . . 1436 G.1.5 Exercise #1 . . . 1436 G.1.6 “Loops” chapter . . . 1436 G.1.7 Exercise #3 . . . 1436 G.1.8 Exercise #4 . . . 1437 G.1.9 “Simple C-strings processing” chapter. . . 1437 G.1.10 “Replacing arithmetic instructions to other ones” chapter . . 1438 G.1.11 “Floating-point unit” chapter . . . 1438 G.1.12 “Arrays” chapter. . . 1438 G.1.13 “Manipulating specific bit(s)” chapter . . . 1440 G.1.14 “Structures” chapter . . . 1442 G.1.15 “Obfuscation” chapter . . . 1444 G.1.16 “Division by 9” chapter . . . 1444 G.2 Level 1 . . . 1444 G.2.1 Exercise 1.1 . . . 1444 G.2.2 Exercise 1.4 . . . 1444 G.3 Level 2 . . . 1444 G.3.1 Exercise 2.1 . . . 1444 G.3.2 Exercise 2.4 . . . 1445 G.3.3 Exercise 2.6 . . . 1445 G.3.4 Exercise 2.13 . . . 1446 G.3.5 Exercise 2.14 . . . 1446 G.3.6 Exercise 2.15 . . . 1446 G.3.7 Exercise 2.16 . . . 1446 G.3.8 Exercise 2.17 . . . 1447 G.3.9 Exercise 2.18 . . . 1447 G.3.10 Exercise 2.19 . . . 1447 G.3.11 Exercise 2.20 . . . 1447 G.4 Level 3 . . . 1447 G.4.1 Exercise 3.2 . . . 1447 G.4.2 Exercise 3.3 . . . 1448 G.4.3 Exercise 3.4 . . . 1448 G.4.4 Exercise 3.5 . . . 1448 G.4.5 Exercise 3.6 . . . 1448 G.4.6 Exercise 3.8 . . . 1448 G.5 Other . . . 1448 G.5.1 “Minesweeper (Windows XP)” example . . . 1448

Acronyms used 1451

Glossary 1457

(30)

CONTENTS CONTENTS

Index 1460

Bibliography 1469

(31)

CONTENTS CONTENTS

Preface

There are several popular meanings of the term “reverse engineering”: 1) The re- verse engineering of software: researching compiled programs; 2) The scanning of 3D structures and the subsequent digital manipulation required order to duplicate them; 3) recreatingDBMS9structure. This book is about the first meaning.

Topics discussed in-depth

x86/x64, ARM/ARM64, MIPS, Java/JVM.

Topics touched upon

Oracle RDBMS (81 on page 1225), Itanium (93 on page 1321), copy-protection don- gles (78 on page 1122), LD_PRELOAD (67.2 on page 1027), stack overflow,ELF10, win32 PE file format (68.2 on page 1036), x86-64 (26.1 on page 628), critical sec- tions (68.4 on page 1081), syscalls (66 on page 1021),TLS11, position-independent code (PIC12) (67.1 on page 1023), profile-guided optimization (95.1 on page 1328), C++ STL (51.4 on page 830), OpenMP (92 on page 1312), SEH (68.3 on page 1046).

About the author

Dennis Yurichev is an experienced reverse engineer and programmer. He can be con- tacted by email: dennis(a)yurichev.com, or on Skype: dennis.yurichev.

9Database management systems

10Executable file format widely used in *NIX systems including Linux

11Thread Local Storage

12Position Independent Code:67.1 on page 1023

(32)

CONTENTS CONTENTS

Thanks

For patiently answering all my questions: Andrey “herm1t” Baranovich, Slava “Avid”

Kazakov.

For sending me notes about mistakes and inaccuracies: Stanislav “Beaver” Bobryt- skyy, Alexander Lysenko, Shell Rocket, Zhu Ruijin, Changmin Heo.

For helping me in other ways: Andrew Zubinski, Arnaud Patard (rtp on #debian-arm IRC), Aliaksandr Autayeu.

For translating the book into Simplified Chinese: Archer.

For translating the book into Korean: Byungho Min.

For proofreading: Alexander “Lstar” Chernenkiy, Vladimir Botov, Andrei Brazhuk, Mark “Logxen” Cooper, Yuan Jochen Kang, Mal Malakov, Lewis Porter, Jarle Thorsen.

Vasil Kolev did a great amount of work in proofreading and correcting many mis- takes.

For illustrations and cover art: Andy Nechaevsky.

Thanks also to all the folks on github.com who have contributed notes and correc- tions.

Many LATEX packages were used: I would like to thank the authors as well.

0.0.1 Donate

As it turns out, (technical) writing takes a lot of time and effort.

This book is free, available freely and available in source code form13(LaTeX), and it shall stay that way forever.

It is also ad-free.

My current plan for this book is to add lots of information about: PLANS14. If you want me to continue writing about all of these topics, you might want to consider making a donation.

Ways to donate can be found here: http://yurichev.com/donate.html.

All donors’ names will be included in the hall of fame!

http://yurichev.com/donors.html.

13github.com/dennis714/RE-for-beginners

14github.com/dennis714/RE-for-beginners/blob/master/PLANS

(33)

CONTENTS CONTENTS

mini-FAQ

Q: Why one should learn assembly language these days?

A: Unless you are anOS15developer, you probably don’t need to code in assembly—

modern compilers are much better at performing optimizations than humans16. Also, modernCPU17s are very complex devices and assembly knowledge doesn’t re- ally help one to understand their internals. That being said, there are at least two areas where a good understanding of assembly can be helpful: First and foremost, security/malware research. It is also a good way to gain a better understanding of your compiled code whilst debugging. This book is therefore intended for those who want to understand assembly language rather than to code in it, which is why there are many examples of compiler output contained within.

Q: I clicked on a hyperlink inside of a PDF-document, how do I get back?

A: In Adobe Acrobat Reader click Alt+LeftArrow.

Q: Your book is huge! Is there anything shorter?

A: There is shortened lite version found here:http://beginners.re/#lite.

Q: I’m not sure, if I should try to learn reverse engineering or not.

A: Perhaps, the average time to become familiar with the contents of the shortened LITE-version is 1-2 month(s).

Q: May I print this book? Use it for teaching?

A: Of course! That’s why book is licensed under Creative Commons license. One might also want to build one’s own version of book—readhereto find out more.

Q: I want to translate your book to some other language.

A: Readmy note to translators.

Q: How does one get a job in reverse engineering?

A: There are hiring threads that appear from time to time on reddit devoted to RE18 (2013 Q3,2014). Try looking there. A somewhat related hiring thread can be found in the “netsec” subreddit:2014 Q2.

Q: I have a question...

A: Send it to me by email (dennis(a)yurichev.com) or ask your question on my forum:

forum.yurichev.com.

15Operating System

16A very good text about this topic: [Fog13b]

17Central processing unit

18reddit.com/r/ReverseEngineering/

(34)

CONTENTS CONTENTS

Praise for Reverse Engineering for Beginners

• “It’s very well done .. and for free .. amazing.”19 Daniel Bilar, Siege Technolo- gies, LLC.

• “... excellent and free”20Pete Finnigan, Oracle RDBMS security guru.

• “... book is interesting, great job!” Michael Sikorski, author ofPractical Mal- ware Analysis: The Hands-On Guide to Dissecting Malicious Software.

• “... my compliments for the very nice tutorial!” Herbert Bos, full professor at the Vrije Universiteit Amsterdam, co-author ofModern Operating Systems (4th Edition).

• “... It is amazing and unbelievable.” Luis Rocha, CISSP / ISSAP, Technical Manager, Network & Information Security at Verizon Business.

• “Thanks for the great work and your book.” Joris van de Vis, SAP Netweaver

& Security specialist.

• “... reasonable intro to some of the techniques.”21 Mike Stay, teacher at the Federal Law Enforcement Training Center, Georgia, US.

• “I love this book! I have several students reading it at the moment, plan to use it in graduate course.”22 Sergey Bratus, Research Assistant Professor at the Computer Science Department at Dartmouth College

• “Dennis @Yurichev has published an impressive (and free!) book on reverse engineering”23Tanel Poder, Oracle RDBMS performance tuning expert.

• “This book is some kind of Wikipedia to beginners...” Archer, Chinese Trans- lator, IT Security Researcher.

This is the A5-format version for e-book readers. Although the content is mostly the same, the illustrations are resized and probably not readable. You may try to change scale in your e-book reader. Otherwise, you can always view them in the A4-format version here:beginners.re.

About the Korean translation

In January 2015, the Acorn publishing company (www.acornpub.co.kr) in South Ko- rea did huge amount of work in translating and publishing my book (as it was in August 2014) into Korean.

19twitter.com/daniel_bilar/status/436578617221742593

20twitter.com/petefinnigan/status/400551705797869568

21reddit

22twitter.com/sergeybratus/status/505590326560833536

23twitter.com/TanelPoder/status/524668104065159169

(35)

CONTENTS CONTENTS

It’s now available attheir website.

Translator is Byungho Min (twitter/tais9).

The cover art was done by my artistic friend, Andy Nechaevsky : facebook/andy- dinka.

They also hold the copyright to the Korean translation.

So, if you want to have arealbook on your shelf in Korean and want to support my work, it is now available for purchase.

(36)

Part I

Code patterns

(37)

When the author of this book first started learning C and later, C++, he used to write small pieces of code, compile them, and then look at the assembly language output.

This made it very easy for him to understand what was going on in the code that he had written.24. He did it so many times that the relationship between the C/C++

code and what the compiler produced was imprinted deeply in his mind. It’s easy to imagine instantly a rough outline of C code’s appearance and function. Perhaps this technique could be helpful for someone else.

Sometimes an ancient compilers are used here, in order to get the shortest (or simplest) possible code snippet.

Exercises

When the author of this book studied assembly language, he also often compiled small C-functions and then rewrote them gradually to assembly, trying to make their code as short as possible. This probably is not worth doing in real-world scenarios today, because it’s hard to compete with modern compilers in terms of efficiency.

It is, however, a very good way to gain a better understanding of assembly. Feel free, therefore, to take any assembly code from this book and try to make it shorter.

However, don’t forget to test what you have written.

Optimization levels and debug information

Source code can be compiled by different compilers with various optimization lev- els. Typical compiler has about three such levels, where level zero means disable optimization. Optimization can also be targeted towards code size or code speed.

A non-optimizing compiler is faster and produces more understandable (albeit ver- bose) code, whereas an optimizing compiler is slower and tries to produce code that runs faster (but is not necessarily more compact).

In addition to optimization levels and direction, compiler can include in the result- ing file some debug information, thus producing code for easy debugging.

One of the important features of the ´debug’ code is that it might contain links between each line of the source code and the respective machine code addresses.

Optimizing compilers, on the other hand, tend to produce output where entire lines of source code can be optimized away and thus not even be present in the resulting machine code.

Reverse engineers can encounter either version, simply because some developers turn on the compiler’s optimization flags and others do not. Because of this, we’ll

24In fact, he still does it when he can’t understand what a particular bit of code does.

(38)

try to work on examples of both debug and release versions of the code featured in this book, where possible.

(39)

CHAPTER 1. A SHORT INTRODUCTION TO THE CPU CHAPTER 1. A SHORT INTRODUCTION TO THE CPU

Chapter 1

A short introduction to the CPU

TheCPUis the device which executes the machine code a program consists of.

A short glossary:

Instruction : A primitiveCPUcommand. The simplest examples include: moving data between registers, working with memory and arithmetic primitives. As a rule, eachCPUhas its own instruction set architecture (ISA).

Machine code : Code that theCPUdirectly processes. Each instruction is usually encoded by several bytes.

Assembly language : Mnemonic code and some extensions like macros which are intended to make a programmer’s life easier.

CPU register : EachCPUhas a fixed set of general purpose registers (GPR). ≈8in x86,≈16in x86-64,≈16in ARM. The easiest way to understand a register is to think of it as an untyped temporary variable. Imagine if you were working with high-levelPL1and could only use eight 32-bit (or 64-bit) variables. Yet a lot can be done using just these!

One might wonder why there needs to be a difference between machine code and aPL. The answer lies in the fact that humans andCPUs are not alike— it is much easier for humans to use a high-level PLlike C/C++, Java, Python, etc., but it is easier for aCPU to use a much lower level of abstraction. Perhaps it would be possible to invent aCPUthat can execute high-levelPLcode, but it would be many times more complex than theCPUs we know of today. In a similar fashion, it is very inconvenient for humans to write in assembly language due to it being so low-level and difficult to write in without making a huge amount of annoying mistakes. The program which converts the high-levelPLcode into assembly is called acompiler.

1Programming language

(40)

CHAPTER 1. A SHORT INTRODUCTION TO THE CPU CHAPTER 1. A SHORT INTRODUCTION TO THE CPU

1.1 A couple of words about different ISAs

The x86ISAhas always been one with variable-length opcodes, so when the 64-bit era came, the x64 extensions did not impact theISAvery significantly. In fact, x86 ISAstill contains a lot of instructions that first appeared in 16-bit 8086 CPU, yet are still found in the CPUs of today.

ARM is aRISC2CPUdesigned with constant-length opcode in mind, which had some advantages in the past. In the very beginning, all ARM instructions were encoded in 4 bytes3. This is now referred to as “ARM mode”.

Then they thought it wasn’t as frugal as they first imagined. In fact, most used CPUinstructions4 in real world applications can be encoded using less informa- tion. They therefore added anotherISA, called Thumb, where each instruction was encoded in just 2 bytes. This is now referred as “Thumb mode”. However, not allARM instructions can be encoded in just 2 bytes, so the Thumb instruction set is somewhat limited. It is worth noting that code compiled for ARM mode and Thumb mode may of course coexist within one single program.

The ARM creators thought Thumb could be extended, giving rise to Thumb-2, which appeared in ARMv7. Thumb-2 still uses 2-byte instructions, but has some new instructions which have the size of 4 bytes. There is a common misconception that Thumb-2 is a mix of ARM and Thumb. This is incorrect. Rather, Thumb-2 was extended to fully support all processor features so it could compete with ARM mode—a goal that was clearly achieved, as the majority of applications for iPod/i- Phone/iPad are compiled for the Thumb-2 instruction set (admittedly, largely due to the fact that that Xcode does this by default). Later the 64-bit ARM came out. ThisISAhas 4-byte opcodes, and lacked the need of any additional Thumb mode. However, the 64-bit requirements affected the ISA, resulting in us now having three ARM instruction sets: ARM mode, Thumb mode (including Thumb-2) and ARM64. TheseISAs intersect partially, but it can be rather said they are differ- entISAs, rather than variations of the same one. Therefore, we would try to add fragments of code in all three ARMISAs in this book.

There are, by the way, many otherRISC ISAs with fixed length 32-bit opcodes, such as MIPS, PowerPC and Alpha AXP.

2Reduced instruction set computing

3By the way, fixed-length instructions are handy because one can calculate the next (or previous) instruction address without effort. This feature will be discussed in the switch() operator (13.2.2 on page 235) section.

4These are MOV/PUSH/CALL/Jcc

(41)

CHAPTER 2. THE SIMPLEST FUNCTION CHAPTER 2. THE SIMPLEST FUNCTION

Chapter 2

The simplest Function

The simplest possible function is arguably one that simply returns a constant value:

Here it is:

Listing 2.1: C/C++ Code int f()

{

return 123;

};

Lets compile it!

2.1 x86

Here’s what both the optimizing GCC and MSVC compilers produce on the x86 plat- form:

Listing 2.2: Optimizing GCC/MSVC (assembly output) f:

mov eax, 123 ret

There are just two instructions: the first places the value 123 into theEAXregister, which is used by convention for storing the return value and the second one isRET, which returns execution to thecaller. The caller will take the result from theEAX register.

(42)

CHAPTER 2. THE SIMPLEST FUNCTION CHAPTER 2. THE SIMPLEST FUNCTION

2.2 ARM

There are a few differences on the ARM platform:

Listing 2.3: Optimizing Keil 6/2013 (ARM mode) ASM Output f PROC

MOV r0,#0x7b ; 123

BX lr

ENDP

ARM uses the registerR0for returning the results of functions, so 123 is copied intoR0.

The return address is not saved on the local stack in the ARMISA, but rather in the link register, so theBX LRinstruction causes execution to jump to that address—

effectively returning execution to thecaller.

It worth noting thatMOVis a misleading name for the instruction in both x86 and ARMISAs. The data is not in factmoved, butcopied.

2.3 MIPS

There are two naming conventions used in the world of MIPS when naming regis- ters: by number (from $0 to $31) or by pseudoname ($V0, $A0, etc). The GCC assembly output below lists registers by number:

Listing 2.4: Optimizing GCC 4.4.5 (assembly output)

j $31

li $2,123 # 0x7b

…whileIDA1does it—by their pseudonames:

Listing 2.5: Optimizing GCC 4.4.5 (IDA)

jr $ra

li $v0, 0x7B

The $2 (or $V0) register is used to store the function’s return value. LI stands for

“Load Immediate” and is the MIPS equivalent to MOV.

The other instruction is the jump instruction (J or JR) which returns the execution flow to thecaller, jumping to the address in the $31 (or $RA) register. This is the register analogous toLR2in ARM.

1Interactive Disassembler and debugger developed byHex-Rays

2Link Register

(43)

CHAPTER 2. THE SIMPLEST FUNCTION CHAPTER 2. THE SIMPLEST FUNCTION

You might be wondering why positions of the the load instruction (LI) and the jump instruction (J or JR) are swapped. This is due to aRISCfeature called “branch delay slot”. The why this happens is due to quirk in the architecture of some RISCISAs and isn’t important for our purposes - we just need to remember that in MIPS, the instruction following a jump or branch instruction is executedbefore the jump/brunch instruction itself. As a consequence, branch instructions always swap place with the instruction which must be executed beforehand.

2.3.1 A note about MIPS instruction/register names

Register and instruction names in the world of MIPS are traditionally written in lowercase. However, for the sake of consistency, we’ll stick to using uppercase letters, as it is the convention followed by all otherISAs featured this book.

(44)

CHAPTER 3. HELLO, WORLD! CHAPTER 3. HELLO, WORLD!

Chapter 3

Hello, world!

Let’s use the famous example from the book “The C programming Language”[Ker88]:

#include <stdio.h>

int main() {

printf("hello, world\n");

return 0;

}

3.1 x86

3.1.1 MSVC

Let’s compile it in MSVC 2010:

cl 1.cpp /Fa1.asm

(/Fa option instructs the compiler to generate assembly listing file) Listing 3.1: MSVC 2010

CONST SEGMENT

$SG3830 DB 'hello, world', 0AH, 00H CONST ENDS

PUBLIC _main

EXTRN _printf:PROC

; Function compile flags: /Odtp

(45)

CHAPTER 3. HELLO, WORLD! CHAPTER 3. HELLO, WORLD!

_TEXT SEGMENT _main PROC

push ebp mov ebp, esp push OFFSET $SG3830 call _printf

add esp, 4 xor eax, eax

pop ebp

ret 0

_main ENDP _TEXT ENDS

MSVC produces assembly listings in Intel-syntax. The difference between Intel- syntax and AT&T-syntax will be discussed in3.1.3 on page 13.

The compiler generated1.objfile, which is to be linked into1.exe. In our case, the file contains two segments:CONST(for data constants) and_TEXT(for code).

The stringhello, worldin C/C++ has typeconst char[][Str13, p176, 7.3.2], however it does not have its own name. The compiler needs to deal with the string somehow so it defines the internal name$SG3830for it.

That is why the example may be rewritten as follows:

#include <stdio.h>

const char $SG3830[]="hello, world\n";

int main() {

printf($SG3830);

return 0;

}

Let’s go back to the assembly listing. As we can see, the string is terminated by a zero byte, which is standard for C/C++ strings. More about C strings:57.1.1 on page 974.

In the code segment,_TEXT, there is only one function so far:main(). The func- tionmain()starts with prologue code and ends with epilogue code (like almost any function)1.

After the function prologue we see the call to the printf() function: CALL _printf. Before the call the string address (or a pointer to it) containing our greeting is placed on the stack with the help of thePUSHinstruction.

1You can read more about it in section about function prolog and epilog (4 on page 37).

(46)

CHAPTER 3. HELLO, WORLD! CHAPTER 3. HELLO, WORLD!

When the printf() function returns the control to the main()function, the string address (or a pointer to it) is still on the stack. Since we do not need it anymore, thestack pointer(theESPregister) needs to be corrected.

ADD ESP, 4 means add 4 to theESPregister value. Why 4? Since this is 32-bit program, we need exactly 4 bytes for address passing through the stack. If it was x64 code we would need 8 bytes. ADD ESP, 4 is effectively equivalent toPOP registerbut without using any register2.

Some compilers (like the Intel C++ Compiler) for the same purpose may emitPOP ECXinstead ofADD(e.g., such a pattern can be observed in the Oracle RDBMS code as it is compiled with the Intel C++ compiler). This instruction has almost the same effect but theECXregister contents will be overwritten. The Intel C++ compiler probably usesPOP ECXsince this instruction’s opcode is shorter thanADD ESP, x(1 byte forPOPagainst 3 forADD).

Here is an example of usingPOPinstead ofADDfrom Oracle RDBMS:

Listing 3.2: Oracle RDBMS 10.2 Linux (app.o file)

.text:0800029A push ebx

.text:0800029B call qksfroChild

.text:080002A0 pop ecx

After callingprintf(), the original C/C++ code contains the statementreturn 0—return 0 as the result of themain()function. In the generated code this is implemented by the instruction XOR EAX, EAX. XORis in fact just “eXclusive OR”3 but the compilers often use it instead ofMOV EAX, 0— again because it is a slightly shorter opcode (2 bytes forXORagainst 5 forMOV).

Some compilers emitSUB EAX, EAX, which meansSUBtract the value in theEAX from the value inEAX, which in any case resulting in zero.

The last instructionRETreturns the control to thecaller. Usually, this is C/C++CRT4 code, which in its turn returns the control to theOS.

3.1.2 GCC

Now let’s try to compile the same C/C++ code in the GCC 4.4.1 compiler in Linux:

gcc 1.c -o 1 Next, with the assistance of theIDAdisassembler, let’s see how themain()function was created.IDA, like MSVC, uses Intel-syntax5.

2CPU flags, however, are modified

3wikipedia

4C runtime library :68.1 on page 1031

5We could also have GCC produce assembly listings in Intel-syntax by applying the options-S - masm=intel.

(47)

CHAPTER 3. HELLO, WORLD! CHAPTER 3. HELLO, WORLD!

Listing 3.3: code inIDA

main proc near

var_10 = dword ptr -10h push ebp mov ebp, esp

and esp, 0FFFFFFF0h sub esp, 10h

mov eax, offset aHelloWorld ; "hello, world⤦

Ç \n"

mov [esp+10h+var_10], eax call _printf

mov eax, 0 leave

retn

main endp

The result is almost the same. The address of thehello, worldstring (stored in the data segment) is loaded in theEAXregister first and then it is saved onto the stack. In addition, the function prologue containsAND ESP, 0FFFFFFF0h—this instruction aligns theESPregister value on a 16-byte boundary. This results in all values in the stack being aligned the same way (The CPU performs better if the values it is dealing with are located in memory at addresses aligned on a 4-byte or 16-byte boundary)6.

SUB ESP, 10h allocates 16 bytes on the stack. Although, as we can see hereafter, only 4 are necessary here.

This is because the size of the allocated stack is also aligned on a 16-byte boundary.

The string address (or a pointer to the string) is then stored directly onto the stack without using the PUSH instruction. var_10 —is a local variable and is also an argument forprintf(). Read about it below.

Then theprintf()function is called.

Unlike MSVC, when GCC is compiling without optimization turned on, it emitsMOV EAX, 0instead of a shorter opcode.

The last instruction,LEAVE—is the equivalent of theMOV ESP, EBPandPOP EBPinstruction pair —in other words, this instruction sets thestack pointer(ESP) back and restores theEBPregister to its initial state. This is necessary since we modified these register values (ESPandEBP) at the beginning of the function (by executingMOV EBP, ESP/AND ESP, …).

6Wikipedia: Data structure alignment

(48)

CHAPTER 3. HELLO, WORLD! CHAPTER 3. HELLO, WORLD!

3.1.3 GCC: AT&T syntax

Let’s see how this can be represented in assembly language AT&T syntax. This syntax is much more popular in the UNIX-world.

Listing 3.4: let’s compile in GCC 4.7.3 gcc -S 1_1.c

We get this:

Listing 3.5: GCC 4.7.3 .file "1_1.c"

.section .rodata .LC0:

.string "hello, world\n"

.text

.globl main

.type main, @function main:

.LFB0:

.cfi_startproc pushl %ebp

.cfi_def_cfa_offset 8 .cfi_offset 5, -8 movl %esp, %ebp .cfi_def_cfa_register 5 andl $-16, %esp subl $16, %esp movl $.LC0, (%esp) call printf

movl $0, %eax leave

.cfi_restore 5 .cfi_def_cfa 4, 4 ret

.cfi_endproc .LFE0:

.size main, .-main

.ident "GCC: (Ubuntu/Linaro 4.7.3-1ubuntu1) 4.7.3"

.section .note.GNU-stack,"",@progbits

The listing contains many macros (beginning with dot). These are not interesting for us at the moment. For now, for the sake of simplification, we can ignore them (except the.stringmacro which encodes a null-terminated character sequence just like a C-string). Then we’ll see this7:

7This GCC option can be used to eliminate “unnecessary” macros:-fno-asynchronous-unwind-tables

(49)

CHAPTER 3. HELLO, WORLD! CHAPTER 3. HELLO, WORLD!

Listing 3.6: GCC 4.7.3 .LC0:

.string "hello, world\n"

main:

pushl %ebp movl %esp, %ebp andl $-16, %esp subl $16, %esp movl $.LC0, (%esp) call printf

movl $0, %eax leave

ret

Some of the major differences between Intel and AT&T syntax are:

• Source and destination operands are written in opposite order.

In Intel-syntax: <instruction> <destination operand> <source operand>.

In AT&T syntax: <instruction> <source operand> <destination operand>.

Here is a way to easy memorise the difference: when you deal with Intel- syntax, you can imagine that there is an equality sign (=) between operands and when you deal with AT&T-syntax imagine there is a right arrow (→)8.

• AT&T: Before register names, a percent sign must be written (%) and before numbers a dollar sign ($). Parentheses are used instead of brackets.

• AT&T: Suffix is added to instructions to define the operand size:

q — quad (64 bits) l — long (32 bits) w — word (16 bits) b — byte (8 bits)

Let’s go back to the compiled result: it is identical to what we saw inIDA. With one subtle difference:0FFFFFFF0his presented as$-16. It is the same thing:16in the decimal system is0x10in hexadecimal. -0x10is equal to0xFFFFFFF0(for a 32-bit data type).

One more thing: the return value is to be set to 0 by using usualMOV, notXOR.

MOVjust loads value to a register. Its name is a misnomer (data is not moved but rather copied). In other architectures, this instruction is named “LOAD” or “STORE”

or something similar.

8By the way, in some C standard functions (e.g., memcpy(), strcpy()) the arguments are listed in the same way as in Intel-syntax: pointer to the destination memory block at the beginning and then pointer to the source memory block.

Références

Documents relatifs

+ Using these metrics, an analytic metric, called Type-P1-P2- Interval-Temporal-Connectivity, will be introduced to define a useful notion of two-way connectivity

The following proposition corrects the part of the proof of Proposition 5.7 of the

Recalling the impossibility of doubting his existence and looking back to his “sum”-intuition, the Cartesian

Furthermore, a study about the hemodynamic eff ects of mild therapeutic hypothermia in 20 consecutive patients admitted in cardio genic shock after successful resuscitation from

Moreover, by contrast with the pic- ture of value concepts fostered by Prinz ’ s approach, the concept of a felt tendency to act does not promote the idea that emotions are “

In the next section “the Value of Understanding Natural Hazards” I will show how the social and economic pressure for predicting natural hazards is logically accelerating

We prove that an extreme Kerr initial data set is a unique absolute minimum of the total mass in a (physically relevant) class of vacuum, maximal, asymptotically flat, axisymmetric

In our very recent paper [36], we developed a time-domain finite element method to simulate a cylindrical cloak [52], and carried out some mathematical analysis for this model..