GCC: AT&T syntax

Hello, world!

3.1.3 GCC: AT&T syntax

Let’s see how this can be represented in assembly language AT&T syntax. This syntax is much more popular in the UNIX-world.

Listing 3.4: let’s compile in GCC 4.7.3 gcc -S 1_1.c

The listing contains many macros (beginning with dot). These are not interesting for us at the moment. For now, for the sake of simpliﬁcation, we can ignore them (except the.stringmacro which encodes a null-terminated character sequence just like a C-string). Then we’ll see this⁷:

7This GCC option can be used to eliminate “unnecessary” macros:-fno-asynchronous-unwind-tables

CHAPTER 3. HELLO, WORLD! CHAPTER 3. HELLO, WORLD!

Some of the major differences between Intel and AT&T syntax are:

• Source and destination operands are written in opposite order.

In Intel-syntax: <instruction> <destination operand> <source operand>.

In AT&T syntax: <instruction> <source operand> <destination operand>.

Here is a way to easy memorise the difference: when you deal with Intel-syntax, you can imagine that there is an equality sign (=) between operands and when you deal with AT&T-syntax imagine there is a right arrow (→)⁸.

• AT&T: Before register names, a percent sign must be written (%) and before numbers a dollar sign ($). Parentheses are used instead of brackets.

• AT&T: Sufﬁx is added to instructions to deﬁne the operand size:

– q — quad (64 bits) – l — long (32 bits) – w — word (16 bits) – b — byte (8 bits)

Let’s go back to the compiled result: it is identical to what we saw inIDA. With one subtle difference:0FFFFFFF0his presented as$-16. It is the same thing:16in the decimal system is0x10in hexadecimal. -0x10is equal to0xFFFFFFF0(for a 32-bit data type).

One more thing: the return value is to be set to 0 by using usualMOV, notXOR.

MOVjust loads value to a register. Its name is a misnomer (data is not moved but rather copied). In other architectures, this instruction is named “LOAD” or “STORE”

or something similar.

8By the way, in some C standard functions (e.g., memcpy(), strcpy()) the arguments are listed in the same way as in Intel-syntax: pointer to the destination memory block at the beginning and then pointer to the source memory block.

CHAPTER 3. HELLO, WORLD! CHAPTER 3. HELLO, WORLD!

3.2 x86-64

3.2.1 MSVC—x86-64

Let’s also try 64-bit MSVC:

Listing 3.7: MSVC 2012 x64

$SG2989 DB 'hello, world', 0AH, 00H main PROC

In x86-64, all registers were extended to 64-bit and now their names have an R-preﬁx. In order to use the stack less often (in other words, to access external mem-ory/cache less often), there exists a popular way to pass function arguments via registers (fastcall:64.3 on page 1002). I.e., a part of the function arguments are passed in registers, the rest—via the stack. In Win64, 4 function arguments are passed inRCX,RDX,R8,R9registers. That is what we see here: a pointer to the string forprintf()is now passed not in stack, but in theRCXregister.

The pointers are 64-bit now, so they are passed in the 64-bit registers (which have theR-preﬁx). However, for backward compatibility, it is still possible to access the 32-bit parts, using theE-preﬁx.

This is howRAX/EAX/AX/ALregister looks like in x86-64:

7th(byte number) 6th 5th 4th 3rd 2nd 1st 0th

RAX^x64

EAX AX

AH AL

Themain()function returns anint-typed value, which is, in the C/C++, for better backward compatibility and portability, still 32-bit, so that is why theEAXregister is cleared at the function end (i.e., 32-bit part of register) instead ofRAX.

There are also 40 bytes allocated in the local stack. This is called “shadow space”, about which we are going to talk later:8.2.1 on page 140.

CHAPTER 3. HELLO, WORLD! CHAPTER 3. HELLO, WORLD!

3.2.2 GCC—x86-64

Let’s also try GCC in 64-bit Linux:

Listing 3.8: GCC 4.4.6 x64 .string "hello, world\n"

main:

sub rsp, 8

mov edi, OFFSET FLAT:.LC0 ; "hello, world\n"

xor eax, eax ; number of vector registers passed call printf

xor eax, eax add rsp, 8 ret

A method to pass function arguments in registers is also used in Linux, *BSD and Mac OS X [Mit13]. The ﬁrst 6 arguments are passed in theRDI,RSI,RDX,RCX,R8, R9registers, and the rest—via the stack.

So the pointer to the string is passed inEDI(32-bit part of register). But why not use the 64-bit part,RDI?

It is important to keep in mind that allMOVinstructions in 64-bit mode that write something into the lower 32-bit register part, also clear the higher 32-bits[Int13].

I.e., the MOV EAX, 011223344hwrites a value into RAXcorrectly, since the higher bits will be cleared.

If we open the compiled object ﬁle (.o), we can also see all instruction’s opcodes⁹: Listing 3.9: GCC 4.4.6 x64

.text:00000000004004D0 main proc near

.text:00000000004004D0 48 83 EC 08 sub rsp, 8

.text:00000000004004D4 BF E8 05 40 00 mov edi, offset ⤦ Ç format ; "hello, world\n"

.text:00000000004004D9 31 C0 xor eax, eax .text:00000000004004DB E8 D8 FE FF FF call _printf .text:00000000004004E0 31 C0 xor eax, eax .text:00000000004004E2 48 83 C4 08 add rsp, 8 .text:00000000004004E6 C3 retn

.text:00000000004004E6 main endp

As we can see, the instruction that writes intoEDIat0x4004D4occupies 5 bytes.

The same instruction writing a 64-bit value intoRDIoccupies 7 bytes. Apparently, GCC is trying to save some space. Besides, it can be sure that the data segment containing the string will not be allocated at the addresses higher than 4GiB.

9This must be enabled in Options→Disassembly→Number of opcode bytes

CHAPTER 3. HELLO, WORLD! CHAPTER 3. HELLO, WORLD!

We also see that theEAXregister was cleared before theprintf()function call.

This is done because the number of used vector registers is passed inEAXby stan-dard: “with variable arguments passes information about the number of vector registers used” [Mit13].

Dans le document Reverse Engineering for Beginners (Page 48-52)

Hello, world!

3.1.3 GCC: AT&amp;T syntax

3.2 x86-64

3.2.1 MSVC—x86-64

3.2.2 GCC—x86-64

3.1.3 GCC: AT&T syntax