• Aucun résultat trouvé

Object Code and Linkers

Dans le document Assembly Language Step-by-Step (Page 161-164)

Assemblers read your source code files and generate an object code file containing the machine instructions that the CPU understands, plus any data you’ve defined in the source code.

There’s no reason at all why an assembler could not read a source code file and write out a finished, executable program as its object code file, but this is almost never done. The assembler I’m teaching in this book, NASM, can do that for DOS programs, and can write out .COM executable files for the real mode flat model. More modern operating systems such as Linux and Windows are too complex for that, and in truth, there’s no real payoff in such one-step assembly except when you’re first learning to write assembly language.

So the object code files produced by modern assemblers are a sort of inter-mediate step between source code and executable program. This interinter-mediate step is a type of binary file called anobject module,or simply an object code file.

Object code files cannot themselves be run as programs. An additional step, calledlinking, is necessary to turn object code files into executable program files.

The reason for object code files as intermediate steps is that a single large source code file may be divided up into numerous smaller source code files to keep the files manageable in size and complexity. The assembler assembles the various fragments separately, and the several resulting object code files are then woven together into a single executable program file. This process is shown in Figure 5-8.

126 Chapter 5 The Right to Assemble

.ASM

.ASM

.ASM

.ASM

.O

.O

.O

.O

Executable MyProg

Assembler Linker

Source code text

files

Object code binary files

Single executable program file Figure 5-8: The assembler and linker

When you’re first learning assembly programming, it’s unlikely that you’ll be writing programs spread out across several source code files. This may make the linker seem extraneous, as there’s only one piece to your program and nothing to link together. Not so: the linker does more than just stitch lumps of object code together into a single piece. It ensures that function calls out of one object module arrive at the target object module, and that all the many memory references actually reference what they’re supposed to reference. The assembler’s job is obvious; the linker’s job is subtle. Both are necessary to produce a finished, working executable file.

Besides, you’ll very quickly get to the point where you begin extracting frequently used portions of your programs into your own personal code libraries. There are two reasons for doing this:

1. You can move tested, proven routines into separate libraries and link them into any program you write that might need them. This way, you

Chapter 5 The Right to Assemble 127 can reuse code and not build the same old wheels every time you begin a new programming project in assembly language.

2. Once portions of a program are tested and found to be correct, there’s no need to waste time assembling them over and over again along with newer, untested portions of a program. Once a major program grows to tens of thousands of lines of code (and you’ll get there sooner than you might think!) you can save a significant amount of time by assembling only the portion of a program that you are currently working on, linking the finished portions into the final program without re-assemblingevery single part of the whole thing every time you assembleanypart of it.

The linker’s job is complex and not easily described. Each object module may contain the following:

Program code, including named procedures

References to named procedures lying outside the module

Named data objects such as numbers and strings with predefined values Named data objects that are just empty space ‘‘set aside’’ for the program’s use later

References to data objects lying outside the module Debugging information

Other, less common odds and ends that help the linker create the executable file

To process several object modules into a single executable module, the linker must first build an index called asymbol table, with an entry for every named item in every object module it links, with information about what name (called asymbol) refers to what location within the module. Once the symbol table is complete, the linker builds an image of how the executable program will be arranged in memory when the operating system loads it. This image is then written to disk as the executable file.

The most important thing about the image that the linker builds relates to addresses. Object modules are allowed to refer to symbols in other object modules. During assembly, theseexternal referencesare left as holes to be filled later—naturally enough, because the module in which these external symbols exist may not have been assembled or even written yet. As the linker builds an image of the eventual executable program file, it learns where all of the symbols are located within the image, and thus can drop real addresses into all of the external reference holes.

Debugging information is, in a sense, a step backward: portions of the source code, which was all stripped out early in the assembly process, are put back in the object module by the assembler. These portions of the source code are

128 Chapter 5 The Right to Assemble

mostly the names of data items and procedures, and they’re embedded in the object file to make it easier for the programmer (you!) to see the names of data items when you debug the program. (I’ll go into this more deeply later.) Debugging information is optional; that is, the linker does not need it to build a proper executable file. You choose to embed debugging information in the object file while you’re still working on the program. Once the program is finished and debugged to the best of your ability, you run the assembler and linker one more time, without requesting debugging information. This is important, because debugging information can make your executable files hugelylarger than they otherwise would be, and generally a little slower.

Relocatability

Very early computer systems like 8080 systems running CP/M-80 had a very simple memory architecture. Programs were written to be loaded and run at a specific physical memory address. For CP/M, this was 100H. The programmer could assume that any program would start at 100H and go up from there. Memory addresses of data items and procedures were actual physical addresses, and every time the program ran, its data items were loaded and referenced atpreciselythe same place in memory.

This all changed with the arrival of the 8086 and 8086-specific operating systems such as CP/M-86 and PC DOS. Improvements in the Intel architecture introduced with the 8086 made it unnecessary for the program to be assembled for running at a specific physical address. All references within an executable program were specified as relative to the beginning of the program. A variable, for example, would no longer be reliably located at the physical address 02C7H in memory. Instead, it was reliably located at an offset from the beginning of the file. This offset was always the same; and because all references were relative to the beginning of the executable program file, it didn’t matter where the program was placed in physical memory when it ran.

This feature is calledrelocatability, and it is a necessary part of any modern computer system, especially when multiple programs may be running at once.

Handling relocatability is perhaps the largest single part of the linker’s job.

Fortunately, it does this by itself and requires no input from you. Once the job is done and the smoke clears, the file translation job is complete, and you have your executable program file.

Dans le document Assembly Language Step-by-Step (Page 161-164)