Optimization options

The command-line compiler controls most optimizations through the-O command-line option. The -0 option can be followed by one or more of the suboption letters given in the list below. For example, -Oaxt turns on all speed optimizations and the Assume No Pointer Aliasing optimization.

You can turn off optimizations on the command line by placing a minus before the optimization letter. For example, -02-z turns on all speed optimizations except the global transformation optimizations. In addition, some optimizations are controlled by means other than -0. For example, the -r option enables the use of register variables.

Appendix A, The optimizer 129

The optimizations options follow the same rules for precedence as all other Borland C++ options. For example, -Od appearing on the command line after a -02 disables all optimizations.

The settings shown for each optimization in table A.1 are located in the Set-tings notebook. To access the SetSet-tings notebook, choose the View SetSet-tings option from the Project menu. For information on the Settings notebook, see Chapter 4, "Settings notebook."

Table A.1: Optimization options summary Command·line

-02

-01

-Oa -Ob -Oc

-Od

-Oe -Oi -Os -Ot -Ox

-Oz

IDE Setting and Optimization Function CompilerlOptimizationslFastest Code

Generates the fastest code possible. This is the same as using the following command-line options: -0 -Ob -Oe -Oz -Oi -Ot -Oc.

CompilerlOptimizationslSmallest 'Code

Generates the smallest code possible. This is the same as using the following command-line options: -0 -Ob -Os -Oc -Oe.

CompilerlOptimizationslAssume No Pointer Aliasing

Assume that pointer expressions are not aliased in common subexpression evaluation.

CompilerlOptimizationsl Dead Storage Elimination Eliminates dead variables.

CompilerlOptimizationslLocal Common Expressions

Enables local optimizations that are performed on blocks of code that have single entry and single exit. The optimizations performed are common subexpression elimination, code reordering, branch optimizations, copy propagation, constant folding and code compaction.

CompilerlOptimizationslMinimal Opts

Disables all optimizations, except jump distance optimization, which the compiler performs automatically.

CompilerlOptimizationslGlobal Register Allocation Enables global register allocation and data flow analysis.

CompilerlOptimizationsllntrinsic Expansion

Enables inlining of intrinsic functions such as memcpy, sfr/en, and so on.

CompilerlOptimizationslOptimize ForlSize Attempts to minimize code size.

CompilerlOptimizationslOptimize ForlSpeed Attempts to maximize application execution speed.

None

Enables most speed optimizations. This is provided for compatibility with Microsoft compilers.

CompilerlOptimizationslGlobal Optimizations

Enables all optimizations that perform transformations within an entire function. They are: global common subexpression elimination, loop invariant code motion, induction variable elimination, linear function test replacement, loop compaction and copy propagation.

Table A.1: Optimization options summary (continued)

-r None

-r-A closer look at the Borland C++

optimizer

Global register allocation

Global optimizations

Common subexpression elimination

This option enables the use of register variables. It is on by default.

None

This option suppresses the use of register variables. When you are using this option, the compiler won't use register variables, and it won't preserve and respect register variables (ESI, EDI, and EBX) from any caller. For that reason, you should not have code that uses register variables call code which has been compiled with -r-.

On the other hand, if you are interfacing with existing assembly-language code that does not preserve ESI, EDI, and EBX, the -r- option allows you to call that code from Borland C++.

The Borland C++ optimizer performs a number of optimizations, including sophisticated register coloring, invariant code motion, induction variable elimination, and many others. Each of these optimizations has been fine-tuned to the complex instruction set of the Intel 80x86. In addition, the compiler performs architecture-specific optimizations for the target processor. The following sections describe these optimizations.

Because memory references are so expensive on the 80x86 processors, it is extremely important to minimize those references through the intelligent use of registers. Global register allocation both increases the speed and decreases the size of your application. You should always use global register allocation when compiling your application with optimizations on.

The Borland C++ compiler is designed to provide the most efficient code possible with the minimum increase in compilation speed. Thus, a number of optimizations are grouped together and performed in a single step.

These optimizations are global common subexpression elimination, invar-iant code motion, induction variable elimination, copy propagation, loop compaction and linear function test replacement. Because all these

optimizations are performed in a single step, you can't set any of them on or off individually. You can set them all on with the -Oz option, or set them all off with the -Oz-option.

Common sub expression elimination is the process of finding duplicate expressions within the target scope and eliminating the duplicate expres-sion by using the value of the previous expresexpres-sion it had computed. This avoids having to recalculate the expression. When you use this

optimization in conjunction with global register allocation, the gains are both in size reduction and speed increase; otherwise, the gain is mainly a speed increase. Common sub expression elimination lets you program in a

Appendix A, The optimizer 131

Loop invariant code motion

Copy propagation

more readable style, without the need to create unnecessary temporary locations for expressions that are used more than once. For example, the following code uses a temporary variable to avoid using expensive pointer referencing:

temp = t->n.o.lefti

if(temp->op == a_ICON I I temp->op == O_FCON)

With common sub expression elimination, you can use direct referencing, which is more readable and easier to understand, and let the optimizer decide whether it is more efficient to create the temporary variable.

if(t->n.o.left->op == a_ICON I I t->n.o.left->op == O_FCON)

Moving invariant code out of loops is a speed optimization. The optimizer uses the information about all the expressions in the function gathered during data flow analysis to find expressions whose values do not change inside a loop. To prevent the calculation from being performed many times inside the loop, the optimizer moves the code outside the loop so that it is calculated only once. The optimizer then reuses the calculated value inside the loop. For example, in the code below, x * y * z is evaluated in every iteration of the loop.

intv[10li void f (void)

int i, x, y, Zi

for (i = Oi i < 10i itt) v[il = v[il * x * y * Zi

The optimizer rewrites the code for the loop so that it looks like this:

intv[10li void f (void)

int i, x, y, z, t1i tl = x * y * Zi

for (i = Oi i < 10i itt) v[il = v[il * tli

Copy propagation is primarily a speed optimization. Like loop invariant code motion, copy propagation relies on the data flow analysis. The optimizer remembers the values assigned to expressions and uses those values instead of loading the value of the assigned expressions. Copies of

Induction variable analysis and strength reduction

Linear function test replacement

constants, expressions, and variables may be propagated. For example, in the following code the constant value 5 can be used for the second assignment instead of the expression on the right side, so that:

PtrParIn->IntCornp = 5j

( *( PtrParIn->PtrCornp ) ) . IntCornp = PtrParIn->IntCornpj is optimized to look like:

( *( PtrParIn->PtrCornp ) ) . IntCornp = PtrParIn->IntCornp = 5j

Induction variable analysis and strength reduction are speed optimizations performed on loops. The optimizer uses a mathematical technique called induction to create new variables out of expressions used inside a loop.

These variables are called induction variables. The optimizer assures that the operations performed on these new variables are computationally less expensive (reduced in strength) than those used by the original variables.

Opportunities for these optimizations are common if you use array index-ing or structure references inside loops, where these references vary with the loop iterations. For example, the optimizer creates an induction variable out of the operation v[i] in the code below, because the vIi] operation varies with the iterative nature of the loop.

int v[10] j

void f (void) int i, x, y, Zj

for (i = OJ i < 10j itt) v[i] = x * y * ^Zj

The optimizer changes this code to the following:

int v[10] j

void f (void)

int i, x, y, ^Z, *pj P = ^Vj

for (i = OJ i < 10j itt)

*p = x * y * Zj

pHj

Linear function test replacement is an optimization that occurs when induction variable elimination has taken place. Induction variable elimination generates expressions that vary linearly with the loop iterations. The compiler can replace the test condition of the loop with an induction variable expression and scale the test operands accordingly. This

Appendix A, The optimizer 133

Loop compaction

optimization is done when the loop iterator varies linearly and is not used directly within the loop and if its value is not required outside the loop. For example, the loop iterator i is used only to count the for loop, and is not used outside the for loop.

int v[10];

void f(void)

int i, x, y, Z, *p;

P = v;

for (i = 0; i < 10; itt)

*p = x * y * Z;

pH;

After being optimized, the code looks like this:

int v[10];

void f(void)

int i, x, y, z, *p;

for (p = v; p < &v[10]; ptt)

*p = x * y * Z;

This eliminates the need for the loop iterator i.

Loop compaction takes advantage of the string move instructions on the 80x86 processorsby replacing the code for a loop with such an instruction.

int v[100];

void t(void) int i;

for (i = 0; i < 100; itt) v[i] = 0;

The optimizer reduces this to the machine instructions:

mov ecx,100

mov edi,offset _v[O]

xor eax, eax rep stosd

Depending on the complexity of the operands, the compacted loop code might also be smaller than the corresponding non-compacted loop. You might want to experiment with this optimization if you are compiling for size and have loops of this nature.

Dead storage elimination

Pointer aliasing

The optimizer can identify variables that are no longer needed or that are unnecessary. In the following example, the optimizer performs induction variable elimination and linear function test replacement to reveal a dead loop iterator j. Using -Ob removes the code to store any result into variable j.

int goo(void) , a[10] i int f (void) {

int i, j i i = goo () i

for(j = 0; j < 10; j++) a [j] = goo () i return ii

After the dead storage elimination optimization is performed on this code, it looks like this:

int goo (void) , a[10] i int f (void) {

int ii

i = goo(); II The 'j 'has been removed.

for(int *p = &a[O]i P < &a[10]i P++)

*p = goo () ; return i;

Pointer aliasing is not an optimization in itself, but it does affect optimizer performance. Since C and C++ allow pointers to point to any type, the compiler normally gathers pointer information to generate clean, correct code. When a pointer has global scope, the compiler is not able to

determine what it points to,\and takes the conservative view that it could point to every variable that is in global scope. This might be too

conservative for your program. Pointer aliasing provides a mechanism by which you can inform the compiler that such cases do not exist and that two pointers do not point to the same location, thus allowing the compiler to be more aggressive and generate better code. Pointer aliasing might create bugs, which are hard to spot, so it is only applied when you use -Oa.

-Oa controls how the optimizer treats expressions with pointers in them.

When compiling with global or local common sub expressions and -Oa enabled, the optimizer recognizes

*p * x

Appendix A, The optimizer 135

Code size versus speed optimizations

as a common sub expression in functionfoo in the following code:

int g, y;

int foo(int *p) int x=5;

y = *p * x;

g = 3;

return (*p * x);

}

void goo (void) g=2;

foo(&g) ; 1* This is incorrect, because the assignment g = 3 invalidates the expression *p * x. *1

-Oa also controls how the optimizer treats expressions involving variables whose address has been taken. When compiling with -Oa, the compiler assumes that assignments via pointers affect only those expressions involving variables whose addresses have been taken and which are of the same type as the left-hand side of the assignment in question. To illustrate, consider the following function:

int y, Z;

int f (void) int X;

char *p = (char *)&x;

y = X * Z;

*p = 'a';

return (x * z);

When compiled with -Oa, the assignment *p = la' does not prevent the optimizer from treating x * z as a common sub expression, because the destination of the assignment, *p, is a char, whereas the addressed variable is an int. When compiled without -Oa, the assignment to *p prevents the optimizer from creating a common sub expression out of x * z.

You can control the selection and compaction of instructions with the -G and -G-options. -G tells the compiler to compile your source code for the fastest execution time. This is equivalent to pressing the Fastest Code button in the Compiler I Optimizations subsection of the S~ttings notebook.

Intrinsic function inlining

There are times when you might want to use one of the common string or memory functions, such as strcpy or memcmp, but you don't want to incur the overhead of a function call. If you use -Oi, the compiler generates the code for these functions within your function's scope, eliminating the need for a function call. The resulting code executes faster than a call to the same function, but it is also larger.

The following is a list of those functions that are inlined when -Oi is enabled.

alloca rnernset strcrnp

fabs jotl strcpy

lrotl rotr strlen

-lrotr rrotl strncat

-rnernchr _-rrotr strncrnp rnerncrnp stpcpy strncpy rnerncpy strcat strnset

You can control the inlining of each of these functions with the #pragma intrinsic. For example,

#pragrna intrinsic strcpy

causes the compiler to generate code for strcpy in your function.

#pragrna intrinsic -strcpy

prevents the compiler from inlining strcpy. By using these pragmas in a file, you can override the command-line switches or IDE options used to compile that file.

When inlining any intrinsic function, you must include a prototype for that function before you use it. This is because, when inlining, the compiler actually creates a macro that renames the inlined function to a function that the compiler internally recognizes. In the above example, the compiler creates this macro:

#define strcpy __ strcpy __

The compiler recognizes calls to functions with two leading and two trailing underscores and tries to match the prototype of that function against its own internally stored prototype. If you did not supply a prototype, or the prototype you supplied does not match the compiler's internal prototype, the compiler rejects the attempt to inline that function and generates an error. Prototypes are provided in the standard header files (that is, string.h, stdlib.h, and so on).

Appendix A, The optimizer 137

Register parameter passing

Parameter rules

Table A.2 Parameter types and possible registers used

Floating-point registers

The command-line compiler included in the Borland C++ product

introduces a new calling convention, named __ fastcall. Functions declared using this modifier expect parameters to be passed in registers.

The compiler treats this calling convention as a new language specifier, along the lines of __ cdecl and __ pascal. Functions declared with either of these two language modifiers cannot have the __ fastcall modifier because both __ cdecl and __ pascal functions also use the stack to pass parameters.

Likewise, the __ fastcall modifier cannot be used together with __ export.

The compiler generates a warning if you try to mix functions of these types or if you use the __ fastcall modifier in a situation that might cause an error.

The compiler uses the rules given in Table A.2 when deciding which parameters the program is to pass in registers. A maximum of three parameters can be passed in registers to anyone function. You should not assume that the assignment of registers reflects the ordering of the parameters to a function.

Parameter type

char (signed and unsigned) short (signed and unsigned) int and long (signed and unsigned) pointer

Registers

AL,DL,BL AX,DX,BX EAX,EDX,EBX EAX,EDX,EBX

Union, structure, and floating-point (float, double, and long double) parameters are pushed on the stack.

When your application calls a function using the __ fastcall calling con-vention, the called function automatically saves the RO, Rl, and R2

floating-point registers (or the equivalent if you're using the floating-point emulator) when called. It also restores them when the function returns.

This lets the compiler allocate variables to these registers for the life of the function.

A function uses the __ fastcall calling convention when it is declared with the __ fastcall keyword or compiled with the -pr option or Compiler I Code Generation Options I Register setting turned on.

Function naming Functions declared with the __ fastcall modifier have different names than their non-__ fastcall counterparts. The compiler prefixes the __ fastcall function name with an @. This prefix applies to both unmangled C function names and to mangled C++ function names.

Appendix A, The optimizer 139

Table B.1 Editing commands

A word is defined as a sequence of

char-acters, with the sequence delimited by one of the following:

space < > , ; .()[]II'*+_/

$#=I-?!

"% &': @ \, and all control and graphic characters.

A p p E N

o x B

Dans le document for OS/2® (Page 138-150)