How the Memory-Management System Handles Executable Code

The speed at which processes execute is related partly to how the operating system accesses virtual address space segments of the compiled and linked object code (a.out or other executables).

If you are doing applications or system programming, you may be aware that a program's magic number and internal attributes determine which type of executable code-standard, shared, and demand loaded-are possible. (A later subsection, "Benefits and Shortcomings of Shared and Demand-Loaded Code,"

discusses this in more practical terms. Also see magic( 4).)

The following HP- UX Reference manual pages describe magic numbers and internal attributes in detail:

• The chatr(l) command is used to change a program's internal attributes to shared or demand-loaded.

• The link editor ld(l) produces executable files from one or more object files or libraries.

• magic( 4) describes predefined file types and magic numbers for HP- UX implementations.

• a.out( 4) and its Series 300/400- and 700/800-specific manual pages describe the output file format from the assembler (as(l)), compilers, and link editor.

7 Series 300/400/700 and Series 800 computers have different capabilities. On the Series 300/400/700 computers, executable code can be either standard, shared or demand-loaded. Series 800 computers do not support standard executable code, but code can be shared or demand-loaded.

Table 7-1 and subsequent sections describe the types of executable code by several criteria:

• Addressing in separate segments of code and data.

• Capability of virtual address code segments being shared among multiple processes.

• Alignment of pages to corresponding address boundaries on both disk and in main memory, for direct memory-to-memory copy. (Object code should align on 4K page boundaries for all HP -UX systems.)

7 -28 Memory Management

Although page size was 2 KB for some previous Series 800 releases, execu tables generated during these releases will be handled properly by the current release, but with some performance penalty. You might want to recompile for better performance.

Table 7-1. Characteristic Types of Executable Code

Type of Computer Architectures

Executable Code

Series 300/400 Series 700/800 Both

EXEC_MAGIC - single segment (code - EXEC_MAGIC is and data) for read supported on Series and write 700, unsupported on - no restrictions on Series 800. 2 - 4 alignmentl KB-page aligned on

- no sharing disk

- code and data faulted in as needed

SHARE_MAGIC - not necessarily page- - 4KB-page aligned on - separate segments for

aligned on disk 1 disk3 code and data

DEMAND_MAGIC - 4KB-page aligned on - separate segments for

disk code and data

1 Executable code that is not page aligned on disk might suffer performance degradation, since it may be faulted in by a less optimal path.

2 Series 700 EXEC_MAGIC executables will not run on Series 800.

3 Programs built on the Series 800 can be executed on the Series 700; however, footnote 1 applies on a 4 KB-page size Series 700 for such executables. Note too, executables generated on previous Series 800 operating systems featuring 2K page size will run, but with some performance penalty.

You might want to recompile for better performance.

Memory Management 7-29

Standard Executable Code (EXEC_MAGIC)

Note EXEC_MAGIC is not supported on the Series 800.

In earlier implementations of UNIX, compiled object files consisted of code (machine instructions) and data that occupied the same area of memory, with read, write, and execute permissions.

Under those circumstances, each time a common program like vi, for example, was run, a distinct copy of its code was read into main memory. The code was not shared, even when several vi processes were run simultaneously.

Because the vi code occupied area segments with write permissions, it was vulnerable to being overwritten. File header, code, data, and debugger code were not aligned in main memory with their corresponding page boundaries on disk, and so all code had to be read through a buffer cache before being copied to the user's address space. This slowed the translation.

The EXEC_MAGIC user-executable (a. out) format on the Series 700 allows creation of processes with more than 1 GB of data. This expanded data area is implemented with a new option to the linker; refer to ld (1) in the HP- UX Reference Manual for information.

Several factors contribute to the efficiency of EXEC_MAGIC on Series 700:

7 Pages are aligned at 4 KB, improving address translation. Text for processes greater than 1 GB is relatively small, compared to data; thus, private code does not pose a problem. Also, when processes are greater than 1 GB, normally only one of them is run at a time.

Shared Code (SHARE_MAGIC)

Sharing of code segments reduces the amount of code kept in main memory.

With shared code (also known as shared code), the code and data segments are separated, so that code can be read-only and data can have write permission.

This protects code from being overwritten.

When several processes run the same program simultaneously, the processes use the same copy of code in main memory via pointers to the code's virtual address space. Only one copy of the code exists in memory, regardless of 7 -30 Memory Management

how many processes run the program. The system keeps track of multiple processes sharing code by maintaining a use count. These mechanisms decrease dramatically the amount of memory required for each user's process space.

For example, when a shared program such as vi is first loaded into the user code area, the use count for the program is set to one. While the first process is executing vi, if another process invokes vi also, no additional memory is allocated because the code already resides in main memory. The new process merely executes the copy of vi's code residing in main memory; the shared-code segment use count is incremented from one to two. When one of these processes finishes executing the code, the use count is decremented.

As long as the use count is greater than zero, the shared code remains in

memory. When the last process finishes editing and terminates the vi program, however, the system decrements the use count to zero and releases vi's shared code data structure and its associated physical memory.

Shared code was also designed to facilitate page alignment between main memory and disk, although pages are not guaranteed to be aligned.

Demand-Loaded Code (DEMAND_MAGIC)

Note For the Series 700/800, DEMAND_MAGIC code behaves identically to SHARE_MAGIC, despite a difference in the magic number.

Demand-loaded code encompasses some of the advantages of shared code and provides additional optimizations. (See comparison of shared and demand-loaded code in the next section.)

Like shared code, demand-loaded code addresses code and data separately.

Demand-loaded code is also shared; only one copy of code need be in main memory for use by multiple processes.

When a user runs a demand-Ioadable program, pages of the program are read into memory only as required. The system also anticipates (from prior page usage) what subsequent pages might be required, and brings in additional pages.

Memory Management 7-31

Demand loading eliminates the need to allocate main memory to rarely accessed routines and code, such as error handling routines, which in some instances might account for a large percentage of a program's code.

Disk Buffer Cache Memory Disk Memory

Header

Code

r--+

Data

Other

Non-Page-Aligned Code Page-Aligned Code

Figure 7-11.

Alignment of Page Boundaries Simplifies Transfer of Data and Code to Memory Another demand-loaded code enhancement is guaranteed page alignment. This is illustrated by Figure 7-11. Guaranteed page alignment is based on a loading algorithm simpler for the system to implement. One-to-one mapping between paging device and main-memory pages allows for direct disk-to-memory

7 transfer without an intermediary file-system buffer cache. Individual pages can also be copied faster.

When a executing process faults on a page, the process looks in the page cache (a data structure that describes every page of physical memory) for the page.

If used in a recent process, the sought-after page is likely to be present. If the page is present, the kernel maps in its address for use by the process. If the page is not present, the kernel calculates the page's location from the inode (which was read into memory when the program began to execute) and maps in the page from disk.

Most executable code is now page aligned on 4KB page boundaries. Exceptions are older a. out files of Series 300/400 systems and executables that were compiled on Series 800 systems that were 2 KB-page aligned. When faulting in pages for executables whose code is not page aligned, pages must be copied in 7 -32 Memory Management

through the file-system buffer cache, a slower method than mapping through the page cache.

Benefits and Shortcomings of Shared and Demand-Loaded Code If you work in an environment where applications are written in-house, you have some choice about how to link the object code for optimal performance.

You might consider the following questions:

• How is the program linked by default?

• How do you expect the process's pages to be used?

o At random?

o Serially?

• Are your programs running on a system with ample or limited memory?

• Are there many or few users?

If an application program is running more slowly than expected, using chatr( 1) or relinking to demand-loaded might improve its execution time.

Comparison of Shared and Demand-Loaded Code

The following points might help you determine which kind of executable to use:

• Most programs shipped as shared code by default.

You can tell how the code is shipped by running the file command on the executable, for example:

%

file /bin/cat

/bin/cat: s800 shared executable

• Demand-loaded code gives flexibility; you can relink user code with ld(l) or mark programs demand-loaded with chatr(l).

• Both shared code and demand-loaded code reduce the amount of memory required for user code space when multiple process execute the same program.

• For both shared code and demand-loaded code, less memory required to run programs if only a subset of their pages are used, because pages are loaded only as needed.

Memory Management 7-33

Shared Code in an HP-UX Cluster (Series 300/400/700 only)

In HP- UX clusters, the benefits of shared code are reduced because each cluster node uses its own RAM. In other words, even though a piece of code may be marked as shared, if users on different cluster nodes invoke the program, each cluster node will still have its own copy of the code in its own memory.

However, multiple users of a single cluster node would still share one copy of the code.

Dans le document the System Administrator How HP-UX Works: Concepts for (Page 165-171)