Memory Management in Big Computers - Memory Management and the TLB

Memory Management and the TLB

6.1 Memory Management in Big Computers

It’s probably easiest to start with the whole job of the memory management system in a UNIX-like system (selected for study because, despite its big-system capabilities, it’s much simpler than PC operating big-systems). The typi-cal view is illustrated as Figure 6.1.

6.1.1 Basic Process Layout and Protection

The biggest split in Figure 6.1 is between the low part, labeled “accessible to user programs” and the rest. The user-accessible part of the application map is what we called kuseg in the generic MIPS memory maps described in Section 2.8. All higher memory is reserved.to the OS. From the OS’s point of view, the low part of memory is a safe “sandbox” in which the user program can play all it wants. If the program goes wild and trashes all its own data, that’s no worry to anyone else.

From the application’s point of view, this area is free for use in building arbitrarily complicated private data structures and to get on with the job.

Inside the user area, within the program’s andbox, the OS provides more stack to the program on demand (implicitly, as the stack grows down). it will also provide a system call to make more data available starting from the

128 6.1. Memory Management in Big Computers

highest predeclared data addresses and growing up — systems people call this a heap. The heap feeds library functions such as malloc()which provide your program with chunks of extra memory.

Per-process data

I/O registers(h/w dependent)

Kernel data Kernel code

Stack (grows down)

Declared data

Program code Heap (grows up) High addresses

Shared by all tasks

Low addresses

Only accessible by OS routines

Accessible to user programs

Figure 6.1: Memory map for a protected process

Stack and heap are supplied in chunks small enough to be reasonably thrifty with system memory but big enough to avoid too many system calls or exceptions. However, on every system call or exception the OS has a chance to police the application’s memory consumption. An OS can enforce limits

that make sure the application doesn’t get so large a share of memory as to threaten other vital activities.

In UNIX-like systems the process keeps its identity inside the OS kernel;

most kernel facilities are provided effectively as special subroutines (system calls) invoked by the application under special rules to make sure they only do what the application is entitled to do.

The operating system’s own code and data are of course not accessible to user space programs. On some systems this is done by putting them in a completely separate address space; on MIPS the OS shares the same address space, and when the CPU is running at the user-program privilege level, access to these addresses is illegal and will trigger an exception.

Note that while each process’s user space maps to its own private real storage, the OS space is mostly shared. Much of the OS code and resources are seen at the same address by all processes — an OS kernel is a multi-threaded but single-address-space system inside — but each process’s user space addresses access its own separate space. Kernel routines running ap-plication system calls are trusted to cooperate safely, but the apap-plication need not be trusted at all.

The active parts of the user space are spread out, with stack at the top and code and compiled-in data at the bottom. This allows the stack to grow downward (implicitly, as the program runs and references data deeper) and the data to grow upward (explicitly, as the program calls library functions that allocate memory). The OS can allocate more memory for stack or data and can arrange to map it into the appropriate address.

Note that, in order to allow for programs that use vast quantities of data space, it’s usual to have the stack grow down from the highest permissible user addresses. The wide spread of addresses in use (with a huge hole in between) is one characteristic of this address map with which any translation scheme must cope.

Real-life systems make things more complicated in search of efficiency and more functions. Most systems map application code as read-only to the application, meaning that it can safely be shared by many processes — it’s common to have many processes running the same application.

Many systems share not just whole applications but chunks of applica-tions accessed through library calls (shared libraries). That opens a whole other can of worms that we will keep sealed up for now.

6.1.2 Mapping Process Addresses to Real Memory

What mechanisms are needed to support this model?

The MIPS architecture more or less dictates that the addresses used by programs (whether application or kernel routines) are fixed when the

pro-130 6.1. Memory Management in Big Computers

gram is compiled and linked.¹ That means that applications can’t possibly all be built to use explicit different addresses — and in any case we want to be able to run multiple copies of the same application. So during program execution application addresses are mapped to physical addresses according to a scheme fixed by the OS when the program is loaded.

Although it would be possible for the software to rush around patching all the address translation information whenever we switched contexts from one process to another, it would be very inefficient. Instead, we award each active process a number (in UNIX it’s called the process ID but these days is more wisely called the address space ID or ASID). Any address from a process is implicitly extended by that process’s ASID to produce a unique address to submit for translation. The ASID needs to be loaded into a CPU register whenever a new process is scheduled so that the hardware can use it.

The mapping facility also allows the OS to discriminate between different parts of the user address space: Some parts of the application space (typically code) can be mapped read-only and some parts can be left unmapped and accesses trapped, meaning that a program that runs amok is likely to be stopped earlier.

The kernel part of the process’s address space is generally shared by all processes and most of it maps permanently resident OS code and data. Since this code can be linked to run at this address, it doesn’t need a flexible map-ping scheme, and most MIPS kernels are happy to put most of their code and data in areas whose mapping is fixed by the architecture.

6.1.3 Paged Mapping Preferred

Many exotic schemes have been tried for mapping addresses, commonly base/bound pairs to police correct accesses. But mapping memory in what-ever size chunks the programs ask for, while apparently providing the best service for applications, rapidly leads to available memory being fragmented into using awkward-sized pieces. All practical systems map memory in pages

— fixed-size chunks of memory. Pages are always a power of 2 bytes big, with 4KB being overwhelmingly popular.

With 4KB pages, a CPU address can be simply partitioned thus:

nn 12 11 10 9 8 7 6 5 4 3 2 1 0

Virtual page number(VPN) Address within page

The address-within-page bits don’t need to be translated, so the memory management hardware only has to cope with translating the high-order ad-dresses, traditionally called virtual page number (VPN), into the high-order

1It is possibte to generate position-independent code (PIC) for MIPS CPUs but pure PIC is somewhat awkward on MIPS. (See Section10.11.2for an account of the compromises made to provide enough position independence for shared libraries in the MIPS/ABI standard.)

bits of a physical address (a physical frame number, or PFN — nobody can remember why it’s not PPN).

6.1.4 What We Really Want

The mapping mechanism must allow a program to assert a particular address within its own process/address space and translate that efficiently into a real physical address to access memory.

A good way to do this would be to have a table (the page table) containing an entry far each page in the whole address space, with that entry containing the correct physical address. This is clearly a fairly large data structure and is going to have to be stored in main memory. But there are two big problems.

The first is that we now need two references to memory to do any load or store, and that’s obviously hopeless for performance. You may foresee the answer to this: We can use a high-speed cache memory to store translation entries and go to the memory-resident table only when we miss in the cache.

Since each cache entry covers 4KB of memory space, it’s plausible that we can get a satisfactorily low miss rate out of a reasonably small cache. (At the time this scheme was invented, memory caches were rare and were sometimes also called “lookaside butters”; so the memory translation cache became a translation lookaside buffer or TLB; the acronym survives.)

The second problem is the size of the page table; for a 32-bit application address space split into 4KB pages, there are a million entries, which will take at least 4MB of memory. We really need to find some way to make the table smaller, or there’ll be no memory left to run the programs.

We’ll defer any discussion of the solution for this, beyond observing that real running programs have huge holes in their program address space, and if we can invent some scheme that avoids using physical memory for the corresponding holes in the table, things are likely to get better.

We’ve now arrived, in essence, at the memory translation system DEC figured out for its VAX minicomputer, which has been extremely influential in most subsequent architectures. It’s summarized in Figure 6.2.

The sequence in which the hardware works is something like this:

• A virtual address is split into two, with the least-significant bits (usually 12 bits) passing through untranslated — so translation is always done in pages (usually 4KB).

• The more-significant bits, or VPN, are concatenated with the currently running process’s ASID to form a unique page address.

• We look in the TLB (translation cache) to see if we have a translation entry for the page. If we do, it gives us the high-order physical address bits and we’ve got the address to use.

132 6.1. Memory Management in Big Computers

Physical address.

Process no. Program (virtual) address.

Refilled when necessary

Page table (in memory) TLB

Address within page PFN

Address within page

ASID VPN

PFN Flags

ASID VPN/Mask PFN Flags

Figure 6.2: Desirable memory translation system

The TLB is a special-purpose store and can match addresses in various useful ways. It may have a global flag bit that tells it to ignore the value of ASID for some entries, so that these TLB entries map some range of virtual addresses for every process.

Similarly, the VPN may be stored with some Mask bits that cause some parts of the VPN to be excluded from the match, allowing the TLB entry to map a larger range of virtual addresses.

Both of these special cases are available in some MIPS MMUs.

• There are usually extra bits (flags) stored with the PFN that are used to control what kind of access is allowed — most obviously, to permit reads but not writes. We’ll discuss the MIPS architecture’s hags in Section6.2.

• if there’s no matching entry in the TLB, the system must locate or build an appropriate entry (using main-memory-resident page table informa-tion) and load it into the TLB and then run the translation process again.

In the VAX minicomputer, this process was controlled by microcode and seemed to the programmer to be completely automatic.

6.1.5 Origins of the MIPS Design

The MIPS designers wanted to figure out a way to offer the same facilities as the VAX with as little hardware as possible. The microcoded TLB refill was not acceptable, so they took the brave step of consigning this part of the job to software.

That means that apart from a register to hold the current ASID, the MMU hardware is just a TLB, which is simply a high-speed, fixed-size table of trans-lations. System software can (and usually does) use the TLB as a cache to front a memory-resident page table, but there’s nothing in the TLB hardware to make it a cache, except this: When presented with an address it can’t translate, the TLB triggers a special exception (TLB refill) to invoke the soft-ware routine. However, considerable care is taken with the details of the TLB design and associated control registers to help the software to be efficient.

Dans le document 0.1 Style and Limits (Page 149-155)