Unit OS2:
Unit OS2:
Operating System Principles Operating System Principles
2.2. 2.2. Windows Core System Mechanisms Windows Core System Mechanisms
Copyright Notice Copyright Notice
© 2000-2005 David A. Solomon and Mark Russinovich
© 2000-2005 David A. Solomon and Mark Russinovich
These materials are part of the
These materials are part of the Windows Operating Windows Operating System Internals Curriculum Development Kit,
System Internals Curriculum Development Kit, developed by David A. Solomon and Mark E.
developed by David A. Solomon and Mark E.
Russinovich with Andreas Polze Russinovich with Andreas Polze
Microsoft has licensed these materials from David Microsoft has licensed these materials from David Solomon Expert Seminars, Inc. for distribution to Solomon Expert Seminars, Inc. for distribution to academic organizations solely for use in academic academic organizations solely for use in academic environments (and not for commercial use)
environments (and not for commercial use)
Roadmap for Section 2.2.
Roadmap for Section 2.2.
Object Manager & Handles Object Manager & Handles
Local Procedure Calls Local Procedure Calls
Exception Handling Exception Handling
Memory Pools
Memory Pools
Objects and Handles Objects and Handles
Many Windows APIs take arguments that are
Many Windows APIs take arguments that are handles handles to system-defined data to system-defined data structures, or “objects”
structures, or “objects”
App calls CreateXxx, which creates an object and returns a handle to it App calls CreateXxx, which creates an object and returns a handle to it App then uses the handle value in API calls that operate on that object App then uses the handle value in API calls that operate on that object
Three types of Windows objects (and therefore handles):
Three types of Windows objects (and therefore handles):
Windows “kernel objects” (events, mutexes, files, processes, threads, etc.) Windows “kernel objects” (events, mutexes, files, processes, threads, etc.)
Objects are managed by the Windows “Object Manager”, and represent data Objects are managed by the Windows “Object Manager”, and represent data structures in system address space
structures in system address space
Handle values are private to each process Handle values are private to each process
Windows “GDI objects” (pens, brushes, fonts, etc.) Windows “GDI objects” (pens, brushes, fonts, etc.)
Objects are managed by the Windows subsystem Objects are managed by the Windows subsystem Handle values are valid system-wide / session-wide Handle values are valid system-wide / session-wide
Windows “User objects” (windows, menus, etc.) Windows “User objects” (windows, menus, etc.)
Objects are managed by the Windows subsystem Objects are managed by the Windows subsystem Handle values are valid system-wide / session-wide Handle values are valid system-wide / session-wide
Handles and Security Handles and Security
Process handle table Process handle table
Is unique for each process Is unique for each process
But is in system address space, hence cannot be modified from user mode But is in system address space, hence cannot be modified from user mode Hence, is trusted
Hence, is trusted
Security checks are made when handle table entry is created Security checks are made when handle table entry is created
i.e. at CreateXxx time i.e. at CreateXxx time
Handle table entry indicates the “validated” access rights to the object Handle table entry indicates the “validated” access rights to the object
Read, Write, Delete, Terminate, etc.
Read, Write, Delete, Terminate, etc.
APIs that take an “already-opened” handle look in the handle table APIs that take an “already-opened” handle look in the handle table entry before performing the function
entry before performing the function
For example: TerminateProcess checks to see if the handle was opened for For example: TerminateProcess checks to see if the handle was opened for Terminate access
Terminate access
No need to check file ACL, process or thread access token, etc., on every write No need to check file ACL, process or thread access token, etc., on every write request---checking is done at file handle creation, i.e. “file open”, time
request---checking is done at file handle creation, i.e. “file open”, time
HandleCount = 1 ReferenceCount = 1
Event Object
Handles, Pointers, and Objects Handles, Pointers, and Objects
Handle Table Process A
Handle Table Process B
System Space handles
index
Handle to a kernel object is an index Handle to a kernel object is an index
into the
into the processprocess handle table, and handle table, and hence is invalid in any other process hence is invalid in any other process
Handle table entry contains the system- Handle table entry contains the system- space address (8xxxxxxx or above) of space address (8xxxxxxx or above) of the data structure; this address is the the data structure; this address is the
same regardless of process context same regardless of process context
Although handle table is per-process, it Although handle table is per-process, it
is actually in system address space is actually in system address space
(hence protected) (hence protected)
HandleCount = 1 ReferenceCount = 0
Event Object
Handles, Pointers, and Reference Handles, Pointers, and Reference
Count Count
Handle Table Process A
Handle Table Process B
System Space handles
index HandleCount = 2
ReferenceCount = 0
DuplicateHandle
HandleCount = 3 ReferenceCount = 0
HandleCount = 3 ReferenceCount = 4
Thread (in a wait state
for the event) Thread (in a wait state
for the event)
Note: there is actually another data structure, a “wait block”, “between”
the thread and the object it’s waiting for
Object Manager Object Manager
Executive component for managing system-defined Executive component for managing system-defined
“objects”
“objects”
Objects are data structures with optional names Objects are data structures with optional names
“Objects” managed here include Windows Kernel objects, but “ Objects” managed here include Windows Kernel objects, but not Windows User or GDI objects
not Windows User or GDI objects
Object manager implements user-mode handles and the Object manager implements user-mode handles and the process handle table
process handle table
Object manager is not used for all Windows data Object manager is not used for all Windows data structures
structures
Generally, only those types that need to be shared, named, or Generally, only those types that need to be shared, named, or exported to user mode
exported to user mode
Some data structures are called “objects” but are not managed Some data structures are called “objects” but are not managed by the object manager (e.g. “DPC objects”)
by the object manager (e.g. “DPC objects”)
Object Manager Object Manager
In part, a heap manager…
In part, a heap manager…
Allocates memory for data structure from system-wide, kernel space Allocates memory for data structure from system-wide, kernel space heaps
heaps
(pageable or nonpageable) (pageable or nonpageable)
… with a few extra functions: … with a few extra functions:
Assigns name to data structure (optional) Assigns name to data structure (optional) Allows lookup by name
Allows lookup by name
Objects can be protected by ACL-based security Objects can be protected by ACL-based security
Provides uniform naming, sharing, and protection scheme Provides uniform naming, sharing, and protection scheme
Simplifies C2 security certification by centralizing all object protection in one Simplifies C2 security certification by centralizing all object protection in one place
place
Maintains counts of handles and references (stored pointers in kernel Maintains counts of handles and references (stored pointers in kernel space) to each object
space) to each object
Object cannot be freed back to the heap until all handles and references are Object cannot be freed back to the heap until all handles and references are gonegone
Executive Objects Executive Objects
Object type
Object type Represents Represents
Object directory
Object directory Container object for other objects: implement Container object for other objects: implement
hierarchical namespace to store other object types hierarchical namespace to store other object types Symbolic link
Symbolic link Mechanism for referring to an object name indirectly Mechanism for referring to an object name indirectly Process
Process Virtual address space and control information Virtual address space and control information necessary for execution of thread objects
necessary for execution of thread objects Thread
Thread Executable entity within a process Executable entity within a process Section
Section Region of shared memory (file mapping object in Region of shared memory (file mapping object in Windows API)
Windows API)
File File Instance of an opened file or I/O device Instance of an opened file or I/O device
Port Port Mechanism to pass messages between processes Mechanism to pass messages between processes Access token
Access token Security profile (security ID, user rights) of a Security profile (security ID, user rights) of a process or thread
process or thread
Executive Objects (contd.) Executive Objects (contd.)
Object type
Object type Represents Represents
Event
Event Object with persistent state (signaled or not) usable Object with persistent state (signaled or not) usable for synchronization or notification
for synchronization or notification Semaphore
Semaphore Counter and resource gate for critical section Counter and resource gate for critical section Mutant
Mutant Synchronization construct to serialize resource Synchronization construct to serialize resource access
access Timer
Timer Mechanism to notify a thread when a fixed period of Mechanism to notify a thread when a fixed period of time elapses
time elapses Queue
Queue Method for threads to enqueue/dequeue Method for threads to enqueue/dequeue
notifications of I/O completions (Windows I/O notifications of I/O completions (Windows I/O completion port)
completion port)
Key Key Reference to registry data – visible in object Reference to registry data – visible in object manager namespace
manager namespace Profile
Profile Mechanism for measuring execution time for a Mechanism for measuring execution time for a process within an address range
process within an address range
Object name Object directory Security descriptor Quota charges Open handle count Open handles list Object type
Reference count Object-specific data
Object Structure Object Structure
Object header
Object body
Process 1
Process 2
Process 3
Type name Access types
Synchronizable? (Y/N) Pageable? (Y/N)
Methods:
open, close, delete parse, security, query name Type object contains static, object-type specific data:
- shared among all instances of an object type - link all instances together (enumeration)
Type object Object name
Object directory Security descriptor Quota charges Open handle count Open handles list Object type
Reference count Object-specific data
Object name Object directory Security descriptor Quota charges Open handle count Open handles list Object type
Reference count Object-specific data
Object Methods Object Methods
Process opens handle to object \Device\Floppy0\docs\resume.doc Process opens handle to object \Device\Floppy0\docs\resume.doc Object manager traverses name tree until it reaches Floppy0
Object manager traverses name tree until it reaches Floppy0 Calls parse method for object Floppy0 with arg \docs\resume.doc Calls parse method for object Floppy0 with arg \docs\resume.doc Method
Method When method is calledWhen method is called
OpenOpen When an object handle is openedWhen an object handle is opened Close
Close When an object handle is closedWhen an object handle is closed Delete
Delete Before the object manager deletes an objectBefore the object manager deletes an object Query name
Query name When a thread requests the name of an object, such as a file, that When a thread requests the name of an object, such as a file, that exists in a secondary object domain
exists in a secondary object domain Parse
Parse When the object manager is searching for an object name that exists in When the object manager is searching for an object name that exists in a secondary object domain
a secondary object domain Security
Security When a process reads/changes protection of an objects, such as a file, When a process reads/changes protection of an objects, such as a file, that exists in a secondary object domain
that exists in a secondary object domain Example:
Object Manager Namespace Object Manager Namespace
System and session-wide internal namespace for all objects System and session-wide internal namespace for all objects exported by the operating system
exported by the operating system
View with Winobj from www.sysinternals.com
View with Winobj from www.sysinternals.com
Interesting Object Directories Interesting Object Directories
in \ObjectTypes in \ObjectTypes
objects that define types of objects objects that define types of objects
in \BaseNamedObjects in \BaseNamedObjects
these will appear when Windows programs use these will appear when Windows programs use
CreateEvent, etc.
CreateEvent, etc.
mutant (Windows mutex) mutant (Windows mutex)
queue (Windows I/O completion port) queue (Windows I/O completion port) section (Windows file mapping object) section (Windows file mapping object) event
event
Semaphore Semaphore
In \GLOBAL??
In \GLOBAL??
DOS device name mappings for console session
DOS device name mappings for console session
Object Manager Namespace Object Manager Namespace
Namespace:
Namespace:
Hierarchical directory structure (based on file system model) Hierarchical directory structure (based on file system model) System-wide (not per-process)
System-wide (not per-process)
With Terminal Services, Windows objects are per-session by default With Terminal Services, Windows objects are per-session by default Can override this with “global\” prefix on object names
Can override this with “global\” prefix on object names
Volatile (not preserved across boots) Volatile (not preserved across boots)
As of Server 2003, requires SeCreateGlobalPrivilege As of Server 2003, requires SeCreateGlobalPrivilege
Namespace can be extended by secondary object managers (e.g. file system) Namespace can be extended by secondary object managers (e.g. file system)
Hook mechanism to call external parse routine (method) Hook mechanism to call external parse routine (method)
Supports case sensitive or case blind Supports case sensitive or case blind
Supports symbolic links (used to implement drive letters, etc.) Supports symbolic links (used to implement drive letters, etc.)
Lookup done on object creation or access by name Lookup done on object creation or access by name
Not on access by handle Not on access by handle
Not all objects managed by the object manager are named Not all objects managed by the object manager are named
e.g. file objects are not named e.g. file objects are not named
un-named objects are not visible in WinObj un-named objects are not visible in WinObj
Nonpaged pool Nonpaged pool
Has initial size and upper limit (can be grown dynamically, up to the max) Has initial size and upper limit (can be grown dynamically, up to the max) 32-bit upper limit: 256 MB on x86 (NT4: 128MB)
32-bit upper limit: 256 MB on x86 (NT4: 128MB)
64-bit: 128 GB 64-bit: 128 GB
Performance counter displays current total size (allocated + free) Performance counter displays current total size (allocated + free)
Max size stored in kernel variable MmMaximumNonPagedPoolInBytes Max size stored in kernel variable MmMaximumNonPagedPoolInBytes
Paged pool Paged pool
32-bit upper limit: 650MB (Windows Server 2003), 470MB (Windows 2000), 32-bit upper limit: 650MB (Windows Server 2003), 470MB (Windows 2000), 192MB (Windows NT 4.0)
192MB (Windows NT 4.0) 64-bit: 128 GB
64-bit: 128 GB
Max size stored in MmSizeOfPagedPoolInBytes Max size stored in MmSizeOfPagedPoolInBytes
Pool size performance counters display current size, not max Pool size performance counters display current size, not max
To display maximums, use “!vm” kernel debugger command To display maximums, use “!vm” kernel debugger command
Kernel Memory Pools Kernel Memory Pools
(System-Space Heaps)
(System-Space Heaps)
Invoking Kernel-Mode Routines Invoking Kernel-Mode Routines
Code is run in kernel mode for one of three reasons:
Code is run in kernel mode for one of three reasons:
Requests from user mode Requests from user mode
Via the system service dispatch mechanism Via the system service dispatch mechanism
Kernel-mode code runs in the context of the requesting thread Kernel-mode code runs in the context of the requesting thread
Interrupts from external devices Interrupts from external devices
Interrupts (like all traps) are handled in kernel mode Interrupts (like all traps) are handled in kernel mode
Windows-supplied interrupt dispatcher invokes the interrupt service routine Windows-supplied interrupt dispatcher invokes the interrupt service routine ISR runs in the context of the interrupted thread (so-called “arbitrary thread ISR runs in the context of the interrupted thread (so-called “arbitrary thread context”)
context”)
ISR often requests the execution of a “DPC routine”, which also runs in kernel ISR often requests the execution of a “DPC routine”, which also runs in kernel modemode
Dedicated kernel-mode threads Dedicated kernel-mode threads
Some threads in the system stay in kernel mode at all times (mostly in the Some threads in the system stay in kernel mode at all times (mostly in the
“System” process)
“System” process)
Scheduled, preempted, etc., like any other threads Scheduled, preempted, etc., like any other threads
Accounting for Kernel-Mode Time Accounting for Kernel-Mode Time
“ “ Processor Time” = Processor Time” = total busy time of total busy time of processor (equal to processor (equal to elapsed real time - idle elapsed real time - idle time)
time)
“ “ Processor Time” = Processor Time” =
“User Time” +
“User Time” +
“Privileged Time”
“Privileged Time”
“ “ Privileged Time” = Privileged Time” = time spent in kernel time spent in kernel mode mode
“ “ Privileged Time” Privileged Time”
includes:
includes:
Interrupt Time Interrupt Time DPC Time DPC Time
other kernel-mode other kernel-mode time (no separate time (no separate counter for this) counter for this)
Screen snapshot from: Programs |
Administrative Tools | Performance Monitor click on “+” button, or select Edit | Add to chart...
Kernel Memory Pools Kernel Memory Pools
(System-Space Heaps) (System-Space Heaps)
Windows provides two system memory pools:
Windows provides two system memory pools:
“ “ Nonpaged Pool” and “Paged Pool” Nonpaged Pool” and “Paged Pool”
Used for systemwide persistent data (visible from any process Used for systemwide persistent data (visible from any process context)
context)
Nonpaged pool required for memory accessed from DPC/dispatch Nonpaged pool required for memory accessed from DPC/dispatch IRQL or above
IRQL or above
Page faults at DPC/dispatch IRQL or above cause a system crash Page faults at DPC/dispatch IRQL or above cause a system crash
Pool sizes are a function of memory size & Server vs.
Pool sizes are a function of memory size & Server vs.
Workstation Workstation
Can be overidden in Registry:
Can be overidden in Registry:
HKLM\System\CurrentControlSet\Control\Session Manager HKLM\System\CurrentControlSet\Control\Session Manager
\Memory Management
\Memory Management
But are limited by implementation limits (next slide)
But are limited by implementation limits (next slide)
Lookaside Lists Lookaside Lists
Instead of frequently allocating and freeing pool…
Instead of frequently allocating and freeing pool…
… … Use lookaside lists instead Use lookaside lists instead
Implements a driver private list of preallocated blocks of pool Implements a driver private list of preallocated blocks of pool
Routines automatically extend the lookaside list from the systemwide Routines automatically extend the lookaside list from the systemwide pool pool
Avoids hitting systemwide pool spinlock for every allocation/release Avoids hitting systemwide pool spinlock for every allocation/release
Optimizes CPU/memory cache behavior (the same driver will use the Optimizes CPU/memory cache behavior (the same driver will use the
same list items over and over) same list items over and over) See: See:
ExInitialize[N]PagedLookasideList ExInitialize[N]PagedLookasideList
ExAllocateFrom[N]PagedLookasideList ExAllocateFrom[N]PagedLookasideList ExFreeTo[N]PagedLookasideList
ExFreeTo[N]PagedLookasideList
ExDelete[N]PagedLookasideList
ExDelete[N]PagedLookasideList
Increased System Memory Limits Increased System Memory Limits
Key system memory limits raised in XP & Server 2003 Key system memory limits raised in XP & Server 2003
Windows 2000 limit of 200 GB of mapped file data eliminated Windows 2000 limit of 200 GB of mapped file data eliminated
Previously limited size of files that could be backed up Previously limited size of files that could be backed up
Maximum System Page Table Entries (PTEs) increased Maximum System Page Table Entries (PTEs) increased
Can now describe 1.3 GB of system space (960 MB contiguous) Can now describe 1.3 GB of system space (960 MB contiguous) Windows 2000 limit was 660 MB (220 MB contiguous)
Windows 2000 limit was 660 MB (220 MB contiguous) Increases number of users on Terminal Servers
Increases number of users on Terminal Servers
Also means maximum device driver size is now 960 MB (was 220
Also means maximum device driver size is now 960 MB (was 220
MB) MB)
Local Procedure Calls (LPCs) Local Procedure Calls (LPCs)
IPC – high-speed message passing IPC – high-speed message passing
Not available through Windows API –
Not available through Windows API – Windows Windows OS internal OS internal Application scenarios:
Application scenarios:
RPCs on the same machine are implemented as LPCs RPCs on the same machine are implemented as LPCs
Some Windows APIs result in sending messages to Windows subsyst.
Some Windows APIs result in sending messages to Windows subsyst.
proc.
proc.
WinLogon uses LPC to communicate with local security authentication WinLogon uses LPC to communicate with local security authentication server process (LSASS)
server process (LSASS)
Security reference monitor uses LPC to communicate with LSASS Security reference monitor uses LPC to communicate with LSASS
LPC communication:
LPC communication:
Short messages < 256 bytes are copied from sender to receiver
Short messages < 256 bytes are copied from sender to receiver
Larger messages are exchanged via shared memory segment
Larger messages are exchanged via shared memory segment
Server (kernel) may write directly in client‘s address space
Server (kernel) may write directly in client‘s address space
Port Objects Port Objects
LPC exports port objects to maintain state of communication:
LPC exports port objects to maintain state of communication:
Server connection port
Server connection port : named port, server connection request : named port, server connection request point
point
Server communication port
Server communication port: unnamed port, one per active client, : unnamed port, one per active client, used for communication
used for communication
Client communication port
Client communication port : unnamed port a particular client : unnamed port a particular client thread uses to communicate with a particular server
thread uses to communicate with a particular server Unnamed communication port
Unnamed communication port : unnamed port created for use by : unnamed port created for use by two threads in the same process
two threads in the same process
Typical scenario:
Typical scenario:
Server creates named connection port Server creates named connection port Client makes connection request
Client makes connection request
Two unnamed ports are created, client gets handle to server port, Two unnamed ports are created, client gets handle to server port, server gets handle to client port
server gets handle to client port
Use of LPC ports Use of LPC ports
Client address
space Kernel address
space
Server address space
Message queue Connection port
Client process Server process
Handle Handle Server view
of section Handle
Client view of section
Shared section Client
communication port
Server communication
port
Exception Dispatching Exception Dispatching
Exceptions are conditions that result directly from the Exceptions are conditions that result directly from the execution of the program that is running
execution of the program that is running
Windows introduced a facility known as structured Windows introduced a facility known as structured exception handling, which allows applications to gain exception handling, which allows applications to gain control when exceptions occur
control when exceptions occur
The application can then fix the condition and return to The application can then fix the condition and return to the place the exception occurred,
the place the exception occurred,
unwind the stack (thus terminating execution of the subroutine unwind the stack (thus terminating execution of the subroutine
that raised the exception), or that raised the exception), or
declare back to the system that the exception isn’t recognized declare back to the system that the exception isn’t recognized
and the system should continue searching for an exception and the system should continue searching for an exception
handler that might process the exception.
handler that might process the exception.
Exception Dispatching (contd.) Exception Dispatching (contd.)
Structured exception handling Structured exception handling; ;
Accessible from MS VC++ language: __try, __except, __finally Accessible from MS VC++ language: __try, __except, __finally See Jeffrey Richter, „Advanced Windows“, MS PressSee Jeffrey Richter, „Advanced Windows“, MS Press
See Johnson M.Hart, „Win32 System Programming“, Addison-WesleySee Johnson M.Hart, „Win32 System Programming“, Addison-Wesley
Trap
handler Exception
dispatcher)
Debugger (first chance)
Frame-based handlers
Debugger (second chance)
Environment subsystem
Kernel default handler (Exception
frame, client thread ID)
exception
Unhandled exceptions are passed to next handler
Exception dispatcher sends debug message to
Debugger via LPC/excepion port & session manager process
LPC to debugger port
Internal Windows API exception Internal Windows API exception
handler handler
Processes unhandled exceptions Processes unhandled exceptions
At top of stack, declared in StartOfProcess()/StartOfThread() At top of stack, declared in StartOfProcess()/StartOfThread()
void Win32StartOfProcess(LPTHREAD_START_ROUTINE lpStartAddr, void Win32StartOfProcess(LPTHREAD_START_ROUTINE lpStartAddr,
LPVOID lpvThreadParm) {LPVOID lpvThreadParm) { __try {
__try {
DWORD dwThreadExitCode = lpStartAddr(lpvThreadParm);DWORD dwThreadExitCode = lpStartAddr(lpvThreadParm);
ExitThread(dwThreadExitCode);ExitThread(dwThreadExitCode);
}} __except(UnhandledExceptionFilter(__except(UnhandledExceptionFilter(
GetExceptionInformation())) {GetExceptionInformation())) {
ExitProcess(GetExceptionCode());ExitProcess(GetExceptionCode());
}} }}