Abstract Data Types - Art MeMory Forensics Praise for

The discussion of specific data structure examples is prefaced by introducing the stor-age concepts in terms of abstract data types. Abstract data types provide models for both the data and the operations performed on the data. These models are independent of any particular programming language and are not concerned with the details of the particular data being stored.

While discussing these abstract data types, we will generically refer to the stored data as an element, which is used to represent an unspecified data type. This element could be either a basic data type or a composite data type. The values of some elements—point-ers—may also be used to represent the connections between the elements. Analogous to the discussion of C pointers that store a memory address, the value of these elements is used to reference another element.

By initially discussing these concepts in terms of abstract data types, you can identify why you would use a particular data organization and how the stored data would be manipulated. Finally, we discuss examples of how the C programming language imple-ments abstract data types as data structures and provide examples of how to use them

Part I: An Introduction to Memory Forensics

30

within operating systems. You will frequently encounter implementations of data struc-tures with which you may not be immediately familiar. By leveraging knowledge of how the program uses the data, the characteristics of how the data is stored in memory, and the conventions of the programming language, you can often recognize an abstract data-type pattern that will help give clues as to how the data can be processed.

Arrays

The simplest mechanism for aggregating data is the one-dimensional array. This is a collec-tion of <index, element> pairs, in which the elements are of a homogeneous data type.

The data type of the stored elements is then typically referred to as the array type. An important characteristic of an array is that its size is fixed when an instance of the array is created, which subsequently bounds the number of elements that can be stored. You access the elements of an array by specifying an array index that maps to an element’s position within the collection. Figure 2-1 shows an example of a one-dimensional array.

Index 0 1 2 3 4

Element A B C D B

Figure 2-1: One-dimensional array example

In Figure 2-1, you have an example of an array that can hold five elements. This array is used to store characters and subsequently would be referred to as an array of characters.

As a common convention you will see throughout the book, the first element of the array is found at index 0. Thus, the value of the element at index 2 is the C character. Assuming the array was given the name grades, we could also refer to this element as grades[2].

Arrays are frequently used because many programming languages offer implementa-tions designed to be extremely efficient for accessing stored data. For example, in the C programming language the storage requirements are a fixed size, and the compiler allo-cates a contiguous block of memory for storing the elements of the array. You typically reference the location of the array by the memory address of the first element, which is called the array’s base address. You can then access the subsequent elements of the array as an offset from the base address. For example, assuming that an array is stored at base address X and is storing elements of size S, you can calculate the address of the element at index I using the following equation:

Address(I) = X + (I * S)

This characteristic is typically referred to as random access because the time to access an element does not depend on what element you are accessing.

Data Structures

31

Because arrays are extremely efficient for accessing data, you encounter them fre-quently during memory analysis. This is especially true when analyzing the operating system’s data structures. In particular, the operating system commonly stores arrays of pointers and other types of fixed-sized elements that you need to access quickly at fixed indices. For example, in Chapter 21 you learn how Linux leverages an array to store the file handles associated with a process. In this case, the array index is the file descriptor number. Also, Chapter 13 shows Microsoft’s MajorFunction table: an array of function pointers that enable an application to communicate with a driver. Each element of the array contains the address of the code (dispatch routine) that executes to satisfy an I/O request. The index for the array maps to a predefined operation code (for example, 0 = read; 1 = write; 2 = delete).

Figure 2-2 provides an example of how the function pointers for a driver’s MajorFunction

table are stored in memory on an IA32-based system, assuming that the first entry is stored at base_address.

Figure 2-2: Partial array of function pointers taken from a driver’s MajorFunction table

If you know the base address of an array and the size of its elements, you can quickly enumerate the elements stored contiguously in memory. Furthermore, you can access any particular member by using the same calculations the compiler generates. You may also notice patterns in unknown contiguous blocks of memory that resemble an array, which can provide clues about how the data is being used, what is being stored, and how it can be interpreted.

Bitmaps

An array variant used to represent sets is the bitmap, also known as the bit vector or bit array. In this instance, the index represents a fixed number of contiguous integers, and the elements store a boolean value {1,0}. Within memory analysis, bitmaps are typically used to efficiently determine whether a particular object belongs to a set (that is, allocated versus free memory, low versus high priority, and so on). They are stored as an array of bits, known as a map, and each bit represents whether one object is valid or not. Using bitmaps allows for representation of eight objects in one byte, which scales well to large

Part I: An Introduction to Memory Forensics

32

data sets. For example, the Windows kernel uses a bitmap to maintain allocated network ports. Network ports are represented as an unsigned short, which is 2 bytes, and provides ((2¹⁶)–1) or 65535 possibilities. This large number of ports is represented by a 65535-bit (approximately 8KB) bitmap. Figure 2-3 provides an example of this Windows structure.

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

8192 bytes This means ports 1, 2,

3, and 6 are in use

Figure 2-3: An example of a Windows bitmap of in-use network ports

The figure shows both the bit-level and byte-level view of the bitmap. In the byte-level view you can see that the first index has a value of 4e hexadecimal, which translates to 1001110 binary. This binary value indicates that ports 1, 2, 3, and 6 are in use, because those are the positions of the bits that are set.

Records

Another mechanism commonly used for aggregating data is a record (or structure) type.

Unlike an array that requires elements to consist of the same data type, a record can be made up of heterogeneous elements. It is composed of a collection of fields, where each field is specified by <name, element> pairs. Each field is also commonly referred to as a member of the record. Because records are static, similar to arrays, the combination of its elements and their order are fixed when an instance of the record is created. Specifying

Data Structures

33

the particular member name, which acts as a key or index, accesses the elements of a record. A collection of elements can be combined within a record to create a new element.

Similarly, it is also possible for an element of a record to be an array or record itself.

Figure 2-4 shows an example of a network connection record composed of four mem-bers that describe its characteristics: id, port, addr, and hostname. Despite the fact that the members may have a variety of data types and associated sizes, by forming a record you make it possible to store and organize related objects.

id port

addr

hostname

Figure 2-4: Network connection record example

In the C programming language, records are implemented using structures. A structure enables a programmer to specify the name, type, and order of the members. Similar to arrays, the size of C structures is known when an instance is created, structures are stored in a contiguous block of memory, and the elements are accessed using a base plus offset calculation. The offsets from the base of a record vary depending on the size of the data types being stored. The compiler determines the offsets for accessing an element based on the size of the data types that precede it in the structure. For example, you would express the definition of a C structure for the network connection information from Figure 2-4 in the following format:

struct Connection { short id;

short port;

unsigned long addr;

char hostname[32];

};

As you can see, the identifier Connection is the structure’s name; and id, port, addr, and hostname are the names of its members. The first field of this structure is named id, which has a length of 2 bytes and is used to store a unique identifier associated with this record. The second field is named port, which also has a length of 2 bytes and is used to store the listening port on the remote machine. The third field is used to store the IP address of the remote machine as binary data in 4 bytes with a field name of addr. The fourth field is called hostname and stores the hostname of the remote machine using

Part I: An Introduction to Memory Forensics

34

32 bytes. Using information about the data types, the compiler can determine the proper offsets (0, 2, 4, and 8) for accessing each field. Table 2-3 lists the members of Connection

and their associated offsets.

Table 2-3: Structure Type Information for a Network Connection Example

Byte Range Name Type Description

0–1 id short Unique record ID

2–3 port short Remote port

4–7 addr unsigned long Remote address

8–39 hostname char[32] Remote hostname

The following is an example of how an instance of the data structure would appear in memory, as presented by a tool that reads raw data:

0000000: 0100 0050 ad3d de09 7777 772e 766f 6c61 ...P.>X.www.vola 0000010: 7469 6c69 7479 666f 756e 6461 7469 6f6e tilityfoundation 0000020: 2e6f 7267 0000 0000 .org....

C-style structures are some of the most important data structures encountered when performing memory analysis. After you have determined the base address of a particular structure, you can leverage the definition of the structure as a template for extracting and interpreting the elements of the record. One thing that you must keep in mind is that the compiler may add padding to certain fields of the structure stored in memory. The compiler does this to preserve the alignment of fields and enhance CPU performance.

As an example, we can modify the structure type from Table 2-3 to remove the port

field. If the program were then recompiled with a compiler configured to align on 32-bit boundaries, we would find the following data in memory:

0000000: 0100 0000 ad3d de09 7777 772e 766f 6c61 ...P.>X.www.vola 0000010: 7469 6c69 7479 666f 756e 6461 7469 6f6e tilityfoundation 0000020: 2e6f 7267 0000 0000 .org....

In the output, you can see that despite removing the port field, the remaining fields (id, addr, hostname) are still found at the same offsets (0, 4, 8) as before. You will also notice that the bytes 2–3 (containing 0000), which previously stored the port value, are now used solely for padding.

As you will see in later chapters, the definition of a data structure and the constraints associated with the values that can be stored in a field can also be used as a template for carving possible instances of the structure directly from memory.

Data Structures

35 Strings

One of the most important storage concepts that you will frequently encounter is the string. The string is often considered a special case of an array, in which the stored ele-ments are constrained to represent character codes taken from a predetermined character encoding. Similar to an array, a string is composed of a collection of <index, element>

pairs. Programming languages often provide routines for string manipulation, which may change the mappings between indices and elements. While records and arrays are often treated as static data types, a string can contain a variable length sequence of ele-ments that may not be known when an instance is created. Just as the mappings between indices and elements may change during processing, the size of the string may dynami-cally change as well. As a result, strings must provide a mechanism for determining the length of the stored collection.

The first implementation we are going to consider is the C-style string. C-style strings provide an implementation that is very similar to how the C programming language implements arrays. In particular, we will focus on C-style strings where the elements are of type char—a C basic data type—and are encoded using the ASCII character encoding.

This encoding assigns a 7-bit numerical value (almost always stored in a full byte) to the characters in American English. The major difference between a C-style string and an array is that C-style strings implicitly maintain the string length. This is accomplished by demarcating the end of the string with an embedded string termination character. In the case of C-style strings, the termination character is the ASCII NULL symbol, which is 0x00. You can then calculate the length of the string by determining the number of characters between the string’s starting address and the termination character.

For example, consider the data structure discussed previously for storing network information whose structure type was presented in Table 2-3. The fourth element of the data structure used to store the hostname is a C-style string. Figure 2-5 shows how the string would be stored in memory using the ASCII character encoding. The string begins at byte offset 8 and continues until the termination character, 0x00, at offset 36. Thus you can determine that the string is used to store 28 symbols and the termination character.

w w w . v o l a t i l i t y f o u n d a t i o n . o r g

77 77 77 2e 76 6f 6c 61 7f 69 6c 69 74 79 66 6f 75 6e 64 61 74 69 6f 6e 2e 6f 72 67 00 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 Figure 2-5: Hostname represented in ASCII as a C-style string

You may frequently encounter an alternative string implementation when analyzing memory associated with the Microsoft Windows operating system. We will refer to this implementation as the _UNICODE_STRING implementation because that is the name of its

Part I: An Introduction to Memory Forensics

36

supporting data structure. Unlike the C-style string implementation that leverages the ASCII character encoding to store 1-byte elements, the _UNICODE_STRING supports Unicode encoding by allowing elements to be larger than a single byte and, as a result, provides support for language symbols beyond just American English.

Unless otherwise specified, the elements of a _UNICODE_STRING are encoded using the UTF-16 version of Unicode. This means characters are stored as either 2- or 4-byte values.

Another difference from the C-style string is that a _UNICODE_STRING does not require a terminating character but instead stores the length explicitly. As previously mentioned, a _UNICODE_STRING is implemented using a structure that stores the length, in bytes, of the string (Length), the maximum number of bytes that can be stored in this particular string (MaximumLength), and a pointer to the starting memory address in which the characters are stored (Buffer). The format of the _UNICODE_STRING data structure is given in Table 2-4.

Table 2-4: Structure Type Definition for _UNICODE_STRING on 64-Bit Versions of Windows

Byte Range Name Type Description

0–1 Length unsigned short Current string length

2–3 MaximumLength unsigned short Maximum string length

8–15 Buffer * unsigned short String address

Strings are an extremely important component of memory analysis, because they are used to store textual data (passwords, process names, filenames, and so on). While inves-tigating an incident, you will commonly search for and extract relevant strings from memory. In these cases, the string elements can provide important clues as to what the data is and how it is being used. In some circumstances, you can leverage where a string is found (relative to other strings) to provide context for an unknown region of memory.

For example, if you find strings related to URLs and the temporary Internet files folder, you may have found fragments of the Internet history log file.

Also important for you to understand is that the particular implementation of the strings can pose challenges, depending on the type of memory analysis being performed.

For example, because C-style strings implicitly embed a terminating character that des-ignates the end of the stored characters, strings are typically extracted by processing each character until the termination character is reached. Some challenges may arise, if the termination character cannot be found. For example, when analyzing the physical address space of a system that leverages paged virtual memory, you could encounter a string that crosses a page boundary to a page that is no longer memory resident, which would require special processing or heuristics to determine the actual size of the string.

The _UNICODE_STRING implementation can also make physical address space analysis challenging. The _UNICODE_STRING data structure only contains metadata for the string

Data Structures

37

(that is, its starting virtual memory address, size in bytes, and so on). Thus, if you cannot perform virtual address translation, then you cannot locate the actual contents of the string. Likewise, if you find the contents of a string through other means, you may not be able to determine its appropriate length, because the size metadata is stored separately.

Dans le document Art MeMory Forensics Praise for (Page 55-63)