Some DOS Basics - The Little Black Book of Computer Viruses

To understand the means by which the virus copies itself from one program to another, we have to dig into the details of how the operating system, DOS, loads a program into memory and passes control to it. The virus must be designed so it’s code gets

executed, rather than just the program it has attached itself to. Only then can it reproduce. Then, it must be able to pass control back to the host program, so the host can execute in its entirety as well.

When one enters the name of a program at the DOS prompt, DOS begins looking for files with that name and an extent of

“COM”. If it finds one it will load the file into memory and execute it. Otherwise DOS will look for files with the same name and an extent of “EXE” to load and execute. If no EXE file is found, the operating system will finally look for a file with the extent “BAT”

to execute. Failing all three of these possibilities, DOS will display the error message “Bad command or file name.”

EXE and COM files are directly executable by the Central Processing Unit. Of these two types of program files, COM files are much simpler. They have a predefined segment format which is built into the structure of DOS, while EXE files are designed to handle a user defined segment format, typical of very large and complicated programs. The COM file is a direct binary image of what should be put into memory and executed by the CPU, but an EXE file is not.

To execute a COM file, DOS must do some preparatory work before giving that program control. Most importantly, DOS controls and allocates memory usage in the computer. So first it checks to see if there is enough room in memory to load the program. If it can, DOS then allocates the memory required for the program. This step is little more than an internal housekeeping function. DOS simply records how much space it is making avail-able for such and such a program, so it won’t try to load another program on top of it later, or give memory space to the program that would conflict with another program. Such a step is necessary because more than one program may reside in memory at any given time. For example, pop-up, memory resident programs can remain in memory, and parent programs can load child programs into memory, which execute and then return control to the parent.

Next, DOS builds a block of memory 256 bytes long known as the Program Segment Prefix, or PSP. The PSP is a remnant of an older operating system known as CP/M. CP/M was popular in the late seventies and early eighties as an operating system for microcomputers based on the 8080 and Z80

microproc-essors. In the CP/M world, 64 kilobytes was all the memory a computer had. The lowest 256 bytes of that memory was reserved for the operating system itself to store crucial data. For example, location 5 in memory contained a jump instruction to get to the rest of the operating system, which was stored in high memory, and its location differed according to how much memory the computer had. Thus, programs written for these machines would access the operating system functions by calling location 5 in memory. When PC-DOS came along, it imitated CP/M because CP/M was very popular, and many programs had been written to work with it. So the PSP (and whole COM file concept) became a part of DOS. The result is that a lot of the information stored in the PSP is of little

Offset Size Description 0 H 2 Int 20H Instruction

2 2 Address of Last allocated segment 4 1 Reserved, should be zero

5 5 Far call to DOS function dispatcher A 4 Int 22H vector (Terminate program) E 4 Int 23H vector (Ctrl-C handler) 12 4 Int 24H vector (Critical error handler)

16 22 Reserved

2C 2 Segment of DOS environment

2E 34 Reserved

50 3 Int 21H / RETF instruction

53 9 Reserved

5C 16 File Control Block 1 6C 20 File Control Block 2

80 128 Default DTA (command line at startup) 100 - Beginning of COM program

Figure 2: Format of the Program Segment Prefix.

use to a DOS programmer today. Some of it is useful though, as we will see a little later.

Once the PSP is built, DOS takes the COM file stored on disk and loads it into memory just above the PSP, starting at offset 100H. Once this is done, DOS is almost ready to pass control to the program. Before it does, though, it must set up the registers in the CPU to certain predetermined values. First, the segment registers must be set properly, or a COM program cannot run. Let’s take a look at the how’s and why’s of these segment registers.

In the 8088 microprocessor, all registers are 16 bit regis-ters. The problem is that a 16 bit register will only allow one to address 64 kilobytes of memory. If you want to use more memory, you need more bits to address it. The 8088 can address up to one megabyte of memory using a process known as segmentation. It uses two registers to create a physical memory address that is 20 bits long instead of just 16. Such a register pair consists of a segment register, which contains the most significant bits of the address, and an offset register, which contains the least significant bits. The segment register points to a 16 byte block of memory, and the offset register tells how many bytes to add to the start of the 16 byte block to locate the desired byte in memory. For example, if the ds register is set to 1275 Hex and the bx register is set to 457 Hex, then the physical 20 bit address of the byte ds:[bx] is

1275H x 10H = 12750H + 457H 12BA7H

No offset should ever have to be larger than 15, but one normally uses values up to the full 64 kilobyte range of the offset register. This leads to the possibility of writing a single physical address in several different ways. For example, setting ds = 12BA Hex and bx = 7 would produce the same physical address 12BA7 Hex as in the example above. The proper choice is simply whatever is convenient for the programmer. However, it is standard program-ming practice to set the segment registers and leave them alone as much as possible, using offsets to range through as much data and code as one can (64 kilobytes if necessary).

The 8088 has four segment registers, cs, ds, ss and es, which stand for Code Segment, Data Segment, Stack Segment, and Extra Segment, respectively. They each serve different purposes.

The cs register specifies the 64K segment where the actual program instructions which are executed by the CPU are located. The Data Segment is used to specify a segment to put the program’s data in, and the Stack Segment specifies where the program’s stack is located. The es register is available as an extra segment register for the programmer’s use. It might typically be used to point to the video memory segment, for writing data directly to video, etc.

COM files are designed to operate with a very simple, but limited segment structure. namely they have one segment, cs=ds=es=ss. All data is stored in the same segment as the program code itself, and the stack shares this segment. Since any given segment is 64 kilobytes long, a COM program can use at most 64 kilobytes for all of its code, data and stack. When PC’s were first introduced, everybody was used to writing programs limited to 64 kilobytes, and that seemed like a lot of memory. However, today it is not uncommon to find programs that require several hundred kilobytes of code, and maybe as much data. Such programs must use a more complex segmentation scheme than the COM file format allows. The EXE file structure is designed to handle that complex-ity. The drawback with the EXE file is that the program code which is stored on disk must be modified significantly before it can be executed by the CPU. DOS does that at load time, and it is completely transparent to the user, but a virus that attaches to EXE files must not upset DOS during this modification process, or it won’t work. A COM program doesn’t require this modification process because it uses only one segment for everything. This makes it possible to store a straight binary image of the code to be executed on disk (the COM file). When it is time to run the program, DOS only needs to set up the segment registers properly and execute it.

The PSP is set up at the beginning of the segment allocated for the COM file, i.e. at offset 0. DOS picks the segment based on what free memory is available, and puts the PSP at the very start of that segment. The COM file itself is loaded at offset 100 Hex, just after the PSP. Once everything is ready, DOS transfers control to

the beginning of the program by jumping to the offset 100 Hex in the code segment where the program was loaded. From there on, the program runs, and it accesses DOS occasionally, as it sees fit, to perform various I/O functions, like reading and writing to disk.

When the program is done, it transfers control back to DOS, and DOS releases the memory reserved for that program and gives the user another command line prompt.

Dans le document The Little Black Book of Computer Viruses (Page 29-34)