• Aucun résultat trouvé

COM Program Operation

Dans le document Computer Viruses Black Book GIANT (Page 26-31)

When one enters the name of a program at the DOS prompt, DOS begins looking for files with that name and an extent of

“COM”. If it finds one it will load the file into memory and execute it. Otherwise DOS will look for files with the same name and an extent of “EXE” to load and execute. If no EXE file is found, the operating system will finally look for a file with the extent “BAT”

to execute. Failing all three of these possibilities, DOS will display the error message “Bad command or file name.”

EXE and COM files are directly executable by the Central Processing Unit. Of these two types of program files, COM files are much simpler. They have a predefined segment format which is built into the structure of DOS, while EXE files are designed to handle a segment format defined by the programmer, typical of very large and complicated programs. The COM file is a direct binary image of what should be put into memory and executed by the CPU, but an EXE file is not.

To execute a COM file, DOS does some preparatory work, loads the program into memory, and then gives the program control.

Up until the time when the program receives control, DOS is the 22 The Giant Black Book of Computer Viruses

program executing, and it is manipulating the program as if it were data. To understand this whole process, let’s take a look at the operation of a simple non-viral COM program which is the assem-bly language equivalent of hello.c—that infamous little program used in every introductory c programming course. Here it is:

.model tiny .code

ORG 100H HOST:

mov ah,9 ;prepare to display a message mov dx,OFFSET HI ;address of message

int 21H ;display it with DOS

mov ax,4C00H ;prepare to terminate program int 21H ;and terminate with DOS

HI DB ’You have just released a virus! Have a nice day!$’

END HOST

Call it HOST.ASM. It will assemble to HOST.COM. This program will serve us well in this chapter, because we’ll use it as a host for virus infections.

Now, when you type “HOST” at the DOS prompt, the first thing DOS does is reserve memory for this program to live in. To understand how a COM program uses memory, it is useful to remember that COM programs are really a relic of the days of CP/M—an old disk operating system used by earlier microcomput-ers that used 8080 or Z80 processors. In those days, the processor could only address 64 kilobytes of memory and that was it. When MS-DOS and PC-DOS came along, CP/M was very popular. There were thousands of programs—many shareware—for CP/M and practically none for any other processor or operating system (ex-cepting the Apple II). So both the 8088 and MS-DOS were designed to make porting the old CP/M programs as easy as possible. The 8088-based COM program is the end result.

In the 8088 microprocessor, all registers are 16 bit registers. A 16 bit register will only allow one to address 64 kilobytes of memory, just like the 8080 and Z80. If you want to use more memory, you need more bits to address it. The 8088 can address up to one megabyte of memory using a process known as segmen-tation. It uses two registers to create a physical memory address that is 20 bits long instead of just 16. Such a register pair consists The Simplest COM Infector 23

of a segment register, which contains the most significant bits of the address, and an offset register, which contains the least signifi-cant bits. The segment register points to a 16 byte block of memory, and the offset register tells how many bytes to add to the start of the 16 byte block to locate the desired byte in memory. For example, if the ds register is set to 1275 Hex and the bx register is set to 457 Hex, then the physical 20 bit address of the byte ds:[bx]

is

1275H x 10H = 12750H + 457H —————

12BA7H

No offset should ever have to be larger than 15, but one normally uses values up to the full 64 kilobyte range of the offset register.

This leads to the possibility of writing a single physical address in several different ways. For example, setting ds = 12BA Hex and bx = 7 would produce the same physical address 12BA7 Hex as in the example above. The proper choice is simply whatever is con-venient for the programmer. However, it is standard programming practice to set the segment registers and leave them alone as much as possible, using offsets to range through as much data and code as one can (64 kilobytes if necessary). Typically, in 8088 assem-bler, the segment registers are implied quantities. For example, if you write the assembler instruction

mov ax,[bx]

when the bx register is equal to 7, the ax register will be loaded with the word value stored at offset 7 in the data segment. The data segment ds never appears in the instruction because it is automat-ically implied. If ds = 12BAH, then you are really loading the word stored at physical address 12BA7H.

The 8088 has four segment registers, cs, ds, ss and es, which stand for Code Segment, Data Segment, Stack Segment, andExtra Segment, respectively. They each serve different purposes. The cs register specifies the 64K segment where the actual program in-structions which are executed by the CPU are located. The Data Segment is used to specify a segment to put the program’s data in, and the Stack Segment specifies where the program’s stack is 24 The Giant Black Book of Computer Viruses

located. The es register is available as an extra segment register for the programmer’s use. It might be used to point to the video memory segment, for writing data directly to video, or to the segment 40H where the BIOS stores crucial low-level configura-tion informaconfigura-tion about the computer.

COM files, as a carry-over from the days when there was only 64K memory available, use only one segment. Before executing a COM file, DOS sets all the segment registers to one value, cs=ds=es=ss. All data is stored in the same segment as the program code itself, and the stack shares this segment. Since any given segment is 64 kilobytes long, a COM program can use at most 64 kilobytes for all of its code, data and stack. And since segment registers are usually implicit in the instructions, an ordinary COM program which doesn’t need to access BIOS data, or video data, etc., directly need never fuss with them. The program HOST is a good example. It contains no direct references to any segment; DOS can load it into any segment and it will work fine.

The segment used by a COM program must be set up by DOS before the COM program file itself is loaded into this segment at

Offset Size Description

0 H 2 Int 20H Instruction

2 2 Address of last allocated segment 4 1 Reserved, should be zero

5 5 Far call to Int 21H vector

A 4 Int 22H vector (Terminate program) E 4 Int 23H vector (Ctrl-C handler) 12 4 Int 24H vector (Critical error handler)

16 22 Reserved

2C 2 Segment of DOS environment

2E 34 Reserved

50 3 Int 21H / RETF instruction

53 9 Reserved

5C 16 File Control Block 1 6C 20 File Control Block 2

80 128 Default DTA (command line at startup) 100 - Beginning of COM program

Fig. 3.1: The Program Segment Prefix

The Simplest COM Infector 25

offset 100H. DOS also creates a Program Segment Prefix, or PSP, in memory from offset 0 to 0FFH (See Figure 3.1).

The PSP is really a relic from the days of CP/M too, when this low memory was where the operating system stored crucial data for the system. Much of it isn’t used at all in most programs. For example, it contains file control blocks (FCB’s) for use with the DOS file open/read/write/close functions 0FH, 10H, 14H, 15H, etc.

Nobody in their right mind uses those functions, though. They’re CP/M relics. Much easier to use are the DOS handle-based func-tions 3DH, 3EH, 3FH, 40H, etc., which were introduced in DOS 2.00. Yet it is conceivable these old functions could be used, so the needed data in the PSP must be maintained. At the same time, other parts of the PSP are quite useful. For example, everything after the program name in the command line used to invoke the COM program is stored in the PSP starting at offset 80H. If we had invoked HOST as

C:\HOST Hello there!

then the PSP would look like this:

2750:0000 CD 20 00 9D 00 9A F0 FE-1D F0 4F 03 85 21 8A 03 . ...O..!..

2750:0010 85 21 17 03 85 21 74 21-01 08 01 00 02 FF FF FF .!...!t!...

2750:0020 FF FF FF FF FF FF FF FF-FF FF FF FF 32 27 4C 01 ...2’L.

2750:0030 45 26 14 00 18 00 50 27-FF FF FF FF 00 00 00 00 E&....P’...

2750:0040 06 14 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ...

2750:0050 CD 21 CB 00 00 00 00 00-00 00 00 00 00 48 45 4C .!...HEL 2750:0060 4C 4F 20 20 20 20 20 20-00 00 00 00 00 54 48 45 LO ...THE 2750:0070 52 45 21 20 20 20 20 20-00 00 00 00 00 00 00 00 RE! ...

2750:0080 0E 20 48 65 6C 6C 6F 20-74 68 65 72 65 21 20 0D . Hello there! . 2750:0090 6F 20 74 68 65 72 65 21-20 0D 61 72 64 0D 00 00 o there! .ard...

2750:00A0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ...

2750:00B0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ...

2750:00C0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ...

2750:00D0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ...

2750:00E0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ...

2750:00F0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ...

At 80H we find the value 0EH, which is the length of “Hello there!”, followed by the string itself, terminated by <CR>=0DH. Likewise, the PSP contains the address of the system environment, which contains all of the “set” variables contained in AUTOEXEC.BAT, as well as the path which DOS searches for executables when you type a name at the command string. This path is a nice variable for a virus to get a hold of, since it tells the virus where to find lots of juicy programs to infect.

26 The Giant Black Book of Computer Viruses

The final step which DOS must take before actually executing the COM file is to set up the stack. Typically the stack resides at the very top of the segment in which a COM program resides (See Figure 3.2). The first two bytes on the stack are always set up by DOS so that a simple RET instruction will terminate the COM program and return control to DOS. (This, too, is a relic from CP/M.) These bytes are set to zero to cause a jump to offset 0, where the int 20H instruction is stored in the PSP. The int 20H returns control to DOS. DOS then sets the stack pointer sp to FFFE Hex, and jumps to offset 100H, causing the requested COM program to execute.

OK, armed with this basic understanding of how a COM program works, let’s go on to look at the simplest kind of virus.

Dans le document Computer Viruses Black Book GIANT (Page 26-31)