• Aucun résultat trouvé

ERROR CODE LEVELS

Dans le document first if (Page 28-44)

BOOT - Bootstrap errors are errors by tests burnt into a bootstrap rem. The error code is displayed in hex on the board-edge LEDs.

INITIALIZATION- These errors are reported by the CTOS

in1tial1z~~10~ ro~~in~s and dre recorded in plain English on the

sys-::er.1 log.~ <{~C /Yl.Pi-0-t

CR.~SH ERRORS- These are serious errors detected by CTOS interrupt handlers and are recorded in the system log. Crashes cause an auto re-boot to occur if the keyswitch is in normal.

STAT~S ERRORS- These errors are not as severe as crash errors. CTOS logs status errors related to possible hardware malfunctions.

9.2

BOOTSTRAP ERRORS

INITIALIZATION ERRORS

When the operating system loads into a board., certain

i·ni t ial iation. routines are run. The routines ini ti aii ze and test, if possible and applicab_le, the following:

l. Memory above the operating system 2. Interrupt controller(s)

3. Timers

4. The boards ability to detect a bus time-out

s.

The boards ability to detect a parity error 6. Communications controllers.

If any of these routines detect an error, the error will be logged and the operating system will not be executed on that board. Initialization errors are l099ed in plain English.

Use the MPLOG command to check for initialization errors.

3

9.4

CRASH STATUS ERRORS

Certain hideous hardware errors can be detected by CTOS and are reported as crash status errors. These.errors·are re~orded in the system log and can be viewed by using the MPLOG command. If port. (software problem)

A processor board attempted to make a transfer to an existing I/O port or memory location but an acknowledge was not received due to a circuitry

failure. (hardware problem)

ERC 23 - An uncorrectable ~emery error has occurred.

The 80186 microprocessor fetched an illegal instruction.

ERC 27 - Divide overflow. A number was divided by zero.

When ERC 21 - 28 occur, the error log entry also contains (8) hex words that further describe the error.

· Field l - Hex equi val.ent of the decimal ERC.

Field 2 - Process number of the process that was running when the error occurred. This information is of little use when a hardware error has occurred.

Field 3 - Contents of the General Status Register. This register is loaded when a non-maskable interrupt occurs. The contents of this register can be decoded to determine specifics about the

crash status that was reported. The bl.ts in the General Status Register are described below.

Field 4 - Contents of the memory location that was read when a dou.ple bit error occured.

Field 5 - Contents of the remote slot register.

Field 6 - Contents of base register O. Base register 0 holds bits 16 - 31 of the off-board memory address.. In the case of ERC 23, "this field contains the contents of the ECC syndrome register.

Field 7 - Contents of the CS register following the execution of the instruction that caused the crash.

Field ~ - Contents of the IP register following the execution of the instruction that caused the crash.

s

9.6

GENERAL STATUS REGISTER non-existence memory location. The reference could have been initiated locally ·or by local OMA controller.

4 OFP l =This board is the designated file processor.

SAMPLE CRASH STATUS IN SYSTEM LOG ERC 22 Bus time-out CP in slot 74

(1) (2) (3) (4) (5) (6) (7) .(8)

Status words 0016H 0008H DBA3H OOOOH 0075H OOOOH 0301H 0009H

(l) 16 hexidecimal • 22 decimal.

(2) Failure occurred in process 8.

{3) Bit 10 ~Time outr Bit 6 - Cpu address.

Bit 1,0 - Address greater than or equal to COOOOh.

(4) Only applies _to ERC 23.

(5) Board in slot 75 was being referenced.

{6) Memory l:ower than address FFFFh (64K) was being referenced.

(7)/(8) Instruction proceeding the instruction at address 03010:0009 was being executed when the time-out occurred.

POSSIBLE CAUSES:

l. Board in slot 75 2. Board in slot 74 3. Programming error

7

9.8

S'l'ATUS CODES plain english descriptions and often status a!ld commantj. bytes.

The interpretation of these bytes exceeds the scope of this level co;::.:tiunications. These errors messages are also accom9ani·ea by status bytes and a plain english description of the error. The

SYSTEM TROUBLESHOarING ~OOULE

OBJECTIVE

Upon completion of this module, t~e participant will be able to isolate and repair MegaFrame and programmable terminals at the 'board level.

OVERVIEW

The MegaFrame has built in software features and indicators which will assist the ·troubleshooter in locating the failing logic

board, disk drive, or power supply. Familiarity with software cteration and conf iqurabitily will facilitate expedient isolation of problems.

SOFTWARE CONFIGURATION PROBLEMS

Due to the numerous software configuration files used by the MegaFrame system, it is always a good idea to suspect a software configuration problem when something in.the system was changed and it suddenly stopped working. Leaving blank lines in JCL files and forgetting to place carriage return after eac~ line are the most common cause of errors. These mistakes will often prevent the system from s~ccessfully booting.

HUMAN INDUCED PROBLEMS

l. Trying to use disk drives in an "A" enclosure with the

resistor pack removed.

7. "8" enclosures must have terminator in the first drive.

·l

7. Installing a Winchester drive that does not have all four drive selection straps jumpered.

8. Pulling out n cartridge disk before the indicator stops blinking.

9. Failing to terminate cluster lines propeTly.

10. Using a cluster cable on a fast cluster line that does not meet the required specificati9ns.

11. Running the cluster test on a PT that is still connected

to

the MegaFrame. This can seriously confuse the MegaFrame and other stations connected to the line. This problem can be resolved by re-bootstrapping the MegaFrame and associated workstations and terminals.

12. Plugging an RS-232 cable into the printer port when an

energized peripheral is connected to the other end. This can destroy the ICs that drive the printer channel. It is also possible to punch through the ICs and contaminate the ~ain

data bus. ·

13. Configuration file does not match printer characteristics.

14. Neglecting to connect an SC board to an SP board at J3 by using a private bus jumper.

ISOLATION TECHNIQUES

SIMPLIFICATION TECHNIQUE

Malfunctions are best isolated by using the simplification technique. In other words, a malfunctioning system should be finding problems because any error encountered during the

execution of the JCL entries will also be reported on the screen.

DISK ANO FILE l?ROCESSOR ISOLATION The FP Board won't boot from Dl !11

Is the FP malfunctioning ?

Is the Winchester drive malfunctioning.?

Are the Cables bad ?

Is the power supply producing the correct voltages?

Has the softw~re on the hard disk been d~stroyed?

Wish

r

had some 'parts to swap. ..

The situation described above offers many questions. The fastest solution to the problem is to exchange parts until the problem

BEFORE SIGNING A WINCHESTER'S DEATH CERTIFICATE, ONE SHOUL!') ATTEMPT TO RE-IN.ITI.:a.LIZE A~D RE-INSTALL SOFTWARE. IN ~ANY C:\SSS, THIS ACTION WILL RESOLVE THE PROBLE~L BE SURE TO ENTER THE

INFORMATION FROM THE !3AD SPOTS CARD ATTACHED TO TqE DRIVE.

3

If the FP has a serious malfunction, it will re~ort a bootstrap error code in its board edge light emitting diodes, thus

indicating that the FP is faulty.

As mentioned above, the system should be reduced to its minimum required configuration to allow deductive isolation of another failing part.

INTERMITTENT PROBLEMS

The information recorded in the system log is often helpful when isolating an intermittent error. For example, the ERC 22 (bus time out) error logs the slot number of the processor board being

~eferenced. With this information, the trouble could be isolated to one of two boards.

SYSTEM LOG

Using the MPLOG command and the error listings provided is the most expedient tec~nique for finding a problem.

MEGAFRA~E TROUBLE ISOLATION 2. Disconnected power indicator cable.

3. Bad power indicator.

3. Display assembly is malfunctioning.

s

#S

l. [sys]<sys>master.cnf file contains incorrect text.

2. The operation systems for the other boards are not on the

#8

~12

SYMPTOM

A board never tries to boot {does not shift LED's) . POSS!SLE CAUSES

l. The board is bad.

2. Bent pin on system bus connector.

3. Private bus jumper is incorrectly installed.

tl3

SYMPTOM

_Checking volume status on a drive causes an ERROR 300.

POSSIBLE CAUSES

l. Loose drive cables.

2. Loose connection between disk power supply and the drive.

3. Disk power supply fuse is blown.

4. Disk power supply AC connector is loose or bad.

5. Disk module is bad.

6. FP board is bad.

#14

SYMPTOM

.Checking volume status on drive causes an ERROR 216.

·POSSIBLE CAUSES

l. Loose or bad cables.

2. Drive has never been initialized by using the M!VOLU~E

command.

3. Diagnostics have been run on drive but the drive has not been re-initialized by using the ~!VOLUME command.

4. Disk module i$ bad.

5. FP board is bad.

#15

SYMPTO~

! 11 1 INTER.1•.UTTENT ERROR 301 11111 PROB.r...B LE C.a. USES

l. Bad spot information was not entered when the drive was initialized using the ~!VOLUME command.

2. Loose drive cables.

3. Disk drives are not terminated properly.

4. Disk module is bad.

Dans le document first if (Page 28-44)

Documents relatifs