• Aucun résultat trouvé

Problem Areas

Dans le document Solving HP-UX Problems (Page 63-71)

Problems with HP- UX cluster operation can occur in the following areas:

• System Boot-up (cluster server)

• System Boot-up (clients)

• System Panics

• LAN Problems

• CDF Mix-ups

• Configuration/ Clusterization Problems System Boot-up Problems

For system boot-up problems that can apply to all systems (standalone, cluster servers and cluster clients), see Chapter 5, "System Boot- Up Problems".

Troubleshooting problems with clients that won't boot

HP- UX Cluster servers are disk-based systems and act very much like

standalone systems during the boot up process. HP- UX clients, on the other hand, do not have an attached disk from which to get their operating system.

They must receive their kernel over LAN from a cluster server. The daemon rbootd runs on the cluster server and handles boot requests from HP- UX client nodes. This adds a little complexity to the boot up process and a few more areas where things can go wrong.

4-4 HP-UX Cluster Problems

If you are having problems with booting an HP -UX client node, check the following things:

• The client node is listed in the server's / etc/ clusterconf file.

• The / etc/ clusterconf file has the correct syntax (you check this using the command ccck(lm)).

• The kernel parameters associated with HP- UX clusters are set up properly.

o For information on how the kernel parameters should be set: see Appendix A, "System Parameters" in the System Administration Tasks manual (see the section of the appendix called "Cluster Related Parameters)."

o To view/modify how your cluster related kernel parameters are currently set:

1. run sam 2. Highlight

button.

. and activate the ~ control 3. Highlight ... : and activate the ~ control

button. A list of configurable kernel parameters will be displayed.

4. From the "View" menu (on the menu bar), choose . A "Filter"

panel will be displayed.

5. Set the "Operator" field for the item "Class" to contain "Matches".

the menu bar).

HP-UX Cluster Problems 4-5

4

4

Local Disk Boot-up Problem

If a disk (local to a client computer) has a boot area on it, the client computer may try to boot from the local disk, instead of the cluster server. For details on how to handle this, see "System Boots From Local Disk Instead of the Cluster Server" in Chapter 5 in Chapter 5, "System Boot-Up Problems" . System Panics

System panics are covered in Chapter 10, "System Panics" of this manual.

There are several conditions specific to HP- UX clusters that can cause System Panics. none of the kernels running in the cluster have NFS.

If one node in a cluster has CD-ROM configured in its kernel,

LAN Problems

HP- UX clusters are implemented using a low-level protocol to pass

information/messages over LAN between the various cnodes in the cluster.

Because HP- UX clusters are so heavily dependent on the LAN, they are vulnerable to many of the problems that can occur in LAN configurations.

Problems such as those listed here can cause HP- UX clusters not to function:

• Broken LAN cable

• Improperly terminated LAN cable (each end of the LAN must have a 50 ohm terminator or the LAN will not function properly, if at all)

• Extremely heavy LAN traffic

• Bad LAN connections/hardware

• Improper LAN configurations CDF Mix-ups

If you use SAM to configure your cluster, you shouldn't run into problems in this area too often. If you manually create your own CDFs (for new programs, etc.), you might accidentally place the contents of a file in the wrong context of the CDF. For example, in a mixed cluster (consisting of Series 300/400 and Series 700 computers), you might have created a program (on one of the clients) that you want to make available to all of the Series 300/400 clients in the cluster. The context that you should use is the processor type. But, if you simply copy the executable to the CDF (as in the example below), autocreation will make the file /users/bin/proga+/yourcnodename. For information on autocreation, see Chapter 2, "Understanding Clusters" in the manual Managing Clusters of HP 9000 Computers.

cp a.out /users/bin/proga Note: jusersjbinjproga is a

cnp

This program will then be accessible only to the system yourcnodename and not to the other Series 300s/400s in the cluster. The proper command to use

IS:

cp a.out /users/bin/proga+/HP-MC68020

If, due to a CDF mix-up, you attempt to execute a command that doesn't match the architecture of the system (for example, a command is compiled on

HP-UX Cluster Problems 4-7

4

4

a Series 700 computer and you try to use it on a Series 300/400 computer), you will see the error message "Executable file incompatible with hardware. II Some useful tools to help you locate/correct problems with CDFs are:

"-hidden" option causes find to include elements of hidden directories (CDFs) in its search. The "-type H" option causes find to match on files that are CDF hidden directories.

The file command attempts to classify a file by examining its contents. It can usually identify (and display) which files are Series 300/400 files and which files are HP-PA (Series 700/800) files.

• Series 300/400 program files will be listed as "s200 executable"

• Series 700/800 program files will be listed as "s800 executable"

• Shell script files are usually listed as "commands text"

• Other text files (such as /etc/passwd) are usually listed as to directories containing CDF's), then the elements of the CDF are displayed (similar to showcdf).

4-8 HP-UX Cluster Problems

showcdf(l) Based on the current contents of a hidden directory and the context string for the computer where the command was executed, showcdf will list the name of the file (the element) within the hidden directory that matches the context of your computer. This is very helpful in determining which file within the CDF is being matched (if any) by other commands. Here are two examples (the first shows which element of the CDF / etc/ ini ttab is being used by the computer where showcdf was executed. The second shows which element of the CDF /lib is being used by the computer where showcdf was executed. ):

showcdf /etc/inittab /etc/inittab+/hpxyz showcdf /lib

/lib+/HP-PA

HP-UX Cluster Problems 4-9

4

4

makecdf(lm) Converts a "normal" file to a CDF and allows you to specify the context for the contents of the original file. This is the safest way to manually create a CDF if you must do so. For a description of how to use makecdf, refer to the makecdf(lm) manual reference page and to Managing Clusters of HP 9000 Computers Sharing HP- UX File Systems.

CDF confusion can also occur when you have a file that has a

"+"

sign as the last character in its file name. This is legal because (as previously mentioned in the discussion of the Is command) the true indicator that a file is a CDF is that:

1. It has its SETUID bit set.

2. It is a directory.

To avoid confusion, it is best not to use

"+"

signs in your file names, especially as the last character.

It is possible for it to appear that a file doesn't exist. This happens when the file is a CDF and there isn't an element of the CDF that matches the context of the computer that you are running on. Using Is -H ensures that a CDF is always shown.

Configuration/Clustering Problems

Problems with ini t

In setting up an HP -UX cluster, there are many details that must be attended to. Most of these are handled for you by the SAM utility. During configuration, it is possible to make a mistake in the data entry phase for LAN Link Level Addresses, IP addresses, and other information. If the information in the /etc/clusterconf file (created by SAM during cluster configuration) is incorrect, the ini t program on the cluster server will complain at boot time.

You might see a message similar to:

INIT: WARNING: LAN hardware inconsistent with /etc/clusterconf The above error message indicates that the LINK LEVEL ADDRESS (LLA) listed in the file /etc/clusterconf does not match that of your LAN interface card.

4-10 HP-UX Cluster Problems

A system's context is set during boot time using the contents of the file / etc/ clusterconf. If you remove or corrupt / etc/ clusterconf from a running system, there will be no effect on the system's context until the next time that system is booted. You can verify that the entries in /etc/clusterconf have the correct syntax by using the command ccck(lm).

For detailed information on what the contents of / etc/ clusterconf should look like, see Chapter 8, "Introduction to Cluster Administration" in the manual Managing Clusters of HP 9000 Computers.

Note Note the exact wording of the error message, and based on its text, correct the problem as soon as possible. If you do not, error messages such as "invalid argument" might appear at clusterization time.

"Failed kernel self test" errors

• Failed kernel self test : Cannot allocate file system buffer.

• Failed kernel self test : Cannot allocate kernel network buffer.

• Failed kernel self test : Cannot allocate kernel message buffer.

• Failed kernel self test : Cannot invoke CSP.

These error conditions are probably caused by incorrect configuration of diskless kernel parameters, but they could also indicate that something is seriously wrong (such as a hardware failure). Check the log file

/usr/adm/errlog for possible configuration problems. The directory /usr/adm is a CDF! Be sure to check the element of the CDF matching the context of your computer.

Note After using SAM to configure the kernel on a client, be sure to reboot the client before configuring the kernel on another client.

If you fail to reboot the first client, the kernel that you made will be overwritten by the second client kernel information, and the first client kernel will not be installed.

HP-UX Cluster Problems 4-11

4

4

Dans le document Solving HP-UX Problems (Page 63-71)