Grid Infrastructure Component Directory Structure

Each component in Grid Infrastructure maintains a separate log file and records sufficient information under normal and critical circumstances. The information written in the log files will surely assist in diagnosing and troubleshooting Clusterware components or cluster health-related problems. Exploring the appropriate information from these log files, the DBA can diagnose the root cause to troubleshoot frequent node evictions or any fatal Clusterware problems, in addition to Clusterware installation and upgrade difficulties. In this section, we explain some of the important CRS logs that can be examined when various Clusterware issues occur.

alert<HOSTNAME>.log: Similar to a typical database alert log file, Oracle Clusterware manages an alert log file under the $GRID_HOME/log/$hostname location and posts messages whenever important events take place, such as when a cluster daemon process starts, when a process aborts or fails to start a cluster resource, or when node eviction occurs. It also logs information about node eviction occurrences and logs when a voting, OCR disk becomes inaccessible on the node.

Whenever Clusterware confronts any serious issue, this should be the very first file to be examined by the DBA seeking additional information about the problem. The error message also points to a trace file location where more detailed information will be available to troubleshoot the issue.

Following are a few sample messages extracted from the alert log file, which explain the nature of the event, like node eviction, CSSD termination, and the inability to auto start the cluster:

[ohasd(10937)]CRS-1301:Oracle High Availability Service started on node rac1.

[/u00/app/12.1.0/grid/bin/oraagent.bin(11137)]CRS-5815:Agent

'/u00/app/12.1.0/grid/bin/oraagent_oracle' could not find any base type

entry points for type 'ora.daemon.type'. Details at (:CRSAGF00108:) {0:1:2} in /u00/app/12.1.0/grid/log/rac1/agent/ohasd/oraagent_oracle/oraagent_oracle.log.

[cssd(11168)]CRS-1713:CSSD daemon is started in exclusive mode

[cssd(11168)]CRS-1605:CSSD voting file is online: /dev/rdsk/oracle/vote/ln1/ora_vote_002; details in /u00/app/12.1.0/grid/log/rac1/cssd/ocssd.log.

[cssd(11052)]CRS-1656:The CSS daemon is terminating due to a fatal error; Details at (:CSSSC00012:) in /u00/app/12.1.0/grid/log/rac1/cssd/ocssd.log

[cssd(3586)]CRS-1608:This node was evicted by node 1, rac1; details at (:CSSNM00005:) in /u00/app/12.1.0/grid/log/rac2/cssd/ocssd.log.

ocssd.log: Cluster Synchronization Service daemon (CSSD) is undoubtedly one of the most critical components of Clusterware, whose primary functionality includes node monitoring, group service management, lock services, and cluster heartbeats. The process maintains a log file named ocssd.log under the $GIRD_HOME/log/<hostname>/cssd location and writes all important event messages in the log file. This is one of the busiest CRS log files and is continuously being written; when the debug level of the process is set too high, the file tends to get more detailed information about the underlying issue. Before the node eviction happens on the node, it writes the warning message in the log file. If a situation such as node eviction, VD access issues, or inability of the Clusterware to start up on the local node is raised, it is strongly recommended that you examine the file to find out the reasons.

Following are a few sample entries of the log file:

2012-11-30 10:10:49.989: [CSSD][21]clssnmvDiskKillCheck: not evicted, file /dev/rdsk/c0t5d4 flags 0x00000000, kill block unique 0, my unique 1351280164

ocssd.l04:2012-10-26 22:17:26.750: [CSSD][6]clssnmvDiskVerify: discovered a potential voting file ocssd.l04:2012-10-26 22:36:12.436: [CSSD][1]clssnmvDiskAvailabilityChange: voting file /dev/rdsk/c0t5d4 now online

ocssd.l04:2012-10-26 22:36:10.440: [CSSD][1]clssnmReadDiscoveryProfile: voting file discovery string(/dev/rdsk/c0t5d5,/dev/rdsk/c0t5d4)

2012-12-01 09:54:10.091: [CSSD][30]clssnmSendingThread: sending status msg to all nodes

ocssd.l01:2012-12-01 10:24:57.116: [CSSD][1]clssnmInitNodeDB: Initializing with OCR id 1484043234 [cssd(7335)]CRS-1612:Network communication with node rac2 (02) missing for 50% of

•

timeout interval. Removal of this node from cluster in 14.397 seconds2013-03-15 17:02:44.964 [cssd(7335)]CRS-1611:Network communication with node rac2 (02) missing for 75% of

•

timeout interval. Removal of this node from cluster in 7.317 seconds2013-03-15 17:02:50.024 [cssd(7335)]CRS-1610:Network communication with node rac2 (02) missing for 90% of

•

timeout interval. Removal of this node from cluster in

Oracle certainly doesn’t recommend removing the log file manually for any reason, as it is governed by Oracle automatically. Upon reaching a size of 50 MB, the file will be automatically archived as cssd.l01 in the same location as part of the predefined rotation policy, and a fresh log file (cssd.log) will be generated. There will be ten archived copies kept for future reference in the same directory as part of the built-in log rotation policy and ten-times-ten file retention formula.

When the log file is removed before it reaches 50 MB, unlike the database alter.log file, Clusterware will not generate a new log file instantly until the removed file reaches 50 MB. This is because despite the removal of the file from the OS, the CSS process will be still writing the messages to the file until the file becomes a candidate for the rotation policy. When the removed file reaches a size of 50 MB, a new log file will appear or be generated and will be available in this way. However, the previous messages won’t be able to recall in this context.

When the CSSD fails to start up, or it is reported to be unhealthy, this file can be referred to ascertain the root cause of the problem.

crsd.log: CRSD is another critical component of Clusterware whose primary functionality includes resource monitoring, resource failover, and managing OCR. The process maintains a log file named crsd.log under the

$GIRD_HOME/log/<hostname>/crsd location and writes all important event messages in the log file. Whenever a cluster or non-cluster resource stops or starts, or a failover action is performed, or any resource-related warning message or communication error occurs, the relevant information is written to the file. In case you face issues like failure of starting resources, the DBA can examine the file to get relevant information that could assist in a resolution of the issue.

Deleting the log file manually is not recommended, as it is governed by Oracle and archived automatically. The file will be archived as crsd.l01 under the same location on reaching a size of 10 MB, and a fresh log file (crsd.log) will be generated. There will be ten archived copies kept for future reference in the same directory.

When the CRSD fails to start up, or is unhealthy, refer to this file to find out the root cause of the problem.

ohasd.log: The Oracle High Availability Service daemon (OHASD), a new cluster stack, was first introduced with 11gR2 to manage and control the other cluster stack. The primary responsibilities include managing OLR;

starting, stopping, and verifying cluster health status on the local and remote nodes; and also supporting cluster-wide commands. The process maintains a log file named crsd.log under the $GIRD_HOME/log/<hostname>/

ohasd location and writes all important event messages in the log file. Examine the file when you face issues running root.sh script, such as when the ohasd process fails to start up or in case of OLR corruption.

Oracle certainly doesn’t encourage deleting the log file for any reason, as it is governed by Oracle automatically.

The file will be archived as ohasd.l01 under the same location on reaching a size of 10 MB, and a fresh log file (ohasd.log) will be generated. Like crsd.log and crs.log, there will be ten archived copies kept for future reference in the same directory.

2013-04-17 11:32:47.096: [ default][1] OHASD Daemon Starting. Command string :reboot 2013-04-17 11:32:47.125: [ default][1] Initializing OLR

2013-04-17 11:32:47.255: [ OCRRAW][1]proprioo: for disk 0

(/u00/app/12.1.0/grid_1/cdata/rac2.olr), id match (1), total id sets, need recover (0), my votes (0), total votes (0), commit_lsn (3118), lsn (3118) 2013-04-17 11:32:47.368: [ default][1] Loading debug levels . . .

2013-04-17 11:32:47.803: [ clsdmt][13]Creating PID [6401] file for home

/u00/app/12.1.0/grid_1 host usdbp10 bin ohasd to /u00/app/12.1.0/grid_1/ohasd/init/

Upon successful execution of ocrdump, ocrconfig, olsnodes, oifcfg, and ocrcheck commands, a log file will be generated under the $GRID_HOME/log/<hostname>/client location. For EVM daemon (EVMD) process–relevant details, look at the evmd.log file under the $GRID_HOME/log<hostname>/evmd location. Cluster Health Monitor Services (CHM) and logger services are maintained under the $GRID_HOME/log/<hostname>/crfmond, crflogd directories.

Figure 2-4 depicts the hierarchy of the Clusterware component directory structure.

Figure 2-4. Unified Clusterware log directory hierarchy

Operating system (OS) logs: Referring to the OS-specific log file will be hugely helpful in identifying Clusterware startup and shutdown issues. Different platforms maintain logs at different locations, as shown in the following example:

HPUX - /var/adm/syslog/syslog.log AIX - /bin/errpt –a

Linux - /var/log/messages

Windows - Refer .TXT log files under Application/System log using Windows Event Viewer Solaris - /var/adm/messages

Caution

■ oracle Clusterware generates and maintains 0-sized socket files in the hidden './oracle' directory under the location /etc or /var/tmp (according to the platform). removing these files as part of regular log cleanup or unintentionally removing them might lead to a cluster hung situation.

Note

■ it is mandatory to maintain sufficient free space under the file system on which grid and rdbsM software are installed to prevent Clusterware issues; in addition, oracle suggests not to remove the logs manually.

Dans le document Expert Oracle RAC 12cExpert Oracle RAC 12c (Page 55-58)