• Aucun résultat trouvé

B UILDING A B EOWULF

Dans le document Cluster Computing (Page 63-66)

4. SYSTEM INSTALLATION & TESTING

4.1. B UILDING A B EOWULF

Whilst this section is not intended to be a complete or comprehensive guide to building a Beowulf Cluster, it does list the salient points on the configuration process as well as the explicit configuration used for testing in the laboratory.

Building a Beowulf cluster requires a thorough working knowledge of the operating system including networking, file systems, system configuration, daemons and services. In relation to Linux this requires a working knowledge of up to 30 different configuration files and their individual format requirements to get a system up and running. This detailed configuration knowledge is learned through reading and implementing (by trial and error) the various How-Tos.

With respect to my knowledge and experience with Linux, it dates back to Redhat Linux 5.2 (October 1998) and the documentation that is particularly useful for building a cluster is noted in [11] and [38.1]

Linux Installation

Initially the Redhat 6.2 and 7.1 (codename: Seawolf, Linux Kernel Version 2.4.7-2) distributions were initially tested. It was found that both contained many bugs, especially to do with NFS and EXT2.

As soon as the Redhat 7.2 distribution was available, all nodes in the cluster were upgraded to RH7.2 as it implements the 2.4.x Kernel and the EXT3 file system that offers greater stability and performance. The RH7.2 distribution also includes versions of LAM and PVM that can be installed as part of the initial installations.

For laboratory-testing purposes the Linux Redhat 7.2 distribution will be used implementing the 2.4.7-10 kernel.

It is of note that the 2.4.x series kernels do not yet implement NFS over TCP instead UDP is used (which Linux uses by default as opposed to Sun Solaris using TCP).

In total eight nodes were used, the specifications of which is shown in Figure 4-1

In addition to the configuration as detailed in Figure 4-1, the master server was installed with 128Mb of Ram, an additional network card for external cluster access, and a 2Gb hard-drive for the /home directory which will be cross mounted within the cluster.

Installation

The following process was under taken in installing the system:

1. Install CD-ROM on the master node (server node).

2. Install Linux Redhat 7.2 on the master node from the CD-ROM using a CD-ROM boot image floppy (use the dos utility rawrite from CD-ROM 1).

3. Configure using Linuxconf and start the required services using serviceconf. (Note: if serviceconf is not available then symbolic linking is required to the Run Level 3 and 5 startups – however I prefer to use a GUI for system services operations, hence serviceconf was used).

4. Create the following accounts to be used throughout the whole cluster:

Name Password Purpose

root cluster # for administration beowulf beowulf # cluster operation

The root account is local to each machine, however the beowulf account is global (using the same group and user ID) and hence a change on any node to this account will be instantaneously reflected across all nodes. Additionally the beowulf account and home directory are located under /home/beowulf which is exported from the NFS server, hence only one set of config files exist for this account. The beowulf account was configured using the csh (C Shell) for compatibility and ease of configuration (requires conf files for POV-Ray, LAM and other parallel applications).

5. Start NFS server, exporting /home and /mnt/cdrom to all nodes in the cluster.

6. Ensure NFS, RSH, NTP, TCP/IP services are working. Use rpcinfo –p to make sure the NFS server is working (rpc.portmap, rpc.mountd, rpc.nfsd, rpc.statd, rpc.lockd, rpc.rquotad should be listed).

7. Copy Linux Redhat 7.2 CDs (1 and 2) from the CD-Rom to /home.

8. Install Linux Redhat 7.2 on each node over the network using an NFS image and a network boot image floppy – do not add accounts other than the root account at this stage.

9. Configure using Linuxconf and start the required services using serviceconf (all required files can be configured with these utilities or manually. Using these utilities is through recommended, however knowledge of the file formats is still required).

10. Add all nodes to the etc/hosts file, such as:

node1 192.168.0.1 # master server, NFS server and /home dir node2 192.168.0.2

node3 192.168.0.3 node4 192.168.0.4 node5 192.168.0.5 node6 192.168.0.6 node7 192.168.0.7

node8 192.168.0.8 # backup LAM server

11. DNS and NIS were not used to reduce daemon overhead. However could be used in future clusters to ease of maintenance and management.

12. Get Telnet up and running, as well as Linuxconf web access services for remote management of all nodes. Linuxconf web access allows remote configuration through a web-browser via port 98. For example:

http://192.168.0.x:98

13. Edit all Beowulf clients /etc/ntp.conf to add the server address 192.168.0.1 and set the Beowulf master clock to the correct time.

14. Cross mount /home and mnt/cdrom from the NFS server on each node of the system.

15. Remove internal passwords for each node in the cluster using a .rhosts file in the top level beowulf account directory.

16. Ensure rsh and rlogin access from the server to each node is possible.

17. Stop all unused daemons running on every node in the cluster. This includes lpd and sendmail for printer and email services respectively.

18. Stop all applications that consume resources such as processing power and memory.

Boot each node into Run-level 3 (Multi-User/Text/Full Network).

19. Install LAM, if not installed in the initial installation. Configure the /etc/lam/lam-bhost.lam file to contain the node names on the system.

20. Login to each node using a user account, rather than the root account. LAM does not allow the use of the root account as it could crash and destroy the system (for testing purposes, it is actually more convenient to log into each node as root and only the server node as beowulf, the server logs into each node using the beowulf operating account, even though passwords are not required).

21. Start LAM using the lamboot command on the server [Refer to 8.5.1 for details].

Notes on the Installation:

The following notes are for historical reference purposes:

1. Linux’s default implementation of NFS is over unreliable data transport UDP rather than reliable data transport such as TCP. This is in contrast to Sun Solaris. Linux can be configured to use TCP however the Linux 2.4.x series kernels do not provide support for this at the time of writing. Should reliability be a design requirement then the 2.2.x series kernels should be used such as Redhat 6.2. Another benefit of using earlier distributions is that they have smaller size-footprint. [41]

2. Whilst the test installation was carried out using a network installation, nodes can be cloned which reduces the overall work involving in setting up or in adding additional nodes in a large cluster. Information on cloning nodes can be obtained at:

ftp://ftp.sci.usq.edu.au/pub/jacek/beowulf-utils/disk-less/ [11]

3. Whilst PVM was not recommended in this document to be used as the lower layer of middleware, it does provide some noteworthy features such as fault-tolerance and an easy to use GUI that can be installed with Redhat 7.2.

All nodes in the cluster were located at the University of Technology in the ITS research laboratory in Building 1, Level 21 room 1/2122E. The Ethernet switch was located in room

Figure 4-2 – Test Cluster Computer in Lab B1/2122E

Dans le document Cluster Computing (Page 63-66)