Saturday, April 5, 2008

Ubuntu Cluster - Slave nodes

In this part we'll install and configure the other nodes, in the master node we have a DHCP server and a PXE network boot which serves the netboot system, so we just need to connect the net cable, a keyboard and a monitor to the node, turn on and wait to install the base system.

Options include partition of the hda disk, we use the defaults and put all the system in a partition and default swap size.

We add the general user named beagle and after rebooting, login, become root, change the password and install the SSH server:


sudo su -
passwd
apt-get install openssh-server

Repeat this for all the nodes. The next steps will be only from the master node.


SSH access


You need to create a pair of keys for each user to access any node without a password, first we create it in the general user beagle:

ssh-keygen
cp .ssh/id_pub .ssh/authorized_keys

Root also need this keys, but we don't export the root's home, after you created the keys, you need to copy to all the nodes. This step ask every time for the root's password, don't worry, it'll be the first and last time:

su -
ssh-keygen
cp .ssh/id_pub .ssh/authorized_keys
for NODE in `cat /etc/machines`
do
rsh $NODE mkdir .ssh
rcp .ssh/authorized_keys $NODE:.ssh/authorized_keys
done

In all the next steps you need to access as root in each node.


Exporting HOME


We connect to the node, install NFS packages, change /etc/fstab to include the /home from master node and delete old home files:

ssh nodeXX
apt-get install nfs-common
echo "197.1.1.1:/home /home nfs defaults,auto 0 0" >> /etc/fstab
rm -rf /home/*
mount -a


Hosts adjustments


Edit /etc/hosts to include all the other nodes in the cluster:

197.1.1.1 beagle.local beagle
197.1.1.100 node00.local node00
197.1.1.101 node01.local node01
197.1.1.102 node02.local node02
197.1.1.103 node03.local node03
197.1.1.104 node04.local node04
197.1.1.105 node05.local node05
197.1.1.106 node06.local node06
197.1.1.107 node07.local node07
197.1.1.108 node08.local node08
197.1.1.109 node09.local node09


Install SGE


SGE files are exported in /home/sgeadmin, we install the dependencies, add the user and install:

apt-get install binutils
adduser sgeadmin
/home/sgeadmin/install_execd

Note: Check the values for GID in /etc/passwd and /etc/groups, this must be the same in the master node.


Managing the nodes


Many administrative tasks will be the same for each node, so we create a bash script (/sbin/cluster-fork) to do it:

#!/bin/bash
# cluster-fork COMMANDS
# Script to execute COMMANDS in all nodes in /etc/machines
# Juan Caballero @ Cinvestav 2008
for NODE in `cat /etc/machines`
do
echo $NODE:
rsh $NODE $*
done

Now we can run the same command in all nodes without problems, but maybe you want to run commands in non-interactive mode, for example to upgrade all the node:

cluster-fork apt-get -y update
cluster-fork apt-get -y upgrade


Add users in the cluster


Any user added to the master node will be exported to the other nodes, so we can run the adduser command, remember to have the same UID and GID in all nodes, if you had added the user in the same sequence, you don't have problems, if did not, you must edit all the /etc/passwd and /etc/groups. and don't forget to create valid access keys for passwordless login.


Finally you have a HPC cluster running in Linux Ubuntu, but many steps can be applied in other Linux distros with few changes. I want to run performance test to compare this cluster with the others we have. Maybe later I put some photos.

4 comments:

  1. This comment has been removed by the author.

    ReplyDelete
  2. Hi I would like to access your full article on making cluster on UBUNTU how can I access that.

    ReplyDelete
    Replies
    1. sorry, it was a problem with the dynamic view, now you can see the complete article.

      Delete