Ubuntu Cluster - Master node
This time I want to show how to build a HPC (High Performance Computing) cluster using Ubuntu Linux.
Before to begin
My Institute (Cinvestav - www.ira.cinvestav.mx) ask me to build a small cluster to general works and it'll be a learning system for bioinformatic. The cluster name is Beagle, they put the hardware and my friend LuisD and I the hard job. The design includes a master node to access, to check status and to send jobs, and 10 slaves nodes with /home exported with NFS, a job queque created with Sun Grid Engine, Ganglia is user to monitor the systems and also include MPI support.
Hardware
Install the Master Node
We have an arch amd64, so we use the Ubuntu Desktop for amd64, we download the ISO file, burn into a CD and install:
hda1 : 180 Mb : /boot
hda2 : 2.0 Gb : swap
hda4 : 24 Gb : /
hda5 : 4.6 Gb : /tftpboot
hda6 : 22 Gb : /var
hda7 : 22 Gb : /usr
sda1 : 2.0 Gb : swap
sda2 : 292 Gb : /home
Finish the installation, reboot, login and access a terminal to become root and create a password:
sudo su -
passwd
DHCP server
A DHCP server is build to talk with the slaves nodes:
apt-get install dhcp3-server
Now we edit /etc/dhcp3/dhcpd.conf to include the net 197.1.1.0/24 and add the MACs of each machine, include the hostname and use a netboot loader (PXE). Our file looks:
# dhcp.conf
# Network for the Beagle Cluster
# Juan Caballero @ Cinvestav 2008
ddns-update-style none;
subnet 197.0.0.0 netmask 255.0.0.0 {
default-lease-time 1200;
max-lease-time 1200;
option routers 197.1.1.1;
option subnet-mask 255.0.0.0;
option domain-name "local";
option domain-name-servers 197.1.1.1;
option nis-domain "beagle";
option broadcast-address 197.255.255.255;
deny unknown-clients;
allow booting;
allow bootp;
if (substring (option vendor-class-identifier, 0, 20)
= "PXEClient:Arch:00002") {
# ia64
filename "elilo.efi";
next-server 197.1.1.1;
} elsif ((substring (option vendor-class-identifier, 0, 9)
= "PXEClient") or
(substring (option vendor-class-identifier, 0, 9)
= "Etherboot")) {
# i386 and x86_64
filename "pxelinux.0";
next-server 197.1.1.1;
} else {
filename "/install/sbin/kickstart.cgi";
next-server 197.1.1.1;
}
host beagle.local {
hardware ethernet 00:e0:7d:b4:e1:13;
option host-name "beagle.local";
fixed-address 197.1.1.1;
}
host node00.local {
hardware ethernet 00:1b:b9:e2:0d:18;
option host-name "node00.local";
fixed-address 197.1.1.100;
}
host node01.local {
hardware ethernet 00:1b:b9:e1:cf:6a;
option host-name "node01.local";
fixed-address 197.1.1.101;
}
host node02.local {
hardware ethernet 00:1b:b9:e1:be:6e;
option host-name "node02.local";
fixed-address 197.1.1.102;
}
host node03.local {
hardware ethernet 00:1b:b9:cf:f3:55;
option host-name "node03.local";
fixed-address 197.1.1.103;
}
host node04.local {
hardware ethernet 00:1b:b9:e2:14:06;
option host-name "node04.local";
fixed-address 197.1.1.104;
}
host node05.local {
hardware ethernet 00:1b:b9:ce:85:9a;
option host-name "node05.local";
fixed-address 197.1.1.105;
}
host node06.local {
hardware ethernet 00:1b:b9:e2:0c:5f;
option host-name "node06.local";
fixed-address 197.1.1.106;
}
host node07.local {
hardware ethernet 00:1b:b9:cf:f7:29;
option host-name "node07.local";
fixed-address 197.1.1.107;
}
host node08.local {
hardware ethernet 00:1b:b9:cf:f3:25;
option host-name "node08.local";
fixed-address 197.1.1.108;
}
host node09.local {
hardware ethernet 00:1b:b9:e2:14:9f;
option host-name "node09.local";
fixed-address 197.1.1.109;
}
}
In the file /etc/defaults/dhcp3-server we wrote the network card to activate DHCP:
Interfaces="eth1"
Now we can restart the service:
/etc/init.d/dhcp3-server restart
More Network details
Edit /etc/hosts to include all the nodes, our file looks:
127.0.0.1 localhost
197.1.1.1 beagle.local beagle
197.1.1.100 node00.local node00
197.1.1.101 node01.local node01
197.1.1.102 node02.local node02
197.1.1.103 node03.local node03
197.1.1.104 node04.local node04
197.1.1.105 node05.local node05
197.1.1.106 node06.local node06
197.1.1.107 node07.local node07
197.1.1.108 node08.local node08
197.1.1.109 node09.local node09
Also we create a text file in /etc/machines with the names of all slaves nodes, this will be used in scripting.
node00
node01
node02
node03
node04
node05
node06
node07
node08
node09
NFS server
Install the packages:
apt-get nfs-common nfs-kernel-server
Edit /etc/exports to export /home and /tftpboot:
/home 197.1.1.0/24(rw,no_root_squash,sync,no_subtree_check)
/tftpboot 197.1.1.0/24(rw,no_root_squash,sync,no_subtree_check)
Start the file exportation with:
exportfs -av
PXE boot server
Install tftpd-hpa:
apt-get install tfptd-hpa
Edit /etc/defaults/tfptd-hpa to include:
#Defaults for tftpd-hpa
RUN_DAEMON="yes"
OPTIONS="-l -s /tftpboot"
Now we download the netboot file for Ubuntu amd64:
cd /tftpboot
wget http://tezcatl.fciencias.unam.mx/ubuntu/dists/gutsy/main/installer-amd64/current/images/netboot/netboot.tar.gz
tar zxvf netboot.tar.gz
Restart the service:
/etc/init.d/tftpd-hpa restart
SGE server
For the Sun Grid Engine we add an user named sgeadmin, and run the installation script, many options are defaults:
adduser sgemaster
wget http://gridengine.sunsource.net/download/SGE61/ge-6.1u3-common.tar.gz
wget http://gridengine.sunsource.net/download/SGE61/ge-6.1u3-bin-lx24-amd64.tar.gz
tar zxvf ge-6.1u3-common.tar.gz
tar zxvf ge-6.1u3-bin-lx24-amd64.tar.gz
./install-qmaster
Web server
Install Apache:
apt-get install apache2
Ganglia monitor
First we install dependencies, download the source and compile ganglia:
apt-get install rrdtool librrds-perl librrd2-dev php5-gd
wget http://downloads.sourceforge.net/ganglia/ganglia-3.0.7.tar.gz?modtime=1204128965&big_mirror=0
tar zxvf ganglia*
cd ganglia*
./configure --enable-gexec --with-gmetad
make
mkdir /var/www/ganglia
cp web/* /var/www/ganglia
Edit Apache configuration to access ganglia in the file /etc/apache2/sites-enabled/000-default
Also install the packages:
apt-get install ganglia-monitor gmetad
You can modify the default configuration if you edit /etc/gmond.conf and /etc/gmetad.conf.
Others programs
Use apt-get or manually compilated packages, for example we add a SSH server, the basic compilers and MPI support:
apt-get install openssh-server gcc g++ g77 mpich-bin openmpi-bin lam-runtime
In the next part we'll install the slaves nodes.
Links:
Ubuntu http://www.ubuntu.com/
Debian Clusters http://debianclusters.cs.uni.edu/index.php/Main_Page
SGE http://gridengine.sunsource.net/
Ganglia http://ganglia.info/
NFS http://nfs.sourceforge.net/
TFTP-HPA http://freshmeat.net/projects/tftp-hpa/
DHCP http://www.dhcp.org/
MPICH http://www-unix.mcs.anl.gov/mpi/mpich1/
OpenMPI http://www.open-mpi.org/
LAM/MPI http://www.lam-mpi.org/
Comments
Post a Comment