Version 31 (modified by shuang, 9 years ago)

--

Setting up xCAT 2.4.3 on DELL PE 860

Note:

  1. Redhat or similar is preferred.
  2. only eth0 is needed, eth1 is left for ORCA to configure. Head node's eth0 (192.168.201.49) is connected to Internet. Compute nodes do not need to have access to internet, everything can be managed by the head node.

xCAT Installation

xCAT installation on CentOS 5.5 is simple:

  1. # cd /etc/yum.repos.d
  2. # wget http://xcat.sourceforge.net/yum/xcat-core/xCAT-core.repo
  3. # wget http://xcat.sourceforge.net/yum/xcat-dep/rh5/x86_64/xCAT-dep.repo
  4. # yum clean metadata
  5. # yum install xCAT.x86_64

xCAT Configuration

The configuration of xCAT is all about managing the tables. Run

tabdump -d

for an introduction of all tables (can be found as /etc/xcat/*.sqlite). Run

tabdump -d $table

for an introduction of $table's elements. First, run

tabdump site

to verify xCAT is correctly installed. You should see the default values for site. Next, we need some modifications to the site table. There are two ways to modify a table, one is to use chtab, the other is to use tabedit. Our site looks like this:

#key,value,comments,disable
"blademaxp","64",,
"domain","renci.ben",,
"fsptimeout","0",,
"installdir","/install",,
"ipmimaxp","64",,
"ipmiretries","3",,
"ipmitimeout","2",,
"consoleondemand","no",,
"master","192.168.201.49",,
"maxssh","8",,
"ppcmaxp","64",,
"ppcretry","3",,
"ppctimeout","0",,
"sharedtftp","1",,
"SNsyncfiledir","/var/xcat/syncfiles",,
"tftpdir","/tftpboot",,
"xcatdport","3001",,
"xcatiport","3002",,
"xcatconfdir","/etc/xcat",,
"timezone","America/New_York",,
"useNmapfromMN","no",,
"nameservers","192.168.201.49",,
"ntpservers","mgt",,
"forwarders","192.168.201.254",,
"dhcpinterfaces","eth0",,

Note: Both "master" and "nameservers" point to 192.168.201.49, IP address of head node eth0's. This interface also provides dhcpd service to all compute nodes ("dhcpinterfaces"=>"eth0"). "forwarders" points to our local name server.

Network Configuration

First, edit /etc/hosts file:

# Do not remove the following line, or various programs
# that require network functionality will fail.
127.0.0.1               localhost.localdomain localhost
::1             localhost6.localdomain6 localhost6
192.168.201.49  mgt.renci.ben mgt
#n01 ip
192.168.201.12  n01 n01.renci.ben
#n01 ipmi ip
192.168.201.77  n01-ipmi n01-ipmi.renci.ben
...
#switch ip
192.168.201.76  linksys-mgt.renci.ben

Note: you don't have to edit this file manually. Instead, run

#tabedit hosts

to fill in all the node information, then run

#makehosts

to populate node info to /etc/hosts. Next, setup dns:

#makedns
#service named start

Note that only entries in /etc/hosts that are part of a network in networks table are added to dns by makedns. In most cases, your networks table should be created automatically. Nevertheless, if you need to modify it or eliminate networks you don't want xCAT to control, mine looks like this:

#tabdump networks
#netname,net,mask,mgtifname,gateway,dhcpserver,tftpserver,nameservers,ntpservers,logservers,dynamicrange,nodehostname,comments,disable
"192_168_201_0-255_255_255_0","192.168.201.0","255.255.255.0","eth0","192.168.201.1",,"192.168.201.49","192.168.201.254,152.54.4.3",,,"192.168.201.79-192.168.201.80",,,

We also need to config /etc/resolv.conf for nslookups, mine looks like this:

search renci.ben
nameserver 192.168.201.49

To test it is setup correctly, run

#host n01

make sure it's resolved to the right IP address (192.168.201.12 in our case). Next, run the following commands to create /etc/dhcpd.conf for dhcpd service

#makedhcpd -n
#service dhcpd restart

Node Configuration

To fill in node information, we need to edit nodelist table. Mine looks like:

#tabdump nodelist
#node,groups,status,statustime,appstatus,appstatustime,primarysn,comments,disable
"n01","compute,all","netbooting","09-03-2010 15:18:13",,,,,
"n01-ipmi","ipmi,all",,,,,,,

Note that you only need to specify the node and groups, the status and statustime is populated automatically, it will not reflect the real status if you change it outside of xCAT (e.g. press the power button to turn it off instead of using xCAT's ipmi). In xCAT group is somewhat similar with the concept in Linux. It's convenient in the sense that you can operate commands on a group without specifying each individual member. Run the following to verify nodelist works:

#nodels compute
n01
...

Next, we setup the node hardware management table, i.e., nodehm. Mine looks like this:

#node,power,mgt,cons,termserver,termport,conserver,serialport,serialspeed,serialflow,getmac,comments,disable
"n01","ipmi","ipmi",,,,,,,,,,
"n02","ipmi","ipmi",,,,,,,,,,

Note that we will configure ipmi to allow LAN access. Some people may need to use console over LAN access -- this is not covered here.

IPMI Configuration

First, when you installed xCAT, an ipmitool package comes with it. Make sure you are not using ipmitool from other sources. You don't need to ipmitool in most cases, but it's important to know if you have troubles. Obviously DELL's ipmi implementation is different from IBM's.

Netboot image

The genimage command invokes genintrd automatically to generate the initrd for netboot. However, when the compute nodes boot up, the following error is returned:

Kernel panic: no init found. Try passing init= option to kernel

To modify the initrd, do:

  1. sudo -s
  2. mkdir temp
  3. cd temp
  4. cat /tftpboot/xcat/netboot/centos5.5/x86_64/compute/initrd.gz| gzip -d | cpio -i
  5. less init, the first line is
    !/sbin/busybox.anaconda sh
    
  6. ldd sbin/busybox.anaconda
  7. copy all the missing libs to lib64 or lib, take care of symbol links separately
  8. find ./ | cpio -H newc -o > ../initrd
  9. gzip initrd
  10. cp initrd.gz /install/netboot/centos5.5/x86_64/compute/initrd.gz
  11. nodeset compute netboot

Attachments