Changes between Version 42 and Version 43 of xCAT

Show
Ignore:
Timestamp:
09/07/10 17:57:11 (9 years ago)
Author:
shuang (IP: 152.54.8.247)
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • xCAT

    v42 v43  
    6060=== Network Configuration === 
    6161We need to provide DNS and DHCP service for compute nodes. They can be run on a different machine. In our example, both are run on the head node (master).  
    62  1.First, configure DNS service. Edit /etc/hosts file: 
    63 {{{ 
    64 # Do not remove the following line, or various programs 
    65 # that require network functionality will fail. 
    66 127.0.0.1               localhost.localdomain localhost 
    67 ::1             localhost6.localdomain6 localhost6 
    68 192.168.201.49  mgt.renci.ben mgt 
    69 #n01 ip 
    70 192.168.201.12  n01 n01.renci.ben 
    71 #n01 ipmi ip 
    72 192.168.201.77  n01-ipmi n01-ipmi.renci.ben 
    73 ... 
    74 #switch ip 
    75 192.168.201.76  linksys-mgt.renci.ben 
    76 }}} 
    77 Note: you don't have to edit this file manually. Instead, run 
    78 {{{ 
    79 #tabedit hosts 
    80 }}} 
    81 to fill in all the node information, then run  
    82 {{{ 
    83 #makehosts 
    84 }}} 
    85 to populate node info to /etc/hosts.  
    86 Next, setup dns: 
    87 {{{ 
    88 #makedns 
    89 #service named start 
    90 }}} 
    91 Note that only entries in /etc/hosts that are part of a network in networks table are added to dns by makedns. In most cases, your networks table should be created automatically. Nevertheless, if you need to modify it or eliminate networks you don't want xCAT to control, mine looks like this: 
    92 {{{ 
    93 #tabdump networks 
    94 #netname,net,mask,mgtifname,gateway,dhcpserver,tftpserver,nameservers,ntpservers,logservers,dynamicrange,nodehostname,comments,disable 
    95 "192_168_201_0-255_255_255_0","192.168.201.0","255.255.255.0","eth0","192.168.201.1",,"192.168.201.49","192.168.201.254,152.54.4.3",,,"192.168.201.79-192.168.201.80",,, 
    96 }}} 
    97 We also need to config /etc/resolv.conf for nslookups, mine looks like this: 
    98 {{{ 
    99 search renci.ben 
    100 nameserver 192.168.201.49 
    101 }}} 
    102 To test it is setup correctly, run 
    103 {{{ 
    104 #host n01 
    105 }}} 
    106 make sure it's resolved to the right IP address (192.168.201.12 in our case).  
    107  1. Next, run the following commands to create /etc/dhcpd.conf for dhcpd service 
    108 {{{ 
    109 #makedhcpd -n 
    110 #service dhcpd restart 
    111 }}} 
     62 
     63 1. First, configure DNS service. Edit /etc/hosts file: 
     64 {{{ 
     65 # Do not remove the following line, or various programs 
     66 # that require network functionality will fail. 
     67 127.0.0.1               localhost.localdomain localhost 
     68 ::1             localhost6.localdomain6 localhost6 
     69 192.168.201.49  mgt.renci.ben mgt 
     70 #n01 ip 
     71 192.168.201.12  n01 n01.renci.ben 
     72 #n01 ipmi ip 
     73 192.168.201.77  n01-ipmi n01-ipmi.renci.ben 
     74 ... 
     75 #switch ip 
     76 192.168.201.76  linksys-mgt.renci.ben 
     77 }}} 
     78 Note: you don't have to edit this file manually. Instead, run 
     79 {{{ 
     80 #tabedit hosts 
     81 }}} 
     82 to fill in all the node information, then run  
     83 {{{ 
     84 #makehosts 
     85 }}} 
     86 to populate node info to /etc/hosts.  
     87 Next, setup dns: 
     88 {{{ 
     89 #makedns 
     90 #service named start 
     91 }}} 
     92 Note that only entries in /etc/hosts that are part of a network in networks table are added to dns by makedns. In most cases, your networks table should be created automatically. Nevertheless, if you need to modify it or eliminate networks you don't want xCAT to control, mine looks like this: 
     93 {{{ 
     94 #tabdump networks 
     95 #netname,net,mask,mgtifname,gateway,dhcpserver,tftpserver,nameservers,ntpservers,logservers,dynamicrange,nodehostname,comments,disable 
     96 "192_168_201_0-255_255_255_0","192.168.201.0","255.255.255.0","eth0","192.168.201.1",,"192.168.201.49","192.168.201.254,152.54.4.3",,,"192.168.201.79-192.168.201.80",,, 
     97 }}} 
     98 We also need to config /etc/resolv.conf for nslookups, mine looks like this: 
     99 {{{ 
     100 search renci.ben 
     101 nameserver 192.168.201.49 
     102 }}} 
     103 To test it is setup correctly, run 
     104 {{{ 
     105 #host n01 
     106 }}} 
     107 make sure it's resolved to the right IP address (192.168.201.12 in our case).  
     108 2. Next, run the following commands to create /etc/dhcpd.conf for dhcpd service 
     109 {{{ 
     110 #makedhcpd -n 
     111 #service dhcpd restart 
     112 }}} 
     113 3. To enable node discovery, we need to configure the switch so the following command works: 
     114 {{{ 
     115 $snmpwalk -v 1 -c public switch_ip 
     116 }}} 
    112117=== Node Configuration === 
    113 To fill in node information, we need to edit nodelist table. Mine looks like: 
    114 {{{ 
    115 #tabdump nodelist 
    116 #node,groups,status,statustime,appstatus,appstatustime,primarysn,comments,disable 
    117 "n01","compute,all","netbooting","09-03-2010 15:18:13",,,,, 
    118 "n01-ipmi","ipmi,all",,,,,,, 
    119 }}} 
    120 Note that you only need to specify the node and groups, the status and statustime is populated automatically, it will not reflect the real status if you change it outside of xCAT (e.g. press the power button to turn it off instead of using xCAT's ipmi). In xCAT group is somewhat similar with the concept in Linux. It's convenient in the sense that you can operate commands on a group without specifying each individual member. Run the following to verify nodelist works: 
    121 {{{ 
    122 #nodels compute 
    123 n01 
    124 ... 
    125 }}} 
    126  
    127 Next, we setup the node hardware management table, i.e., nodehm. Mine looks like this: 
    128 {{{ 
    129 #node,power,mgt,cons,termserver,termport,conserver,serialport,serialspeed,serialflow,getmac,comments,disable 
    130 "n01","ipmi","ipmi",,,,,,,,,, 
    131 "n02","ipmi","ipmi",,,,,,,,,, 
    132 }}} 
    133 Note that we will configure ipmi to allow LAN access. Some people may need to use console over LAN access -- since we only have IPMI 1.5, this is not covered here. 
     118 1. To fill in node information, we need to edit nodelist table. Mine looks like: 
     119 {{{ 
     120 #tabdump nodelist 
     121 #node,groups,status,statustime,appstatus,appstatustime,primarysn,comments,disable 
     122 "n01","compute,all","netbooting","09-03-2010 15:18:13",,,,, 
     123 "n01-ipmi","ipmi,all",,,,,,, 
     124 }}} 
     125 Note that you only need to specify the node and groups, the status and statustime is populated automatically, it will not reflect the real status if you change it outside of xCAT (e.g. press the power button to turn it off instead of using xCAT's ipmi). In xCAT group is somewhat similar with the concept in Linux. It's convenient in the sense that you can operate commands on a group without specifying each individual member. Run the following to verify nodelist works: 
     126 {{{ 
     127 #nodels compute 
     128 n01 
     129 ... 
     130 }}} 
     131 
     132 2. Next, we setup the node hardware management table, i.e., nodehm. Mine looks like this: 
     133 {{{ 
     134  #node,power,mgt,cons,termserver,termport,conserver,serialport,serialspeed,serialflow,getmac,comments,disable 
     135 "n01","ipmi","ipmi",,,,,,,,,, 
     136 "n02","ipmi","ipmi",,,,,,,,,, 
     137 }}} 
     138 Note that we will configure ipmi to allow LAN access. Some people may need to use console over LAN access -- since we only have IPMI 1.5, this is not covered here. 
    134139=== BMC/IPMI Configuration === 
    135 First, when you installed xCAT, an ipmitool package comes with it. Make sure you are not using ipmitool from other sources. You don't need to ipmitool in most cases, but it's important to know if you have troubles. Obviously DELL's ipmi implementation is different from IBM's. To configure IPMI, you can either do it at run time using ipmitool or at boot time (Ctrl-E on DELL PE 860). Read [www.dell.com/downloads/global/power/ps4q04-20040204-murphy.pdf] for details. Mine looks like this: 
    136 {{{ 
    137 # ipmitool -I lan -H 192.168.201.77 -U root -a lan print 1 
    138 Set in Progress         : Set Complete 
    139 Auth Type Support       : NONE MD2 MD5 PASSWORD  
    140 Auth Type Enable        : Callback : MD2 MD5 PASSWORD  
     140 1. First, when you installed xCAT, an ipmitool package comes with it. Make sure you are not using ipmitool from other sources. You don't need to ipmitool in most cases, but it's important to know if you have troubles. Obviously DELL's ipmi implementation is different from IBM's. To configure IPMI, you can either do it at run time using ipmitool or at boot time (Ctrl-E on DELL PE 860). Read [www.dell.com/downloads/global/power/ps4q04-20040204-murphy.pdf] for details. Mine looks like this: 
     141 {{{ 
     142 # ipmitool -I lan -H 192.168.201.77 -U root -a lan print 1 
     143 Set in Progress         : Set Complete 
     144 Auth Type Support       : NONE MD2 MD5 PASSWORD  
     145 Auth Type Enable        : Callback : MD2 MD5 PASSWORD  
    141146                        : User     : MD2 MD5 PASSWORD  
    142147                        : Operator : MD2 MD5 PASSWORD  
    143148                        : Admin    : MD2 MD5 PASSWORD  
    144149                        : OEM      : MD2 MD5  
    145 IP Address Source       : Static Address 
    146 IP Address              : 192.168.201.77 
    147 Subnet Mask             : 255.255.255.0 
    148 MAC Address             : 00:18:8b:f8:e3:58 
    149 SNMP Community String   : public 
    150 IP Header               : TTL=0x40 Flags=0x40 Precedence=0x00 TOS=0x10 
    151 Default Gateway IP      : 0.0.0.0 
    152 Default Gateway MAC     : 00:00:00:00:00:00 
    153 Backup Gateway IP       : 0.0.0.0 
    154 Backup Gateway MAC      : 00:00:00:00:00:00 
    155 802.1q VLAN ID          : Disabled 
    156 802.1q VLAN Priority    : 0 
    157 Cipher Suite Priv Max   : Not Available 
    158 }}} 
    159  
    160 Dell's IPMI implementation seems to be different from IBM's in the sense that once a session is established, it does not return an auth type in the response (Thanks Jarrod Johnson for help!). In /opt/xcat/lib/perl/xCAT/IPMI.pm, this response packet is dropped by the following code: 
    161 {{{ 
    162 if ($rsp[4] != $self->{authtype}) { 
    163     return 2; # not thinking about packets that do not match our preferred auth type 
    164 } 
    165 }}} 
    166 To solve this problem we commented the 2nd line out. Next, we need to setup the ipmi table like this: 
    167 {{{ 
    168 ]# tabdump ipmi 
    169 #node,bmc,bmcport,username,password,comments,disable 
    170 "compute","/\z/-ipmi/",,"root","renci",, 
    171 }}} 
    172 Note that the regular expression appends -ipmi to host name (e.g. n01-ipmi). Make sure your dns is setup correctly to resolve the host name.  
     150 IP Address Source       : Static Address 
     151 IP Address              : 192.168.201.77 
     152 Subnet Mask             : 255.255.255.0 
     153 MAC Address             : 00:18:8b:f8:e3:58 
     154 SNMP Community String   : public 
     155 IP Header               : TTL=0x40 Flags=0x40 Precedence=0x00 TOS=0x10 
     156 Default Gateway IP      : 0.0.0.0 
     157 Default Gateway MAC     : 00:00:00:00:00:00 
     158 Backup Gateway IP       : 0.0.0.0 
     159 Backup Gateway MAC      : 00:00:00:00:00:00 
     160 802.1q VLAN ID          : Disabled 
     161 802.1q VLAN Priority    : 0 
     162 Cipher Suite Priv Max   : Not Available 
     163 }}} 
     164 
     165 Dell's IPMI implementation seems to be different from IBM's in the sense that once a session is established, it does not return an auth type in the response (Thanks Jarrod Johnson for help!). In /opt/xcat/lib/perl/xCAT/IPMI.pm, this response packet is dropped by the following code: 
     166 {{{ 
     167 if ($rsp[4] != $self->{authtype}) { 
     168     return 2; # not thinking about packets that do not match our preferred auth type 
     169 } 
     170 }}} 
     171 To solve this problem we commented the 2nd line out.  
     172 
     173 2. Next, we need to setup the ipmi table like this: 
     174 {{{ 
     175 # tabdump ipmi 
     176 #node,bmc,bmcport,username,password,comments,disable 
     177 "compute","/\z/-ipmi/",,"root","renci",, 
     178 }}} 
     179 Note that the regular expression appends -ipmi to host name (e.g. n01-ipmi). Make sure your dns is setup correctly to resolve the host name. Also, the username/password will overwrite what's in passwd table. 
    173180 
    174181=== PXE configuration === 
    175 The node resources table (noderes) is used to specify the resources and settings when installing compute nodes. We use PXE, then the table looks like: 
    176 {{{ 
    177 # tabdump noderes 
    178 #node,servicenode,netboot,tftpserver,nfsserver,monserver,nfsdir,installnic,primarynic,discoverynics,cmdinterface,xcatmaster,current_osimage,next_osimage,nimserver,comments,disable 
    179 "compute",,"pxe","192.168.201.49","192.168.201.49",,,"eth0","eth0",,,,,,,, 
    180 }}} 
    181  
    182 To setup the TFTP server for PXE, we prepare the source by running: 
    183 {{{ 
    184 #copycds /dev/dvd  
    185 }}} 
    186 Note: assume that you have the Linux DVD in /dev/dvd, this command will copy every thing to /install. Or, if you downloaded the DVD iso, say, CentOS-5.5-bin-DVD.iso, you can run 
    187 {{{ 
    188 #copycds CentOS-5.5-bin-DVD.iso 
    189 }}} 
    190 Then, run: 
    191 {{{ 
    192 #mknb x86_64 
    193 }}} 
    194 to setup the TFTP server (the default directory is /tftpboot). To install compute nodes, run: 
    195 {{{ 
    196 rinstall compute 
    197 }}} 
    198  
    199 To create a net boot image, run: 
    200 {{{ 
    201 #./genimage -i eth0 -n tg3,bnx2 -o centos5.5 -p compute 
    202 #cd /install/netboot/centos5.5/x86_64/compute/rootimg/etc/ 
    203 #cp fstab fstab.old 
    204 #echo "compute_x86_64 / tmpfs rw 0 1 
     182 1. The node resources table (noderes) is used to specify the resources and settings when installing compute nodes. We use PXE, then the table looks like: 
     183 {{{ 
     184 # tabdump noderes 
     185 #node,servicenode,netboot,tftpserver,nfsserver,monserver,nfsdir,installnic,primarynic,discoverynics,cmdinterface,xcatmaster,current_osimage,next_osimage,nimserver,comments,disable 
     186 "compute",,"pxe","192.168.201.49","192.168.201.49",,,"eth0","eth0",,,,,,,, 
     187 }}} 
     188 
     189 2. To setup the TFTP server for PXE, we prepare the source by running: 
     190 {{{ 
     191 #copycds /dev/dvd  
     192 }}} 
     193 Note: assume that you have the Linux DVD in /dev/dvd, this command will copy every thing to /install. Or, if you downloaded the DVD iso, say, CentOS-5.5-bin-DVD.iso, you can run 
     194 {{{ 
     195 #copycds CentOS-5.5-bin-DVD.iso 
     196 }}} 
     197 Then, run: 
     198 {{{ 
     199 #mknb x86_64 
     200 }}} 
     201 to setup the TFTP server (the default directory is /tftpboot). To install compute nodes, run: 
     202 {{{ 
     203 #rinstall compute 
     204 }}} 
     205 
     206 3. To create a net boot image, run: 
     207 {{{ 
     208 #./genimage -i eth0 -n tg3,bnx2 -o centos5.5 -p compute 
     209 #cd /install/netboot/centos5.5/x86_64/compute/rootimg/etc/ 
     210 #cp fstab fstab.old 
     211 #echo "compute_x86_64 / tmpfs rw 0 1 
    205212        none /tmp tmpfs defaults, size=10m 0 2 
    206213        none /var/tmp tmpfs defaults, size=10m 0 2" >> fstab 
    207 #cd 
    208 #packimage -o centos5.5 -p compute -a x86_64 
    209 }}} 
    210 Test it by running: 
    211 {{{ 
    212 #nodeset n01 netboot 
    213 #rpower n01 boot 
    214 }}} 
    215  
    216 The genimage command invokes genintrd automatically to generate the initrd for netboot.  
    217 However, when the compute nodes boot up, the following error is returned: 
    218 {{{ 
    219 Kernel panic: no init found. Try passing init= option to kernel 
    220 }}} 
    221 To modify the initrd, do: 
    222  1. sudo -s 
    223  1. mkdir temp 
    224  1. cd temp 
    225  1. cat /tftpboot/xcat/netboot/centos5.5/x86_64/compute/initrd.gz| gzip -d | cpio -i 
    226  1. less init, the first line is 
    227 {{{ 
    228 !/sbin/busybox.anaconda sh 
    229 }}} 
    230  1. ldd sbin/busybox.anaconda 
    231  1. copy all the missing libs to lib64 or lib, take care of symbol links separately 
    232  1. find ./ | cpio -H newc -o > ../initrd 
    233  1. gzip initrd 
    234  1. cp initrd.gz /install/netboot/centos5.5/x86_64/compute/initrd.gz 
    235  1. nodeset compute netboot 
     214 #cd 
     215 #packimage -o centos5.5 -p compute -a x86_64 
     216 }}} 
     217 Test it by running: 
     218 {{{ 
     219 #nodeset n01 netboot 
     220 #rpower n01 boot 
     221 }}} 
     222 
     223 4. The genimage command invokes genintrd automatically to generate the initrd for netboot. However, when the compute nodes boot up, the following error is returned: 
     224 {{{ 
     225 Kernel panic: no init found. Try passing init= option to kernel 
     226 }}} 
     227 To modify the initrd, do: 
     228  1. sudo -s 
     229  1. mkdir temp 
     230  1. cd temp 
     231  1. cat /tftpboot/xcat/netboot/centos5.5/x86_64/compute/initrd.gz| gzip -d | cpio -i 
     232  1. less init, the first line is 
     233 {{{ 
     234 !/sbin/busybox.anaconda sh 
     235 }}} 
     236  1. ldd sbin/busybox.anaconda 
     237  1. copy all the missing libs to lib64 or lib, take care of symbol links separately 
     238  1. find ./ | cpio -H newc -o > ../initrd 
     239  1. gzip initrd 
     240  1. cp initrd.gz /install/netboot/centos5.5/x86_64/compute/initrd.gz 
     241  1. nodeset compute netboot