Hi!
I seem to have a talent for doing things at the wrong time, in the wrong order or just in general messing things up. Sigh...
I am installing a little cluster with OSCAR 5.1rc (which I have to do for a postgrad chemistry project. Yes, the connection between clusters and chemistry isn't immediately apparent, but it's there...)
At the second last step, when the cluster should be completed, it failed. There were problems with torque and maui finishing the configuration. I have a server node, and added one normal compute node.
Then, perhaps because my supervisor was wanting to see some progress, and because I had a few hours in which I couldn't continue with other work, I loaded VMware server 2.0 on the server node as we have a small partition for XP. (Running Fedora 8 on the larger partition).
Now, when I wanted to add some more nodes to the cluster, they don't manage to boot by way of the network. This is what I get on the nodes:
PXE-E11 : ARP timeout
PXE-E11 : ARP timeout
PXE-E38 : TFTP cannot open connection
PXE-M0F : Exiting Intel Boot Agent
So I guess the trouble-maker is the VMware and the DHCP. The VMware install said the following:
Code:
This system appears to have a DHCP server configured for normal use. Beware
that you should teach it how not to interfere with VMware Server's DHCP server.
There are two ways to do this:
1) Modify the file /etc/dhcpd.conf to add something like:
subnet 10.0.0.0 netmask 255.0.0.0 {
# Note: No range is given, vmnet-dhcpd will deal with this subnet.
}
2) Start your DHCP server with an explicit list of network interfaces to deal
with (leaving out vmnet1). e.g.:
dhcpd eth0
Consult the dhcpd(8) and dhcpd.conf(5) manual pages for details.
Perhaps the above makes everything crystal-clear to people who are not as clueless as I am. Firstly, I don't have a dhcpd.conf file in /etc. Now that I have loaded VMware, I have this /etc/vmware/vmnet8/dhcpd which looks like this:
Code:
#
# Configuration file for ISC 2.0b6pl1 vmnet-dhcpd operating on vmnet8.
#
# This file was automatically generated by the VMware configuration program.
# If you modify it, it will be backed up the next time you run the
# configuration program.
#
# We set domain-name-servers to make some DHCP clients happy
# (dhclient as configued in SuSE, TurboLinux, etc.).
# We also supply a domain name to make pump (Red Hat 6.x) happy.
#
allow unknown-clients;
default-lease-time 1800; # 30 minutes
max-lease-time 7200; # 2 hours
subnet 192.168.0.0 netmask 255.255.0.0 {
range 192.168.128.0 192.168.255.254;
option broadcast-address 192.168.255.255;
option domain-name-servers 192.168.0.2;
option domain-name "localdomain";
option routers 192.168.0.2;
}
and similarly for vmnet1. Is there some other dhcpd file that I should change as shown in 1) above?
Secondly, I don't really know how to do 2). Is 2) a once off thing, or what?
Thirdly, I don't need VMware urgently until the cluster is up and running. Could I uninstall VMware, and just install it again later? Hm, but I'm afraid it might cause problems again. I would prefer to sort out the problems now, and not have some mysterious problems popping up later.
Any advice would be greatly appreciated!