Hi LQ,
I'm in the process of building a Beowulf cluster. All of the nodes in the cluster are running Linux and are networked together, and I am using the 192.168.0.x address range for my LAN.
I have added the following two lines to each computer's /etc/rc.d/rc.local file, which is a shell script that gets executed after all the other setup happens.
Code:
ifconfig eth0 192.168.0.<number> netmask 255.255.255.0
hostname node<number>
where <number> is the number of the computer.
I've added the necessary aliases to the /etc/hosts files, so that each computer can identify the others by hostname. The nodes use SSH to communicate with each other, and I have set up proper SSH authentication so that no password is required when nodes communicate with each other or run remote commands.
The application I am trying to run on the cluster is returning various TCP errors. Are steps mentioned above enough to effectively change the identities of the computers on the network, or is there more I need to do?
The errors I am getting (if they are any help) are:
Code:
rmcd: getaddrinfo: Temporary failure in name resolution
TCP connect error: Unknown error message.
TCP connect error: return value errno=43
TCP: Connect failed. node01 -> masternode:32809
This is on Aurora Linux, which is Fedora 6 for SPARC chips, running bash.
Thanks very much for your help,
Stephen