Beowulf
Hello,
Iam trying to build a cluster and currently have two machines hooked up. I plan to expand the cluster once i have this working.
Hardware Configuration:
1 pentium 200 MHz machine with 48 MB of RAM
1 pentium 100 MHz machine with 48 MB of RAM
Operating system:
RedHat Linux 6.2
I have made the installations on both the machines to be exactly the same.
I then installed the MPICH-1.2.3 on both machines , as a normal user, in exactly the same directories on both the machines, with exactly the same options to "./configure"
I have modified the " /etc/hosts.equiv " file to include both machines on the network. at present I can " rsh " from one machine to another and can also run the listing from either machine.
I am having trouble in trying to run the "tstmachines" script to test the availability of the machines for multinode processing and i get errors of the kind
unexpected response from 192.168.1.1 :
-> /bin/ls : /home/srik/mpich-1.2.3/sbin/mpichfoo : no such file or directory
the explanation that comes along with this says
the " ls " test failed on some machines. this usually means that you do not have a common file system on all of machines in your machines list; MPICH requires this for mpirun ( it is possible to handle this in a procgroup file; see documentation for more details )
other possible problems include :
the remote shell command does not allow you to run " ls "
see documentation about remote shell and rhosts
you have a common file system, but with inconsistent names
see documentation o the automounter fix
I need help on this. I tried to mail the people at anl, but i havent heard anything from them in 3 days. could someone please help me out on this.
Thanks,
srik.
|