Home made cluster node fails to start sometimes on account of nfs
Hi all,
I've build a home-made small cluster built up of a master and 1 disk-less slave node. Lately it happens that the node 1 fails to start, reporting the following message: -------------------------------------------------------- IP-Config: Complete: [ 12.318051] device=eth0, addr=192.168.100.21, mask=255.255.255.0, gw=192.168.100.2, [ 12.414252] host=192.168.100.21, domain=mydomain.com, nis-domain=(none), [ 12.499742] bootserver=192.168.100.2, rootserver=192.168.100.2, rootpath= [ 12.589739] md: Skipping autodetection of RAID arrays. (raid=autodetect will force) [ 12.681474] Looking up port of RPC 100003/2 on 192.168.100.2 [ 12.750322] Looking up port of RPC 100005/1 on 192.168.100.2 [ 12.819465] Root-NFS: Server returned error -13 while mounting /diskless/192.168.100.21 [ 12.915257] VFS: Unable to mount root fs via NFS, trying floppy. [ 12.987233] VFS: Cannot open root device "nfs" or unknown-block(2,0) [ 13.063343] Please append a correct "root=" boot option; here are the available partitions: [ 13.163295] Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(2,0) [ 13.262198] Pid: 1, comm: swapper Not tainted 2.6.31-gentoo-r6 #4 --------------------------------------------------------- If I try to hard rebooting it through the switch it fails again, whereas if I wait say 3-5 minutes and reboot it, it starts normally: --------------------------------------------------------- IP-Config: Complete: [ 12.404086] device=eth0, addr=192.168.100.21, mask=255.255.255.0, gw=192.168.100.2, [ 12.500328] host=192.168.100.21, domain=mydomain.com, nis-domain=(none), [ 12.585810] bootserver=192.168.100.2, rootserver=192.168.100.2, rootpath= [ 12.675797] md: Skipping autodetection of RAID arrays. (raid=autodetect will force) [ 12.767526] Looking up port of RPC 100003/2 on 192.168.100.2 [ 12.836319] Looking up port of RPC 100005/1 on 192.168.100.2 [ 12.929577] VFS: Mounted root (nfs filesystem) readonly on device 0:15. --------------------------------------------------------- I have never had such a problem so far and perhaps I messed the whole thing up by unintentionally altering some configuration file. As I am at lost of ideas and checked (or at least I presume so) every possible file on PC and forum on the web, any help on pinpointing the hitch would be very welcomed. Thanks, Pier |
Well, error 13 is a permissions issue I believe. Do the server logs mention anything about this particular failure? Have you tried increasing the log level of the NFS server during one of these failures?
|
Quote:
I managed to get the node1 starting again by removing the lines related to ntp, which I recently added in order to getting right the master time. /etc/conf.d/local.start reads: # /etc/conf.d/local.start # This is a good place to load any misc programs # on startup (use &>/dev/null to hide output) echo # eth0 -> internet echo "Setto eth0 192.168.0.129 up ..." ifconfig eth0 192.168.0.129 up route add default gw 192.168.0.1 dev eth0 echo echo # eth1 -> Gigabit for fast comunication with node1 echo "Setto eth1 192.168.99.2 up ..." ifconfig eth1 192.168.99.2 up echo echo "Setto eth2 192.168.100.2 up ..." # eth2 -> comunications with node1 ifconfig eth2 192.168.100.2 up echo echo "Abilito fooldns..." cp /etc/resolv.conf.fooldns /etc/resolv.conf echo echo "Abilito modalita wol su on board eth0" echo ethtool -s eth0 wol g echo ################ Partenza node1 ######################## echo "Lancio il demone dhcpd in ascolto su eth2..." # /etc/init.d/dhcpd start # echo # sleep 2 # echo "Avvio il nodo 1..." # echo # /sbin/node_01.up # ######################################################## --------------- what follows has been removed ---------- sleep 2 echo echo "Aggiorno ora di sistema" echo if ping -c 1 -q -W 2 -w 2 ntp1.ien.it >/dev/null; then /usr/sbin/ntpdate ntp1.ien.it else echo echo "Web non raggiungibile: impossibile aggiornare orario" echo fi ######################################################## |
All times are GMT -5. The time now is 03:14 PM. |