LinuxQuestions.org
Visit Jeremy's Blog.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 03-05-2010, 03:52 PM   #1
pimanlives
LQ Newbie
 
Registered: Feb 2010
Distribution: Ubuntu 9.10 (Karmic)
Posts: 5

Rep: Reputation: 0
/etc/init.d network dependent scripts running/failing at boot [on ubuntu server]


First a brief description...
I have been slowly building and configuring a small computer cluster to run chemistry simulations on for my research. As problems arise, I am typically able to find a solution through google and/or these forums. However this problem is one in which I have yet to find anything useful (although I may not understand what I should be searching for)

I have somewhat successfully installed ganglia (a cluster monitoring package), torque (pbs batch system), and maui (scheduler that can integrate with torque). However their init scripts do not always run correctly on a reboot.

Looking at the error messages in /var/log/daemon.log after a reboot I sometimes see the following.

Quote:
Mar 4 13:29:56 lithium /usr/sbin/gmond[924]: Error creating multicast server mcast_join=239.2.11.71 port=8649 mcast_if=NULL family='inet4'. Exiting.#012
Mar 4 13:29:57 lithium pbs_mom: LOG_ERROR::mom_server_add, host mendeleev.chemcluster.loc not found
Followed closely by
Quote:
Mar 4 13:30:04 lithium dhclient: DHCPDISCOVER on eth0 to 255.255.255.255 port 67 interval 14
Mar 4 13:30:04 lithium dhclient: DHCPOFFER of 192.168.100.4 from 192.168.100.1
Mar 4 13:30:04 lithium dhclient: DHCPREQUEST of 192.168.100.4 on eth0 to 255.255.255.255 port 67
Mar 4 13:30:04 lithium dhclient: DHCPACK of 192.168.100.4 from 192.168.100.1
Mar 4 13:30:04 lithium dhclient: bound to 192.168.100.4 -- renewal in 35587 seconds.
The scripts are supposed to be dependent on networking but they do not appear to be.
Quote:
# Required-Start: $network $named $remote_fs $syslog
To give a brief description of my cluster topology
I have a head node, mendeleev, acting as the dhcp, dns server, etc. While every node is essentially identity-less. They are told who they are based on information stored in the head node's dhcpd.conf file. I think the problem is that these scripts very much depend on not only networking being up (which seems to be a bit of a nebulous concept) but that the node having already been assigned a name, IP address, etc. Because if I run the script later, everything works fine. Additionally, I have tried a very simple hack of adding a 5 second sleep command to both of these init scripts (torque and ganglia) and that also seems to work.

I guess my question is, does this seem like a valid conclusion to draw, and what is the proper way to solve this type of init script network dependency problem? I feel a bit unclean throwing a sleep statement into an init script like this and I really thought that #Required-Start: $network, should have prevented these problems.

My apologies if this feels long or unwieldy, I was attempting to accurately describe my problem and attempts and it seems to have gotten a bit long.

Joseph Michalka
 
Old 03-05-2010, 04:29 PM   #2
jstephens84
Senior Member
 
Registered: Sep 2004
Location: Nashville
Distribution: Manjaro, RHEL, CentOS
Posts: 2,098

Rep: Reputation: 102Reputation: 102
What happens if you restart the scripts once you get a new IP? From my quick observations, it looks like the services go dead after your adapter fails.
 
Old 03-05-2010, 04:32 PM   #3
pimanlives
LQ Newbie
 
Registered: Feb 2010
Distribution: Ubuntu 9.10 (Karmic)
Posts: 5

Original Poster
Rep: Reputation: 0
If I restart the scripts after the node has come completely up (it both sees and is seen by the network) they run correctly. I only have this problem occur on boot/reboot
 
Old 03-05-2010, 10:35 PM   #4
catkin
LQ 5k Club
 
Registered: Dec 2008
Location: Tamil Nadu, India
Distribution: Debian
Posts: 8,578
Blog Entries: 31

Rep: Reputation: 1208Reputation: 1208Reputation: 1208Reputation: 1208Reputation: 1208Reputation: 1208Reputation: 1208Reputation: 1208Reputation: 1208
Quote:
Originally Posted by pimanlives View Post
... what is the proper way to solve this type of init script network dependency problem? I feel a bit unclean throwing a sleep statement into an init script like this and I really thought that #Required-Start: $network, should have prevented these problems.
AFAIK there is no "proper way".

As you comment, "network up" is a nebulous concept so there's no test for it. It's a recurring problem when doing NFS mounts at boot time. The usual (imperfect) solution is to run the network dependent boot script late in the boot sequence by giving them links beginning S<big number> where <big number> is up to 99 depending on which other boot scripts have to run after this one.

If robustness is more important to you than boot time then you could devise a "network up" test and run it in a delayed loop (with a maximum limit so it doesn't sit there forever) at the beginning of your script.
 
Old 03-06-2010, 02:01 AM   #5
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,007

Rep: Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191
Hi pimanlives

Are your startup scripts upstart, init or some other version?

Assuming upstart, may be as simple as changing current 'start on' to be:

start on started networking (may need to check which script starts your cards up and put its name here)
 
Old 03-15-2010, 07:59 PM   #6
pimanlives
LQ Newbie
 
Registered: Feb 2010
Distribution: Ubuntu 9.10 (Karmic)
Posts: 5

Original Poster
Rep: Reputation: 0
Further forays

catkin

I appreciate the confirmation that my assumption about "networking" not being fully up is the probable cause of my problem.

grail
The scripts were initially init.d type scripts and I was trying to use the directive
Code:
# Required-Start:    $network $named $remote_fs $syslog
to force them to wait for networking to be up. I did not really know what upstart was before you posted, but after looking into it and trying to find some examples I ran across another solution that appears to be working and seems to make sense for my current system setup.

I found out about the folder /etc/network/if-up.d. And how the scripts present in this directory are run whenever an network device is fully up. I ended up writing a simple script in this folder that calls the init.d scripts I already have. I then "update-rc.d -f $name remove" the scripts from the normal boot sequence. As catkin recommended, and I was trying to fake with a sleep command, these scripts should now only run when a network interface is "up".
Code:
#!/bin/sh
/etc/init.d/ganglia-monitor restart
/etc/init.d/pbs_mom restart
Thank you for the suggestions all of you, I am glad my first question on these forums has been resolved so quickly.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
network bridge / bond - slackware network init scripts Slax-Dude Slackware 1 07-17-2009 06:56 AM
[SOLVED] Optimize slack boot process / init scripts ? H_TeXMeX_H Slackware 73 10-12-2007 04:47 AM
Running network scripts on boot Brynn Linux - Wireless Networking 6 11-08-2006 09:39 AM
starting named on boot without init scripts evilchild SUSE / openSUSE 1 03-05-2005 07:46 AM
X not starting on boot (init scripts) cootetom Mandriva 11 05-05-2004 08:37 AM

LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie

All times are GMT -5. The time now is 05:43 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration