LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)
-   -   Scripting a network restart when net goes down. (https://www.linuxquestions.org/questions/linux-newbie-8/scripting-a-network-restart-when-net-goes-down-912182/)

TheOnlyQ 11-06-2011 11:24 AM

Scripting a network restart when net goes down.
 
Alright, so as a first-action solution I've come up with an idea.

Basically every now and then we get very quick short-lasting incoming ddos attacks and it seems the NIC gives up but the server is still online. So to counteract this I thought it would be good to have a script which checks if it can ping google.com or any other sustainable host.. this script can run via cron every 5minutes.

When it can't contact google.com and has already tried 2 times, it restarts the network with 'service network restart' and then removes the log of its past two attempts.

I'm just wondering, how would I allow the script to check if the ping is succeeding? I'm not great at bash scripting so I would appreciate any help.

Thanks!

spiky0011 11-06-2011 12:16 PM

How about
Code:

ping -i120 google.com

ButterflyMelissa 11-06-2011 12:17 PM

Hmm,

Maybe this could help, dunnow...

Quote:

#!/bin/bash

ping -c 2 www.google.com > /dev/null
if [ $? -ne 0 ]; then
echo "restarting network"
/etc/init.d/networking restart
fi
Save it somewhere and put the command in the proper cron folder, but...you know this...

needs root rights, though...

Of course, the "bacon" of the thing is the line where the restart happens, depends on your distro, on mine (Arch Linux) this could be:

Quote:

/etc/rc.d/network restart
therefore, check the manual...

Thor

suicidaleggroll 11-06-2011 12:24 PM

I wrote a script to do exactly this on one of our embedded systems a while back, works great.

The basic gist is to ping a known host (I usually use the router's IP since you don't want to go around restarting the network if it's just an Internet outage) and check the exit code like Thor showed. If the ping succeeds you exit, if it fails you wait a minute and try again. If it succeeds you exit, if it fails you wait a minute and try again. If it fails the third time, you write the event to a log and issue a network restart command.

TheOnlyQ 11-06-2011 01:52 PM

Quote:

Originally Posted by suicidaleggroll (Post 4517403)
I wrote a script to do exactly this on one of our embedded systems a while back, works great.

The basic gist is to ping a known host (I usually use the router's IP since you don't want to go around restarting the network if it's just an Internet outage) and check the exit code like Thor showed. If the ping succeeds you exit, if it fails you wait a minute and try again. If it succeeds you exit, if it fails you wait a minute and try again. If it fails the third time, you write the event to a log and issue a network restart command.

For learning purposes would you mind sharing your script?

I'm definitely not any good in bash so I'm basing this from piecing bits together, and I don't think that method works all to well so I would like to learn from something whole if you would be kind enough.

suicidaleggroll 11-06-2011 03:10 PM

Keep in mind that this was developed for an embedded debian system with a lot of restrictions, which is why it's written the way it is (no cron, etc). It should run on a regular system as well, but in that case I would probably make a few changes.

It's invoked in rc.local using nohup so it's always running in the background. With cron this wouldn't be necessary.

Code:

#!/bin/bash

last=0
last2=0

while [[ 1 ]]; do
  source ethconfig

  ping -c 1 $ETH_GW > /dev/null
  status=$?
  if [[ $status == 1 ]]; then
      echo "ping failed" >> eth_watchdog.log
      if [[ $last == 1 && $last2 == 1 ]]; then
        echo "calling /etc/init.d/networking restart" >> eth_watchdog.log
        echo `date` >> eth_watchdog.log
        /etc/init.d/networking restart
        status=0
      fi
  fi
  last2=$last
  last=$status

  sleep $ETHWD_SLEEP
done

ethconfig looks something like:
Code:

export ETHWD_SLEEP=60
export ETH_GW=192.168.1.1


TheOnlyQ 11-07-2011 11:48 AM

Quote:

Originally Posted by suicidaleggroll (Post 4517542)
Keep in mind that this was developed for an embedded debian system with a lot of restrictions, which is why it's written the way it is (no cron, etc). It should run on a regular system as well, but in that case I would probably make a few changes.

It's invoked in rc.local using nohup so it's always running in the background. With cron this wouldn't be necessary.

Code:

#!/bin/bash

last=0
last2=0

while [[ 1 ]]; do
  source ethconfig

  ping -c 1 $ETH_GW > /dev/null
  status=$?
  if [[ $status == 1 ]]; then
      echo "ping failed" >> eth_watchdog.log
      if [[ $last == 1 && $last2 == 1 ]]; then
        echo "calling /etc/init.d/networking restart" >> eth_watchdog.log
        echo `date` >> eth_watchdog.log
        /etc/init.d/networking restart
        status=0
      fi
  fi
  last2=$last
  last=$status

  sleep $ETHWD_SLEEP
done

ethconfig looks something like:
Code:

export ETHWD_SLEEP=60
export ETH_GW=192.168.1.1


Yeah, having a hard time converting this to CentOS. Any pro's got any tips?

suicidaleggroll 11-07-2011 12:47 PM

What kind of problems are you having? I just ran it without any issues on Redhat Enterprise 4 and Fedora 15.

TheOnlyQ 11-07-2011 02:02 PM

sleep: missing operand
Try `sleep --help' for more information.
./test.sh: line 7: ethconfig: No such file or directory
Usage: ping [-LRUbdfnqrvVaA] [-c count] [-i interval] [-w deadline]
[-p pattern] [-s packetsize] [-t ttl] [-I interface or address]
[-M mtu discovery hint] [-S sndbuf]
[ -T timestamp option ] [ -Q tos ] [hop1 ...] destination


I've tried changing ethconfig to ifconfig but commands are different.

Ping, not sure about that.

Sleep, the same, not sure. Seems your script would work but I just need to know how to convert several parts to centos.

suicidaleggroll 11-07-2011 02:50 PM

ethconfig is an ascii file with two lines as shown, it tells the script which IP to ping and how long to wait between repeat attempts. The script is failing on line 7 because you didn't create the ethconfig file. Since you didn't create the ethconfig file, that means the $ETH_GW and $ETHWD_SLEEP variables are empty, which is why the calls to ping and sleep are failing. Create the ethconfig file in your pwd (or create it elsewhere and hard-code the location in the script) and all of those problems will go away.

Alternatively, you can just swap the values in ethconfig into the script directly and remove the "source ethconfig" line. I didn't do that in my version because this code is always running, 24/7, so if I ever wanted to change the IP or the delay time, I would have to kill the script, change the values, and then re-start it. Separating those values into their own file allows me to change them without having to kill and restart the script (which is why ethconfig is being sourced every iteration of the loop). If you convert this script into a cron version (aka: not an infinite loop), then those problems become moot.

TheOnlyQ 11-08-2011 03:44 AM

Quote:

Originally Posted by suicidaleggroll (Post 4518262)
ethconfig is an ascii file with two lines as shown, it tells the script which IP to ping and how long to wait between repeat attempts. The script is failing on line 7 because you didn't create the ethconfig file. Since you didn't create the ethconfig file, that means the $ETH_GW and $ETHWD_SLEEP variables are empty, which is why the calls to ping and sleep are failing. Create the ethconfig file in your pwd (or create it elsewhere and hard-code the location in the script) and all of those problems will go away.

Alternatively, you can just swap the values in ethconfig into the script directly and remove the "source ethconfig" line. I didn't do that in my version because this code is always running, 24/7, so if I ever wanted to change the IP or the delay time, I would have to kill the script, change the values, and then re-start it. Separating those values into their own file allows me to change them without having to kill and restart the script (which is why ethconfig is being sourced every iteration of the loop). If you convert this script into a cron version (aka: not an infinite loop), then those problems become moot.

Alright, it is now working, from what I can tell it is registering that the ping is failing. watchdog log >


ping failed
ping failed
ping failed
ping failed


However it doesn't seem to be restarting the network, including it isn't logging that it is even attempting it, or echo'ing it.

Any advice? Once again appreciate it.

d3vrandom 11-08-2011 03:52 AM

I suggest using monit for this sort of thing. That is what it's designed to do.

TheOnlyQ 11-08-2011 03:54 AM

Quote:

Originally Posted by d3vrandom (Post 4518657)
I suggest using monit for this sort of thing. That is what it's designed to do.

Can you provide any more info? The script above I have working and it seems great but it isn't restarting the network, trying to work out why.

suicidaleggroll 11-08-2011 08:02 AM

Quote:

Originally Posted by TheOnlyQ (Post 4518655)
Alright, it is now working, from what I can tell it is registering that the ping is failing. watchdog log >


ping failed
ping failed
ping failed
ping failed


However it doesn't seem to be restarting the network, including it isn't logging that it is even attempting it, or echo'ing it.

Any advice? Once again appreciate it.

If you look closely at the script, you'll see it only restarts the network if the ping has failed 3 times in a row. If it just fails once or twice, it will write "ping failed" to the output, but as long as the ping succeeds the third time it won't restart the network. Throw a date in the log file next to the "ping failed" echo and you'll probably see that your "ping failed" prints aren't in a row, they're random and separated by some time, which is why it's not taking action.

jlinkels 11-08-2011 12:24 PM

Don't do like I did!
 
Once I put a script on my client of my systems which rebooted the machine when the network was down. I can't remember why I had the machine reboot instead of simply restarting the network. Maybe it had something to do with module loading. I must have installed it back in 2005 or so. I renewed some system components but cloned the installation when replacing the hard disk.

Anyway, years went by and (as usual with Linux) I never had a problem, and (as usual with Linux) things were working so well I had totally forgotten about installing that script. I even forgot that I ever wrote it.

What did this script do? Exactly 10 minutes after booting the client would ping our main server. That was the WinNT (at that time) PDC, and if that one was down, there would be something very wrong anyway, so that looked a reliable source. The PDC never went down and there were no problems at all.

Until 2 years ago I replaced that WinNT box with an all new Linux server. With a different IP address of course because the two servers would need to run side by side for a while. The the WinNT server was taken off-line.

Nothing happened, because my client only checked 10 minutes after booting. Not every 10 minutes. So weeks after taking the WinNT server off-line I rebooted the client. Which faithfully rebooted after 10 minutes.

Not only could I remember ever installing the script, there was also no association with the server going off-line. There were weeks in between.

So what does one do with a desktop computer which reboots after 10 minutes? Right, suspect the cooling, clean the fan, replace it, check the power supply, replace it, memory, hard disk.... Long story short, I can't remember anymore how I discovered the script again, but after replacing most of the hardware I finally did. IIRC I logged an entry in the /var/log/messages, but at first I didn't understand why a network failure would force a reboot.

Anyway, don't do it my way. It was stupid.

jlinkels


All times are GMT -5. The time now is 04:19 PM.