LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Networking (https://www.linuxquestions.org/questions/linux-networking-3/)
-   -   Linux Server Occasionally losing connectivity (https://www.linuxquestions.org/questions/linux-networking-3/linux-server-occasionally-losing-connectivity-231047/)

bennydtown 09-15-2004 04:53 PM

Linux Server Occasionally losing connectivity
 
I have a Dell poweredge server running Red Hat 9 that serves a fair amount of traffic. It ran well for several months, but recently it has taken to occasionally losing it's network connection.

Actually, as far as I can tell, it doesn't recognise any problem with the connection, but it just stops working and can no longer ping the gateway (although the gateway is up, running fine and pingable from the outside). It is on a persistent 10 Mbps ethernet connection, the nic card continues to show lights, but the packets just seem to stop getting through.

Oddly enough, doing a 'service network restart' doesn't seem to fix the problem, only a full system restart. I've tried switching cables and using the other nic card (there are two on the machine) but the problem persists. So, I've got an ugly hack solution in place right now where a cron job restarts the whole machine if it can't ping the gateway once per hour. Obviosuly far from ideal.

Anyway, I'm primarily a programmer, so my system administrative skills are pretty thin. I suppose what I need to do is start scouring the logs, but I'm not really sure which ones I should look at. Any guidance would be greatly appreciated.

Thanks for your time,
Bennydtown
:scratch:

bennydtown 09-15-2004 07:26 PM

Addendum: After yet another occurance of this strange loss of connectivity, I reviewed /var/log/messages I know from my access logs exactly when the server lost connection. /var/log/messages has no entries at that time. In fact it has absolutely no entries until that ping-testing cronjob issued the shutdown command.

Thanks again.

bennydtown 09-16-2004 02:40 PM

One more piece of information: When the connection stops working, the machine can ping it's own IP address and localhost, but it gets an "incomplete" response to arp -an.

jymbo 09-16-2004 04:00 PM

A usual, we have to ask...is there a firewall on this RH server?

This is a total stab in the dark, but here are a few troubleshoots:

1.) The next time your connection tanks, do #ifconfig on the the interface and see if it lists anything under "errors:" and "dropped:".

2.) Pull up a few terminals while your connection is still running:
terminal 1: #tcpdump -i eth0 (substitute the interface you want to monitor)
terminal 2: #tail -f /var/log/messages
terminal 3: #tail -f /var/log/syslog
terminal 4: #top

Sit back and wait for the connection to tank, then you can try to see what happened just prior to the event.

3.) If you can afford the downtime, try running the server with your web service turned off (I'm assuming this is a web server). Just let the box idle with the 4 terms I described above. See if your connection still tanks.

4.) Do a "#df -m" to see if you're running out of disk space on / (sounds crazy, but this happened to me once before).

bennydtown 09-16-2004 11:25 PM

Thanks for the reply Jymbo.

1) Results from ifcfgonfig contains no errors drops and looks pretty similar to an ifconfig when everything is working:
eth1 Link encap:Ethernet HWaddr 00:06:5B:3D:1E:99
inet addr:206.168.218.114 Bcast:206.168.218.119 Mask:255.255.255.248
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:8011 errors:0 dropped:0 overruns:0 frame:0
TX packets:5733 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:100
RX bytes:704343 (687.8 Kb) TX bytes:4738835 (4.5 Mb)
Interrupt:17 Base address:0xec80 Memory:fe2fe000-fe2fe038

lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
UP LOOPBACK RUNNING MTU:16436 Metric:1
RX packets:14 errors:0 dropped:0 overruns:0 frame:0
TX packets:14 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:3004 (2.9 Kb) TX bytes:3004 (2.9 Kb)

2) /var/log/messages shows nothing around the loss of connection
my system doesn't have any /var/log/syslog


3) I may have to try turning off httpd, but haven't yet.

4) df says there's plenty of space left on all partitions.

bennydtown 09-20-2004 11:00 AM

For posterity’s sake, here was the resolution to my problem:

The issue turned out to be a hardware failure of the network interfaces. I have managed a workaround by installing a new Linksys PCI card.

Diagnosing the problem was complicated by the two integrated Nics in the machine. Because neither one was working, I had mistakenly assumed it was a software related problem rather than a hardware one. It turns out that the two interfaces on the poweredge share most of their circuitry in a fairly small area. So, it is entirely possible for a single hardware issue to disrupt both of their performance.

It is possible to deactivate those two integrated nics through the machine's bios, under the "Integrated Devices" menu option. In researching the problem, I saw that older versions of the bios did not include that functionality, so if you can't find it, try upgrading your bios.

Anyway, good luck to anybody else with a similar problem.


All times are GMT -5. The time now is 04:46 AM.