LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Networking (https://www.linuxquestions.org/questions/linux-networking-3/)
-   -   Network connectivity flaky, fine on other systems (https://www.linuxquestions.org/questions/linux-networking-3/network-connectivity-flaky-fine-on-other-systems-509382/)

exodist 12-11-2006 09:03 AM

Network connectivity flaky, fine on other systems
 
Interface comes up fine, gets ip and router. I try to use the interface and it works fine, can go to google etc. However after a minute or 2 of usage it will suddenly stop, I will try to ping and ping will sit there silently, after a varying ammount of time it will start to work again, when it does the browser goes around fine and ping starts giving me messages, if I have ping going when it suddenly starts again it will give me giant numbers for ms timing.

I have a gigabit network going here, I have 5 computers connected to it and all but 1 are fine.

Basically this computer has 3 nic's, one for a direct gigabit (crossover cable) connection to an nfs server, one is a connection to the gigabit network all my systems use, then the last one is to a dsl router that provides a static ip. 3 systems have this exact same setup. I used the multipath routing to support this, here is the tutorial I wrote on it: http://wiki.linuxquestions.org/wiki/...load_balancing

2 of the systems work perfectly fine, the third does not. I am pretty sure it is not my dual ethernet script that is causing this, just to be sure I tested by starting w/o the dual ethernet and using only one interface configured via dhcp and the problem still occured.

I have tried reconfiguring the kernel with every option I can think of for why this is happening.

I have tried several network cards and all have the same problem:
usb 10/100 (pegasus)
3com 10/100 (3c905btx)
linksys gigabit (realtek)
onboard (nforce)

Strangly enough however this does not happen on the direct connection tot he nfs server, that connection is rock solid.

I have tried changing what port the computer is connected to on the switch, and I know the switch is good because I have other comps connected to it not experiencing problems. When the internet is in flake mode on this system others have no problems.

Kernel 2.6.18, I have tried 17 and 19 as well
Gentoo linux x86
intel core 2 duo e6600
evga nforce i680 chipset
1gb ram

--Added:
I have checked and verified this problem occurs on the dsl interface as well, it seems any interface used for internet connectivity has the problem, but ones w/o internet connectivity do not.

jantman 12-12-2006 01:22 AM

I'm confused as to why this machine has 3 NIC's in it. Is it acting in the capacity of a router or firewall?

Have you tried switching out the effected NIC and cables? What about using top to watch the load on the system, or ethereal to watch the network load?

exodist 12-12-2006 01:45 AM

as my first message clearly states I have tried several network cards both pci, usb, and onboard, all used different chips and drivers. Several kernel versions, and several kernel compile schemes from minal to maximized. I have tried upgrading and downgrading several software packages.

As for system load, I am pretty sure it is low to none considering this is a core 2 duo e6600 with the only software running being the MINIMAL gentoo startups (freshly built, tried rebuilding gentoo from scratch to fix problem) + X, fluxbox, and firefox.

As for why it needs 3 network cards:

The first gigabit is connection to my nfs server, this computer specifically needs a direct connection because it absolutely cannot have it's nfs connection interfered w/, it is too vital.

The second gigabit is to my general gigabit network on which is a cable modem/router for high speed internet.

The third network, 10/100 is directly connected to a dsl router, this is important because the system needs a static ip so I can access it remotely. Before you ask why I do not just get static for my cable modem the answer is simple, they do not offer it. As for why I do not settle for dsl then? simple 1.5mbps vs 7mbps... the choice is obvious.

This config worked fine on my previous system that did this job, but a motherboard failure in another system caused me to shuffle hardware. This computer does not use, and is not connected to any hardware that was in or used by the system that died, so it is not inheriting any bad hw.

every package is from the gentoo stable set, and is compiled with -o2, not -o3 or anything psycho like that.


added:
Yes I have tried swapping around cables.

jantman 12-12-2006 01:59 AM

Sorry, I guess I shouldn't post past 02:00.

Interesting... what about using dynamic DNS like that from dynDns.org for the cable? It worked for me for 3+ years, with some limitations...

You've got me beat... sounds like it could be a hardware problem, if you've tried different kernels and OS builds...

Have you tried network traffic analysis?

You said this happens on the DSL interface as well... is this simultaneous on both? If it happens on both DSL and cable interfaces, it rules out any hardware problem outside of your box.

Have you tried connecting a different box, same patch cables and all, to the DSL or cable and seeing if the problem replicates? This would pretty much narrow it down to inside-the-box or outside-the-box.

exodist 12-12-2006 09:03 AM

in this case I am doing this rather than dyndns because the cable internet is cheep, and the dsl is provided through my job, and used on my boses backup server that I keep here :-D also I need more than 1 static ip because I also have 2 other systems that I use for my domain.

Another box on these cables did not have the same problem. (I just replaced the old box w/ this one)

the only real difference here is the motherboard cpu and ram, everything else is as it was in the old system that worked. (the old system did not die, just was time for an upgrade)

I suspected the motherboard at first because it happened on the second onboard (strangly not the first, still doesn't on that one, both onboards are same nforce driver) but when 3 different gigabit adaptors, 2 10/100 adaptors and a usb adaptor all experienced the same problem I suspected software.

I just re-compiled the os overnight in 64 bit mode, I will test it out this way and see if the problems persist, you can't get much different than a completely recompiled os switching between 32 and 64 bit modes, if it still happens I will be at an even more complete loss.

exodist 12-12-2006 09:33 PM

no love in 64 bit mode, it still stalls out occasionally. This time as soon as my browser seemed to hand I started a ready terminal pining my router, I also had gkrellm2 up with a graph. gkrell m showed only a few bytes of activity 60-90. ping seemed to hand, then after a little waiting it started saying packets recieved, I hit control+c after 7 recieved, I checked and it said 32 sent, 7 recieved, 78% packet loss.

no kernel messages.

I did try changing cables just before this as well.

exodist 12-12-2006 11:47 PM

I ran another set of tests, 2 systems, both 64 bit, one amd64 on an old asus board, the other e6600 on the evga i680.

The test:
all of the following simeltaniously
download a dvd iso of debian linux from an http mirror
endless loop of downloading the google frontpage (while [ 0 -eq 0 ]; do wget www.google.com; done)
and ping 192.168.0.1 (my router)

on both systems this process ran for 15 minutes or more

the i680 system using a pci gigabit card had 55% packet loss, and the debian iso failed it's download
the i680 system using a usb 10/100 card had 37% packet loss, and the debian iso also failed it's download on this one.
the amd64 system when it was done I did a control+c to stop the ping, 0% packet loss

same router, same kernel version and basic config (slightly modified for different cpu's etc.)
Go to Top of Page

exodist 12-13-2006 08:39 PM

I am an idiot and must appologise. This is indeed a problem on my end... I tried everything I could think of for 2 weeks before making a report because I wanted to be certain it was not my end before wasting anyones time...

The problem was that I had a box hidden away under my desk that I had forgotten about when I made my new ip scheme. There was an ip address conflict, I resloved it and there are no more problems, and I can even justify the test results showing eth0 as good and others as bad:

The tests where they were initilised as eth1 I used the network scripts that gave them the static ip's, this was conflicted, obvious bad
the tests where it was eth0 I used dhcpcd real quick, thus no ip conflict.

I am at a loss of words for anyone who's time I have wasted w/ this.


All times are GMT -5. The time now is 07:57 AM.