LinuxQuestions.org
Help answer threads with 0 replies.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Distributions > Debian
User Name
Password
Debian This forum is for the discussion of Debian Linux.

Notices

Reply
 
Search this Thread
Old 05-06-2009, 09:01 PM   #1
emgee3
LQ Newbie
 
Registered: May 2009
Posts: 14

Rep: Reputation: 1
Trying to troubleshoot system (network?) pauses


I'm running Debian 5 PPC on a Apple G4 that I'm using as a router. Since I installed it, I have a strange problem where every several minutes (I've seen it between 4 and 6 minutes, but not every 4-6 minutes) the system pauses and doesn't accept any input for a 1-3 minutes. Then it continues on as if nothing happened.

I'm not running X, this is all from the command line. I noticed this once or twice when setting up the system, but didn't think much about it. After I installed the router, it's gotten a bit more annoying, as I ssh to a server behind the router and mid-typing all of a sudden I have to wait.

I suspect it's coming from one network cards based on some google searches I've done, but I can't tell if I'm describing the problem properly.

This is a lcpci that shows the network cards:
Code:
01:02.0 Ethernet controller: VIA Technologies, Inc. VT86C100A [Rhine] (rev 06)
01:03.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8139/8139C/8139C+ (rev 10)
01:04.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8139/8139C/8139C+ (rev 10)
I've also checked various log files but I don't see any messages around that time period that are suspicious to me.

I've had this problem in a Vyatta 4 install as well (which is based on Debian Lenny), on completely different hardware, but the same brand network card.
 
Old 05-07-2009, 02:41 AM   #2
rylan76
Senior Member
 
Registered: Apr 2004
Location: Potchefstroom, South Africa
Distribution: Fedora 17 - 3.3.4-5.fc17.x86_64
Posts: 1,475

Rep: Reputation: 87
Ouch - these types of intermittent problems are the hardest to solve.

Have you tried "downing" one or all of the network cards, and then seeing if the problem persists?

(I don't know if it will have any effect - I don't know enough about Linux architecture. Even for a "downed" interface, the drivers and associated software is still present in the kernel... as far as I know.)

You do not mention it, but do you get these lockups at the physical console commandline for that system, or ONLY if you are SSH'ed in from a remote connection somewhere?

I. e. what I'd do is:

Code:
# /sbin/ifconfig eth0 down
# /sbin/ifconfig eth1 down
# /sbin/ifconfig eth2 down
Then, wait or use the system for at least an hour to see if the lockups persist. If it is suddenly working WITHOUT lockups, you know that at least one of the cards is the culprit.

If you only get the lockups when behind a SSH'ed connection, try and selectively disable the other cards. I. e. if you know you are using eth1 to SSH in (that's where you get the lockups) try downing eth0 and eth2 and seeing if the problem persists.

Also, you can try downing the cards and then unloading their modules with the "rmmod" command. Of course, try and use the simplest topology possible when testing - i. e. don't have other routers, gateways, proxies or firewalls between you and the server you are trying to fix - they will only complicate matters since the error might be almost -anywhere- if you have too many factors involved in it.

What is the load on that system? I have encountered something similar once when the kernel got busy on an older system of mine. I did not compile that kernel with DMA for my motherboard, and when the system got busy I used to get "micro lockups" of about 30 to 45 seconds, exactly the way you describe yours. Are you sure you have DMA enabled, and that it works? No idea how this applies on an Apple, but I also noticed this on yet another kernel I was using - network throughput was slow, and although the system did not hang, it got sluggish if there was lots of network traffic - I had to recompile the kernel with DMA support for my motherboard chipset, and after that is was fixed...

You might have a network buffer or something that is overflowing, do you see anything relevant in dmesg or in the kernel's logs or your network logs? While the system is "input locked", does it still respond to network traffic / pings? Try, for example, FTP'ing in while it is input locked - does the FTP connection work and is it responsive? I. e. it might be a protocol or port that is getting blocked for some reason, if, for example, your SSH session is locked down but FTP is working...?

It can be just about anything.

You'll need to do some elimination here first, and the best way is to start at the simplest possible configuration and then slowly add complexity. The problem you describe can be caused by -very- many different factors, not all neccessarily integral to your software, hardware or network infrastructure. It might be a combination of all three, one aspect only, or something else that might be exceedingly trivial, or extremely complex to solve.

Hope this gives you some ideas...

Last edited by rylan76; 05-07-2009 at 02:44 AM.
 
Old 05-07-2009, 09:23 AM   #3
farslayer
Guru
 
Registered: Oct 2005
Location: Willoughby, Ohio
Distribution: linuxdebian
Posts: 7,231
Blog Entries: 5

Rep: Reputation: 189Reputation: 189
Anything in the logs when this happens ?


NETDEV WATCHDOG: eth1: transmit timed out
eth1: Tx timed out, lost interrupt?


or anything else bizarre ?
 
Old 05-07-2009, 10:40 PM   #4
emgee3
LQ Newbie
 
Registered: May 2009
Posts: 14

Original Poster
Rep: Reputation: 1
Thanks rylan76 and farslayer for your replies.

Quote:
Anything in the logs when this happens ?
I checked all the logs in /var/log and there's no entries that coincide with the times of a pause. Which is just strange.

Quote:
Try, for example, FTP'ing in while it is input locked - does the FTP connection work and is it responsive? I. e. it might be a protocol or port that is getting blocked for some reason, if, for example, your SSH session is locked down but FTP is working...?
Another factor is that established network connections stay connected during this time, but no traffic makes it through. The exception is if the timeout is set different. I've had some connections, usually large http transfers, time out during a pause. But SSH connections and most everything else stays connected, but with no traffic, during the pause.

-----

I can only SSH in from the main interface, and that's mostly how I connect in to work on it.

The place the computer is physically located is not easy for me to get to or stay for any length of time, but it's not being used for production work, so I'm going to get it and see if I still get the lockups from the console.

The DMA enabled is also a good thing to check, but I've got no idea how or if that applies to a PPC box either.

But you've given a couple things to test, so that's good. I'm going to try at the console with the ethernet cards down, and leave a terminal running top open and see if anything happens then.
 
Old 05-08-2009, 03:04 PM   #5
emgee3
LQ Newbie
 
Registered: May 2009
Posts: 14

Original Poster
Rep: Reputation: 1
Ok, a bit more troubleshooting done. The pauses occur even with the two PCI ethernet cards downed.

Also, when there's a pause, the console is still responsive. So it does seem to be networking related, not the whole system.

I left a top open during a freeze and there was nothing out of the ordinary. CPU usage never seems to climb above 10%.
 
Old 05-08-2009, 03:14 PM   #6
farslayer
Guru
 
Registered: Oct 2005
Location: Willoughby, Ohio
Distribution: linuxdebian
Posts: 7,231
Blog Entries: 5

Rep: Reputation: 189Reputation: 189
Install and run itop to see if theres an interrupt issue, causing the pauses.. maybe you have a piece of hardware that is freakig out and sending tons of interrupts..


itop -a

Last edited by farslayer; 05-08-2009 at 03:20 PM.
 
Old 05-08-2009, 04:21 PM   #7
emgee3
LQ Newbie
 
Registered: May 2009
Posts: 14

Original Poster
Rep: Reputation: 1
So using itop, I got the following:

Code:
18 [              MESH]     0 Ints/s     (max:     0)
 20 [        NMI - XMON]     0 Ints/s     (max:     0)
 21 [           pcilynx]     0 Ints/s     (max:     0)
 24 [              eth1]   183 Ints/s     (max:   250)
 25 [              eth2]     0 Ints/s     (max:    92)
 26 [              ide1]     0 Ints/s     (max:    57)
 27 [              PMac]     0 Ints/s     (max:     0)
 28 [     ohci_hcd:usb1]     0 Ints/s     (max:     0)
 29 [       PMac Output]     0 Ints/s     (max:     0)
 30 [        PMac Input]     0 Ints/s     (max:     0)
 31 [             SWIM3]     0 Ints/s     (max:     0)
 33 [               ADB]     0 Ints/s     (max:     0)
 34 [              ide0]     0 Ints/s     (max:     0)
 36 [        BMAC-txdma]    64 Ints/s     (max:   127)
 37 [        BMAC-rxdma]   120 Ints/s     (max:   139)
 42 [         BMAC-misc]     0 Ints/s     (max:     0)
This was during a large http transfer from behind the router. When the freeze came, eth1 dropped down the 2 Ints/s and everything else dropped down to 0.

I'm not sure what the normal range of these are.
 
Old 05-10-2009, 10:03 PM   #8
farslayer
Guru
 
Registered: Oct 2005
Location: Willoughby, Ohio
Distribution: linuxdebian
Posts: 7,231
Blog Entries: 5

Rep: Reputation: 189Reputation: 189
Hmm not familiar enough with the tool, but I was expecting something ot go haywire with interrupts when the pause occurred if that was the issue..

This blog has some other interesting tools, that might be worth looking at.
http://prefetch.net/blog/index.php/c...lities/page/2/
be sure to scroll back through previous posts, for a lot of additional tools that can be used for diagnostics.


Without knowing the source of the problem, how does one figure out what to look at.. I guess that is the ultimate question..

Last edited by farslayer; 05-10-2009 at 10:06 PM.
 
Old 05-10-2009, 10:56 PM   #9
emgee3
LQ Newbie
 
Registered: May 2009
Posts: 14

Original Poster
Rep: Reputation: 1
Quote:
This blog has some other interesting tools, that might be worth looking at.
http://prefetch.net/blog/index.php/c...lities/page/2/
be sure to scroll back through previous posts, for a lot of additional tools that can be used for diagnostics.
That looks like a great set of resources. Unfortunately for my G4, I dumpstered it earlier today, after:
  1. Removing all unnecessary hardware
  2. Swapping all the network cards
  3. Compiling a custom kernel
  4. Reinstalling Debian

Thank you farslayer and rylan76 for your assistance.
 
Old 05-21-2009, 01:11 AM   #10
emgee3
LQ Newbie
 
Registered: May 2009
Posts: 14

Original Poster
Rep: Reputation: 1
must be a Debian bug...

So, I installed the same setup from scratch on a Celeron 600 I had lying around, using different network cards. The crazy thing is it started doing the same thing!

I figured I'd list the software I installed on it in case it gives any clues:

base debian lenny install
firehol
dansguardian
squid

firehol is set to reroute http and https to port 8080, dansguardian, which uses squid.

All fine and dandy, but it's not http traffic that's getting pauses. Even SSH or FTP do.

Here's the odd thing. I installed SUSE on the same computer, set up the same software and no pauses. I'm not sure what more data to collect for a bug report.
 
  


Reply

Tags
debian, freeze, network, troubleshooting


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
CUPS troubleshoot/ From Debian to network kaz2100 Linux - General 2 09-22-2008 07:38 PM
How to troubleshoot very slow network connection Ook Slackware 2 08-15-2008 12:09 AM
how to troubleshoot network speed? babag Linux - Networking 1 08-14-2008 12:02 AM
system pauses, related perhaps to CD-ROM michapma Linux - General 1 07-13-2006 03:52 PM
How do I troubleshoot my system locking up? eboladog Linux - Newbie 7 02-07-2001 03:43 PM


All times are GMT -5. The time now is 10:09 PM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration