DebianThis forum is for the discussion of Debian Linux.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
I'm running Debian 5 PPC on a Apple G4 that I'm using as a router. Since I installed it, I have a strange problem where every several minutes (I've seen it between 4 and 6 minutes, but not every 4-6 minutes) the system pauses and doesn't accept any input for a 1-3 minutes. Then it continues on as if nothing happened.
I'm not running X, this is all from the command line. I noticed this once or twice when setting up the system, but didn't think much about it. After I installed the router, it's gotten a bit more annoying, as I ssh to a server behind the router and mid-typing all of a sudden I have to wait.
I suspect it's coming from one network cards based on some google searches I've done, but I can't tell if I'm describing the problem properly.
I've also checked various log files but I don't see any messages around that time period that are suspicious to me.
I've had this problem in a Vyatta 4 install as well (which is based on Debian Lenny), on completely different hardware, but the same brand network card.
Ouch - these types of intermittent problems are the hardest to solve.
Have you tried "downing" one or all of the network cards, and then seeing if the problem persists?
(I don't know if it will have any effect - I don't know enough about Linux architecture. Even for a "downed" interface, the drivers and associated software is still present in the kernel... as far as I know.)
You do not mention it, but do you get these lockups at the physical console commandline for that system, or ONLY if you are SSH'ed in from a remote connection somewhere?
I. e. what I'd do is:
Code:
# /sbin/ifconfig eth0 down
# /sbin/ifconfig eth1 down
# /sbin/ifconfig eth2 down
Then, wait or use the system for at least an hour to see if the lockups persist. If it is suddenly working WITHOUT lockups, you know that at least one of the cards is the culprit.
If you only get the lockups when behind a SSH'ed connection, try and selectively disable the other cards. I. e. if you know you are using eth1 to SSH in (that's where you get the lockups) try downing eth0 and eth2 and seeing if the problem persists.
Also, you can try downing the cards and then unloading their modules with the "rmmod" command. Of course, try and use the simplest topology possible when testing - i. e. don't have other routers, gateways, proxies or firewalls between you and the server you are trying to fix - they will only complicate matters since the error might be almost -anywhere- if you have too many factors involved in it.
What is the load on that system? I have encountered something similar once when the kernel got busy on an older system of mine. I did not compile that kernel with DMA for my motherboard, and when the system got busy I used to get "micro lockups" of about 30 to 45 seconds, exactly the way you describe yours. Are you sure you have DMA enabled, and that it works? No idea how this applies on an Apple, but I also noticed this on yet another kernel I was using - network throughput was slow, and although the system did not hang, it got sluggish if there was lots of network traffic - I had to recompile the kernel with DMA support for my motherboard chipset, and after that is was fixed...
You might have a network buffer or something that is overflowing, do you see anything relevant in dmesg or in the kernel's logs or your network logs? While the system is "input locked", does it still respond to network traffic / pings? Try, for example, FTP'ing in while it is input locked - does the FTP connection work and is it responsive? I. e. it might be a protocol or port that is getting blocked for some reason, if, for example, your SSH session is locked down but FTP is working...?
It can be just about anything.
You'll need to do some elimination here first, and the best way is to start at the simplest possible configuration and then slowly add complexity. The problem you describe can be caused by -very- many different factors, not all neccessarily integral to your software, hardware or network infrastructure. It might be a combination of all three, one aspect only, or something else that might be exceedingly trivial, or extremely complex to solve.
I checked all the logs in /var/log and there's no entries that coincide with the times of a pause. Which is just strange.
Quote:
Try, for example, FTP'ing in while it is input locked - does the FTP connection work and is it responsive? I. e. it might be a protocol or port that is getting blocked for some reason, if, for example, your SSH session is locked down but FTP is working...?
Another factor is that established network connections stay connected during this time, but no traffic makes it through. The exception is if the timeout is set different. I've had some connections, usually large http transfers, time out during a pause. But SSH connections and most everything else stays connected, but with no traffic, during the pause.
-----
I can only SSH in from the main interface, and that's mostly how I connect in to work on it.
The place the computer is physically located is not easy for me to get to or stay for any length of time, but it's not being used for production work, so I'm going to get it and see if I still get the lockups from the console.
The DMA enabled is also a good thing to check, but I've got no idea how or if that applies to a PPC box either.
But you've given a couple things to test, so that's good. I'm going to try at the console with the ethernet cards down, and leave a terminal running top open and see if anything happens then.
Install and run itop to see if theres an interrupt issue, causing the pauses.. maybe you have a piece of hardware that is freakig out and sending tons of interrupts..
This was during a large http transfer from behind the router. When the freeze came, eth1 dropped down the 2 Ints/s and everything else dropped down to 0.
Hmm not familiar enough with the tool, but I was expecting something ot go haywire with interrupts when the pause occurred if that was the issue..
This blog has some other interesting tools, that might be worth looking at. http://prefetch.net/blog/index.php/c...lities/page/2/
be sure to scroll back through previous posts, for a lot of additional tools that can be used for diagnostics.
Without knowing the source of the problem, how does one figure out what to look at.. I guess that is the ultimate question..
This blog has some other interesting tools, that might be worth looking at. http://prefetch.net/blog/index.php/c...lities/page/2/
be sure to scroll back through previous posts, for a lot of additional tools that can be used for diagnostics.
That looks like a great set of resources. Unfortunately for my G4, I dumpstered it earlier today, after:
Removing all unnecessary hardware
Swapping all the network cards
Compiling a custom kernel
Reinstalling Debian
Thank you farslayer and rylan76 for your assistance.
So, I installed the same setup from scratch on a Celeron 600 I had lying around, using different network cards. The crazy thing is it started doing the same thing!
I figured I'd list the software I installed on it in case it gives any clues:
base debian lenny install
firehol
dansguardian
squid
firehol is set to reroute http and https to port 8080, dansguardian, which uses squid.
All fine and dandy, but it's not http traffic that's getting pauses. Even SSH or FTP do.
Here's the odd thing. I installed SUSE on the same computer, set up the same software and no pauses. I'm not sure what more data to collect for a bug report.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.