LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - General (https://www.linuxquestions.org/questions/linux-general-1/)
-   -   95% of commands not executing on RHEL4 server (https://www.linuxquestions.org/questions/linux-general-1/95-of-commands-not-executing-on-rhel4-server-731157/)

acid_kewpie 06-06-2009 02:40 PM

95% of commands not executing on RHEL4 server
 
Howdy,

I've an odd problem I'm having trouble with, wondered if the symptoms might ring any bells with anyone...

I've a pretty recently built rhel4.6 box, an IBM HS21, and over the last month or so, despite the services it is running seeming to be working ok, when I log into the box under any number of accounts, both local and ldap something like 95% of commands I run just plain do nothing. Commands like ls, su, bash, ssh just instantly return to a command line, no errors in dmesg, secure, messages etc... if I run these commands contantly [ up enter up enter up enter up enter styley ] then within 20 or 30 attempts it works, and stays working for a while, then stops again. As bash seems to instantly complete, this also means that I normally can't log into the box at all, as as soon as I complete a login my shell exits and I'm back where I started from.

Baffling me. Any thoughts? I don't recall commands like tail, cat or less not working, may very well just be coincidence or an outright lie.

Thanks

foodown 06-06-2009 07:44 PM

When you do an 'uptime' or 'top' what is the load average?

lumak 06-07-2009 02:34 AM

Did you fill up your root partition?

colucix 06-07-2009 03:13 AM

Really odd. It make me think about: 1) a corrupted bash; 2) the output of commands is redirected somewhere by means of a misplaced exec statement, erroneously put in some login script. What if you redirect the output of ls, cat or something to a file? Is it filled by something? Obviously you will not have the chance to see its content from the terminal, but you should open it using a GUI. And what about the standard error?

noctilucent 06-07-2009 03:35 AM

And what about running a strace or perhaps giving gdb a chance? Unless strace itself exits immediatly. But it's still unclear to me if you are actually able to log in. On one hand you're experiencing 'programs are returning immediatly' but on the other hand you can't log in because bash is misbehaving too.

acid_kewpie 06-07-2009 08:53 AM

load is countably zero, turns out the main service, a network monitoring app, had died somehow too, not sure if that is relevant here though... So a load of 0.01 or so, no disk space issues at all.

There is no command redirection or the likes, it's built from a non standard image with an official RedHat kernel that we're not running on any other box to my knowledge. I wouldn't like to assume a bug in there though, it's hardly likely to me.

I'll have a go with strace, but I suspect that in those instances where the problem occurs, strace itself wouldn't run either. Then again it still has to spawn a process, so might see strace work but the spawned process not.

bash might be something to try to focus on specifically. Whilst general commands run under a bash environment fail, spawing bash as a shell via ssh or login also fails, and that wouldn't suggest any bash issues in triggering the process in the first place.

I can log in sometimes, about 5% of the time, which seems to be because my shell doesn't execute, or exits immediately, just like other programs, so I don't think logging in itself is a special case, just more significant in terms of user experience.

Thanks for the ideas. I can't see for a second that a server rebuild isn't going to be the outcome, but I'd love to know what's going on. Any other random thoughts appreciated.

acid_kewpie 06-30-2009 03:33 PM

Well I think I found the cause, or very nearly. nscd seemed to be playing silly buggers, restart of that and everything is fine again for a while. Anyone familiar enough with it to comment? Turned up debugging waiting for it to happen again...

Ohhh hello Bugzilla!

https://bugzilla.redhat.com/show_bug.cgi?id=432706


All times are GMT -5. The time now is 11:09 PM.