most files disappear from filesystem - reboot fixes
Linux - ServerThis forum is for the discussion of Linux Software used in a server related context.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
most files disappear from filesystem - reboot fixes
I am experiencing a very odd problem on a server (pretty busy production server). Some information:
- Fedora 4
- Kernel 2.6.18 - not vanilla!! I have asked the vendor what is the -xxxx- patchset in the version but got no reply..
- Core 2 Duo on some consumer grade main board
- 2x SATA disks using MD software-RAID1
- Web applications use Apache, MySQL, PHP; some use also Tomcat, Mono
If reading the specs and keeping the words "pretty busy production server" makes you go "WTF!" I completely agree. I inherited this time bomb (amongst several others) from my predecessor..
The server used to run just fine for couple of months until 1 month ago when it crashed for the first time. The symptoms, which are completely cured by a simple hardware reboot:
1) Files seem to disappear NOT instantaneously, as last time (yesterday) when it happened I was logged in and could `ls` some directories, for example /bin with many invalid symlinks etc.
2) Soon doing `ls` killed open screens one by one when executed - sounds like a shell dying
3) After some time screen dies with the infamous "The dungeon collapses.." Nethack joke
4) I could still `cd` to directories, but obviously `ls` and everything else gave file not found
5) Soon trying to cd/ls something the SSH session dies - sounds like the shell died
I believe that before (5th time happening already!!!) I was too late since I could only do steps 4 & 5.
There is absolutely nothing in the logs, but the problems seems to appear under very high load situation - few times happened in the middle of few (not related) very intensive jobs.
All services are not responding, with the exception of:
Apache starts giving 403 forbidden to sites, but for some weird reason the last time only 1 page of 1 site was still accessible! Happens also to be the one page that is checked with monitoring tools. On top of that, when tried this one page once gave a error stating that MySQL could not be connected - the site does not use one line of SQL! This might be because of PLESK (8.2.0), I believe it does its own thing with SQL.
The timing of the crash seems to be random. No special cron jobs are run on the server when it happens. Most crashed did happen on Monday..
It seems to point to a kernel problem or hardware problem (HD/controller) in my opinion. But even upgrading the kernel is a pain. The provider does a very very lousy job on support and I do not want to reboot to new kernel without someone being able to change the boot to old kernel in case the new one fails, right away.
If I were responsible for the server, I would shut it down and reboot, leaving it off line long enough to get some major backups. Hopefully, your predecessor had a routine sequential backup system in place and in operation. If you are having controller issues, as you suspect, you could lose all of your data. It could be corrupted beyond recovery. If this is a heavily loaded server it is probably essential. You might consider getting a backup server available with a clean install and moving the existing data to the new install until you get things sorted out with the old machine. The small cost of a replacement machine could be small when compared to the loss of data and business from an essential system. If nothing else, you could rent a server from one of the server farms long enough to get yours sorted out.
Which flavor of Linux/Unix/BSD are you running? Do you have a service contract with the vendor?
If I were responsible for the server, I would shut it down and reboot, leaving it off line long enough to get some major backups. Hopefully, your predecessor had a routine sequential backup system in place and in operation.
Muah.. No, the backup system was really really bad- just tar all the vhosts together & dump databases (into publicly www accessible directory!). I do have some archives anyways. I do have a working rsync system now, but I am afraid of running it since 2 times already it/the load it created made the server die..
Quote:
Originally Posted by jlgreer1
If you are having controller issues, as you suspect, you could lose all of your data. It could be corrupted beyond recovery. If this is a heavily loaded server it is probably essential.
Very true. It is not so heavily loaded, the sites get only around 150-200K hits/day.
Quote:
Originally Posted by jlgreer1
You might consider getting a backup server available with a clean install and moving the existing data to the new install until you get things sorted out with the old machine. The small cost of a replacement machine could be small when compared to the loss of data and business from an essential system. If nothing else, you could rent a server from one of the server farms long enough to get yours sorted out.
Renting the server sounds like a really good idea, thanks! The problem is, that with more than 50 sites using many different web technologies the transfer is not exactly a breeze. But you are right, also I think this might be the only option, unless someone comes up with good ideas. However I will try to update the kernel and distro to newest Fedora and hope it helps.
Quote:
Originally Posted by jlgreer1
Which flavor of Linux/Unix/BSD are you running? Do you have a service contract with the vendor?
This information was in my OP but I repeat: Fedora 4 (!!), kernel 2.6.18.1-xxxx-grs-ipv4-32 in a custom configuration (!!!), some consumer grade main board I found out to be VIA chipset (ARGH!), 2x SATA drives on what seems to be MD software-RAID1 (quadruple ARGH).
There is a support contract but that is complete bull. I can not even get reply to what is the -xxxx- patchset on the kernel or exact hardware specifications, after trying to install Tomcat for 3 weeks they admitted they can not do it! I am very close to naming&shaming this P-O-S company and going through the necessary hoops to get the contract void and future paid months (MANY of them) paid back.
Add in top of that the goodiness of PLESK 8.2.0, and I will be bald very young
Last edited by bluikz; 11-06-2007 at 08:19 PM.
Reason: the kernel version 2.6.18.1, not 2.6.18
I've had similar experiences with servers that were rootkitted, after eliminating that possibility I would suspect hardware problems like bad RAM somewhere.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.