LinuxQuestions.org
Latest LQ Deal: Latest LQ Deals
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Server
User Name
Password
Linux - Server This forum is for the discussion of Linux Software used in a server related context.

Notices


Reply
  Search this Thread
Old 11-05-2007, 11:04 PM   #1
bluikz
LQ Newbie
 
Registered: Nov 2007
Posts: 2

Rep: Reputation: 0
Question most files disappear from filesystem - reboot fixes


I am experiencing a very odd problem on a server (pretty busy production server). Some information:
- Fedora 4
- Kernel 2.6.18 - not vanilla!! I have asked the vendor what is the -xxxx- patchset in the version but got no reply..
- Core 2 Duo on some consumer grade main board
- 2x SATA disks using MD software-RAID1
- Web applications use Apache, MySQL, PHP; some use also Tomcat, Mono
If reading the specs and keeping the words "pretty busy production server" makes you go "WTF!" I completely agree. I inherited this time bomb (amongst several others) from my predecessor..

The server used to run just fine for couple of months until 1 month ago when it crashed for the first time. The symptoms, which are completely cured by a simple hardware reboot:
1) Files seem to disappear NOT instantaneously, as last time (yesterday) when it happened I was logged in and could `ls` some directories, for example /bin with many invalid symlinks etc.
2) Soon doing `ls` killed open screens one by one when executed - sounds like a shell dying
3) After some time screen dies with the infamous "The dungeon collapses.." Nethack joke
4) I could still `cd` to directories, but obviously `ls` and everything else gave file not found
5) Soon trying to cd/ls something the SSH session dies - sounds like the shell died
I believe that before (5th time happening already!!!) I was too late since I could only do steps 4 & 5.
There is absolutely nothing in the logs, but the problems seems to appear under very high load situation - few times happened in the middle of few (not related) very intensive jobs.

All services are not responding, with the exception of:
Apache starts giving 403 forbidden to sites, but for some weird reason the last time only 1 page of 1 site was still accessible! Happens also to be the one page that is checked with monitoring tools. On top of that, when tried this one page once gave a error stating that MySQL could not be connected - the site does not use one line of SQL! This might be because of PLESK (8.2.0), I believe it does its own thing with SQL.

The timing of the crash seems to be random. No special cron jobs are run on the server when it happens. Most crashed did happen on Monday..

It seems to point to a kernel problem or hardware problem (HD/controller) in my opinion. But even upgrading the kernel is a pain. The provider does a very very lousy job on support and I do not want to reboot to new kernel without someone being able to change the boot to old kernel in case the new one fails, right away.

Thanks..

Last edited by bluikz; 11-05-2007 at 11:23 PM.
 
Old 11-06-2007, 06:12 PM   #2
jlgreer1
Member
 
Registered: Aug 2005
Location: Under the rainbow
Distribution: LFS 7, CentOS 7, OS X
Posts: 119

Rep: Reputation: 25
If I were responsible for the server, I would shut it down and reboot, leaving it off line long enough to get some major backups. Hopefully, your predecessor had a routine sequential backup system in place and in operation. If you are having controller issues, as you suspect, you could lose all of your data. It could be corrupted beyond recovery. If this is a heavily loaded server it is probably essential. You might consider getting a backup server available with a clean install and moving the existing data to the new install until you get things sorted out with the old machine. The small cost of a replacement machine could be small when compared to the loss of data and business from an essential system. If nothing else, you could rent a server from one of the server farms long enough to get yours sorted out.

Which flavor of Linux/Unix/BSD are you running? Do you have a service contract with the vendor?

Jeff
 
Old 11-06-2007, 07:54 PM   #3
bluikz
LQ Newbie
 
Registered: Nov 2007
Posts: 2

Original Poster
Rep: Reputation: 0
Quote:
Originally Posted by jlgreer1 View Post
If I were responsible for the server, I would shut it down and reboot, leaving it off line long enough to get some major backups. Hopefully, your predecessor had a routine sequential backup system in place and in operation.
Muah.. No, the backup system was really really bad- just tar all the vhosts together & dump databases (into publicly www accessible directory!). I do have some archives anyways. I do have a working rsync system now, but I am afraid of running it since 2 times already it/the load it created made the server die..
Quote:
Originally Posted by jlgreer1 View Post
If you are having controller issues, as you suspect, you could lose all of your data. It could be corrupted beyond recovery. If this is a heavily loaded server it is probably essential.
Very true. It is not so heavily loaded, the sites get only around 150-200K hits/day.
Quote:
Originally Posted by jlgreer1 View Post
You might consider getting a backup server available with a clean install and moving the existing data to the new install until you get things sorted out with the old machine. The small cost of a replacement machine could be small when compared to the loss of data and business from an essential system. If nothing else, you could rent a server from one of the server farms long enough to get yours sorted out.
Renting the server sounds like a really good idea, thanks! The problem is, that with more than 50 sites using many different web technologies the transfer is not exactly a breeze. But you are right, also I think this might be the only option, unless someone comes up with good ideas. However I will try to update the kernel and distro to newest Fedora and hope it helps.

Quote:
Originally Posted by jlgreer1 View Post
Which flavor of Linux/Unix/BSD are you running? Do you have a service contract with the vendor?
This information was in my OP but I repeat: Fedora 4 (!!), kernel 2.6.18.1-xxxx-grs-ipv4-32 in a custom configuration (!!!), some consumer grade main board I found out to be VIA chipset (ARGH!), 2x SATA drives on what seems to be MD software-RAID1 (quadruple ARGH).

There is a support contract but that is complete bull. I can not even get reply to what is the -xxxx- patchset on the kernel or exact hardware specifications, after trying to install Tomcat for 3 weeks they admitted they can not do it! I am very close to naming&shaming this P-O-S company and going through the necessary hoops to get the contract void and future paid months (MANY of them) paid back.

Add in top of that the goodiness of PLESK 8.2.0, and
I will be bald very young

Last edited by bluikz; 11-06-2007 at 08:19 PM. Reason: the kernel version 2.6.18.1, not 2.6.18
 
Old 12-19-2007, 02:22 PM   #4
archtoad6
Senior Member
 
Registered: Oct 2004
Location: Houston, TX (usa)
Distribution: MEPIS, Debian, Knoppix,
Posts: 4,727
Blog Entries: 15

Rep: Reputation: 234Reputation: 234Reputation: 234
Interesting thread, what was the outcome; esp. are you ready to name the PoS company?
 
Old 12-19-2007, 08:19 PM   #5
pengaru
LQ Newbie
 
Registered: Dec 2007
Distribution: GNU/Linux
Posts: 9

Rep: Reputation: 0
Has the server been checked for rootkits?

I've had similar experiences with servers that were rootkitted, after eliminating that possibility I would suspect hardware problems like bad RAM somewhere.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
loopy skippy sound fixes itself with reboot from win2k randomsel Slackware 3 10-03-2007 06:30 PM
The filesystem disappear antiqui.populi Linux - General 4 10-28-2006 01:29 AM
Printers disappear at reboot jtrainsputers Linux - Newbie 4 01-03-2006 04:31 PM
IP disappear after reboot... frankpretec Linux - Networking 1 04-07-2005 01:37 PM
sound /dev's disappear on reboot. trey85stang Slackware 2 08-08-2004 11:54 AM

LinuxQuestions.org > Forums > Linux Forums > Linux - Server

All times are GMT -5. The time now is 04:38 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration