LinuxQuestions.org
Visit Jeremy's Blog.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Software
User Name
Password
Linux - Software This forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.

Notices


Reply
  Search this Thread
Old 05-04-2012, 10:41 AM   #1
fRAiLtY-
LQ Newbie
 
Registered: May 2012
Posts: 2

Rep: Reputation: Disabled
Read Only FS, randomly, 2 reboots cured, latest one doesn't!


Hi guys,

I have a Hetzner server located in their DC in Germany, I'm in the UK. We've had the same server for nearly 3 years now and it's never missed a beat, not once. It has a 3ware controller and in total 1500GB of space on it.

I ran fdisk -l and I see this:

/dev/sda1
/dev/sda2
/dev/sda3

On May 2nd, at 10pm the root filesystem (/dev/root) went into read-only mode and no sites on the server were accessible. I contacted our DC who said they would reboot the server, which they did. The server came up and so did the sites. On May 3rd at 13:00 it went down again with exactly the same circumstances, again Hetzner rebooted the server and all was well. I began scanning logs and found that there was some errors involving EXT-3 and "journal", which I'm not familiar with.

Today, at exactly the same time as yesterday the same thing happened, however a reboot has not fixed it this time and the sites remain down. I have asked Hetzner to do a "deep scan" of the server under their recommendation as many people I have spoken to and several threads across the internet point to potential drive failure. This should take around 8hrs apparently. In the mean time I have around 40 websites down (all e-commerce) and many unhappy clients. I have backups of course, but am trying to just get everything up and running ASAP.

Can anyone give me some advice on what to do, should the tests on hardware come back OK, which I'm dreading. What commands should be run, how can I check and repair the filesystem etc.

Many thanks in advance.

Tom.
 
Old 05-04-2012, 03:28 PM   #2
Kustom42
Senior Member
 
Registered: Mar 2012
Distribution: Red Hat
Posts: 1,604

Rep: Reputation: 415Reputation: 415Reputation: 415Reputation: 415Reputation: 415
Quote:
Originally Posted by fRAiLtY- View Post
Hi guys,
We've had the same server for nearly 3 years now
Hardware has a lifespan of 3-5 years. To be pro-active you should refresh the hardware every 3 years. Depending on the manufacturer of the server, there are different diagnostics that can be ran. However, since this is a rented server it should be the responsibility of the hardware owner, in this case Hetzner not yourself, to diagnose and replace if necessary. I would contact your server provider have them do the necessary hardware checks as that is their responsibility. And my strongest suggestion would be to get a new server and cut-over to the new one with fresh hardware. Keep yourself on a 3 year hardware life cycle, all of the major companies have a life cycle for hardware.
 
Old 05-04-2012, 03:34 PM   #3
fRAiLtY-
LQ Newbie
 
Registered: May 2012
Posts: 2

Original Poster
Rep: Reputation: Disabled
Hi Kustom42,

Hetzner are running what's apparently called a "deep scan" on the system now, due to finish within the next few hours. Should this scan yield nothing, which I suspect (just my luck) what's my play? They're claiming it's likely software, yet everyone I've spoken to seems to suggest it's hardware.

This would kinda be backed up by the nature of the occurences, out of the blue. What puzzles me is what causes it go into read-only, presumably it reboots the server to do this? It's just one minute we're on the websites, the next we get 500 internal errors and the filesystem is on read-only. Has the server rebooted in this time into read-only mode or do the drives unmount or.. just curious what happens?

At the minute I'm assuming Hetzner will say their hardware is fine and basically tell me to go away. For speed I need to get the clients sites up and running ASAP, what's the best option? I've heard fsck mentioned?

Cheers.
 
Old 05-04-2012, 03:40 PM   #4
Kustom42
Senior Member
 
Registered: Mar 2012
Distribution: Red Hat
Posts: 1,604

Rep: Reputation: 415Reputation: 415Reputation: 415Reputation: 415Reputation: 415
Don't run an fsck, it is very common for drives that are failing to fail back into read only mode. This is due to I/O errors that are received by the kernel on the drive. I personally have worked for one of the biggest server providers and I can tell you that is the only answer you are going to get. You will have to take the initiative and purchase a new server. VERIFY THEY DO NOT REUSE HARDWARE, the company I worked for(which shall rename nameless for legal ramification purposes) did and I would see at least 20 servers crash a week that were less than a week from purchase date.

Do some digging and read some reviews, I'm not sure about providers in your neck of the woods so it would be hard for me give you any recommendations.

I would imagine in about an hour you will get your incident resolved with Hetzner that basically tells you that they can't find a problem and to figure it out yourself because its your problem. This is a nice way of saying go f yourself we don't care, if you do find a new provider asap, leave the drive mounted as read-only and work as quick as you can to get cut-over to a new server with a new company.
 
Old 05-05-2012, 01:20 AM   #5
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 21,128

Rep: Reputation: 4121Reputation: 4121Reputation: 4121Reputation: 4121Reputation: 4121Reputation: 4121Reputation: 4121Reputation: 4121Reputation: 4121Reputation: 4121Reputation: 4121
I don't understand the advice not to run fsck. If the filesystem is broken, whether because of hardware or software failure, you need to run fsck. It may actually run automatically upon reboot after the f/s goes read-only.

Look at /etc/fstab - and see what it has as the "errors" option; probably "remount-ro".

I've never had a (stable) filesystem "go bad" - there's always been dodgy hardware involved. Doesn't necessarily mean the hard disk BTW. And by "stable" I mean one that is "enterprise ready". New filesystems (like btrfs a couple of yeas ago) don't qualify.
 
Old 05-07-2012, 10:55 AM   #6
Kustom42
Senior Member
 
Registered: Mar 2012
Distribution: Red Hat
Posts: 1,604

Rep: Reputation: 415Reputation: 415Reputation: 415Reputation: 415Reputation: 415
Running an fsck on a faulty hard-drive has a big potential to cause data loss. Since the best solution here is to move to new hardware it would be best to preserve the integrity of the current data for copy-over. If he runs an fsck and it removes a /var/www/html/website/ folder an entire web-site of data could be lost.

Your signature is the best first step to take here, make a backup! If drives are beginning to give you errors get a good copy of your data before you do anything else.
 
Old 05-07-2012, 08:28 PM   #7
chrism01
LQ Guru
 
Registered: Aug 2004
Location: Sydney
Distribution: Rocky 9.2
Posts: 18,359

Rep: Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751
I agree with backup + new HW.
If you have sufficient access, you could run the smartctl sw eg http://www.linuxjournal.com/magazine...rd-disks-smart to do your own checks.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
RHEL 5.4 server reboots randomly bobmac010 Red Hat 10 06-14-2010 02:03 PM
SUSE 10.3 randomly powers off and reboots dccombs SUSE / openSUSE 5 03-14-2009 11:31 AM
Centos 4 Server Reboots Randomly? Terroth Linux - Software 3 02-28-2006 03:55 PM
fedora 2 reboots randomly on opteron 250 system fdarvas Fedora 1 12-04-2004 10:45 AM
Box randomly reboots...how do I find out why? InDIo Linux - General 4 02-07-2004 10:23 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Software

All times are GMT -5. The time now is 06:00 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration