[SOLVED] Need continual fsck

Todd1561 · 12-02-2012, 04:49 PM

I have a small Iomega ix2-200 NAS that I've loaded with a custom Debian install. I've found that after a week or so I'll see odd behavior (kernel panics, services start failing, etc.). Once I run a fsck it'll report it fixed some errors and all is well again for another week or so. The configuration is 2 1TB SATA drives in a Linux-based mirror. Issues like this would tell me I'm seeing one or more physical disk failures but neither disk is reporting any S.M.A.R.T errors, even when I forcibly run the thorough test on each. I've also run the badblocks program on each and it reports no issues.

Any idea what could be causing this? I don't want to just throw new hard drives at it until I know for sure that's the problem.

Thanks,
Todd

chrism01 · 12-03-2012, 02:13 AM

Could be bad cxn between the disks and the m/board.
(a hot fault?)
You could just try replacing one disk and see if it goes away permanently (for that disk).
If it comes back its likely not the disk(s).

pan64 · 12-03-2012, 04:42 AM

you can try to run fsck from crontab, but I suggest you to find the reason instead of just fixing. Have you checked the log files?

Todd1561 · 12-03-2012, 01:39 PM

There is nothing in the logs that indicates a disk failure, just kernel panic messages about services that are failing. Is there not a good way to test for physical disk failures that I haven't already tried?

As for a bad connection, it's a pretty simple device, the disks just plug right into the motherboard of the NAS, no cables.

I guess I'm just looking for a more definitive answer as to what's wrong with this thing before I just start throwing parts at it. Maybe it's not even a hardware problem, could it be some bug in the Linux RAID system? I've always hated software RAID solutions (regardless of OS), but it's the only option with this device.

Todd

jsaravana87 · 12-04-2012, 03:17 AM

I guess it could be a hardware issue.Instead of running continuous fsck.You can try to install mcelog using gives you better analyses of Hardware issue before server crash .I have been using in all of my server which gives you better prediction of hardware related issue before crashing down .

Look after the article

http://www.cyberciti.biz/tips/linux-...e-failure.html

Todd1561 · 12-04-2012, 07:21 AM

Quote:

Originally Posted by arun5002

I guess it could be a hardware issue.Instead of running continuous fsck.You can try to install mcelog using gives you better analyses of Hardware issue before server crash .I have been using in all of my server which gives you better prediction of hardware related issue before crashing down .

Look after the article

http://www.cyberciti.biz/tips/linux-...e-failure.html

Thanks for the suggestion, but it appears to only support x64 processors. This is a NAS unit that runs an ARM-based processor. Last night I went ahead and removed one of the drives from the array and rebuilt the array with another 1TB drive. When done rebuilding I ran a fsck and looked at the logs in /var/log/fsck/*. In the past the last line was always "fsck died with status 1" (or something like that), this last check didn't have those entries at the bottom. From my research that message means it fixed some errors, so by that line not being there I assume there were no errors to fix?

I'll let it run like this and see how it goes. If anyone has any more insight into the fsck results please let me know.

Thanks,
Todd

unSpawn · 12-04-2012, 08:34 AM

Did you have the same errors before you replaced EMC Lifeline?

Todd1561 · 12-04-2012, 08:47 AM

Quote:

Originally Posted by unSpawn

Did you have the same errors before you replaced EMC Lifeline?

No, but in the process of switching it over to the new OS I put in a different drive for one of the 2 drives. What I did last night was put the original drive back in. That was the only hardware change I could think of between the stock setup and the new custom one, so that's why I put the old drive back in.

We'll see what happens.

Todd

Todd1561 · 01-04-2013, 10:06 AM

Looks like it was a hard drive problem, I swapped out the odd drive I used when I first built the NAS with the original drive model and so far it's been up for 20 days without issue.

Thanks,
Todd

unSpawn · 01-04-2013, 10:18 AM

Good to hear. Don't forget to mark this thread "solved" when you're ready, TIA.