[SOLVED] Recurring input/output error on my network HD

LightSeeker · 09-18-2013, 03:54 AM

Hello,

As the title says, my network hard drive (ext4) that I use for storage keeps on having input/output errors. Usually a reboot or applying fsck to restore the journal fixes it, but then the problem just returns a coupe of days later. The drive is shared via NFS and Samba. Does anybody have a clue as to what might be the cause of these recurring glitches?

EDIT: The HD is new, only two months old.

BR

TobiSGD · 09-18-2013, 06:24 AM

That a HDD is new does not mean that it is error free. I recommend to test it using the manufacturer's diagnosis tool.

LightSeeker · 09-24-2013, 04:44 AM

Are there any other reasons, beside a Hard drive failure as to why a journal of an ext4 formatted drive would keep getting corrupted every couple of days? I have my home server to automatically shut down and then reboot every day to save power and the drive is connected via USB,

pan64 · 09-24-2013, 05:13 AM

you can probably - shut down the server without allowing the external drive to save cached data. Probably power is lost for a few seconds. You may need to force "safely remove" that drive before reboot.
By the way why do you want to reboot that host? It is quite unusual.

LightSeeker · 09-24-2013, 09:12 AM

Thank you for your answer. How can I safely remove it? When I try to unmount it (even with force or lazy unmount) it always just says that the resource is busy. I tried checking the processes tied to it (I think it was lsof), but it was all very confusing (now when I have a problem, I just unplug the disk, plug it back in and do the fsck, then reboot).

I use rtcwake to shutdown, with sleep set to 2 seconds - perhaps it would be better to give it some more time? Or should I do something else?

I power it down in the middle of the night and turn it back on in the early morning. Since this is a home server, nobody is using it in those hours, so I turn it off to save on electricity.

TobiSGD · 09-24-2013, 09:34 AM

Shutting the machine down should safely unmount the partitions on the disk, while unplugging when mounted is a good way to get data and/or filesystem corruption. Do you really power down the machine or is this some kind of suspend/standby?

LightSeeker · 09-24-2013, 10:43 AM

Yes you are right, but I didn't know what else to do really, when I couldn't unmount it remotely. It would probably be doable if I would hook the server up to a monitor then try to unmount it in the file manager. But that's quite a hassle, since I have to unplug a monitor from my computer and carry it to the other end of the house, then hook it up and run a desktop environment etc.

Here is the output of my rctwake script:

#! /bin/sh
t=`date -d "06:10" +%s`
sudo /bin/true
sudo rtcwake -u -t $t -m on &
sleep 2
sudo shutdown -h 0

This script is then put in crontab and runs at a quarter to two AM.

pan64 · 09-25-2013, 12:35 AM

Yes, in general shutdown should work, but in your case probably an explicit umount (or eject?) and a sleep afterwards may help. you can insert it into that script.
Have you checked the state of your disk?

LightSeeker · 09-25-2013, 01:33 AM

I did run some checks on it with fsck -c but because the size of the disk is 3TB, the check would last 25 hours, when connected through the USB, so I always interrupted it prematurely (I needed to access the data). When exited it said it fixed a bad block and that was it.

pan64 · 09-25-2013, 02:22 AM

I'm afraid you would need to check it several times. Probably that bad block "moves" (will reappear several times)...

LightSeeker · 09-27-2013, 06:03 AM

I'm going to run fsck -vcck today to check for badblocks and let it run until completion (until tomorrow probably it's a 3 TB drive). I was also told that the input/output errors might be connected to the fact, that this is an external Hard drive that is connected to the computer via USB – because of fluctuations in power supply that the computer sends through, this might have something to do with the drive continually loosing journal information which then needs to be recovered. Is there any basis to this claim?

pan64 · 09-27-2013, 06:07 AM

you may try external power supply (strong enough) to check it - or usb hub?

LightSeeker · 09-27-2013, 06:45 AM

Hmmm, 'll go to my local store and check out those USB bays that use DC power, might be a good idea.

LightSeeker · 09-28-2013, 05:26 AM

I went to the store and asked about the USB port that would have it's own power supply. The salesman said that he can order it, but after hearing that I have a problem with my disk that has an USB 3 output (I connect that to USB 2 on my computer), he said that it probably won't help much with regards to power supply and that a lot of people have had problems with the usb 3 based disks.

I'm confused now, I must admit

WIll getting a usb port that has it's own power mean a steadier supply of power to the hard drive or not? Or should I maybe start thinking about getting a new enclosure to put the drive in that will have it's own power supply?

I tried to run a fsck test, but it just kept getting slower and slower and this morning, after more than 22 hours it was just at 20 %. Interrupted it.

pan64 · 09-28-2013, 07:12 AM

what about your nfs and samba cache? Probably you can turn off all the caches (that will slow down the access but probably will avoid corruption.
I do not know if a new enclosure is cheaper (or an usb port with power supply).
You can try also another usb port (of your pc), probably you can try to use an usb2 port (maybe it is only available on the motherboard).