LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Server (https://www.linuxquestions.org/questions/linux-server-73/)
-   -   RAID problem on media server (https://www.linuxquestions.org/questions/linux-server-73/raid-problem-on-media-server-581890/)

JeffElkins 09-03-2007 11:40 AM

RAID problem on media server
 
I have a media server setup with 4 400GB SATA drives connected to a SiI 3114 card. It's setup via mdadm as RAID 0, formatted as ext3. i'm generally happy with this setup speedwise, over a gigabit network.

Just recently after a power failure, I started to experience problems writing new content to the RAID array. Using scp, data would start out copying at full speed, then eventually stall. Same problem when doing a local cp from the server's boot drive to the array.

I then noticed that the used filespace reported by df was ridiculously wrong. It's showing 728Mb used/1.4Tb available for the array when I know full well that there's easily 900Gb. I've verified that the array's content is still present. mdadm reports the array as working and clean.

I'd like to repair this problem w/o rebuilding the array and reloading my content, although luckily I'm all backed up.

Thanks for any help.

ajg 09-03-2007 03:38 PM

I think you should be looking at the filesystem for the source of your problems rather than the RAID.

JeffElkins 09-03-2007 03:45 PM

Quote:

Originally Posted by ajg (Post 2880075)
I think you should be looking at the filesystem for the source of your problems rather than the RAID.

OK. Can you expand? AFAIK, fsck isn't a tool that's used on RAID volumes, so how does one examine/repair a filesystem layered over a RAID array?

ajg 09-03-2007 04:46 PM

Same way as you repair a filesystem on a normal drive. The MD driver is transparent to the filesystem - all it sees is a filesystem X on device /dev/mdY - as far as the filesystem is concerned, it's no different to seeing filesystem X on device /dev/hdaY or /dev/sdbZ. Usual precautions apply when using FSCK. Be careful if you use a LiveCD - it may mount the MD devices as separate volumes (seen this happen with RedHat/Fedora/CentOS when booting from Knoppix) - this will make a mess of the mirror if you try and write to one of them.

JeffElkins 09-04-2007 09:37 AM

Thanks for the replies. I went ahead and tried a fsck on the array and that did fix the misreport of df. However, i still can't copy or scp new content to the array w/o stalling. Files seem to stall at the 50% mark.

strick1226 09-04-2007 12:16 PM

I would heartily encourage you to wipe the disks clean, recreate a RAID 5 array, and see how that goes.
900 GB on a RAID 0 setup is a terrifying concept to me--and it doesn't sound like you have any kind of battery backup, either. Those two things = disaster at some point when you least expect/need it.

Just my $.02 ...

JeffElkins 09-04-2007 01:21 PM

Thanks for your $.02 :) I truly appreciate it.

The reason I went with RAID 0 was because I didn't want to lose space. I'm doing backups to hard drives, so redundancy and mirroring didn't seem that important. I definitely need a UPS for this system though.

What would RAID 5 buy me, and how much space would I lose from my 1.6TB of raw hard drives? And why can't I copy files to my current RAID 0 array when it reports 600GB free? Will RAID 5 cure this problem?

ajg 09-05-2007 03:00 AM

Good that it's no longer misreporting the size - I was hoping that they were all symptoms of the same problem, but I guess not. Stalling could be one, or a combination of a whole heap of things from OS buffering problems to glitches in hardware.

My rules as far as RAID goes:

1) RAID0 in only ever used for benchmarking and absolutely not for production systems. Hard disks break. If you have 4 hard disks in your RAID0 system, if any one of them goes, you're dead in the water. The one thing I guarantee is that one of your hard disks will break.

2) Never use software RAID5. It's a pain in the butt when a disk breaks. You end up in a situation where you have to hard-reset the system because it's still trying to write to the failed disk, and it totally stops responding. The system then has an array which is critical, and a filesystem which is dirty. You can't run FSCK because a member of the RAID set is missing, so unless you have a spare drive on hand to rebuild the array, there's nothing you can do to get at the data. Nothing at all. RAID5 is a hardware only option. You're asking for headache with software.

So ... production systems = software RAID1 only - it saves lots of grief later.

The battery back-up isn't a problem as you're not caching writes (and unless you've changed something to make it do that then you aren't).

You can only spend so much time investigating these weird problems. If you have a full backup, best to start over and restore it. It will be faster than investigating.


All times are GMT -5. The time now is 10:52 PM.