Linux - HardwareThis forum is for Hardware issues.
Having trouble installing a piece of hardware? Want to know if that peripheral is compatible with Linux?
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
Alright so I have 6 3 TB drives connected in a software raid 5. The box is used as a nas/seed box/whatever other small thing I feel like using it as at the time. I have the OS (Ubuntu 10.04 Server 64 bit) running on a 16 GB usb flash drive and the RAID 5 nas part is mounted at /media/stuff. It's an AMD fusion CPU, dual core 1.6. I have the raid 5 dm-crypt encrypted and I have / dm-crypt encrypted (boot obviously isn't).
Anyways it has been crashing very frequently but only when I'm writing to the RAID 5. Like for example I was extracting a bunch of very large archive files and about 30 seconds in it crashed. The archives were on the raid 5 and I was extracting them to another place on the raid 5.
At first I though the cpu couldn't handle it since it was only 1.6 dual core and it had to calculate all of the parity and stuff, but then I ran mprime for like 20 minutes and it didn't crash or overheat, but as soon as I start doing very heavy writes to the raid 5 it crashes again.
I've even gone so far as completely reinstalling the OS from scratch and it is still happening. The other funny thing is this is just a very recent problem, it never used to happen.
Obviously you guys are going to need some log outputs and stuff, but I'm not sure what to show exactly, so just tell me what you need output from and I'll post it.
also dunno if it matters but this server is mostly headless so I've been doing most of this through cifs mounts and ssh.
Code:
@ubuntu-server:~$ sudo mdadm --detail /dev/md0
/dev/md0:
Version : 01.02
Creation Time : Fri Oct 21 16:27:29 2011
Raid Level : raid5
Array Size : 14651325440 (13972.59 GiB 15002.96 GB)
Used Dev Size : 5860530176 (5589.04 GiB 6001.18 GB)
Raid Devices : 6
Total Devices : 6
Preferred Minor : 0
Persistence : Superblock is persistent
Update Time : Wed Dec 7 22:34:49 2011
State : clean
Active Devices : 6
Working Devices : 6
Failed Devices : 0
Spare Devices : 0
Layout : left-symmetric
Chunk Size : 512K
Name : debian-server:0
UUID : e967892d:e5006f45:8c97fdb4:9e3eab2d
Events : 182
Number Major Minor RaidDevice State
0 8 33 0 active sync /dev/sdc1
1 8 49 1 active sync /dev/sdd1
2 8 65 2 active sync /dev/sde1
3 8 81 3 active sync /dev/sdf1
4 8 1 4 active sync /dev/sda1
5 8 17 5 active sync /dev/sdb1
so now I've tried to decompress the files on my nas over the network, but have the output come to my desktop instead of back on to the server to decrease the IO, but it's still crashing. Could it be that I'm running out of /tmp space or ram? I have 8 gigs of ram so I don't see how that could be the problem, and I also have like 9 gigs left on my usb stick for /tmp. If it matters I don't have a swap partition. I'm gunna try making a swap with a spare usb stick I have.
I wonder if the IO from the OS running from a usb stick can't keep up with the IO from the RAID 5? but I would imagine that the raid 5 would just throttle itself if that was the case? And if I run out of tmp space would it just default back to ram? because between /tmp and ram I have a total of like 18 gigs so... yea..
I don't know what type of crash it is. I can post any logs if you can tell me what to post, but what happens is it'll be running fine the it will just completely shut off immediately when I'm doing heavy writing to the raid 5. and it'll only crash writing to my raid 5. mdadm says there is no problem with my raid.
i'm doing a fsck of the raid 5 as we speak. it's ext 4 running over an encrypted lvm. i'll post what the outcome of that is. after that i'm gunna fsck my / partition.
So when you run a heavy workload on the raid the machine powers off? If so maybe the power supply isn't up to the task although I'd assume the fsck would cause the same problem. It might be a sata cable is failing, although i'd expect it report a failed drive in the array as apposed to shutting down. I suppose it could be the raid controller is failing and bringing down the system or the sata failing is bringing the controller with it.
Possible place to look is check the /var/log/messages file although if the machine is powering off it might not have time to log the error.
fsck completed perfectly without any errors on my raid 5.
/var/log/messages doesn't report anything relevant it just says that eth0 is working or something to that effect.
I don't think its the power supply, because i ran mprime without any problems at all. what I'm going to try next is to create an 8 gigabyte image file inside of the raid 5 for swap and then another 25 gigabyte image file for /tmp, and if it stops crashing then I'll know the problem and find a better solution. i'll post back.
dd if=/dev/zero of=~/swap.swp bs=1024 count=8000k and it completed but when I tried to copy it over the network to the nas it crashed about 34 megabytes in, but then I did the same command on the server directly (creating it directly on the raid 5) and it completed at an average of 34mbs with no problems whatsoever.... so it doesn't seem to be a power supply or an I/O error, because it's only crashing when I start transfering files over the network at this point.
Intermittant "crashes", under load, indicates a possible PS issue. Mprime doesn't do much except draw current on the cpu.
The RAID devices are on the +12V rail, and running all disks adds load, don't dismiss this as a possible cause.
An actual lockup/black screen/freeze type crash should not occur under usual OS error conditions.
Since you're using software raid, you could also run memtest and verify no memory errors.
On the rare occasion when I've seen this kind of failure I start at the beginning, and follow the current.
Power Supply - test and or replace
CPU - re-seat, reattach cooler with fresh liquid silver
RAM - test, re-seat
Check all cabling, make sure everything's tight.
clear out old logs in /var/log - dmesg, kern.log, syslog, messages
Try again
If it crashes again, post (as attachments) those logs - dmesg, kern.log, syslog, messages
You have call traces in the jbd2 module, which I don't believe are normal, could be a sign of a failing disk.
I would download the diagnostics CD from whichever manufacturer made the hard disks
Boot from diagnostics CD and test each drive, quick test first, then long test.
Since your problem only occurs when writing to the disks, that's a pretty good indication there's something amiss in the disks.
so i'm still having the problem. i went out and bought a new power supply, and it still crashes randomly. and it's also crashing even if the raid isn't mounted, only when there is high i/o on the hard disk (or hard disks depending on if the raid is mounted, but it crashes either way.) so at this point I'm thinking it's either the distro i'm using, a certain package i'm using or the motherboard/cpu failing. i'm gunna try using another distro to see if that fixes it, but if it doesn't i'll be back again.
it's still crashing... could it be that a 500 watt power supply isn't enough to power this box? it's running 7 hard disk drives, and it seems to fail when all of the drives are running at their max potential. so maybe it isn't an i/o problem... it could be that my power supply can't handle it?
edit: or could the problem be that i have 3 of my hard drives on 1 cable (rail?) coming out of the psu and then 2 plus a molex sata splitter powering the other 4 (on another cable group(rail?))?
i've been running mprime blend for about 20 minutes straight now with no problems, so i don't think its the cpu.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.