Linux - GeneralThis Linux forum is for general Linux questions and discussion.
If it is Linux Related and doesn't seem to fit in any other forum then this is the place.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
borisk, that was a terrific first post. Using that diff I was able to get it to compile and replaced the file "/lib/modules/2.6.31-21-server/kernel/drivers/message/fusion/mptsas.ko" with it. I was able to format both arrays without error and I am currently doing an rsync from my backup. I still don't fully trust this but I'll be keeping an eye on it.
Thanks for your help borisk and H_TeXMeX_H.
*edit*
Well, that didn't last. It did another panic during the rsync. I already created an RMA with LSI on the chance that this is hardware related as when I first used the card I didn't have any real problems, it's just been getting worse. Any suggestions on what I should try? Send it in to replace it or anything else I can try?
The only other thing is: So, if this card was ok when you first started using it, do you remember any change that was made right before this started happening ? Any clue as to what it might be related to, other than these drivers (which I suspect are the cause). Just in case it's something else.
When I built the server I upgraded everything to the latest version, including the kernel. After that I formatted the arrays and moved the data on to them. I then set up Samba, nfs, rsync, mail, and all the others things I needed on it. Since then no kernel or module changes were made. Back then I saw it kernel panic once in the first week. A few days later, again, a few days later, again. Then it started doing it very often (more than once a day) and I did some testing and discovered that it would no longer even complete a format. Now that I replaced the module with the compiled one it completed the format on both arrays and transferred 60GB of data before it had another panic.
So then it has never been stable, just more stable than now, I see. Well, unfortunately I have no more suggestions, if everything else checks out then the driver is probably just very poorly maintained or something, and thus is unstable. Maybe if you post some more logs from '/var/log/syslog' and messages, right before the hard lock, it may give some more clues.
I removed the card from the server and decided to us software RAID with mdadm until I fix the problem. I created the two arrays and formatted them with no problem. I then started an rsync to put the 2TB of data back on the drives and.....PANIC.
So, this is NOT related to the card as I am using the on board adapter, the same one the sys/boot drives are on. The data drives were already tested and came up clean, I am now doing diagnostics on the sys/boot drives. If that does not come up with any error I will be leaving Ubuntu for Debian and starting over with this server. If Debian has the same problem I'll set the server on fire and quit working in IT.
A newer kernel may solve it, or it might be just bad hardware. If it's not the card, then it could be anything. Just keep looking through logs and trying different hardware configurations. I would start with the kernel tho, it could be as simple as that, a bad kernel.
It's a good idea to try different distros and see if it helps.
I've tried a number of distros and kernels and all have the same problem. Obviously this is a hardware issue, but any guess as to what? If I run the program Folding@Home which uses 100% of the CPU (all cores, it's meant to do that) then I also get a kernel panic. If I simply to 4 endless BASH loops to force each core to have 100% usage it does not panic.
So, certain processes which cause high CPU cause a panic, but not all high CPU processes do this. I have never seen anything like this before.
Any guess on which hardware is causing the problem and how to resolve it? At this point all I can do is throw away several hundred dollars of hardware and start buying new ones and hope it works, but it may have the same problem as this one.
I was away for a while, but if the problem is still there try blacklisting all modules but 'ahci' and use that for all SATA drives. For example blacklist 'pata_atiixp'.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.