Linux - SoftwareThis forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Introduction to Linux - A Hands on Guide
This guide was created as an overview of the Linux Operating System, geared toward new users as an exploration tour and getting started guide, with exercises at the end of each chapter.
For more advanced trainees it can be a desktop reference, and a collection of the base knowledge needed to proceed with system and network administration. This book contains many real life examples derived from the author's experience as a Linux system and network administrator, trainer and consultant. They hope these examples will help you to get a better understanding of the Linux system and that you feel encouraged to try out things on your own.
Click Here to receive this Complete Guide absolutely free.
I've had an interesting experience that is completely unprecedented in my history of using rsync. As I write this I'm listening to my server squeal like a stuck pig as the Adaptec RAID controller rebuilds my RAID-5 array (and doesn't lose any data, I HOPE!!!!)
rsync is an integral part of my Disaster Recovery plan at the office. I have a NFS server that hosts all the business critical VMs that my company runs and a couple of VMware hosts that link thereto and attempt to make the best of things. I know NFS isn't the most optimal solution, but that's a conversation for another thread.
My rsync woes began tonight and took an unexpected turn for the worst as I was attempting to implement the most white-knuckled portion of this scenario.
The Process:
1. A script runs that shuts down all VMs that access the NFS array
2. Post-shutdown, the server then rsyncs to a local HDD for fast-backup and to minimize downtime
3. During daily operation, another script will rsync the backed-up VMs to an offsite server that keeps them in case we have a disaster scenario
My plans never made it to step 3 since rsync started crashing during the local portion (step 2).
My Hardware:
1. Adapted 5405 4-port hardware RAID controller
2. 3 x Western Digital Black SATA 2.0 HDDs (1TB each) in RAID-5 config
3. 1 x Western Digital Black SATA 2.0 HDD (2TB) in solo-mode on the same controller
The Software:
- Slackware Linux 13.0 64-bit
- rsync 3.0.6
The files:
All are VM files and, as a result, the virtual disks get quite large. They're up to 80GB/disk on one machine, but most are below 50GB.
rsync initially was not running with a bwlimit set, so it was humming along at over 50MB/sec, then it hung.
After a harrowing reboot process and the OS repairing the stand-alone drive, I bwlimited to 20000kB, which hung after copying twice as much data.
Spotting a trend, I bwlimited to 1000kB, but it hung at the same point as the 20000kB limit. The hang was pretty bad, I had to use the external-poweroff of the solo HDD (not the RAID array) to get the OS to start responding again. Naturally, the Adaptec controller squealed and, when the OS started responding again I issued a reboot and powered the HDD back on. Squealing stops and life goes on.
I read online and some people noticed that, when shunting through localhost with ssh they were eeking by. I thought I'd test it out. BIG MISTAKE.
This time, the hang was TOTAL. I had to use the external power-off of the solo-hdd again. This time, however, the kernel cacked, too, and started spewing numeric data all over the screen. Line after line. As production time was coming again and, thus, the server would need to be in use, I hard-rebooted the box (after all, I hadn't changed ANYTHING on the RAID array, so it should be fine, right? WRONG!) After the reboot, during POST, the squealing stopped for a time and everything seemed normal, then, after booting the controller kernel it started squealing again, saying my RAID-5 array was degraded. UGH!
It's doing the reconstruction now, but I'm hoping to postpone booting any of the VMs until it's finished doing its post-boot-up repair.
My conjecture:
I think my Adaptec card's buffer is getting maxed and it's not handling the writes properly, causing it to block-out and hang the process. The hardware failure I experienced is something that is particularly unnerving. Does anyone out there know if this scenario would work better if I grabbed another SATA controller and hooked up my HDD to that instead? Would I just be wasting money?
Last edited by lylemwood; 08-20-2010 at 08:17 AM.
Reason: Updated Adaptec RAID Controller Version
This time, however, the kernel cacked, too, and started spewing numeric data all over the screen. Line after line.
This caught my eye. It sounds like some type of hardware
failure somewhere. It could be that something is overheating
or it could be as simple as bad ram.
Until you verify that all the hardware is good, I wouldn't
recommend trying to rebuild the raid array. And the first
thing I would do is run memtest to verify the ram.
Thanks for your suggestion. I've never used memtest before, but I've just downloaded the latest version of memtest+ (4.10) from www.memtest.org.
I'll check this out tomorrow night.
I'll definitely attempt to validate the RAM before I start the work adding the new SATA controller.
I still have the old RAM (pre-upgrade) and will swap back and retest if memtest pops on this upgrade RAM.
I know this is probably typical, but this RAM hasn't caused any issues whatsoever on massive file copies before and this was the first time I tried to rsync disk-to-disk on the same controller.
What I'm saying is that there has never been a problem, even rsyncing this same data off-sever to another box via SSH, until I did this intra-controller sync. This is why I think it's the controller cacking, not the RAM.
Nonetheless, I'll run the tests and see what happens.
Oh, and I'll update to advise on Wednesday or Thursday.
When I moved the destination (backup) drive to a new controller it worked perfectly.
I moved the drive and I'm currently watching it sync all data with rsync. No bwlimit and no need to nice the operation down to a lower priority... Who knew?
Adaptec, you disappoint me.
Oh well, the system is working well and - apart from memtest sucking up half my night - it looks like I'll be all set now.
If anyone wants to continue the discussion/try to help me understand why a high-end Adaptec controller crapped out like this. Please, let me know.
The only other thing of any import that I can come up with is that I'm using the stock Slackware64 13.0 driver for the card, but it has served me well in all other facets, so I'd rather not get too iffy on that right now unless there's a good reason.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.