LinuxQuestions.org
Review your favorite Linux distribution.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Software
User Name
Password
Linux - Software This forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.

Notices


Reply
  Search this Thread
Old 08-20-2010, 05:45 AM   #1
lylemwood
Member
 
Registered: Jan 2008
Location: Toronto, Canada
Distribution: Slackware, CentOS
Posts: 47

Rep: Reputation: 18
rsync hangs and locks server


Hello all,

I've had an interesting experience that is completely unprecedented in my history of using rsync. As I write this I'm listening to my server squeal like a stuck pig as the Adaptec RAID controller rebuilds my RAID-5 array (and doesn't lose any data, I HOPE!!!!)

rsync is an integral part of my Disaster Recovery plan at the office. I have a NFS server that hosts all the business critical VMs that my company runs and a couple of VMware hosts that link thereto and attempt to make the best of things. I know NFS isn't the most optimal solution, but that's a conversation for another thread.

My rsync woes began tonight and took an unexpected turn for the worst as I was attempting to implement the most white-knuckled portion of this scenario.

The Process:
1. A script runs that shuts down all VMs that access the NFS array
2. Post-shutdown, the server then rsyncs to a local HDD for fast-backup and to minimize downtime
3. During daily operation, another script will rsync the backed-up VMs to an offsite server that keeps them in case we have a disaster scenario

My plans never made it to step 3 since rsync started crashing during the local portion (step 2).

My Hardware:
1. Adapted 5405 4-port hardware RAID controller
2. 3 x Western Digital Black SATA 2.0 HDDs (1TB each) in RAID-5 config
3. 1 x Western Digital Black SATA 2.0 HDD (2TB) in solo-mode on the same controller

The Software:
- Slackware Linux 13.0 64-bit
- rsync 3.0.6

The files:
All are VM files and, as a result, the virtual disks get quite large. They're up to 80GB/disk on one machine, but most are below 50GB.

rsync initially was not running with a bwlimit set, so it was humming along at over 50MB/sec, then it hung.

After a harrowing reboot process and the OS repairing the stand-alone drive, I bwlimited to 20000kB, which hung after copying twice as much data.

Spotting a trend, I bwlimited to 1000kB, but it hung at the same point as the 20000kB limit. The hang was pretty bad, I had to use the external-poweroff of the solo HDD (not the RAID array) to get the OS to start responding again. Naturally, the Adaptec controller squealed and, when the OS started responding again I issued a reboot and powered the HDD back on. Squealing stops and life goes on.

I read online and some people noticed that, when shunting through localhost with ssh they were eeking by. I thought I'd test it out. BIG MISTAKE.

This time, the hang was TOTAL. I had to use the external power-off of the solo-hdd again. This time, however, the kernel cacked, too, and started spewing numeric data all over the screen. Line after line. As production time was coming again and, thus, the server would need to be in use, I hard-rebooted the box (after all, I hadn't changed ANYTHING on the RAID array, so it should be fine, right? WRONG!) After the reboot, during POST, the squealing stopped for a time and everything seemed normal, then, after booting the controller kernel it started squealing again, saying my RAID-5 array was degraded. UGH!

It's doing the reconstruction now, but I'm hoping to postpone booting any of the VMs until it's finished doing its post-boot-up repair.

My conjecture:
I think my Adaptec card's buffer is getting maxed and it's not handling the writes properly, causing it to block-out and hang the process. The hardware failure I experienced is something that is particularly unnerving. Does anyone out there know if this scenario would work better if I grabbed another SATA controller and hooked up my HDD to that instead? Would I just be wasting money?

Last edited by lylemwood; 08-20-2010 at 08:17 AM. Reason: Updated Adaptec RAID Controller Version
 
Old 08-22-2010, 08:29 AM   #2
carltm
Member
 
Registered: Jan 2007
Location: Canton, MI
Distribution: CentOS, SuSE, Red Hat, Debian, etc.
Posts: 703

Rep: Reputation: 99
Quote:
Originally Posted by lylemwood View Post
This time, however, the kernel cacked, too, and started spewing numeric data all over the screen. Line after line.
This caught my eye. It sounds like some type of hardware
failure somewhere. It could be that something is overheating
or it could be as simple as bad ram.

Until you verify that all the hardware is good, I wouldn't
recommend trying to rebuild the raid array. And the first
thing I would do is run memtest to verify the ram.
 
Old 08-23-2010, 10:39 AM   #3
lylemwood
Member
 
Registered: Jan 2008
Location: Toronto, Canada
Distribution: Slackware, CentOS
Posts: 47

Original Poster
Rep: Reputation: 18
Working on this tomorrow night

Hi carltm,

Thanks for your suggestion. I've never used memtest before, but I've just downloaded the latest version of memtest+ (4.10) from www.memtest.org.

I'll check this out tomorrow night.

I'll definitely attempt to validate the RAM before I start the work adding the new SATA controller.

I still have the old RAM (pre-upgrade) and will swap back and retest if memtest pops on this upgrade RAM.

I know this is probably typical, but this RAM hasn't caused any issues whatsoever on massive file copies before and this was the first time I tried to rsync disk-to-disk on the same controller.

What I'm saying is that there has never been a problem, even rsyncing this same data off-sever to another box via SSH, until I did this intra-controller sync. This is why I think it's the controller cacking, not the RAM.

Nonetheless, I'll run the tests and see what happens.

Oh, and I'll update to advise on Wednesday or Thursday.
 
Old 08-25-2010, 02:24 AM   #4
lylemwood
Member
 
Registered: Jan 2008
Location: Toronto, Canada
Distribution: Slackware, CentOS
Posts: 47

Original Poster
Rep: Reputation: 18
Question memtest+ v4.10 passed all tests

Well, memtest+ passed all tests on the RAM. Took forever to run as there's quite a lot of RAM on the box, but a full pass is a full pass.

I'm moving the in-box backup HDD to a different controller to see if that repairs the issue. Will update after I'm finished the night's work on that.

For now, though, since RAM has been (as best I can tell) eliminated as an issue, any other thoughts from anyone out there?
 
Old 08-25-2010, 05:26 AM   #5
lylemwood
Member
 
Registered: Jan 2008
Location: Toronto, Canada
Distribution: Slackware, CentOS
Posts: 47

Original Poster
Rep: Reputation: 18
Talking Moving the drive to a new controller and BINGO

When I moved the destination (backup) drive to a new controller it worked perfectly.

I moved the drive and I'm currently watching it sync all data with rsync. No bwlimit and no need to nice the operation down to a lower priority... Who knew?

Adaptec, you disappoint me.

Oh well, the system is working well and - apart from memtest sucking up half my night - it looks like I'll be all set now.

If anyone wants to continue the discussion/try to help me understand why a high-end Adaptec controller crapped out like this. Please, let me know.

The only other thing of any import that I can come up with is that I'm using the stock Slackware64 13.0 driver for the card, but it has served me well in all other facets, so I'd rather not get too iffy on that right now unless there's a good reason.

Good luck to all Adaptec users!!!
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] BIOS hangs/locks after detecting harddisks/CD-ROM LQuidFire Linux - Hardware 7 07-06-2010 09:24 AM
Could I run rsync to download files from a server without rsync daemon? Richard.Yang Linux - Software 1 09-18-2009 05:08 AM
rsync hangs Murdock1979 Ubuntu 2 08-30-2008 02:46 PM
rsync hangs and_deva Linux - Enterprise 2 11-11-2004 04:07 AM
Red Hat Linux 8.0 hangs and locks up frequently but randomly? jencom Linux - General 4 10-08-2003 12:17 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Software

All times are GMT -5. The time now is 06:21 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration