rsync crashes entire networking subsystem?!?
Wondering if any of you out there have seen anything like this.
We used to be doing a nightly rsync from a redhat physical server to another redhat physical server without issues. Total data moved nightly is maybe 20-25 GB.
For testing, we have switched the destination server to be a SLES 11 SP1 virtual machine running under vmware. Same source server.
After maybe 8-10 minutes, the networking subsystem on the VM craps out completely. The link on the virtual NIC still shows as up, and there are no errors reported on the virtual NIC, BUT... no packets can get in or out of the NIC, either internally to our company, or externally to the Internet. The rsync process itself hangs, and pings both to and from the server immediately show 100% dropped packets. The only fix is to ifdown the interface then ifup it again.
This is 100% reproducible. It will occur on different files on each rsync attempt. Things I am going to test when I get a chance:
- Mess around with possible compression options
- Get a precise time of how long it takes before the network subsystem hangs
- Hang an strace off of the rsync server process and see what's going on at the time of networking wedge.
- Set the bwlimit option of rsync to see if that prevents the hang or causes it at a significantly different time offset from invocation of the sync.
My gut tells me this is a stacked problem. Possibly a problem with rsync that then exposes a SLES or VMWare networking problem. I mean come on... the WHOLE NETWORKING SUBSYSTEM JUST BEING BLACKED OUT BY RSYNC??? (Or it could be something ssh-related).
Anyways - any thoughts you folks out there could offer would be much appreciated. Things to try - things it could be, etc. etc. I'm already using rsync -vv and -P options - that's not revealing much.
(Will provide updates in a few days)