Linux - EnterpriseThis forum is for all items relating to using Linux in the Enterprise.
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
Wondering if any of you out there have seen anything like this.
We used to be doing a nightly rsync from a redhat physical server to another redhat physical server without issues. Total data moved nightly is maybe 20-25 GB.
For testing, we have switched the destination server to be a SLES 11 SP1 virtual machine running under vmware. Same source server.
After maybe 8-10 minutes, the networking subsystem on the VM craps out completely. The link on the virtual NIC still shows as up, and there are no errors reported on the virtual NIC, BUT... no packets can get in or out of the NIC, either internally to our company, or externally to the Internet. The rsync process itself hangs, and pings both to and from the server immediately show 100% dropped packets. The only fix is to ifdown the interface then ifup it again.
This is 100% reproducible. It will occur on different files on each rsync attempt. Things I am going to test when I get a chance:
- Mess around with possible compression options
- Get a precise time of how long it takes before the network subsystem hangs
- Hang an strace off of the rsync server process and see what's going on at the time of networking wedge.
- Set the bwlimit option of rsync to see if that prevents the hang or causes it at a significantly different time offset from invocation of the sync.
My gut tells me this is a stacked problem. Possibly a problem with rsync that then exposes a SLES or VMWare networking problem. I mean come on... the WHOLE NETWORKING SUBSYSTEM JUST BEING BLACKED OUT BY RSYNC??? (Or it could be something ssh-related).
Anyways - any thoughts you folks out there could offer would be much appreciated. Things to try - things it could be, etc. etc. I'm already using rsync -vv and -P options - that's not revealing much.
That's not trancendental, but :
- tcpdump (around the time of failure, for limiting the size of the generated file) could show some interesting things.
- sar can collect statistics of network interfaces (usage and errors).