LinuxQuestions.org - Debian box running virtual machines blows up under large NFS network load

- Linux - Networking (https://www.linuxquestions.org/questions/linux-networking-3/)

- - Debian box running virtual machines blows up under large NFS network load (https://www.linuxquestions.org/questions/linux-networking-3/debian-box-running-virtual-machines-blows-up-under-large-nfs-network-load-897663/)

Debian box running virtual machines blows up under large NFS network load

I am running Debian Squeeze with Xen 4.0. I am running a stress test. I have created 2 virtual machines, each with 512 Mb of memory. I have made one of the vms an NFS server, sharing out a large file (4.6 Gb). The other virtual machine is an NFS client. The stress test consists of passing that big file back and forth via an mv command executed on the client, which moves the file back and forth from the nfs share directory to a local directory. The virtual machines are stored on a remote SAN connected to by ISCSI and formatted in ocfs2.

It is true I have had better luck with some nics than others, but now I have one that does not drop packets.

And yet,

about 1 out of 2 times I attempt to mv the file from the local directory of the nfs client vm to the nfs share, the box running the vms reboots. It leaves no logs, and seldom even any messages on the screen. It just blanks out and the next thing I know I it is rebooting.

I have tried manipulating the size of the MTU, with out positive success.

I have noticed that all--or nearly all--the reboots occur when I attempt to mv the file BACK INTO the nfs shared directory.

Any ideas why the host machine is rebooting, and how this could be fixed? Could changing the size of the ring buffer make a difference? I read about this on a couple of web pages.

update--possible fixes

On the Xen users forum a guy called explained that he had had problems with several nics other than Intels under large sustained network loads. Apparently other nics will interrupt the CPU often, or otherwise interrupt packet transfer when packet numbers exceed 60-70,000.

Apparently, Intel nics offer better throttle control, via RxIntDelay and other params available on the e1000e module and not available for other nics. I have tried Realtek and Broadcom nics without success. These do not offer parameters to tune of the kind made available by Intel.

Therefore, solution may consist of:

1. installing Intel nics. Ideally Pro/1000s such as we have mounted on the functioning machine, but possibly other Intels that allow for tuning.

2. if necessary, replacing not only the nics but also the server that interracts with them, in favour of an intel model.

3. investigate possible workarounds to hardware replacement, including tuning of Linux buffer parameters in /etc/systctl.conf

Any perspectives you all might have on these are welcome!