Fedora 12 Socket hang
Hi all,
I am running 2 apps, 1 client and 1 server, over the localhost:15001 port. Every now and then (1 out of 15 times maybe) both apps hang on the send and the recv and the data appears to be stuck in the SendQ: # netstat -tnp Proto Recv-Q Send-Q Local Address Foreign Address State tcp 0 201216 127.0.0.1:37227 127.0.0.1:15001 ESTABLISHED tcp 0 0 127.0.0.1:15001 127.0.0.1:37227 ESTABLISHED The application does the same thing every time sending 700544 bytes 515 times and then exits. Then is started again until it eventually hangs. I increased wmem_max and rmem_max to 16mb from the default of 131kb but that did not help. When I break in with gdb the client is sitting on the send() (no flags set) call and the server is sitting on the recv() call (MSG_WAITALL is set). Any ideas how to debug this and find out why the data is stuck in the Send-Q? Thanks much, |
If the (unnamed?) application supports debugging enable it? If it doesn't then strace it?
|
The application is one that I wrote that sends 700544 bytes 300-500 times to a server that I wrote. When I run with strace it seems to run fine due to the logging over head. Normally it runs over infiniband, but for testing I am running it over the localhost on one box. Over infiniband it runs fine. Using the localhost the send and recv both block and the sendQ indicates that 201616 bytes stuck for some reason.
It only fails on the first send when started. But only fails 2 out of 100 times. Sorry for the delay. I would like to find out what's going on but it's been put on the back burner at the moment. Any ideas how to find out why it stopped sending data? Thanks much! |
All times are GMT -5. The time now is 09:32 AM. |