I'm implementing a
nonblocking tcp socket based client and server program in ubuntu linux and server is somewhere in the internet (client is in local network connecting to internet with nat). there is a device in the path between client and server that has a mtu set to 1492. (I don't have access to that device)
while most of the data is written to socket each time is much smaller then 1000 bytes, I forced the write system call to write no more than 1430 bytes to socket each time, to make my troubleshooting simpler. the output of wireshark shows that when ever data size is small and ip layer length is smaller than 1492, everything works fine. but as soon as one packet exceeds that 1492, the device can not fragment the packet (wireshark shows the DF flag is set) and an icmp packet originates by that device in the path that asks the client's linux to reduce the mss size in the tcp stack which is pretty normal. the problem is, the linux does not give a crap about that pmtu icmp errors and still tries to retransmit that much data and that device keeps asking it to reduce packet size. the packet never riches to server and it ruins the whole connection.
then I decided to do
Code:
if ((sockfd = socket(AF_INET, SOCK_STREAM, 0)) == -1) {
perror("socket");exit(1);
}
int optval = IP_PMTUDISC_DO;
if(setsockopt(sockfd,IPPROTO_IP, IP_MTU_DISCOVER,(char *)&optval,sizeof(optval)))
perror("setsocketopt()");
based on some manual that said IP_PMTUDISC_DO forces a stream socket to do path mtu discovery but after setting it, nothing happened and still tcp does not react to that icmp errors.
then I changed it to:
Code:
int optval = IP_PMTUDISC_DONT;
to let that device fragment the packets.
this time wireshark showed the DF flag changed to not set in packets and that icmp errors from that device had been gone, but again, packet more that 1492 failed to receive to server and linux tried to retransmit it with no success.
Because I knew I don't write more than 1430 to socket each time, by looking sent data in wireshark, I discovered large packet is because of the naggle, so this time I decided to add tcp_nodelay.
Code:
//(optval is declared as int above)
optval = 1;
if(setsockopt(sockfd,SOL_TCP,TCP_NODELAY,(char *) &optval,sizeof(optval)))
{
perror("setsocketopt()");exit(1);
}
but the larg packets had not been gone!!!! seemed TCP_NODELAY did not effect to naggle. I even tried the
Code:
optval = 1;
if(setsockopt(sockfd,IPPROTO_TCP,TCP_NODELAY,(char *) &optval,sizeof(optval)))
{
perror("setsocketopt()");exit(1);
}
and result was the same, could not disable the naggle.
now here is my two questions:
1) why tcp does not give a damn to pmtu and don't reduce the mss size for retransmiting larger packets?
2) why setting TCP_NODELAY does not dissable naggle? am I setting it in a wrong way? do I need to set TCP_NODELAY before each write (or after) and setting it once is not sufficient? or something is totally wrong with setsockopt in my linux's socket api?!
please help me figuring this out.
P.S when I put the server program in a machine in lan, everything works grate and that indicates that server is not slow in handling connections or there is no wrong way in implementing sockets. it even works for much larger packets. The evidences are pretty much indicating the problem is based on mtu and packet sizes.
P.P.S the outgoing interface in both client and server is set to 1500 MTU