TCP sincronization some packets lost
Hello Again:
I have a network with 2 linux servers and 2 windows servers. One linux server (name quadcore) is the one handling telnet, ftp, http, mysql services via the router to the internet. Within the lan it provides the same services as well. The second linux machine (name inflinux) is a backup server just in case to replace the main linux server (it also gives the same services, telnet, ftp, http but only to the LAN). Inflinux is making backup copies of the mysql databases. Every 4 hours it does a mysqldump from quadcore. Then I noticed that certain mysqldump were failing; not connecting. I'm located outside the router, and I access that network via telnet and ftp, etc. So I entered quadcore (main server) and from there I telnet'ed to inflinux (backup server). Finally I ping'ed from inflinux (192.168.76.15) to quadcore (192.168.76.6). Then I exit inflinux and ping'ed from quadcore to inflinux. Notice the packet loss is the same (60% in this case. I've go 52% in other times). HTTP transmission is ok, mysqldump when successful it gets the data without missing bits. But the fact is that there are problems in communication that cause delays. Do you think it's some problem in sincronization?. One txqueuelen is 100 (quadcore) while inflinux is 1000. I also get to have delays when trying to telnet main server quadcore, but in general this is rare. I did a tcpdump and I don't see more references but to local machines. Firewall in quadcore is off. Firewall in inflinux has all services allowed (http, ftp, mysql, samba). Thanks for the help.!!! FIRST PING FROM INFLINUX TO QUADCORE javier@inflinux:~> ping 192.168.76.6 PING 192.168.76.6 (192.168.76.6) 56(84) bytes of data. ping: sendmsg: Operation not permitted ping: sendmsg: Operation not permitted ping: sendmsg: Operation not permitted ping: sendmsg: Operation not permitted ping: sendmsg: Operation not permitted ping: sendmsg: Operation not permitted 64 bytes from 192.168.76.6: icmp_seq=7 ttl=64 time=0.371 ms 64 bytes from 192.168.76.6: icmp_seq=8 ttl=64 time=0.366 ms 64 bytes from 192.168.76.6: icmp_seq=9 ttl=64 time=0.376 ms 64 bytes from 192.168.76.6: icmp_seq=10 ttl=64 time=0.344 ms 64 bytes from 192.168.76.6: icmp_seq=11 ttl=64 time=0.345 ms ping: sendmsg: Operation not permitted ping: sendmsg: Operation not permitted ping: sendmsg: Operation not permitted ping: sendmsg: Operation not permitted ping: sendmsg: Operation not permitted 64 bytes from 192.168.76.6: icmp_seq=17 ttl=64 time=0.349 ms 64 bytes from 192.168.76.6: icmp_seq=18 ttl=64 time=0.370 ms 64 bytes from 192.168.76.6: icmp_seq=19 ttl=64 time=0.365 ms 64 bytes from 192.168.76.6: icmp_seq=20 ttl=64 time=0.373 ms 64 bytes from 192.168.76.6: icmp_seq=21 ttl=64 time=0.340 ms 64 bytes from 192.168.76.6: icmp_seq=22 ttl=64 time=0.344 ms ping: sendmsg: Operation not permitted ping: sendmsg: Operation not permitted ping: sendmsg: Operation not permitted ping: sendmsg: Operation not permitted ping: sendmsg: Operation not permitted ping: sendmsg: Operation not permitted ping: sendmsg: Operation not permitted ping: sendmsg: Operation not permitted 64 bytes from 192.168.76.6: icmp_seq=31 ttl=64 time=0.382 ms 64 bytes from 192.168.76.6: icmp_seq=32 ttl=64 time=0.343 ms 64 bytes from 192.168.76.6: icmp_seq=33 ttl=64 time=0.356 ms ping: sendmsg: Operation not permitted ping: sendmsg: Operation not permitted ping: sendmsg: Operation not permitted ping: sendmsg: Operation not permitted ping: sendmsg: Operation not permitted 64 bytes from 192.168.76.6: icmp_seq=39 ttl=64 time=0.440 ms 64 bytes from 192.168.76.6: icmp_seq=40 ttl=64 time=0.436 ms 64 bytes from 192.168.76.6: icmp_seq=41 ttl=64 time=0.353 ms 64 bytes from 192.168.76.6: icmp_seq=42 ttl=64 time=0.426 ms 64 bytes from 192.168.76.6: icmp_seq=43 ttl=64 time=0.344 ms 64 bytes from 192.168.76.6: icmp_seq=44 ttl=64 time=0.342 ms 64 bytes from 192.168.76.6: icmp_seq=45 ttl=64 time=0.350 ms ping: sendmsg: Operation not permitted ping: sendmsg: Operation not permitted ping: sendmsg: Operation not permitted ping: sendmsg: Operation not permitted ping: sendmsg: Operation not permitted ping: sendmsg: Operation not permitted ping: sendmsg: Operation not permitted 64 bytes from 192.168.76.6: icmp_seq=53 ttl=64 time=0.373 ms 64 bytes from 192.168.76.6: icmp_seq=54 ttl=64 time=0.383 ms 64 bytes from 192.168.76.6: icmp_seq=55 ttl=64 time=0.342 ms 64 bytes from 192.168.76.6: icmp_seq=56 ttl=64 time=0.346 ms ping: sendmsg: Operation not permitted ping: sendmsg: Operation not permitted ping: sendmsg: Operation not permitted ping: sendmsg: Operation not permitted ping: sendmsg: Operation not permitted ping: sendmsg: Operation not permitted ping: sendmsg: Operation not permitted ping: sendmsg: Operation not permitted 64 bytes from 192.168.76.6: icmp_seq=65 ttl=64 time=0.378 ms 64 bytes from 192.168.76.6: icmp_seq=66 ttl=64 time=0.342 ms 64 bytes from 192.168.76.6: icmp_seq=67 ttl=64 time=0.349 ms ping: sendmsg: Operation not permitted ping: sendmsg: Operation not permitted ping: sendmsg: Operation not permitted ping: sendmsg: Operation not permitted ping: sendmsg: Operation not permitted ping: sendmsg: Operation not permitted ping: sendmsg: Operation not permitted ping: sendmsg: Operation not permitted 64 bytes from 192.168.76.6: icmp_seq=76 ttl=64 time=0.379 ms 64 bytes from 192.168.76.6: icmp_seq=77 ttl=64 time=0.346 ms 64 bytes from 192.168.76.6: icmp_seq=78 ttl=64 time=0.137 ms ping: sendmsg: Operation not permitted ping: sendmsg: Operation not permitted ping: sendmsg: Operation not permitted ping: sendmsg: Operation not permitted ping: sendmsg: Operation not permitted ping: sendmsg: Operation not permitted ping: sendmsg: Operation not permitted 64 bytes from 192.168.76.6: icmp_seq=86 ttl=64 time=0.361 ms 64 bytes from 192.168.76.6: icmp_seq=87 ttl=64 time=0.372 ms 64 bytes from 192.168.76.6: icmp_seq=88 ttl=64 time=0.344 ms 64 bytes from 192.168.76.6: icmp_seq=89 ttl=64 time=0.345 ms 64 bytes from 192.168.76.6: icmp_seq=90 ttl=64 time=0.348 ms ping: sendmsg: Operation not permitted ping: sendmsg: Operation not permitted --- 192.168.76.6 ping statistics --- 92 packets transmitted, 36 received, 60% packet loss, time 91012ms rtt min/avg/max/mdev = 0.137/0.357/0.440/0.046 ms SECOND PING FROM QUADCORE TO INFLINUX NOTICE SKIPPED SEQ NO. javier@quadcore:~> ping inflinux PING inflinux (192.168.76.15) 56(84) bytes of data. 64 bytes from inflinux (192.168.76.15): icmp_seq=1 ttl=64 time=3.32 ms 64 bytes from inflinux (192.168.76.15): icmp_seq=6 ttl=64 time=0.251 ms 64 bytes from inflinux (192.168.76.15): icmp_seq=10 ttl=64 time=0.249 ms 64 bytes from inflinux (192.168.76.15): icmp_seq=11 ttl=64 time=0.252 ms 64 bytes from inflinux (192.168.76.15): icmp_seq=12 ttl=64 time=0.248 ms 64 bytes from inflinux (192.168.76.15): icmp_seq=13 ttl=64 time=0.253 ms 64 bytes from inflinux (192.168.76.15): icmp_seq=21 ttl=64 time=0.249 ms 64 bytes from inflinux (192.168.76.15): icmp_seq=22 ttl=64 time=0.249 ms 64 bytes from inflinux (192.168.76.15): icmp_seq=23 ttl=64 time=0.249 ms 64 bytes from inflinux (192.168.76.15): icmp_seq=24 ttl=64 time=0.248 ms 64 bytes from inflinux (192.168.76.15): icmp_seq=33 ttl=64 time=0.252 ms 64 bytes from inflinux (192.168.76.15): icmp_seq=34 ttl=64 time=0.249 ms 64 bytes from inflinux (192.168.76.15): icmp_seq=35 ttl=64 time=0.252 ms 64 bytes from inflinux (192.168.76.15): icmp_seq=44 ttl=64 time=0.251 ms 64 bytes from inflinux (192.168.76.15): icmp_seq=45 ttl=64 time=0.252 ms 64 bytes from inflinux (192.168.76.15): icmp_seq=46 ttl=64 time=0.250 ms 64 bytes from inflinux (192.168.76.15): icmp_seq=49 ttl=64 time=0.230 ms 64 bytes from inflinux (192.168.76.15): icmp_seq=51 ttl=64 time=0.258 ms 64 bytes from inflinux (192.168.76.15): icmp_seq=52 ttl=64 time=0.256 ms 64 bytes from inflinux (192.168.76.15): icmp_seq=53 ttl=64 time=0.253 ms 64 bytes from inflinux (192.168.76.15): icmp_seq=54 ttl=64 time=0.250 ms 64 bytes from inflinux (192.168.76.15): icmp_seq=55 ttl=64 time=0.253 ms 64 bytes from inflinux (192.168.76.15): icmp_seq=56 ttl=64 time=0.250 ms 64 bytes from inflinux (192.168.76.15): icmp_seq=57 ttl=64 time=0.253 ms 64 bytes from inflinux (192.168.76.15): icmp_seq=66 ttl=64 time=0.252 ms 64 bytes from inflinux (192.168.76.15): icmp_seq=67 ttl=64 time=0.249 ms 64 bytes from inflinux (192.168.76.15): icmp_seq=68 ttl=64 time=0.251 ms --- inflinux ping statistics --- 69 packets transmitted, 27 received, 60% packet loss, time 68007ms rtt min/avg/max/mdev = 0.230/0.364/3.326/0.581 ms |
It's more likely there is a duplex mis-match between one of the network cards and the switch it's plugged into.
|
This is the ifconfig of quadcore for its eth0:
eth0 Link encap:Ethernet HWaddr 00:1C:C0:35:54:F0 inet addr:192.168.76.6 Bcast:192.168.76.255 Mask:255.255.255.0 inet6 addr: fe80::21c:c0ff:fe35:54f0/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:264174 errors:0 dropped:0 overruns:0 frame:0 TX packets:278374 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:100 RX bytes:27311771 (26.0 Mb) TX bytes:396425389 (378.0 Mb) and for the other server inflinux: eth0 Link encap:Ethernet HWaddr 00:15:58:23:ED:17 inet addr:192.168.76.15 Bcast:192.168.76.255 Mask:255.255.255.0 inet6 addr: fe80::215:58ff:fe23:ed17/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:3201030 errors:0 dropped:0 overruns:0 frame:0 TX packets:3237135 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:229245144 (218.6 Mb) TX bytes:477458716 (455.3 Mb) Interrupt:217 Base address:0xe200 This a brief tcpdump to show the delays: Notice at 22:26:40 I issue a "telnet" to quadcore from inflinux, while I keep tcpdump in the background within inflinux. As you can see there are messages from .local.domain machines possibly DNS servers (windows machines I don't control). 22:26:40.068263 IP 192.168.75.102.netbios-dgm > 192.168.76.255.netbios-dgm: NBT UDP PACKET(138) 22:26:40.068596 IP inflinux.site.solid-mux > servigi.pdm.local.domain: 16255+ PTR? 255.76.168.192.in-addr.arpa. (45) 22:26:40.068838 IP servigi.pdm.local.domain > inflinux.site.solid-mux: 16255 NXDomain* 0/1/0 (132) 22:26:40.069027 IP inflinux.site.solid-mux > servigi.pdm.local.domain: 48599+ PTR? 102.75.168.192.in-addr.arpa. (45) 22:26:40.157613 IP servigi.pdm.local.domain > inflinux.site.solid-mux: 48599 NXDomain* 0/0/0 (45) 22:26:40.284380 IP 192.168.76.6.37569 > inflinux.site.telnet: S 30159679:30159679(0) win 5840 <mss 1460,sackOK,timestamp 31704488 0,nop,wscale 7> 22:26:41.411141 IP servpdm.pdm.local.55871 > 255.255.255.255.11050: UDP, length: 664 22:26:44.511539 IP 192.168.75.6.netbios-dgm > 192.168.76.255.netbios-dgm: NBT UDP PACKET(138) 22:26:44.512164 IP inflinux.site.solid-mux > servigi.pdm.local.domain: 64527+ PTR? 6.75.168.192.in-addr.arpa. (43) 22:26:44.605830 IP servigi.pdm.local.domain > inflinux.site.solid-mux: 64527 NXDomain* 0/0/0 (43) 22:26:46.285356 IP 192.168.76.6.37569 > inflinux.site.telnet: S 30159679:30159679(0) win 5840 <mss 1460,sackOK,timestamp 31705988 0,nop,wscale 7> 22:26:58.287303 IP 192.168.76.6.37569 > inflinux.site.telnet: S 30159679:30159679(0) win 5840 <mss 1460,sackOK,timestamp 31708988 0,nop,wscale 7> |
The delay in telnet is 192.168.76.6 is not seeing ACK packets from inflinux. Either inflinux isn't receiving the SYN, or the ACK is being dropped on the way back. It's pretty clear you have a lot of packet loss on your network, the question is why. Try this on both the machines involved:
Code:
$ sudo ethtool eth0 Code:
$ netstat -s |
Thanks a lot:
For main machine quadcore: By the way, I already tried turning off the firewall at backup server inflinux, but the delays continue. Quote:
How can I fix this?, what am I missing? Thanks Quote:
|
chort I forgot to mention the end idea when I said at 22:26:40.... There is a response, but after a loooong wait of 6 seconds. It's usually instantaneous.
Quote:
Hey!!!, just remembered, last time this happened seriously (worse than this, very loooong delays in ALL telnet logins, but no problem once in session), the ISP had done a change in DNS numbers. Once I put those in the linux boxes, all worked ok. Please explain me how to make sudo ethtool to report "something" there where it shows null. Greetings |
Your first box is having to do a lot of retransmits, and your second box has a phenomenal amount of incorrect TCP cookies received. It looks like the second box is having it's socket buffers overrun and generating TCP cookies, which for some strange reason your first box is apparently not answering correctly (bizarre). It seems like the second machine is having a hard time reading packets fast enough.
Is the second machine actually using eth0, or is the configured interface eth1? What happens if you try running ethtool against eth1? I'm guessing the second machine has a very cheap NIC. Unless there's a problem with the configuration, your best bet is to buy a new NIC (the one it's using is probably integrated in the motherboard, right?) and use that instead. Intel made some very good 100Mb ethernet cards based on their 82555/82557/82559 chipsets. You can find them on eBay for ridiculously cheap (although pretty much no online stores sell them any more). |
Yes both cards are integrated in the motherboard.
No there is no eth1 in the second machine. I will go to the site and do some tests there. Yes a new card will be a good thing. Tell me about the 10% of all the packets to be invalid SYN cookies in TCPext: Could this be just packets that pass by but don't belong or are not bound to this server? Thanks --- Instead of going to the site, I placed a cronjob to ping the card of the pc. Can't it be a DOS attack? Is it more likely the damn cheap card?: Quote:
A ping to the main quadcore, from within this machine to it's own card: Quote:
|
All times are GMT -5. The time now is 02:36 AM. |