TCP Retransmission & lost segments problem under Linux but not under XP
Linux - NetworkingThis forum is for any issue related to networks or networking.
Routing, network cards, OSI, etc. Anything is fair game.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
TCP Retransmission & lost segments problem under Linux but not under XP
I have a regular HTTP site I'm trying to connect to that does not work when:
1) I'm using Linux
AND
2) Not on the same LAN as the server.
I am able to access the site when using linux and am on the same LAN, and also with XP whether inside or outside the network.
The screenshot below (with the target IP address blotted out) shows that I'm connecting to the server fine, but the act of sending data is where I get lost segments and retransmissions occur. Ultimately, the remote server closes the connection after about 19 seconds. Something tells me that whatever the problem is is probably a kernel compilation option that I don't have on or something. I'm running Etch with 2.6.23.1 compiled manually. My network-section is below the screenshot. Any ideas on what could be the problem? Thanks.
#
# Networking
#
CONFIG_NET=y
#
# Networking options
#
CONFIG_PACKET=m
CONFIG_PACKET_MMAP=y
CONFIG_UNIX=m
CONFIG_XFRM=y
CONFIG_XFRM_USER=m
# CONFIG_XFRM_SUB_POLICY is not set
# CONFIG_XFRM_MIGRATE is not set
CONFIG_NET_KEY=m
# CONFIG_NET_KEY_MIGRATE is not set
CONFIG_INET=y
CONFIG_IP_MULTICAST=y
# CONFIG_IP_ADVANCED_ROUTER is not set
CONFIG_IP_FIB_HASH=y
# CONFIG_IP_PNP is not set
# CONFIG_NET_IPIP is not set
# CONFIG_NET_IPGRE is not set
# CONFIG_IP_MROUTE is not set
# CONFIG_ARPD is not set
# CONFIG_SYN_COOKIES is not set
CONFIG_INET_AH=m
CONFIG_INET_ESP=m
CONFIG_INET_IPCOMP=m
CONFIG_INET_XFRM_TUNNEL=m
CONFIG_INET_TUNNEL=m
CONFIG_INET_XFRM_MODE_TRANSPORT=y
CONFIG_INET_XFRM_MODE_TUNNEL=y
CONFIG_INET_XFRM_MODE_BEET=y
CONFIG_INET_DIAG=y
CONFIG_INET_TCP_DIAG=y
# CONFIG_TCP_CONG_ADVANCED is not set
CONFIG_TCP_CONG_CUBIC=y
CONFIG_DEFAULT_TCP_CONG="cubic"
# CONFIG_TCP_MD5SIG is not set
# CONFIG_IP_VS is not set
# CONFIG_IPV6 is not set
# CONFIG_INET6_XFRM_TUNNEL is not set
# CONFIG_INET6_TUNNEL is not set
# CONFIG_NETWORK_SECMARK is not set
CONFIG_NETFILTER=y
# CONFIG_NETFILTER_DEBUG is not set
Last edited by debuser123; 11-19-2007 at 03:16 PM.
Bump, still can't figure this out. It has to be a Linux TCP stack issue. Or maybe it's a routing bug or something and Windows just handles it?
I'm gonna try enabling all networking options at the expense of having a larger kernel but who's to say that a certain option being enabled is the problem which could be pretty difficult to debug.
I'd start by running a stock kernel, tear down any network "tweaks" be it through firewalling, sysctl or other configuration settings (or run a Live CD if that's easier) and compare the pcap of that with the one from XP and the current one. To get a better picture running this scenario both in and outside the LAN would be better. And if you can, try running diagnostic tools against the target like LFT and Tcptraceroute (both in and outside the LAN).
But as unSpawn said, for any problem; revert back to
-> No firewall (even on routers for the time of the test)
-> Clean Distro not with all the tweaks.
This mtu stuff is just a guess from the picture you sent
I booted with kernel 2.4.27...and was able to connect to the site, even through my router w/firewall enabled. The only parameter that has changed is the kernel (and modules). Recompiling my current kernel (2.6.23.9) with just the defaults now, will see what happens.
^ lol...I actually did reinstall Linux...on my laptop which really had nothing important worth keeping (plus I wanted to get rid of Ubuntu & go back to Etch). After compiling 2.6.23.9 for my PC I kept getting a kernel panic on startup and didn't have time to debug it 'cause I had to go to work, so I just grabbed my laptop, and installed Etch. I think that kernel was 2.6.18.something though. Won't get a chance to do some more stuff with this issue til tomorrow though. Thanks for the help, it's really appreciated.
Nah, it was a calculated move....to run an experiment you need a control variable! Plus it (kind of) ensures that the problem isn't with my PC if it happens on my laptop as well (which have different hardware setups [including chipsets], except they both run AMD Athlon64s [K7]).
The stock install of Etch on my laptop with 2.6 didn't connect to the site either, but it did from within the LAN.
The funny thing is, connection isn't the problem as the image in my first post proves I'm able to open a successful TCP connection to the remote server without any problem. It just doesn't like when I want to "GET" from it...stingy server.
I've got 2.6.23.11 running on my PC using the 2.4 config as the basis for compilation options, and that didn't work either. Confused on what the next step would be.
So to sum up the problem:
I'm trying to connect to a web site, which I have physical access to. I can connect and view the web page while on the same LAN as the server (under a 2.4 OR 2.6 kernel), but cannot when outside the network (over the internet, running a 2.6 kernel). The aforementioned problem does not occur on Windows XP at all. I do not have permission to change any options on the server or any particular LAN/routing/DNS configuration(s). A packet capture when there is no problem doesn't yield anything out of the ordinary (the response to the HTTP GET is an expected HTTP 200).
How dare I say it's a kernel bug....but what else could it be? This is the ONLY site which I have had a problem with. The next thing I'm going to do is run a packet capture on 2.6 and 2.4...and compare field by field what is different.
XP in all aspects is way more "easy", lenient, "forgiving" (and I don't necessarily mean that in a disrespectful way), whereas GNU/Linux tries hard to (and does succeed in) getting the maximum out of both HW and SW, is strict in adhering to standards but still allowing you to change things (like for instance sysctl network tweaks for coping with broken remote network stacks, ECN and such). So if I have to put my money on something I'd say it's not a kernel bug but a networking problem. But that's just a hunch though.
Before testing I'd start by making sure the box runs two vanilla kernel.org kernels (no Debian kernel) and keep a reasonable distance between versionnumbers, is "clean" meaning default configs for everything, so no firewalling or packet shaping/scrubbing, no network tweaks, no proxying, no nothing, just the bare necessary changes to get the box networked. (It would be good to make copies of "clean" configs so you can track back any changes made and how they affect networking.)
For protocols I'd like to see TCP, ICMP and UDP checked so you should have tcptraceroute, ping and traceroute or hping or any other multi-protocol tool. And for testing I'd not only use a browser but also another client say telnet (just in case).
For testing, if there's another HTTP server in the same "problematic" network you have access to run tests on that one too, else take some public target like www.google.com next to your "problematic target" for baselining.
So what you should end up with is three traces plus a pcap * two or three sites * two kernels. If that's too much data to post here, tarball it, U/L to some free hosting provider and post the URI here.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.