LinuxQuestions.org
Visit the LQ Articles and Editorials section
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Networking
User Name
Password
Linux - Networking This forum is for any issue related to networks or networking.
Routing, network cards, OSI, etc. Anything is fair game.

Notices



Reply
 
Search this Thread
Old 07-11-2008, 01:09 AM   #16
debuser123
Member
 
Registered: Nov 2006
Distribution: Ubuntu Hardy
Posts: 69

Original Poster
Rep: Reputation: 15

Something phenomenal happened. It went "out" but it came back "in". I don't know if it was caused by me or not though.

1. Three FF tabs that I had recently tried to open were still trying to load making me believe that it was happening.
2. Tried to open up some usual sites I go to in links...nope, no go, all stuck on "making connection".
...EXCEPT....
yahoo.com worked for some odd reason, and I mean quickly like there were no dropped packets at all. A wireshark cap confirmed that there were no errors in the connection process from start to finish.

This got me thinking, and so I then tried random sites out. Out of about 20 sites I tried, only about 3 worked, linuxquestions.org being one of those three.

Since ping / icmp packets never had a problem in lieu of the TCP checksum errors, I tried pinging good & bad sites to see if maybe it was a TTL problem (for example good sites had a high TTL [e.g. less hops] and bad sites had a low TTL) but then I realized that would be inconclusive as I don't know the starting TTL of a received packet. So getting TTLs of 56, 120, and 240 really could be good and bad depending on the starting TTL.

The default TTL (/proc/sys/net/ipv4/ip_default_ttl) was 64....I set it to 255 for kicks, but that didn't change anything. I then changed other tcp options (set tcp_sack, tcp_timestamps, tcp_windowscaling off) which seemed like they didn't make a difference as new connection attempts at random hosts hung as well.

But somehow sites that I was testing started coming up, which seemed kind of odd 'cause my success rate was like 100%. So then I tried sites that didn't work 5 minutes before and what do you know, they now do.

Now I just have to wait for "it" to happen again so I could see if a) repeatedly trying to load yahoo.com or b) changing those /proc values fixes the problem.
 
Old 07-11-2008, 09:30 AM   #17
ARC1450
Member
 
Registered: Jun 2005
Location: Odenton, MD
Distribution: Gentoo
Posts: 290

Rep: Reputation: 30
Quote:
Originally Posted by farslayer View Post
could be a bug in the driver too and not bad hardware.. like I said it was easier for me to throw a NIC in the box than waste my time chasing something I couldn't control or identify.
Not that I'm doubting you because this well could be the case, but I wouldn't think so. I'm using kernel 2.6 with via-rhine and it's just fine. I actually use that as a DNS/Squid/IRC/file server. No issues.

Not to mention if it was a bug in a driver, I'm sure there would be more people screaming about it now concerning the circumstances (the computer just sits and has TCP checksum errors).

Quote:
The default TTL (/proc/sys/net/ipv4/ip_default_ttl) was 64....I set it to 255 for kicks, but that didn't change anything.
Watch arbitrarily setting parameters like this. TTL is used to keep packets from living forever if they get caught in a routing loop. Not to mention, since this can be used in a type of DoS attack, some ISPs may filter out packets with a TTL that is set too high (since it can needlessly consume processor cycles).

Quote:
DNS queries work no problem, but connecting is the part that seems to have a problem. Running wireshark, I noticed that the main contributor seems to be TCP checksum errors. Offloading is not the problem because the checksums are always off by 1 (for example, correct checksum could be 0x1234 when the segment might have a checksum of 0x1233). It seems some packets "work" while others don't...and for example just opening something as simple as http://google.com might take about 2-3 minutes for the page to fully load.
After reading through the thread, this caught my attention; reason being that DNS is UDP and you have no issues. If you could, next time you notice this problem, try connecting a game to a server (as most games run over UDP) and see what happens.

Either way, after checking around, I found this link: http://wiki.xensource.com/xenwiki/Xe...7cb10110eae9b7

You can always use ethtool to turn of tx checksumming and see if that fixes things. Just some suggestions.
 
Old 07-11-2008, 10:48 AM   #18
debuser123
Member
 
Registered: Nov 2006
Distribution: Ubuntu Hardy
Posts: 69

Original Poster
Rep: Reputation: 15
Yeah I'm pretty sure I'd be able to connect to a game server (if I knew how...don't really play games) if it was through UDP. ICMP (ping) packets are fine too.

It's hard to fully debug this problem because I need to see what the other end sees (and sends). Does the remote server even receive my SYN's or are the packets invalid & they're dropping them? It would be interesting if the tcp checksums mysteriously change. I don't doubt AT&T (my ISP) probably having something to do with it but then again if they were, I should still have the problem after rebooting. That's the big thing....after rebooting the problem is guaranteed to go away which makes me believe it's a software (or hardware initialization) issue. I also tested to see if I get the problem in Windows and I don't, but the problem could be triggered by something that Linux does (or doesn't do) that Windows doesn't do (or does).

And what makes it harder is the fact that I haven't figured out what triggers the problem meaning I have to just wait for it to pop up and hope that I'm in the mood to actually do some debugging, which, isn't always. Most times I just say cut the crap and reboot.

I tried the suggestions listed on the link but they didn't work because the via-rhine driver in its current state (2.6.25.10) doesn't support offloading (tx or rx). I think the device itself does as I've seen a 2.6.22 patch floating around that enables it, and I might give that a try soon.
Code:
# ethtool -k eth0
Offload parameters for eth0:
Cannot get device rx csum settings: Operation not supported
rx-checksumming: off
tx-checksumming: off
scatter-gather: off
tcp segmentation offload: off
udp fragmentation offload: off
generic segmentation offload: off
All the diagnostic commands of ethtool are unsupported as well (result in Operation not supported).
 
Old 07-11-2008, 12:05 PM   #19
debuser123
Member
 
Registered: Nov 2006
Distribution: Ubuntu Hardy
Posts: 69

Original Poster
Rep: Reputation: 15
Ok, so I'm most sure this command sequence got me back:

Code:
/proc/sys/net/ipv4# echo 0 > tcp_sack
/proc/sys/net/ipv4# echo 0 > tcp_dsack
/proc/sys/net/ipv4# echo 0 > tcp_fack
/proc/sys/net/ipv4# echo 0 > tcp_timestamps
I should've tried them one by one, but here's what I did:
1. "It" happened
2. After about 2 minutes of "it" happening, I changed the tcp params above
3. "It" was gone on the first connection attempt after changing them.

I need to research those params.

But what I'm gonna do is throw those into a script. As soon as it happens again, I'm gonna instantly run that script. If I'm back up right after, then the problem resolution has been isolated.

Last edited by debuser123; 07-11-2008 at 12:07 PM.
 
Old 07-11-2008, 01:11 PM   #20
debuser123
Member
 
Registered: Nov 2006
Distribution: Ubuntu Hardy
Posts: 69

Original Poster
Rep: Reputation: 15
Figured out the bad parameter:

tcp_timestamps

With it enabled = bad. Disabled = good. Keep in mind this is only after "it" happens. Before that (e.g., a fresh reboot), enabled and disabled = good.

I found a post that pretty much exactly describes my problem:
http://linux.derkeiler.com/Mailing-L.../msg00996.html

Quote:
I attempted to connect to eBay, the ip address was 216.113.103.100 this is one of their cgi servers (cgi-core.ebay.com). Originally I noticed that I was unable to completely load pages from them. Further investigation showed that it was only to certain IP addresses. After some additional looking I found that a Windoze machine was able to connect to this address with no problem.

after looking at a network trace I found that the Linux machine correctly sent a SYN packet but no SYN/ACK was returned. The windows machine sent the SYN and promptly got a SYN/ACK in response. There were obvious differences in the options used in the SYN packet, primarily TCP Timestamps were enabled on the Linux system. After disabling the Timestamps connectivity was restored.
Lol, kind of makes sense if you read the title of this thread.

So, I think I am gonna blame it on my ISP...There's probably a traffic-shaping device that's messing with me or a bad box somewhere along some internet routes.

I think I can now put this issue to rest and just leave timestamps disabled altogether; thanks to all who offered help & suggestions.

Last edited by debuser123; 07-11-2008 at 01:32 PM.
 
  


Reply

Tags
amd, checksum, tcp, wireshark


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
TCP checksum error mshenbagaraj Programming 3 05-16-2007 03:43 PM
Make most amount of Linux users in least amount of time studpenguin General 24 02-02-2007 04:42 PM
tcp checksum incorrect x1228 Programming 1 09-11-2006 04:53 AM
TCP header checksum live_dont_exist Programming 16 04-13-2005 01:45 PM
anyone can help me with the TCP checksum? vaaub Programming 1 02-10-2004 02:32 PM


All times are GMT -5. The time now is 07:35 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration