LinuxQuestions.org
Welcome to the most active Linux Forum on the web.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Networking
User Name
Password
Linux - Networking This forum is for any issue related to networks or networking.
Routing, network cards, OSI, etc. Anything is fair game.

Notices


Reply
  Search this Thread
Old 02-10-2008, 09:21 PM   #1
debuser123
Member
 
Registered: Nov 2006
Distribution: Ubuntu Hardy
Posts: 69

Rep: Reputation: 15
Question TCP Checksum errors ... only after some amount of time has passed.


Edit 07/11/2008: Quick fix found: disable tcp timestamps
(# echo 0 > /proc/sys/net/ipv4/tcp_timestamps
)

Currently running a 2.6.24.1 kernel but also experienced this under 2.6.23.11; I am pretty sure it's a kernel/driver problem as a reboot fixes the "problem" without me changing any modem/router configurations (no wireless stuff involved). And I've never experienced this problem with Windows.

As soon as I boot up my PC (Debian Etch), I have no problems with my internet connection. However, if I leave my computer on for a while, my internet connection pretty much stalls (all the while my laptop running XP plugged in the same router has no problem). "A while" is not a definite amount of time, but let's say I've experienced this when I've left it on & unattended for at least 5 hours.

DNS queries work no problem, but connecting is the part that seems to have a problem. Running wireshark, I noticed that the main contributor seems to be TCP checksum errors. Offloading is not the problem because the checksums are always off by 1 (for example, correct checksum could be 0x1234 when the segment might have a checksum of 0x1233). It seems some packets "work" while others don't...and for example just opening something as simple as http://google.com might take about 2-3 minutes for the page to fully load.

Doing a little research, I found out some guy who discovered a bug in some MIPS64 assembly code (of the kernel) that incorrectly converted between 32bit & 64bit value (here) and the bug he describes seems to be exactly my problem: checksums off by one, and only in specific cases (just like mine: packets eventually get through but there's a lot of packets thrown away in-between).

The problem is, my PC isn't running a MIPS processor, but an AMD. However, I am running AMD64 which means it could be the exact same problem (32/64 bit conversions). My kernel has been compiled as AMD64 (K7?)...I will know for sure if this is the problem when I recompile it using the same options but w/o the 64'bitness'...I'm hoping this is it and perhaps a bug report can be filed by people who know how to.

Otherwise, does anyone have any other suggestions on what the problem could be?

Last edited by debuser123; 07-11-2008 at 12:18 PM.
 
Old 02-11-2008, 08:45 AM   #2
ARC1450
Member
 
Registered: Jun 2005
Location: Odenton, MD
Distribution: Gentoo
Posts: 290

Rep: Reputation: 30
Looks like you're on the right track. If you're ruled out that it's an offloading problem (which has been known to occur with certain gigabit adapters), and you've ruled out that it's not a NIC going bad or a poor driver, you're going where you need to go.

Please, though. . .do post what happens when you compile your kernel for generic 32-bit support. By the by, AMD64 is K8, just for future reference. K7 was up to the XP series of Athlons.
 
Old 02-11-2008, 04:31 PM   #3
debuser123
Member
 
Registered: Nov 2006
Distribution: Ubuntu Hardy
Posts: 69

Original Poster
Rep: Reputation: 15
Looks like I was mistaken; my kernel (2.6.24.1) was not compiled as Athlon64(K8) but as a K7...so 64 bitness is probably not the problem.

However, a kernel that has never given me a problem was 2.4.27 and it looks like that was compiled using 386 as the processor type. The bad thing is that you can't really compare 2.4 and 2.6 kernels to figure out a problem, however I can compile as 386.

So, what I will do is this:
#1) use K8 (the processor I actually have) and see what happens
and
#2) use 386 (the end-all in compatibility) and see what happens

The reason I doubt it's offloading is because the checksums are only off by one (a computer I know that has offloading, the checksums differed by huge amounts). My NIC is your generic onboard 100mb Via Rhine (vt6102, rhine-II) which is another reason why offloading probably isn't it since it's a "slow" card and wouldn't have that big of a use for offloading.

I don't know about a bad NIC because I can reboot and not have a problem at all (though rebooting might reset the NIC and put it in working state).

Funny thing is I don't even need a 64 bit processor...when it was 64bit vs. dual core, I chose wrongly.

PS: Is there a way to, of sorts, reset the TCP stack without rebooting? ifconfig ups/downs, dhcp lease renewals don't fix the problem.

Last edited by debuser123; 02-11-2008 at 04:33 PM.
 
Old 02-11-2008, 06:23 PM   #4
ARC1450
Member
 
Registered: Jun 2005
Location: Odenton, MD
Distribution: Gentoo
Posts: 290

Rep: Reputation: 30
Well, restarting your network card should clear the TCP/IP stack, as far as I know.

What I can tell you is that when NICs die, don't be surprised about anything. Some NICs silently go into the night, and just up and die. Some NICs go rather violently and storm a network to death, then die. Some NICs will cause the computer to lock as they die. Some NICs just start sending out jumbo frames in a network that can't accept them. And some NICs will appear to be on, have a connection, and show nothing. I just dealt with an onboard like that at work. The switch detected a connection, the NIC light was on, the switch even had packets trickling in and out. But no traffic actually went to and from the box.

If you've got an el-cheapo NIC, slap it in, see if it works. That'll tell you if it's your kernel or not. But compiling your kernel for an earlier generation of processor is okay. K8's will run anything equal to or less than a K8 kernel on the AMD side. P4's will run anything equal to or less than a P4 kernel on the Intel side.
 
Old 02-12-2008, 03:20 AM   #5
debuser123
Member
 
Registered: Nov 2006
Distribution: Ubuntu Hardy
Posts: 69

Original Poster
Rep: Reputation: 15
I've always assumed optimizations for an X (K7) may not work that well on a Y (K8) even if the Y is backwards compatible. But with K8 as the processor type, I still had the same problem.

A little bit more info:

1) Only packets that have been received generate a TCP checksum mismatch. Wireshark says the checksum should be one greater than what was received. That rules out offloading since it is used for transmission.

2) Internal (TCP) LAN traffic whether received or transmitted does not have any checksum mismatches.

I guess another thing I could try is compiling the NIC driver as a module and then unloading / reloading the module when I start getting errors.

This issue isn't a big problem because I don't always waste electricity by leaving it on, but sometimes I leave it on when I know I might need to ssh into it & grab some files.
 
Old 05-16-2008, 12:57 AM   #6
debuser123
Member
 
Registered: Nov 2006
Distribution: Ubuntu Hardy
Posts: 69

Original Poster
Rep: Reputation: 15
Bump...still a problem on 2.6.25.
 
Old 07-03-2008, 08:23 PM   #7
debuser123
Member
 
Registered: Nov 2006
Distribution: Ubuntu Hardy
Posts: 69

Original Poster
Rep: Reputation: 15
This is pretty odd but I figured out the "problem" which does not require me to reboot:

Once I start experiencing a loss of internet access out of nowhere (e.g., can't connect to web sites, can't ping, dns lookups don't work [e.g., FF sits on "Looking up host whatever.com..." in the statusbar]), what fixed it was to:

....Restart XWindows/Xorg (e.g., ctrl+alt+backspace)

I noticed it because I recently put the system monitor applet on the panel (which shows current cpu usage). Well, I noticed that once it looked like my net access was out, that applet was about 50% blue (meaning about 50% of my cpu was being used)...but I wasn't doing anything (at least in the foreground). So I clicked it and noticed that process Xorg was taking up about 60% of the cpu.

What I did next, I don't know why I never tried before, but I switched to tty1 and opened up google in links...voila, came right up. Went back to X, nope, no go. Then I reduced Xorg's priority to 19 (while in X) which was kind of dumb 'cause then my mouse was useless as any clicking didn't register. So restart X, and bam, everything's back to normal.

What this makes me think is that there's an issue with kernel, my motherboard (onboard LAN), and display (GeForce 256). I guess this is now less of a networking issue but my uname and lspci output is below. Still on kernel 2.6.23.11 (which I built a while ago [too lazy to change the default in grub]), but I still experienced the issue on version 2.6.25.

Code:
$ uname -a
Linux thepc 2.6.23.11 #2 PREEMPT Sun Dec 23 01:05:27 CST 2007 i686 GNU/Linux

$ lspci
00:00.0 Host bridge: VIA Technologies, Inc. K8T800Pro Host Bridge
00:00.1 Host bridge: VIA Technologies, Inc. K8T800Pro Host Bridge
00:00.2 Host bridge: VIA Technologies, Inc. K8T800Pro Host Bridge
00:00.3 Host bridge: VIA Technologies, Inc. K8T800Pro Host Bridge
00:00.4 Host bridge: VIA Technologies, Inc. K8T800Pro Host Bridge
00:00.7 Host bridge: VIA Technologies, Inc. K8T800Pro Host Bridge
00:01.0 PCI bridge: VIA Technologies, Inc. VT8237 PCI bridge [K8T800/K8T890 South]
00:0b.0 Multimedia audio controller: Creative Labs SB0400 Audigy2 Value
00:0f.0 IDE interface: VIA Technologies, Inc. VIA VT6420 SATA RAID Controller (rev 80)
00:0f.1 IDE interface: VIA Technologies, Inc. VT82C586A/B/VT82C686/A/B/VT823x/A/C PIPC Bus Master IDE (rev 06)
00:10.0 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller (rev 81)
00:10.1 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller (rev 81)
00:10.2 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller (rev 81)
00:10.3 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller (rev 81)
00:10.4 USB Controller: VIA Technologies, Inc. USB 2.0 (rev 86)
00:11.0 ISA bridge: VIA Technologies, Inc. VT8237 ISA bridge [KT600/K8T800/K8T890 South]
00:12.0 Ethernet controller: VIA Technologies, Inc. VT6102 [Rhine-II] (rev 78)
00:18.0 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] HyperTransport Technology Configuration
00:18.1 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Address Map
00:18.2 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] DRAM Controller
00:18.3 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Miscellaneous Control
01:00.0 VGA compatible controller: nVidia Corporation NV10 [GeForce 256 SDR] (rev 10)
 
Old 07-07-2008, 09:24 AM   #8
debuser123
Member
 
Registered: Nov 2006
Distribution: Ubuntu Hardy
Posts: 69

Original Poster
Rep: Reputation: 15
Then again, maybe that wasn't the fix or what I experienced earlier wasn't "it". Today it happened again and I reset X...but TCP connections still wouldn't go through. I can ping and do udp/icmp stuff, it's just TCP that has a problem. I give up.........
 
Old 07-07-2008, 10:59 AM   #9
farslayer
LQ Guru
 
Registered: Oct 2005
Location: Northeast Ohio
Distribution: linuxdebian
Posts: 7,249
Blog Entries: 5

Rep: Reputation: 191Reputation: 191
I was having issues with BOTH my AMD boxes and their integrated nics, I believe they also have the VIA chipsets. and where doing similar things to what you are describing, and really SLOOOWWW transfer rates. I threw an intel NIC into each box, disabled the onboard NIC and all my network problems disappeared on those two machines. I personally think it's a hardware issue.
 
Old 07-10-2008, 12:09 PM   #10
debuser123
Member
 
Registered: Nov 2006
Distribution: Ubuntu Hardy
Posts: 69

Original Poster
Rep: Reputation: 15
I agree it's probably a hardware issue but I never experienced this on any of the 2.4 kernels with the same hardware. I'm attaching a wireshark/tcpdump/pcap dump of how it takes almost a full 2 minutes just to connect to google.com with the links text-mode browser. You can see that for about the first minute it's just filled with tcp checksum incorrect errors. Then after that it seems fine. Wireshark says the tcp checksums are off by a single value (bit). It says the reported checksum was 0x1234 but it should've been 0x1235.

1. I don't have any problems when I'm running a local server and I connect with the lo or eth0 ip address.
2. Plugging my laptop into my router and trying to ssh into my computer is successful.
3. Starting an ssh session from my computer to my laptop is also successful.

So it seems that when this problem arises, connections to localhost servers and servers on my LAN still go through without error. The internet just doesn't like me.

I couldn't attach a file so I uploaded the wireshark packet dump to mediafire.com:

http://www.mediafire.com/?cultwg690dh

Anyone know some mailing list I could subscribe to to better debug this problem?
 
Old 07-10-2008, 12:36 PM   #11
ARC1450
Member
 
Registered: Jun 2005
Location: Odenton, MD
Distribution: Gentoo
Posts: 290

Rep: Reputation: 30
Just curious, but have you taken your router out of the mix and just directly connected to the 'net?
 
Old 07-10-2008, 04:55 PM   #12
debuser123
Member
 
Registered: Nov 2006
Distribution: Ubuntu Hardy
Posts: 69

Original Poster
Rep: Reputation: 15
Yup, tried that. The peculiar thing is that a reboot is the only thing that can fix "it". I don't have to reset my router or anything.

I also tried just compiling via-rhine as a module. Once I get the problem I'd do something like:
# ifdown eth0
# modprobe -r via-rhine
# modprobe via-rhine
# ifup eth0

I get an IP address and all, but still incoming TCP packets from the network have invalid checksums.
 
Old 07-10-2008, 05:00 PM   #13
ARC1450
Member
 
Registered: Jun 2005
Location: Odenton, MD
Distribution: Gentoo
Posts: 290

Rep: Reputation: 30
Dude, it sounds like you have a bad NIC, period.

And how long have you been off of a 2.4 based kernel?
 
Old 07-10-2008, 05:39 PM   #14
debuser123
Member
 
Registered: Nov 2006
Distribution: Ubuntu Hardy
Posts: 69

Original Poster
Rep: Reputation: 15
I've used 2.6 kernels for about a year. Prior to that was the default Debian Sarge kernel 2.4.something. I guess I could set a 2.4 as my default. I just never remember having this problem while on a 2.4 kernel, but who knows, maybe I did but didn't realize it.

I really just would like more debugging info in my kernel pertaining to the TCP/IP stack. Anyone know how I could get that?

I'm not that big on upgrading...my video card (nvidia geforce 256) gets talked about enough [and has it's own bundle of problems....the infamous Xid lockups with nvidia's closed-source drivers]. I really would like to figure out that if my NIC is bad, why it is.
 
Old 07-10-2008, 11:35 PM   #15
farslayer
LQ Guru
 
Registered: Oct 2005
Location: Northeast Ohio
Distribution: linuxdebian
Posts: 7,249
Blog Entries: 5

Rep: Reputation: 191Reputation: 191
could be a bug in the driver too and not bad hardware.. like I said it was easier for me to throw a NIC in the box than waste my time chasing something I couldn't control or identify.
 
  


Reply

Tags
amd, checksum, tcp, wireshark



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
TCP checksum error mshenbagaraj Programming 3 05-16-2007 02:43 PM
Make most amount of Linux users in least amount of time studpenguin General 24 02-02-2007 03:42 PM
tcp checksum incorrect x1228 Programming 1 09-11-2006 03:53 AM
TCP header checksum live_dont_exist Programming 16 04-13-2005 12:45 PM
anyone can help me with the TCP checksum? vaaub Programming 1 02-10-2004 01:32 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Networking

All times are GMT -5. The time now is 05:36 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration