LinuxQuestions.org
Help answer threads with 0 replies.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Hardware
User Name
Password
Linux - Hardware This forum is for Hardware issues.
Having trouble installing a piece of hardware? Want to know if that peripheral is compatible with Linux?

Notices


Reply
  Search this Thread
Old 01-04-2022, 04:48 AM   #1
serafean
Member
 
Registered: Mar 2006
Location: Czech Republic
Distribution: Gentoo, Chakra
Posts: 997
Blog Entries: 15

Rep: Reputation: 136Reputation: 136
Ingress ethernet packet corruption -- out of ideas


Hi,

I'm looking for diagnostic ideas. My situation:
Ethernet (PCIe, builtin igb) network packets arrive corrupted/changed (ICMP requests -- viewed in wireshark), OS doesn't process them at all. Corruption is not random, but I haven't been able to find a pattern.

What I tried:
Disable all firewalls.
Boot liveCD.
Change cable.
remove all switches from path. Still happens on p2p link.
Add another PCIe ethernet card (r8169), corruption is the exactly the same.

Chipset: X570. CPU: AMD 5800X

At this point I'm thinking the PCIe bus is doing weird stuff. Any ideas what more to try?
 
Old 01-04-2022, 07:49 AM   #2
business_kid
LQ Guru
 
Registered: Jan 2006
Location: Ireland
Distribution: Slackware, Slarm64 & Android
Posts: 17,626

Rep: Reputation: 2619Reputation: 2619Reputation: 2619Reputation: 2619Reputation: 2619Reputation: 2619Reputation: 2619Reputation: 2619Reputation: 2619Reputation: 2619Reputation: 2619
You're on the right track swapping things out. Here's some ideas.
  • Can you get the packets correctly on another machine?
  • Can you swap out any of the cabling?
  • Can you slow speeds down in your box?
  • Can you (even temporarily) pick the signals up at the earliest point?
  • Can you do a thorough nic <--> nic test of your box?

You're beaten only when you run out of ideas. Try as many as you can (and guess who might have done some faultfinding before)
 
Old 01-04-2022, 03:19 PM   #3
jefro
Moderator
 
Registered: Mar 2008
Posts: 22,361

Rep: Reputation: 3692Reputation: 3692Reputation: 3692Reputation: 3692Reputation: 3692Reputation: 3692Reputation: 3692Reputation: 3692Reputation: 3692Reputation: 3692Reputation: 3692
See if checksum offload is set on nic.
 
Old 01-06-2022, 12:37 PM   #4
serafean
Member
 
Registered: Mar 2006
Location: Czech Republic
Distribution: Gentoo, Chakra
Posts: 997

Original Poster
Blog Entries: 15

Rep: Reputation: 136Reputation: 136
Cheksum offload, off:
Code:
 ethtool -K enp4s0 rx off tx off
Still same behaviour.

ICMP request example dump:
Code:
0000   3c 7c 3f 21 77 b6 b6 71 16 4a 3f 41 08 00 45 00   <|?!w..q.J?A..E.
0010   00 54 d7 cd 40 00 40 01 0a 7b ac 12 00 01 ac 12   .T..@.@..{......
0020   00 3b 08 00 b6 46 89 6c 00 12 fb 30 d7 61 00 00   .;...F.l...0.a..
0030   00 00 1e c5 08 00 00 00 00 00 10 11 12 13 14 15   ................
0040   16 17 18 19 1a 1b 1c 1d 1e 1f 20 21 22 23 04 25   .......... !"#.%
0050   26 27 28 29 2a 2b 2c 2d 2e 2f 30 31 32 33 34 35   &'()*+,-./012345
0060   36 37                                             67
byte 0x33 is changed from 0xd5 to 0xc5, byte 0x4e from 0x24 to 0x04.
0x4e is consistently wrong. 0x33 only sometimes (but it isn't constant at the source). I haven't checked the rest too thoroughly...

Another interesting behaviour is that without a running egress ping (no replies received) I don't get ingress requests (corrupted).

This is driving me insane. When it started: When I logged into a desktop session, and plasma-nm applet took over. Before that NetworkManager managed the connection fine. Since then it doesn't work even without networkmanager, or under a liveCD.

All networking material has been either replaced or removed until a single cable remained, and that swapped out too. Even the other peer has been changed. Still exactly the same behaviour.
I don't think I can do better than wireshark at each end of the cable. Will try to find another machine to act as bridge.
 
Old 01-07-2022, 09:23 AM   #5
zaphar
Member
 
Registered: Nov 2012
Distribution: Slackware
Posts: 37

Rep: Reputation: Disabled
Quote:
Originally Posted by serafean View Post
Hi,

I'm looking for diagnostic ideas. My situation:
Ethernet (PCIe, builtin igb) network packets arrive corrupted/changed (ICMP requests -- viewed in wireshark), OS doesn't process them at all. Corruption is not random, but I haven't been able to find a pattern.

What I tried:
Disable all firewalls.
Boot liveCD.
Change cable.
remove all switches from path. Still happens on p2p link.
Add another PCIe ethernet card (r8169), corruption is the exactly the same.

Chipset: X570. CPU: AMD 5800X

At this point I'm thinking the PCIe bus is doing weird stuff. Any ideas what more to try?
I haven't seen it mentioned yet, but some other ideas to run by the mental checklist:
- Check cable rating vs link negotiation speed. Sometimes folks will have an old Cat5 cable hanging around and try to use it with a Gigabit Ethernet connection. It should be Cat5e or better.
- How long is your cable connection? Probably not the issue, but Ethernet is rated for 100m.
- Also, do you have any sources of interference near the cable? Powerful motors, high voltage, ... sources of electrical or magnetic interference.
- Tried a USB Ethernet adapter?
 
Old 01-07-2022, 11:53 AM   #6
business_kid
LQ Guru
 
Registered: Jan 2006
Location: Ireland
Distribution: Slackware, Slarm64 & Android
Posts: 17,626

Rep: Reputation: 2619Reputation: 2619Reputation: 2619Reputation: 2619Reputation: 2619Reputation: 2619Reputation: 2619Reputation: 2619Reputation: 2619Reputation: 2619Reputation: 2619
I would work at eliminating things with successful tests, not negative ones.

What I mean is: A second/different network card also throwing errors doesn't mean the first one is good. But a network card doing successful 100 metre transfers means the network card is good.

That said, have you changed out the long trunk of network cable, even by laying something along the floor? BTW, false ceilings are great for temporary cabling.
 
Old 02-14-2022, 11:06 AM   #7
serafean
Member
 
Registered: Mar 2006
Location: Czech Republic
Distribution: Gentoo, Chakra
Posts: 997

Original Poster
Blog Entries: 15

Rep: Reputation: 136Reputation: 136
Hi,

Thanks everybody for suggestions. Time being scarce, and contiguous time intervals being even scarcer I didn't manage to sit down and continue with this for a while. However, I now have the culprit.

My PCIe bus is indeed shot.
If you look at this Zen 3 interconnect diagram, All PCIe connected through the x570 chipset has the same behaviour. Using the GPU lanes for the network card results in a working network.
For some reason, SATA ports seem to work OK. (btrfs doesn't complain about checksums)
An even stronger proof is that a part of the AM4 socket plastic is partially melted under the CPU. Something terrible happened, even though the PC is behind a surge-protecting power socket. It's a miracle this machine still somehow works.

So, mystery "solved", one for the books...

Edit: as a bonus, a screenshot of wireshark upon reception of a packet consisting only of dashes "-", corruption nicely visible.
Attached Thumbnails
Click image for larger version

Name:	Screenshot_20220214_181935.jpg
Views:	14
Size:	236.1 KB
ID:	38355  

Last edited by serafean; 02-14-2022 at 11:20 AM.
 
Old 02-15-2022, 03:56 AM   #8
business_kid
LQ Guru
 
Registered: Jan 2006
Location: Ireland
Distribution: Slackware, Slarm64 & Android
Posts: 17,626

Rep: Reputation: 2619Reputation: 2619Reputation: 2619Reputation: 2619Reputation: 2619Reputation: 2619Reputation: 2619Reputation: 2619Reputation: 2619Reputation: 2619Reputation: 2619
In all circumstances, the silicon dies before the chip socket melts. There's over 50º C at least in the difference. So the socket melting looks like sabotage, or outside thermal forces. Very strange…

Anyhow, replace your m/b and mark this solved.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
LXer: What is Ingress Controller and how to deploy Nginx Ingress Controller in Kubernetes Cluster on AWS using Helm LXer Syndicated Linux News 0 05-15-2021 05:36 PM
tc ingress qdisc not dropping Toske Linux - Networking 2 02-12-2011 09:17 PM
installing ingress 2006 delmoras Linux - Software 0 02-23-2006 02:52 PM
Correct rate for ingress shaping/policy jlinkels Linux - Networking 1 04-21-2005 11:48 AM
Ingress Installation on Linux gvsprasad Linux - Enterprise 1 09-25-2004 10:44 AM

LinuxQuestions.org > Forums > Linux Forums > Linux - Hardware

All times are GMT -5. The time now is 09:53 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration