Linux - Hardware This forum is for Hardware issues.
Having trouble installing a piece of hardware? Want to know if that peripheral is compatible with Linux? |
Notices |
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
Are you new to LinuxQuestions.org? Visit the following links:
Site Howto |
Site FAQ |
Sitemap |
Register Now
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
|
 |
01-04-2022, 04:48 AM
|
#1
|
Member
Registered: Mar 2006
Location: Czech Republic
Distribution: Gentoo, Chakra
Posts: 997
Rep: 
|
Ingress ethernet packet corruption -- out of ideas
Hi,
I'm looking for diagnostic ideas. My situation:
Ethernet (PCIe, builtin igb) network packets arrive corrupted/changed (ICMP requests -- viewed in wireshark), OS doesn't process them at all. Corruption is not random, but I haven't been able to find a pattern.
What I tried:
Disable all firewalls.
Boot liveCD.
Change cable.
remove all switches from path. Still happens on p2p link.
Add another PCIe ethernet card (r8169), corruption is the exactly the same.
Chipset: X570. CPU: AMD 5800X
At this point I'm thinking the PCIe bus is doing weird stuff. Any ideas what more to try?
|
|
|
01-04-2022, 07:49 AM
|
#2
|
LQ Guru
Registered: Jan 2006
Location: Ireland
Distribution: Slackware, Slarm64 & Android
Posts: 17,626
|
You're on the right track swapping things out. Here's some ideas. - Can you get the packets correctly on another machine?
- Can you swap out any of the cabling?
- Can you slow speeds down in your box?
- Can you (even temporarily) pick the signals up at the earliest point?
- Can you do a thorough nic <--> nic test of your box?
You're beaten only when you run out of ideas. Try as many as you can (and guess who might have done some faultfinding before  )
|
|
|
01-04-2022, 03:19 PM
|
#3
|
Moderator
Registered: Mar 2008
Posts: 22,361
|
See if checksum offload is set on nic.
|
|
|
01-06-2022, 12:37 PM
|
#4
|
Member
Registered: Mar 2006
Location: Czech Republic
Distribution: Gentoo, Chakra
Posts: 997
Original Poster
Rep: 
|
Cheksum offload, off:
Code:
ethtool -K enp4s0 rx off tx off
Still same behaviour.
ICMP request example dump:
Code:
0000 3c 7c 3f 21 77 b6 b6 71 16 4a 3f 41 08 00 45 00 <|?!w..q.J?A..E.
0010 00 54 d7 cd 40 00 40 01 0a 7b ac 12 00 01 ac 12 .T..@.@..{......
0020 00 3b 08 00 b6 46 89 6c 00 12 fb 30 d7 61 00 00 .;...F.l...0.a..
0030 00 00 1e c5 08 00 00 00 00 00 10 11 12 13 14 15 ................
0040 16 17 18 19 1a 1b 1c 1d 1e 1f 20 21 22 23 04 25 .......... !"#.%
0050 26 27 28 29 2a 2b 2c 2d 2e 2f 30 31 32 33 34 35 &'()*+,-./012345
0060 36 37 67
byte 0x33 is changed from 0xd5 to 0xc5, byte 0x4e from 0x24 to 0x04.
0x4e is consistently wrong. 0x33 only sometimes (but it isn't constant at the source). I haven't checked the rest too thoroughly...
Another interesting behaviour is that without a running egress ping (no replies received) I don't get ingress requests (corrupted).
This is driving me insane. When it started: When I logged into a desktop session, and plasma-nm applet took over. Before that NetworkManager managed the connection fine. Since then it doesn't work even without networkmanager, or under a liveCD.
All networking material has been either replaced or removed until a single cable remained, and that swapped out too. Even the other peer has been changed. Still exactly the same behaviour.
I don't think I can do better than wireshark at each end of the cable. Will try to find another machine to act as bridge.
|
|
|
01-07-2022, 09:23 AM
|
#5
|
Member
Registered: Nov 2012
Distribution: Slackware
Posts: 37
Rep: 
|
Quote:
Originally Posted by serafean
Hi,
I'm looking for diagnostic ideas. My situation:
Ethernet (PCIe, builtin igb) network packets arrive corrupted/changed (ICMP requests -- viewed in wireshark), OS doesn't process them at all. Corruption is not random, but I haven't been able to find a pattern.
What I tried:
Disable all firewalls.
Boot liveCD.
Change cable.
remove all switches from path. Still happens on p2p link.
Add another PCIe ethernet card (r8169), corruption is the exactly the same.
Chipset: X570. CPU: AMD 5800X
At this point I'm thinking the PCIe bus is doing weird stuff. Any ideas what more to try?
|
I haven't seen it mentioned yet, but some other ideas to run by the mental checklist:
- Check cable rating vs link negotiation speed. Sometimes folks will have an old Cat5 cable hanging around and try to use it with a Gigabit Ethernet connection. It should be Cat5e or better.
- How long is your cable connection? Probably not the issue, but Ethernet is rated for 100m.
- Also, do you have any sources of interference near the cable? Powerful motors, high voltage, ... sources of electrical or magnetic interference.
- Tried a USB Ethernet adapter?
|
|
|
01-07-2022, 11:53 AM
|
#6
|
LQ Guru
Registered: Jan 2006
Location: Ireland
Distribution: Slackware, Slarm64 & Android
Posts: 17,626
|
I would work at eliminating things with successful tests, not negative ones.
What I mean is: A second/different network card also throwing errors doesn't mean the first one is good. But a network card doing successful 100 metre transfers means the network card is good.
That said, have you changed out the long trunk of network cable, even by laying something along the floor? BTW, false ceilings are great for temporary cabling.
|
|
|
02-14-2022, 11:06 AM
|
#7
|
Member
Registered: Mar 2006
Location: Czech Republic
Distribution: Gentoo, Chakra
Posts: 997
Original Poster
Rep: 
|
Hi,
Thanks everybody for suggestions. Time being scarce, and contiguous time intervals being even scarcer I didn't manage to sit down and continue with this for a while. However, I now have the culprit.
My PCIe bus is indeed shot.
If you look at this Zen 3 interconnect diagram, All PCIe connected through the x570 chipset has the same behaviour. Using the GPU lanes for the network card results in a working network.
For some reason, SATA ports seem to work OK. (btrfs doesn't complain about checksums)
An even stronger proof is that a part of the AM4 socket plastic is partially melted under the CPU. Something terrible happened, even though the PC is behind a surge-protecting power socket. It's a miracle this machine still somehow works.
So, mystery "solved", one for the books...
Edit: as a bonus, a screenshot of wireshark upon reception of a packet consisting only of dashes "-", corruption nicely visible.
Last edited by serafean; 02-14-2022 at 11:20 AM.
|
|
|
02-15-2022, 03:56 AM
|
#8
|
LQ Guru
Registered: Jan 2006
Location: Ireland
Distribution: Slackware, Slarm64 & Android
Posts: 17,626
|
In all circumstances, the silicon dies before the chip socket melts. There's over 50º C at least in the difference. So the socket melting looks like sabotage, or outside thermal forces. Very strange…
Anyhow, replace your m/b and mark this solved.
|
|
|
All times are GMT -5. The time now is 09:53 PM.
|
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.
|
Latest Threads
LQ News
|
|