LinuxQuestions.org
Visit Jeremy's Blog.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Hardware
User Name
Password
Linux - Hardware This forum is for Hardware issues.
Having trouble installing a piece of hardware? Want to know if that peripheral is compatible with Linux?

Notices


Reply
  Search this Thread
Old 03-19-2020, 06:00 AM   #16
camorri
LQ 5k Club
 
Registered: Nov 2002
Location: Somewhere inside 9.9 million sq. km. Canada
Distribution: Slackware 15.0, current, slackware-arm-currnet
Posts: 6,215

Rep: Reputation: 849Reputation: 849Reputation: 849Reputation: 849Reputation: 849Reputation: 849Reputation: 849

A duckduckgo search on 'visorbus' turns up several kernel patches dealing with this function. I don't know if its related to this issue or not, most of what is there is over my head.

Since this fails on 14.2 and does not fail on current ( at least on my system ) I'm thinking this is in fact a kernel bug that is fixed in later kernel releases.
 
1 members found this post helpful.
Old 03-19-2020, 06:10 AM   #17
hazel
LQ Guru
 
Registered: Mar 2016
Location: Harrow, UK
Distribution: LFS, AntiX, Slackware
Posts: 7,574

Original Poster
Blog Entries: 19

Rep: Reputation: 4452Reputation: 4452Reputation: 4452Reputation: 4452Reputation: 4452Reputation: 4452Reputation: 4452Reputation: 4452Reputation: 4452Reputation: 4452Reputation: 4452
Now that's useful. I've just been going through yesterday's /var/log/messages looking for kernel complaints and found these:
Code:
Mar 18 17:47:01 bigboy root: 138 jobs running
Mar 18 17:48:19 bigboy kernel: [27862.622289] hwinfo[6507]: segfault at 4000191 ip 00007f242f3c613e sp 00007ffc9705ee10 error 4 in libhd.so.21.61[7f242f39e000+a1000]
Mar 18 18:05:02 bigboy -- MARK --
Mar 18 18:10:04 bigboy kernel: [29169.809154] hwinfo[7550]: segfault at 6000191 ip 00007f822c90d13e sp 00007ffe732759e0 error 4 in libhd.so.21.59[7f822c8e5000+a1000]
Mar 18 18:25:02 bigboy -- MARK --
 
Old 03-19-2020, 07:30 AM   #18
hazel
LQ Guru
 
Registered: Mar 2016
Location: Harrow, UK
Distribution: LFS, AntiX, Slackware
Posts: 7,574

Original Poster
Blog Entries: 19

Rep: Reputation: 4452Reputation: 4452Reputation: 4452Reputation: 4452Reputation: 4452Reputation: 4452Reputation: 4452Reputation: 4452Reputation: 4452Reputation: 4452Reputation: 4452
Bingo! I just built a new version of hwinfo-21.59 with that one instruction removed and it works! Now I just have to do the same with hwinfo-21.67.

Ha, ha! Version 21.67 has a further new read function. It's called hd_read_mdio. And when this is left in, the program still segfaults. But when both these functions are prevented from running, the program behaves itself.

I think the next step will be to install the kernel from current, and see if the program behaves better with that.

Last edited by hazel; 03-19-2020 at 10:31 AM. Reason: Reported result for 21.67
 
Old 03-20-2020, 05:49 AM   #19
hazel
LQ Guru
 
Registered: Mar 2016
Location: Harrow, UK
Distribution: LFS, AntiX, Slackware
Posts: 7,574

Original Poster
Blog Entries: 19

Rep: Reputation: 4452Reputation: 4452Reputation: 4452Reputation: 4452Reputation: 4452Reputation: 4452Reputation: 4452Reputation: 4452Reputation: 4452Reputation: 4452Reputation: 4452
So I installed the Series 5 kernel from Slackware Current and booted from it this morning. I was somewhat disappointed not to see any penguins during the boot. What happened to them?

I have just tested the unpatched hwinfo-21.67 and guess what? Segmentation fault!

I think this really has to be a bug in hwinfo and the kernel is not to blame. I would like to report it but there doesn't seem to be any option on their github page for doing that.

PS: I found a maintainer name and address inside the source tarball, so I've emailed him. I wonder what will happen now.

Last edited by hazel; 03-20-2020 at 06:26 AM. Reason: Added postscript
 
Old 03-21-2020, 06:10 AM   #20
business_kid
LQ Guru
 
Registered: Jan 2006
Location: Ireland
Distribution: Slackware, Slarm64 & Android
Posts: 16,292

Rep: Reputation: 2322Reputation: 2322Reputation: 2322Reputation: 2322Reputation: 2322Reputation: 2322Reputation: 2322Reputation: 2322Reputation: 2322Reputation: 2322Reputation: 2322
Yeah - email him.

Diff 21.67 orig with 21.67 modified. It sounds like you don't have huge confidence in your mods, but more that they killed your segfault rather than improved program function. I'd also do some diagnostics - like logging what address(es) is it reading, and is there actually any memory there on your box? Reading an address is pretty harmless, and shouldn't segfault; reading a non existent or unauthorised address is an attack on the pc's privates, and you could expect trouble.

When you retire for the night, leave memtest running, in case you have an issue somewhere. A segfault these days is any memory error, isn't it?
 
Old 03-21-2020, 06:23 AM   #21
hazel
LQ Guru
 
Registered: Mar 2016
Location: Harrow, UK
Distribution: LFS, AntiX, Slackware
Posts: 7,574

Original Poster
Blog Entries: 19

Rep: Reputation: 4452Reputation: 4452Reputation: 4452Reputation: 4452Reputation: 4452Reputation: 4452Reputation: 4452Reputation: 4452Reputation: 4452Reputation: 4452Reputation: 4452
I'm in correspondence with him now and it's very interesting. He gave me some tests to try out and from that, he has found the immediate cause of the crash: the pointer to a crucial structure called hd_data has got changed somehow to another value which doesn't point to anything. Hence the segfault. We still don't know why it happens.

What I don't understand is that, according to the gdb backtrace, the problem starts in a different part of the program from the one I commented out.

I've found a nice gdb manual and I'm studying it. Hopefully I can find a way of using it to narrow down the problem.

PS: I'm attaching a diff between versions 21.58 and 21.59 with a note on the two lines I removed. That's just a call to the new function which follows; obviously it would have been better to fix the function itself.
Attached Files
File Type: txt hwinfo-diff-2.58-59.txt (2.6 KB, 6 views)

Last edited by hazel; 03-21-2020 at 06:51 AM. Reason: PS added
 
Old 03-21-2020, 01:09 PM   #22
business_kid
LQ Guru
 
Registered: Jan 2006
Location: Ireland
Distribution: Slackware, Slarm64 & Android
Posts: 16,292

Rep: Reputation: 2322Reputation: 2322Reputation: 2322Reputation: 2322Reputation: 2322Reputation: 2322Reputation: 2322Reputation: 2322Reputation: 2322Reputation: 2322Reputation: 2322
It's usually the same on these things…

By the time you're getting your head into this and able to say something meaningful, you're elsewhere in the space/time continuum and you lose us ordinary mortals. If you don't need the latest version, why not just use 21.58? Or are you trying to throw the maintainer as bone?
 
Old 03-21-2020, 01:42 PM   #23
hazel
LQ Guru
 
Registered: Mar 2016
Location: Harrow, UK
Distribution: LFS, AntiX, Slackware
Posts: 7,574

Original Poster
Blog Entries: 19

Rep: Reputation: 4452Reputation: 4452Reputation: 4452Reputation: 4452Reputation: 4452Reputation: 4452Reputation: 4452Reputation: 4452Reputation: 4452Reputation: 4452Reputation: 4452
Quote:
Originally Posted by business_kid View Post
It's usually the same on these things…

By the time you're getting your head into this and able to say something meaningful, you're elsewhere in the space/time continuum and you lose us ordinary mortals.
I find your hardware-oriented comments equally impenetrable! That's why I haven't marked them as helpful. I know that you are trying to help but I just can't understand that stuff.
Quote:
If you don't need the latest version, why not just use 21.58? Or are you trying to throw the maintainer as bone?
Yes, I suppose so. If I can help clear a bug, I think it's my duty to do so. Isn't that how the Linux community is supposed to work? The maintainer hasn't been able to reproduce my problem on a virtual machine using the hardware diagnostics I sent him, so it looks like I am the only one who can do this. And also it interests me.
 
Old 03-22-2020, 05:34 AM   #24
business_kid
LQ Guru
 
Registered: Jan 2006
Location: Ireland
Distribution: Slackware, Slarm64 & Android
Posts: 16,292

Rep: Reputation: 2322Reputation: 2322Reputation: 2322Reputation: 2322Reputation: 2322Reputation: 2322Reputation: 2322Reputation: 2322Reputation: 2322Reputation: 2322Reputation: 2322
[QUOTE=hazel;6102966]I find your hardware-oriented comments equally impenetrable! That's why I haven't marked them as helpful. I know that you are trying to help but I just can't understand that stuff.

Yes. they probably are. I have had that exact reaction from customers who found themselves paying for something they didn't understand. I would point out that that was why they had hired me, after factory maintenance and electricians had failed. In the end, most of them developed a 'don't want to know' attitude. A few of them could follow it. I found holding something in my hand reduced it to "This" but design issues were a nightmare.

[On reverting back to 21.58]
Yes, I suppose so. If I can help clear a bug, I think it's my duty to do so. Isn't that how the Linux community is supposed to work?
Indeed.


The maintainer hasn't been able to reproduce my problem on a virtual machine using the hardware diagnostics I sent him, so it looks like I am the only one who can do this. And also it interests me.


Well, go for it, then. You could ask him for a patch to dump salient registers to syslog, and that might help. Then he could see what's going on at different times. Because something is changing a value when he doesn't expect it - I don't get the particulars, but I do get that he thinks the program is actually doing one thing, but it's doing another; so the problem is going to be a surprise to both of you.
 
Old 03-22-2020, 06:18 AM   #25
hazel
LQ Guru
 
Registered: Mar 2016
Location: Harrow, UK
Distribution: LFS, AntiX, Slackware
Posts: 7,574

Original Poster
Blog Entries: 19

Rep: Reputation: 4452Reputation: 4452Reputation: 4452Reputation: 4452Reputation: 4452Reputation: 4452Reputation: 4452Reputation: 4452Reputation: 4452Reputation: 4452Reputation: 4452
I took your advice and ran memtest overnight. 48 cycles, no errors. So I don't think this is a memory problem. I'm pretty sure it's a bug.
 
Old 03-23-2020, 05:56 AM   #26
hazel
LQ Guru
 
Registered: Mar 2016
Location: Harrow, UK
Distribution: LFS, AntiX, Slackware
Posts: 7,574

Original Poster
Blog Entries: 19

Rep: Reputation: 4452Reputation: 4452Reputation: 4452Reputation: 4452Reputation: 4452Reputation: 4452Reputation: 4452Reputation: 4452Reputation: 4452Reputation: 4452Reputation: 4452
I ran a gdb session with a watch on hw_data and sent Steffen the results. I got an email overnight which says:
Quote:
Thanks! I've now an idea what happens. Could you also send me the compiled static hwinfo you used for that debug session?
So I have. The trouble is that when he has found out what went wrong, he probably won't be able to explain it to me in terms that I can understand.

I'm beginning to regret doing that intensive memtest run. It seems to have done something nasty to my machine, because now it won't reboot. It only starts from cold.
 
Old 03-24-2020, 05:31 AM   #27
business_kid
LQ Guru
 
Registered: Jan 2006
Location: Ireland
Distribution: Slackware, Slarm64 & Android
Posts: 16,292

Rep: Reputation: 2322Reputation: 2322Reputation: 2322Reputation: 2322Reputation: 2322Reputation: 2322Reputation: 2322Reputation: 2322Reputation: 2322Reputation: 2322Reputation: 2322
Quote:
Originally Posted by hazel View Post
I ran a gdb session with a watch on hw_data and sent Steffen the results. I got an email overnight which says:

So I have. The trouble is that when he has found out what went wrong, he probably won't be able to explain it to me in terms that I can understand.

I'm beginning to regret doing that intensive memtest run. It seems to have done something nasty to my machine, because now it won't reboot. It only starts from cold.
When you don't understand his explanation, I'll have a try. I don't know what to suggest on the reboot, except to point out you've bigger fish to fry at the moment. If you look at /etc/inittab, you'll see the runlevels laid out, the file is in /rtc/rc.d/rc.6 in Slackware, IIRC. If you've seen My Issues, not getting a reboot is small beer. I can't get booted up, and I'm replying on a little rugrat of a RasPi 4 running Raspbian.
 
Old 03-24-2020, 05:43 AM   #28
hazel
LQ Guru
 
Registered: Mar 2016
Location: Harrow, UK
Distribution: LFS, AntiX, Slackware
Posts: 7,574

Original Poster
Blog Entries: 19

Rep: Reputation: 4452Reputation: 4452Reputation: 4452Reputation: 4452Reputation: 4452Reputation: 4452Reputation: 4452Reputation: 4452Reputation: 4452Reputation: 4452Reputation: 4452
Quote:
Originally Posted by business_kid View Post
If you look at /etc/inittab, you'll see the runlevels laid out, the file is in /rtc/rc.d/rc.6 in Slackware, IIRC.
I wasn't explicit enough. This has nothing to do with Slackware. Slack closes down normally and gives the "Rebooting" message. Then the machine tries to start up again but I don't even get to the bootloader any more. It just freezes at the point where a cold boot does the POST.

If I switch off at the main, then switch on again, it boots. Tiresome, but as you say, we all have more serious problems right now.
 
Old 03-24-2020, 07:41 AM   #29
business_kid
LQ Guru
 
Registered: Jan 2006
Location: Ireland
Distribution: Slackware, Slarm64 & Android
Posts: 16,292

Rep: Reputation: 2322Reputation: 2322Reputation: 2322Reputation: 2322Reputation: 2322Reputation: 2322Reputation: 2322Reputation: 2322Reputation: 2322Reputation: 2322Reputation: 2322
That sounds like the lack of a proper reset on the hd or chipset. There is a significant difference between reboot and poweroff, as the physical reset button or poweroff does a full reset via the BIOS, but the reboot skips some of the way in and does a software reset (=goto this address). I can only provide highly speculative hardware guesses, so I imagine it's software. Have you reinstalled grub (the mbr), or tried hibernate? Mind you, I'm in enough trouble myself.
 
1 members found this post helpful.
Old 03-24-2020, 07:59 AM   #30
hazel
LQ Guru
 
Registered: Mar 2016
Location: Harrow, UK
Distribution: LFS, AntiX, Slackware
Posts: 7,574

Original Poster
Blog Entries: 19

Rep: Reputation: 4452Reputation: 4452Reputation: 4452Reputation: 4452Reputation: 4452Reputation: 4452Reputation: 4452Reputation: 4452Reputation: 4452Reputation: 4452Reputation: 4452
Now that I understood! Not that it's very important.

Regarding the hwinfo bug, Steffen found out where it was. See the bug report at https://bugzilla.opensuse.org/show_bug.cgi?id=1167561.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] Slackware64 14.2 + Multilib + hwinfo = Kernel Oops kjhambrick Slackware 1 08-15-2016 05:40 AM
Hwinfo RPM for RHEL 5 or Centos 5 /6 rch Linux - Software 3 07-22-2011 08:33 AM
Accessing a hardware device based on hwinfo output (Yet another ps/2 mouse issue) kessaris Linux - Hardware 2 06-26-2008 06:56 PM
hwinfo??? wongmoxy Mandriva 7 04-20-2007 02:03 AM
Hardware Inventory - hwinfo - All Distros codegomer Linux - General 4 03-02-2004 06:21 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Hardware

All times are GMT -5. The time now is 04:19 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration