LinuxQuestions.org
Latest LQ Deal: Latest LQ Deals
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Distributions > Fedora
User Name
Password
Fedora This forum is for the discussion of the Fedora Project.

Notices


Reply
  Search this Thread
Old 11-14-2018, 01:04 PM   #1
perlhackr
LQ Newbie
 
Registered: Jul 2006
Posts: 3

Rep: Reputation: 0
Looking for help with system freezing randomly (intel, skylake/kabylake)


Hello all,

I'm currently running into issues where my system is freezing randomly; The freeze can occur anywhere between a few hours of usage to all day. The freeze always occurs when playing back videos through a chromium browser window using the video tag to play the video back. When the system is frozen it still responds to pings although I can no longer log in through ssh. Unfortunately, I have absolutely no logs that could point me in the right direction to explain what is occurring because when this occurs nothing gets logged.

The devices are running their latest microcode; Thinking frequency scaling might be the issue I've tried a few of the cstate options that are out there to keep frequency scaling from occurring (intel_idle.max_cstate=0, processor.max_cstate=1). As of yet I haven't been able to find the root cause of the issue (successfully recreate the issue).

There is also the intel hardware watchdog running on the device that doesn't seem to be responding to this state because it's dog watcher process seems to be still running at regular intervals to kick the dog.

hardware:
M710q/M910q lenovo devices
the devices use intel kabylake chipsets paired with a skylake processor

processors:
Intel(R) Core(TM) i5-6500T CPU @ 2.50GHz (910q)
Intel(R) Pentium(R) CPU G4400T @ 2.90GHz (710q)
chipset:
Q270 Kaby Lake chipsets

kernel: 4.17.14-100
ram: 4G
 
Old 11-15-2018, 10:12 PM   #2
frankbell
LQ Guru
 
Registered: Jan 2006
Location: Virginia, USA
Distribution: Slackware, Ubuntu MATE, Mageia, and whatever VMs I happen to be playing with
Posts: 19,323
Blog Entries: 28

Rep: Reputation: 6141Reputation: 6141Reputation: 6141Reputation: 6141Reputation: 6141Reputation: 6141Reputation: 6141Reputation: 6141Reputation: 6141Reputation: 6141Reputation: 6141
Does the freeze happen when you are using another browser?
 
Old 11-16-2018, 09:09 AM   #3
perlhackr
LQ Newbie
 
Registered: Jul 2006
Posts: 3

Original Poster
Rep: Reputation: 0
Hello Frankbell,

Thank you for taking the time to reply.

No I haven't tried other browsers; I'm very familiar with Chromium so I was hoping to stick with that.

I've been running this kiosk style web application software package in Chromium on many many other devices (sandy bridge, ivy bridge and broadwell) and not had this issue at all. Therefore, I was thinking that it's not likely the Browser and related to new hardware and kernel support issues; Although I could be wrong... Thoughts?

The inability to reproduce the way it's semi-locking up is my big hurdle. I haven't got any logs to shed some light on what might be happening and made it happen often enough to gain any real traction.

My current attempt to gather info has led me to create a little nodejs web service that doesn't require HD access when running. If still running when the incident happens the web service exposes an api that allows me to reboot and get logs. The log request just pulls the kernel ring buffer data (dmesg). The reboot uses the magic sysreq keys.

Some other things to note:
- the system always loses its dhcp address although still responds on a virtual nic (eth0:1) that was created.
- during regular operation of the software I have lmsensors running once in a while to gather temperature data. I wonder if there's a possibility that some i2c bus errors are causing the device to get in some odd state?

Thanks!
 
Old 02-01-2019, 07:58 PM   #4
beard5849
LQ Newbie
 
Registered: Feb 2019
Location: Sydney Australia
Distribution: Fedora
Posts: 17

Rep: Reputation: Disabled
Problem diagnosed

From kernel 4.15.xx the kernel team are introducing Power Management, conserving power for laptops.
This effects graphics drivers.
On my Ubuntu 16.04 system here ASRock J1900-ITX I disabled the GUI console, just text.
Also Fedora 29 on my Banana Pi original, kernel 4.18.0-0.rc8.git1.1.fc29.armv7hl
"uptime" now more than 30 days.

Kernel team, please get the GUI console working with the Allwinner V40 and R40 SoCs.
Yes, it's a u-boot problem. I've asked the u-boot team, they're busy with 64bit.

Alan VK2ZIW
 
Old 02-28-2019, 08:32 AM   #5
Poltergeist_85
LQ Newbie
 
Registered: Feb 2019
Posts: 5

Rep: Reputation: Disabled
Smile I'm also facing the issue...

Quote:
Originally Posted by perlhackr View Post
Hello all,

I'm currently running into issues where my system is freezing randomly; The freeze can occur anywhere between a few hours of usage to all day. The freeze always occurs when playing back videos through a chromium browser window using the video tag to play the video back. When the system is frozen it still responds to pings although I can no longer log in through ssh. Unfortunately, I have absolutely no logs that could point me in the right direction to explain what is occurring because when this occurs nothing gets logged.

The devices are running their latest microcode; Thinking frequency scaling might be the issue I've tried a few of the cstate options that are out there to keep frequency scaling from occurring (intel_idle.max_cstate=0, processor.max_cstate=1). As of yet I haven't been able to find the root cause of the issue (successfully recreate the issue).

There is also the intel hardware watchdog running on the device that doesn't seem to be responding to this state because it's dog watcher process seems to be still running at regular intervals to kick the dog.

hardware:
M710q/M910q lenovo devices
the devices use intel kabylake chipsets paired with a skylake processor

processors:
Intel(R) Core(TM) i5-6500T CPU @ 2.50GHz (910q)
Intel(R) Pentium(R) CPU G4400T @ 2.90GHz (710q)
chipset:
Q270 Kaby Lake chipsets

kernel: 4.17.14-100
ram: 4G
Hello perlhackr,

I've seen your post seeking for a fix because I face the same exact problem on multiple intel based computers (intel NUC 5i3, 6i3 and 7i3).
I was wondering if you've found any fix to the problem you've described since your post.

I'm almost using the same setup: using Chromium in kiosk mode to display information and video to visitors.
Devices are running with a 4.15 kernel (Xubuntu 18.04) and freeze almost one time per day.

When this happens, the image is frozen and nothing react on the device (no keyboard, no mouse, etc.), hard reboot is required.
When I consul syslog logs, i can see that the system has stopped logging when the problem happen and usually I can see a line of @^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^ as the last line the system was able to log.

I've also tried the intel_idle.max_cstate=1 kernel boot option but still facing the issue.

I've upgrade to a 4.19 kernel and it seems to freeze less often but still happening.

Any information would be highly appreciated.

Thanks,

Poltergeist
 
Old 02-28-2019, 10:05 PM   #6
beard5849
LQ Newbie
 
Registered: Feb 2019
Location: Sydney Australia
Distribution: Fedora
Posts: 17

Rep: Reputation: Disabled
Did you read my previous post?

From kernel 4.15.xx the kernel team are introducing Power Management, conserving power for laptops.
This effects graphics drivers.
On my Ubuntu 16.04 system here ASRock J1900-ITX I disabled the GUI console, just text.
Also Fedora 29 on my Banana Pi original, kernel 4.18.0-0.rc8.git1.1.fc29.armv7hl
"uptime" now more than 30 days.

Try changing the graphics card, even an older one. It's all I can offer at this stage.

Alan
 
Old 03-01-2019, 02:41 AM   #7
Poltergeist_85
LQ Newbie
 
Registered: Feb 2019
Posts: 5

Rep: Reputation: Disabled
Hi Alan,

I've read your post but are you suggesting to use a desktop workstation in command line or I missunderstood?
This is obviously not the kind of solution I'm looking for.

You're also referring about the power management system introduced by the kernel team, might using and kernel older than 4.15 could be an option?

Regards,

Polter
 
Old 03-01-2019, 02:09 PM   #8
beard5849
LQ Newbie
 
Registered: Feb 2019
Location: Sydney Australia
Distribution: Fedora
Posts: 17

Rep: Reputation: Disabled
Exactly. My Ubuntu on ASRock Q1900-ITX m/b history log:

2018
Dec 5 Still failed; Turn off graphics: systemctl isolate multi-user.target
systemctl enable multi-user.target
systemctl set-default multi-user.target
revert if needed: systemctl set-default graphical.target

uptime today 2/2/19
06:52:36 up 67 days, 11:26, 1 user, load average: 0.00, 0.00, 0.00

Whereas before the "uptime" was only two days. I tried screensaver settings,
I put in a PCI-E SATA controller for the disk.

On this box I don't need the GUI desktop. It's my "nas".

I suspect it's a console issue. When the GUI is loaded, writes to the system console
fail after perhaps one page and the system just freezes. Dead, no logs dead.

The box I'm writing here on is kernel 4.4.9-300.fc23.x86_64
just keeps running 24x7.

Alan
 
Old 03-02-2019, 12:29 PM   #9
perlhackr
LQ Newbie
 
Registered: Jul 2006
Posts: 3

Original Poster
Rep: Reputation: 0
No solution as of yet (watchdog and question)

Hello all,

I've been using the 4.19 kernel as well and lockups are occurring less. I've also made sure that I'm using the ext4 mount option journal=ordered as I'm suspicious of it being an I/O drive read write issue. The failures occur much less when undergoing a lot of disk activity. The journal=ordered writes less by writing in larger blocks of data as opposed to the basic journal mount options (used in ext3) of journal=data (which handles brownouts better than data=ordered because writes are forced to disk in a narrower time window).

Also, I had discovered that the standard intel watchdog iTCO_wdt no longer works on any Kaby Lake platform devices that I have tried (I have quite a few). Therefore, I've been using the softdog to handle these lockups for the time being. I've read that there's a new watchdog that utilises the ACPI watchdog table (wdat_wdt) although I've yet to figure out how to use it (it loads and does nothing, likely a loader the loads another watchdog and just becomes a generic interface to it???) From what I gather it's suppose to create a general interface for managing the watchdog. ANY info on this would be appreciated. Also I'd be interested in a collaboration to fix the intel one... if there is any interest.

Regards,
 
Old 03-02-2019, 02:15 PM   #10
beard5849
LQ Newbie
 
Registered: Feb 2019
Location: Sydney Australia
Distribution: Fedora
Posts: 17

Rep: Reputation: Disabled
Hi Perlhakr,

Can you run these systems with a kernel version before 4.15 and see if they stay up?

ie. before these Power Management for laptops and pads kernel changes were started.

Alan
 
Old 03-04-2019, 02:20 AM   #11
Poltergeist_85
LQ Newbie
 
Registered: Feb 2019
Posts: 5

Rep: Reputation: Disabled
Hello perlhackr,

Thanks for your message.
I've successfully been able to use the new watchdog (wdat_wdt).
Here is a basic way to use it:

1- make sure that wdat_wdt will be loaded on boot.
2- Write to /dev/watchdog at regular interval.
I've noticed that one of my devices expect to have a feed on the watchdog every second so I'm using the following bash infinite loop. (please note that the interval has to be less than 1 sec or the device will reboot).

feed.sh
------------
#!/bin/bash

while [ 1==1 ]; do
echo "1" >> /dev/watchdog
sleep 0.5s
done
-------------

3- Either you can make a service with this script of just start it from the rc.local "nohup /root/feed.sh &"

As soon you have a crash, the device will reboot by itself as we can expect.
One of my devices reboot after 1 second (CELERON), another one reboot after 90 seconds (i3), but it works!

Hope this will help,

Regards,

Polter

Last edited by Poltergeist_85; 03-04-2019 at 04:46 AM.
 
Old 10-30-2019, 09:20 AM   #12
arivo
LQ Newbie
 
Registered: Oct 2019
Posts: 15

Rep: Reputation: Disabled
Hi Poltergeist_85, hi all,

We also have problems with freezes (on NUC7 and NUC8 models: When the freezes happen the NUC is spamming so much network traffic that it's taking down other devices on the same switch).
We also wanted to use the hardware watchdog but the iTCO_wdt does not work (it does not count down) and the wdat_wdt does not even create a /dev/watchdog.

You said you were able to use the wdat_wdt. Can you tell us what you did to get it working?

Regards,
Thomas
 
Old 10-31-2019, 10:27 AM   #13
Poltergeist_85
LQ Newbie
 
Registered: Feb 2019
Posts: 5

Rep: Reputation: Disabled
Hi Thomas,

I think that only "Pro" models of NUC have the watchdog.
I'm personalty using the NUC7i3DNKE. The 7i3BNK/BNH apparently have no watchdog timer component.

Have a look to the results of my test on the image file in attachment.

Regards,

Polter
Attached Thumbnails
Click image for larger version

Name:	watchdog test.jpg
Views:	74
Size:	30.3 KB
ID:	31731  
 
Old 10-31-2019, 11:20 AM   #14
arivo
LQ Newbie
 
Registered: Oct 2019
Posts: 15

Rep: Reputation: Disabled
Thank you Polter for answering so quickly

That's bad as we are mainly using NUC7i3BNH and NUC7i3DNH. I cannot find any information which NUCs have a watchdog timer, did you find any reliable information or did you just execute your tests?

Regards,
Thomas
 
Old 10-31-2019, 11:24 AM   #15
Poltergeist_85
LQ Newbie
 
Registered: Feb 2019
Posts: 5

Rep: Reputation: Disabled
Quote:
Originally Posted by arivo View Post
Thank you Polter for answering so quickly

That's bad as we are mainly using NUC7i3BNH and NUC7i3DNH. I cannot find any information which NUCs have a watchdog timer, did you find any reliable information or did you just execute your tests?

Regards,
Thomas
I sell about 2000 nucs per year in my company and even if I have good contacts at Intel, nobody from intel was able to provide information concerning watchdog.
So I read a lot about watchdog on linux and has to make tests myself. I tested on every models I had in my stock, that how I made the table you've seen.
Not tested yet on 8i3 / 9i3.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Intel Iris 540 (Skylake) graphics not working (Debian). Install xserver-xorg-video-intel or not ? mgai7755 Linux - Hardware 14 04-09-2017 02:56 PM
Intel skylake CPU intel debugger may be vulnerable as per link aus9 Linux - Security 1 01-11-2017 10:20 AM
LXer: Wind River Linux 8 supports Yocto 2.0, Intel Skylake CPUs LXer Syndicated Linux News 0 10-05-2015 10:52 PM
LXer: COM Express Modules set sail on Intel’s 6th Gen Core Skylake LXer Syndicated Linux News 0 09-05-2015 04:06 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Distributions > Fedora

All times are GMT -5. The time now is 08:02 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration