LinuxQuestions.org
Visit Jeremy's Blog.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Software > Linux - Kernel
User Name
Password
Linux - Kernel This forum is for all discussion relating to the Linux kernel.

Notices


Reply
  Search this Thread
Old 10-12-2018, 08:10 PM   #1
danch
LQ Newbie
 
Registered: Oct 2018
Posts: 5

Rep: Reputation: Disabled
video corruption when reading from disk, Debian jessie


I recently upgraded a machine from Debian wheezy to Debian jessie. After the upgrade, if I run a command like

Code:
find . -type f -exec cat {} \; > /dev/null
from a large directory, I get video corruption after 1 or 2 seconds. It's completely reproducable.
It happens under X or even in a virtual terminal with X not running and the nvidia module never loaded.
It happens when reading from an ext4 partition on sdb or on an xfs partition on sdd.
It happens under the default jessie kernel 3.16.0-7-686-pae, under the still installed wheezy kernel 3.2.0-6-686-pae or under the 4.9 kernel also available in jessie 4.9.0-0.bpo.7-686.

The machine has been in regular use as a backup server and media server for many years running wheezy, without problems. The problems started right after the first boot into jessie.

I booted into System Rescue CD 3.9.2 on a thumb drive. I got similar vt corruption at the start, while it read the USB drive before starting to boot. But then as soon as the kernel started booting, the corruption was gone, and my find test didn't cause further problems.

When the corruption happens in a virtual terminal, nothing shows up in dmesg. If X is running, then there are error messages in dmesg and the machine freezes up soon after. I'll paste these message below.

One time, I got read errors from the hard drive, but SMART tests and later checks showed that the drives were fine.

Hardware:

Gigabyte GA-E7AUM-DS2H motherboard, with on board NVidia GeForce 9400 graphics
Intel Core2Duo E7400 2.8GHz 65W
2x2G Kingson 800MHz ram
4 SATA hard drives

I ran memtest directly from grub for several hours without a problem.

dmesg output when corruption happens and X is running:

Code:
[  360.980157] NVRM: GPU at PCI:0000:02:00: GPU-579d39ac-2eaa-3c97-d407-7d020ce553e2
[  360.980164] NVRM: Xid (PCI:0000:02:00): 8, Channel 0000007e
[  362.980103] NVRM: os_schedule: Attempted to yield the CPU while in atomic or interrupt context
[  366.980179] NVRM: os_schedule: Attempted to yield the CPU while in atomic or interrupt context
[  368.981729] [sched_delayed] sched: RT throttling activated
[  424.682600] perf interrupt took too long (2513 > 2500), lowering kernel.perf_event_max_sample_rate to 50000
[  436.996006] INFO: NMI handler (perf_event_nmi_handler) took too long to run: 57.899 msecs
[  436.996006] perf interrupt took too long (454980 > 5000), lowering kernel.perf_event_max_sample_rate to 25000
[  437.924150] perf interrupt took too long (451438 > 10000), lowering kernel.perf_event_max_sample_rate to 12500
[  438.792378] perf interrupt took too long (447921 > 20000), lowering kernel.perf_event_max_sample_rate to 6250
[  439.718485] perf interrupt took too long (896752 > 38461), lowering kernel.perf_event_max_sample_rate to 3250
[  440.470959] perf interrupt took too long (889756 > 71428), lowering kernel.perf_event_max_sample_rate to 1750
[  441.281296] perf interrupt took too long (882819 > 125000), lowering kernel.perf_event_max_sample_rate to 1000
[  442.149523] perf interrupt took too long (875931 > 250000), lowering kernel.perf_event_max_sample_rate to 500
[  443.017769] perf interrupt took too long (869098 > 500000), lowering kernel.perf_event_max_sample_rate to 250
[  443.885975] INFO: NMI handler (perf_event_nmi_handler) took too long to run: 115.795 msecs
[  444.518713] hpet1: lost 6 rtc interrupts
[  444.692362] hpet1: lost 10 rtc interrupts
[  444.866006] hpet1: lost 10 rtc interrupts
[  455.747847] NVRM: Xid (PCI:0000:02:00): 1, Channel 00000001 Method 00000000 Data 00006861
[  457.992006] INFO: rcu_sched self-detected stall on CPU { 0}  (t=5250 jiffies g=10451 c=10450 q=1182)
[  457.992006] sending NMI to all CPUs:
[  457.992006] NMI backtrace for cpu 0
I'm happy to provide any other info or try any suggestions, but I thought I'd start with this.

Thanks for any help trying to figure this out!
Attached Thumbnails
Click image for larger version

Name:	corruption.jpg
Views:	17
Size:	179.3 KB
ID:	28773  

Last edited by danch; 10-13-2018 at 07:07 PM. Reason: update hardware list
 
Old 10-13-2018, 07:04 AM   #2
Mara
Moderator
 
Registered: Feb 2002
Location: Grenoble
Distribution: Debian
Posts: 9,696

Rep: Reputation: 232Reputation: 232Reputation: 232
It looks this is the nvidia module, it is loaded because it complains (NVRM prefix is from this driver):
Quote:
[ 362.980103] NVRM: os_schedule: Attempted to yield the CPU while in atomic or interrupt context
Please try to remove it/blacklist it and check if the problem doesn't happen anymore.
 
Old 10-13-2018, 07:33 AM   #3
danch
LQ Newbie
 
Registered: Oct 2018
Posts: 5

Original Poster
Rep: Reputation: Disabled
The problem also happens if I rename the nvidia modules and reboot, but then there are no messages in dmesg, and the system remains stable other than the corruption of the virtual terminal.
 
Old 10-16-2018, 08:40 AM   #4
pan64
LQ Addict
 
Registered: Mar 2012
Location: Hungary
Distribution: debian/ubuntu/suse ...
Posts: 21,629

Rep: Reputation: 7265Reputation: 7265Reputation: 7265Reputation: 7265Reputation: 7265Reputation: 7265Reputation: 7265Reputation: 7265Reputation: 7265Reputation: 7265Reputation: 7265
I had to remove all the X related packages from my ubuntu after a dist upgrade and reinstall them. Probably that will help you too.
But I do not really understand: virtual terminal is something running inside X, or ??

What about switching to console (Ctrl-Alt-F1) and back again?
 
Old 10-16-2018, 08:48 AM   #5
danch
LQ Newbie
 
Registered: Oct 2018
Posts: 5

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by pan64 View Post
I had to remove all the X related packages from my ubuntu after a dist upgrade and reinstall them. Probably that will help you too.
But I do not really understand: virtual terminal is something running inside X, or ??

What about switching to console (Ctrl-Alt-F1) and back again?
By a "virtual terminal" I mean the console you get with Ctrl-Alt-F1. I get corruption in that console when X is not even running and the nvidia driver has not been loaded into the kernel. So reinstalling packages related to X shouldn't affect that.
 
Old 10-16-2018, 08:55 AM   #6
pan64
LQ Addict
 
Registered: Mar 2012
Location: Hungary
Distribution: debian/ubuntu/suse ...
Posts: 21,629

Rep: Reputation: 7265Reputation: 7265Reputation: 7265Reputation: 7265Reputation: 7265Reputation: 7265Reputation: 7265Reputation: 7265Reputation: 7265Reputation: 7265Reputation: 7265
anyway it looks like the upgrade was not really successful.
did you try to execute the command: reset? does it help?
 
Old 10-16-2018, 09:25 AM   #7
danch
LQ Newbie
 
Registered: Oct 2018
Posts: 5

Original Poster
Rep: Reputation: Disabled
Typing the
Code:
reset
command in the virtual terminal doesn't improve anything. I think the corruption is at a much lower level than terminal settings. See the screenshot I attached to the original question for how it typically looks. Many of the coloured boxes are flashing.
 
Old 10-17-2018, 01:09 AM   #8
pan64
LQ Addict
 
Registered: Mar 2012
Location: Hungary
Distribution: debian/ubuntu/suse ...
Posts: 21,629

Rep: Reputation: 7265Reputation: 7265Reputation: 7265Reputation: 7265Reputation: 7265Reputation: 7265Reputation: 7265Reputation: 7265Reputation: 7265Reputation: 7265Reputation: 7265
that is the corruption of the buffer, where the "content" is stored. Usually reset forces to clean it. But as you told it is on a lower level. I would suggest you to boot from a live CD or do something similar if that works. I still think a clean reinstall may help.
 
Old 12-01-2018, 08:03 PM   #9
danch
LQ Newbie
 
Registered: Oct 2018
Posts: 5

Original Poster
Rep: Reputation: Disabled
Despite the fact that the corruption problems occurred after an OS upgrade, I'm pretty sure they were caused by a faulty motherboard in the end. I replaced the motherboard, CPU and RAM, but kept the same hard drives and same installation of debian, and the problem went away.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
LXer: Devuan 1.0 Jessie beta 2: That’s Debian Jessie minus systemd LXer Syndicated Linux News 0 12-02-2016 02:41 AM
fast video playback in debian jessie Vyacheslav1 Debian 7 10-28-2015 09:09 AM
[SOLVED] Problems with Radeon video card in Debian Jessie testing okiemel Linux - Software 5 01-04-2014 05:45 PM
runtime error ( double free or corruption (out)) mid way through reading in file quantum_leaf Programming 1 03-02-2010 01:46 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Software > Linux - Kernel

All times are GMT -5. The time now is 12:45 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration