I recently upgraded a machine from Debian wheezy to Debian jessie. After the upgrade, if I run a command like
Code:
find . -type f -exec cat {} \; > /dev/null
from a large directory, I get video corruption after 1 or 2 seconds. It's completely reproducable.
It happens under X or even in a virtual terminal with X not running and the nvidia module never loaded.
It happens when reading from an ext4 partition on sdb or on an xfs partition on sdd.
It happens under the default jessie kernel 3.16.0-7-686-pae, under the still installed wheezy kernel 3.2.0-6-686-pae or under the 4.9 kernel also available in jessie 4.9.0-0.bpo.7-686.
The machine has been in regular use as a backup server and media server for many years running wheezy, without problems. The problems started right after the first boot into jessie.
I booted into System Rescue CD 3.9.2 on a thumb drive. I got similar vt corruption at the start, while it read the USB drive before starting to boot. But then as soon as the kernel started booting, the corruption was gone, and my find test didn't cause further problems.
When the corruption happens in a virtual terminal, nothing shows up in dmesg. If X is running, then there are error messages in dmesg and the machine freezes up soon after. I'll paste these message below.
One time, I got read errors from the hard drive, but SMART tests and later checks showed that the drives were fine.
Hardware:
Gigabyte GA-E7AUM-DS2H motherboard, with on board NVidia GeForce 9400 graphics
Intel Core2Duo E7400 2.8GHz 65W
2x2G Kingson 800MHz ram
4 SATA hard drives
I ran memtest directly from grub for several hours without a problem.
dmesg output when corruption happens and X is running:
Code:
[ 360.980157] NVRM: GPU at PCI:0000:02:00: GPU-579d39ac-2eaa-3c97-d407-7d020ce553e2
[ 360.980164] NVRM: Xid (PCI:0000:02:00): 8, Channel 0000007e
[ 362.980103] NVRM: os_schedule: Attempted to yield the CPU while in atomic or interrupt context
[ 366.980179] NVRM: os_schedule: Attempted to yield the CPU while in atomic or interrupt context
[ 368.981729] [sched_delayed] sched: RT throttling activated
[ 424.682600] perf interrupt took too long (2513 > 2500), lowering kernel.perf_event_max_sample_rate to 50000
[ 436.996006] INFO: NMI handler (perf_event_nmi_handler) took too long to run: 57.899 msecs
[ 436.996006] perf interrupt took too long (454980 > 5000), lowering kernel.perf_event_max_sample_rate to 25000
[ 437.924150] perf interrupt took too long (451438 > 10000), lowering kernel.perf_event_max_sample_rate to 12500
[ 438.792378] perf interrupt took too long (447921 > 20000), lowering kernel.perf_event_max_sample_rate to 6250
[ 439.718485] perf interrupt took too long (896752 > 38461), lowering kernel.perf_event_max_sample_rate to 3250
[ 440.470959] perf interrupt took too long (889756 > 71428), lowering kernel.perf_event_max_sample_rate to 1750
[ 441.281296] perf interrupt took too long (882819 > 125000), lowering kernel.perf_event_max_sample_rate to 1000
[ 442.149523] perf interrupt took too long (875931 > 250000), lowering kernel.perf_event_max_sample_rate to 500
[ 443.017769] perf interrupt took too long (869098 > 500000), lowering kernel.perf_event_max_sample_rate to 250
[ 443.885975] INFO: NMI handler (perf_event_nmi_handler) took too long to run: 115.795 msecs
[ 444.518713] hpet1: lost 6 rtc interrupts
[ 444.692362] hpet1: lost 10 rtc interrupts
[ 444.866006] hpet1: lost 10 rtc interrupts
[ 455.747847] NVRM: Xid (PCI:0000:02:00): 1, Channel 00000001 Method 00000000 Data 00006861
[ 457.992006] INFO: rcu_sched self-detected stall on CPU { 0} (t=5250 jiffies g=10451 c=10450 q=1182)
[ 457.992006] sending NMI to all CPUs:
[ 457.992006] NMI backtrace for cpu 0
I'm happy to provide any other info or try any suggestions, but I thought I'd start with this.
Thanks for any help trying to figure this out!