Linux - Hardware This forum is for Hardware issues.
Having trouble installing a piece of hardware? Want to know if that peripheral is compatible with Linux? |
Notices |
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
Are you new to LinuxQuestions.org? Visit the following links:
Site Howto |
Site FAQ |
Sitemap |
Register Now
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
|
 |
|
10-15-2018, 04:31 PM
|
#1
|
Member
Registered: Apr 2010
Distribution: FC17
Posts: 383
Rep:
|
Can HP StorageWorks 215 tape cause FC26 KDE GUI to be stuck?
Hello!
For quite a few years I perform full backups using a HP StorageWorks 215 (LTO 1) tape,
about 18 times so far and each time it worked flawlessly in the following way:
I open a "konsole", become "root" and call a Perl wrapper which calculates timestamp
and calls "tar" by a command similar to:
Code:
<root localhost>.../backup>backup_tar -full
Running system command: "/bin/tar -X /root/backup/exclude.conf --exclude=/root/backup/log/full_2018_10_15_2143 --xattrs --numeric-owner --exclude-caches -cvpMf /dev/st0 / >& /root/backup/log/full_2018_10_15_2143"...
Another "konsole" is opened (also as "root") in the directory of the log file,
where I open the log file with "nedit" and from time to time I move the cursor over
"nedit", get "file modified by another program Reload/Cancel" prompt, click "reload" and
check what files it's backing up now. I also see there the prompts to insert the next volume
(it's a multi volume backup), when I see one I press "eject" on the tape drive panel, replace
the cartrige and press
"Enter" in the "konsole" where the backup command is running.
Also in the log file "konsole" I run from time to time a Perl script which calculates the sum
of file sizes listed in the log:
Code:
<root localhost>.../log>du_f -G full_2018_10_15_2143
Total files size: 6 G.
Except the first time it's just "Arrow up" followed by "Enter".
The time it takes to fill one LTO 1 (100 GB) cartrige is about 6 hours (it's a PC built in 2009:
AMD Phenom X4, ASUS M3A79T deluxe mboard, 16 GB RAM, ATI FireGL V5600 video card,
currently running FC26+KDE),
the number of volumes is about 10 so it's a multi-day process: start, wait 6 hours (or go to work
or to sleep), check "insert next volume, press Enter when ready" prompt in "nedit", eject used tape,
insert next, press "Enter" in the "console" of the "tar" command.
But the last time I attempted to perform the same backup process I've got problems:
the first 2 volume changes went OK but on the 3-rd one I moved the mouse pointer over the
"nedit", got the usual "file changed by other program, reload/cancell" prompt
but when I clicked on "Reload" - nothing happened, the prompt remained as if I didn't
click on it. I clicked on the "konsole" of the log file directory - it changed focus to
front, as expected, I pressed "Arrow up" - got the last command "du_f -G <log name>"
but when I pressed "Enter" - again, nothing happened.
I clicked the "close" button of "nedit" but it ignored me. So I attempted to "blindly"
change the tape - pressed "Eject", waited for the tape rewind to end and tape to eject,
replaced the tape, clicked on the "konsole" where the "tar" command is running -
it changed focus to "front" but when I pressed "Enter" on it - again, it had been ignored,
nothing happened.
At that point I discovered that I can't close any "konsole" (only focus changes of already
opened windows), can't open any new "konsole" (in fact - anything). So I decided
to shutdown - pressed "Eject", took the tape out (thank God, at least this worked),
attempted RMB on the desktop to bring the shutdown menu - nothing,
attempted going to text mode by Ctrl+Alt+F3 to shutdown from there - nothing.
As a last resort I had to brutally power down the PC.
After waiting some time and powering up I attempted again but this time I didn't even got
to the first cartrige replace prompt -
when I've got back to the PC after few hours I found it in the same semi-responsive state.
Again had to power down the PC and after rebooting and checking the last write time
of the log file I saw that it stopped being updated about 2 hours after the backup start.
The fact that initially the problem appeared before the 3-rd cartride change and later
started appearing during the first cartrige also points in the direction of the
backup process, in particular - the tape. No error LEDs on the tape drive, only the "Ready"
blinking during the backup and is "ON" steady when I get back to a "semi-stuck" GUI.
REAL TIME UPDATE: while I was writing this post I was also running another backup attempt.
After about 1.5 hours suddenly I noticed in the log some I/O errors like:
Code:
/home/kaza/GRAPHICS/PICS/D/Video0208.3gp
/home/kaza/GRAPHICS/PICS/D/Photo0416.jpg
/home/kaza/GRAPHICS/PICS/D/Photo0542.jpg
/home/kaza/GRAPHICS/PICS/D/Video0025.3gp
/bin/tar: /home/kaza/GRAPHICS/PICS/D/Video0025.3gp: File shrank by 4145152 bytes; padding with zeros
/home/kaza/GRAPHICS/PICS/D/Photo0271.jpg
/bin/tar: /home/kaza/GRAPHICS/PICS/D/Photo0271.jpg: Read error at byte 0, while reading 2048 bytes: Input/output error
/home/kaza/GRAPHICS/PICS/D/Photo0179.jpg
/bin/tar: /home/kaza/GRAPHICS/PICS/D/Photo0179.jpg: Read error at byte 0, while reading 3072 bytes: Input/output error
/home/kaza/GRAPHICS/PICS/D/Video0097.3gp
/bin/tar: /home/kaza/GRAPHICS/PICS/D/Video0097.3gp: Read error at byte 0, while reading 512 bytes: Input/output error
after some time I heared a beep from the PC (can't tell if from the tape drive or something
else), the "Ready" LED stopped blinking and turned "ON" steady.
I hurriedly "copy-pasted" this post to a text file before noticing the familiar "semi-stuck"
GUI. But after about a minute or so I noticed the neither the "top" window not "gkrellm"
graphs (always opened at the side of my desktop to see what's going) are being updated
and the mouse pointer dsappeared. Now the GUI was "totally-stuck".
In the past I always started a backup from inserting a cleaning cartrige into the tape drive
but this time I've got "tape error" LED which is probably "cleanings count timeout".
I ordered a new one on e-bay but since it's expected to take a month to arrive I attempted
to backup anyway. Am I being punished for this? The tape drive hasn't been used for few months,
it might've gathered some dust.
But the most important question: why a problem with the backup tape drive causes the GUI to be stuck,
first partially and then completely?
My kernel version is:
Code:
<localhost>.../NOA>uname -a
Linux localhost.localdomain 4.15.6-200.spi_gpio.fc26.x86_64 #1 SMP Sat Mar 3 02:13:51 IST 2018 x86_64 x86_64 x86_64 GNU/Linux
Looking at the kernel name reminds me that I've made a privet kernel version with SPI and GPIO
support enabled to allow me programming through USB an SPI chip (CH341A). Can it somehow affect
KDE GUI being "stuck" when tape drive has errors?
TIA for any ideas (except the obvious "don't start backup before heads had been cleaned"),
kaza.
|
|
|
10-17-2018, 04:22 PM
|
#2
|
Senior Member
Registered: Sep 2017
Distribution: FreeBSD
Posts: 2,252
|
|
|
|
10-17-2018, 06:41 PM
|
#3
|
LQ Veteran
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 21,446
|
Next time it happens copy /proc/meminfo to a file and also get a grab of top - just the first 20 line is probably enough.
Post them here. Take a copy of meminfo just after boot for comparison as well.
|
|
|
10-18-2018, 02:31 PM
|
#4
|
Member
Registered: Apr 2010
Distribution: FC17
Posts: 383
Original Poster
Rep:
|
Thanks for the reply,
When this problem occurs the KDE eventually freezes up
so I guess I'll need to be lucky to catch the tape errors
just as they occur and before the whole KDE freezes up.
When tape errors occur and the KDE gradually "loses its bearings"
I don't remember I saw anything hogging close to 100% CPU (core)
but I'll try to capture the "top".
I'll do another backup try and see if I'll be fast enough to save the info
before KDE freezes up.
As a wild guess it seems to me that I/O errors of the SCSI tape
somehow prevent normal disk access of the OS (KDE), despite the fact that
OS resides on SAS disks connected to a different controller than the
tape. Here's the output of "lsscsi":
Code:
<localhost>.../temp>lsscsi
[0:0:0:0] cd/dvd HL-DT-ST DVDRAM GH24NSD1 LG00 /dev/sr0
[6:0:0:0] disk ICP KA_RAID_0 V1.0 /dev/sda
[6:0:1:0] disk ICP KA_RAID_1 V1.0 /dev/sdb
[6:1:0:0] disk FUJITSU MAU3147RC 0104 -
[6:1:1:0] disk FUJITSU MBA3147RC HPF1 -
[6:1:2:0] disk TOSHIBA MK2001TRKB 0105 -
[6:1:3:0] disk TOSHIBA MK2001TRKB 0105 -
[7:0:0:0] disk FUJITSU MAS3367NP 0104 /dev/sdc
[7:0:3:0] tape HP Ultrium 1-SCSI N2CG /dev/st0
The OS is installen on the pair of MAU3147RC connected as RAID 0
(MAS3367NP is some old data).
As I wrote, it's just a wild guess.
OK, logging off and performing the experiment...
|
|
|
10-20-2018, 02:31 PM
|
#5
|
Member
Registered: Apr 2010
Distribution: FC17
Posts: 383
Original Poster
Rep:
|
Well, the first experiment was a failure: for the first 2 hours the tape drive worked OK
so I stopped watching it continuosly and started "sampling" it once in few minutes.
After over 3 hours (damn Murphy with his laws...) I found the KDE almost stuck and needless
to say, no chance of copying the /proc/meminfo. I couldn't move any window and "copy paste"
the "top" window contents but from what was accidentally not obscured on it I could manually
copy the following:
Code:
up: 3:18
Tasks: 249 total, 2 running, 182 sleeping, 0 stopped, 0 zombie
%cpu: 0.6 us, 0.2 sy, 0.0 ni, 24.7 id, 74.2 wd, 0.1 hi, 0.0 si, 0.0 st
KiB Mem: 16424780 total, 172160 free, 1526880 used, 14725680 buff/cache
KiB swap: 33554428 total, 33554428 free, 0 used, 14524544 avail Mem
From the processes % cpu list:
kwin 2.0%
Xorg 1.3%
konsole 0.7%
kdeconnectd 0.3%
xembedsniproxy 0.3%
korgac 0.3%
akonadi_contr+ 0.3%
akonadi_birth+ 0.3%
akonadi_follo+ 0.3%
kworker/2:2H 0.3%
systemd 0.0%
I also noticed that the tape drive makes a movement sound for about 1 sec in duration
once in 4-5 minutes. Together with the "74.2%" CPU wasted on wait it looks like
the tape drive is endlessly attempting to get again and again some data it didn't
get the first time and the rest of KDE can't do anything while it happens.
(That's just my guess, feel free to correct me.)
I'll make another experiment, this time I'll write a Perl script which will
copy /proc/meminfo into a file with timestame in name once in 10 minutes
and also a "top" in batch mode will be saved to another file, also
with a timestamp in name.
Meanwhile we can only theoretize.
Does anyone know (or can point me to a knowledge source) how the LINUX
system accesses different disks/tapes connected to different
SCSI controllers (main disk - SAS, tape drive - SCSI) from the viewpoint
of queueing: is there a single gueue for requests of all disks+tape
or many queues working in parallel? Is there any configuration of this
which can be changed (even if it means kernel compilation configuration)?
TIA for ideas,
kaza.
|
|
|
10-20-2018, 06:01 PM
|
#6
|
LQ Veteran
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 21,446
|
Next time just grab your phone and take a photo - saves all the typing ... 
Thanks for the interrupt percentage - I was worried that it may be high; that runs at a priority above all else. No CPU is "wasted on wait" - that is merely an accounting convenience, a subset of "idle", but may (does in your case) indicate non-completed I/O.
Whilst you're collecting data, you might as well get vmstat.
Each device gets its own q - SSDs may get multi-queuing per device if you have a current kernel.
|
|
|
10-26-2018, 07:13 AM
|
#7
|
Member
Registered: Apr 2010
Distribution: FC17
Posts: 383
Original Poster
Rep:
|
OK, it took some time (life, etc.) to repeat
the experiment with periodically saved data (/proc/meminfo, top -b -n 1, dmesg -c)
and I don't see anything "sticking out" in the saved reports.
Again, after over 2 hours the tape stopped the backup operation, a short (~1 sec) beep
was audible (can it be some m-board error code?) and the GUI became first "partially
responsive" and after insuccessfull attempt to close the FireFox window (from past experiences -
that happened after attempt to close any window) the GUI became frozen.
The data save period was 10 minutes and the last timestamp was: 2018_10_26_1404.
Looking at the time stamp of the tar command log I see that it last write was at time 14:12
so 8 minutes passes from the last dump of the info to the system being stuck.
The last /proc/meminfo:
Code:
MemTotal: 16424720 kB
MemFree: 153424 kB
MemAvailable: 14569136 kB
Buffers: 143932 kB
Cached: 14157476 kB
SwapCached: 0 kB
Active: 1405552 kB
Inactive: 14333024 kB
Active(anon): 755192 kB
Inactive(anon): 487188 kB
Active(file): 650360 kB
Inactive(file): 13845836 kB
Unevictable: 48 kB
Mlocked: 48 kB
SwapTotal: 33554428 kB
SwapFree: 33554428 kB
Dirty: 2560 kB
Writeback: 0 kB
AnonPages: 1437372 kB
Mapped: 530472 kB
Shmem: 119444 kB
Slab: 320276 kB
SReclaimable: 256700 kB
SUnreclaim: 63576 kB
KernelStack: 10912 kB
PageTables: 64224 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
WritebackTmp: 0 kB
CommitLimit: 41766788 kB
Committed_AS: 5906472 kB
VmallocTotal: 34359738367 kB
VmallocUsed: 0 kB
VmallocChunk: 0 kB
HardwareCorrupted: 0 kB
AnonHugePages: 0 kB
ShmemHugePages: 0 kB
ShmemPmdMapped: 0 kB
CmaTotal: 0 kB
CmaFree: 0 kB
HugePages_Total: 0
HugePages_Free: 0
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 2048 kB
DirectMap4k: 161472 kB
DirectMap2M: 5081088 kB
DirectMap1G: 11534336 kB
The last top:
Code:
top - 14:04:30 up 2:59, 4 users, load average: 2.02, 1.72, 1.49
Tasks: 240 total, 2 running, 171 sleeping, 0 stopped, 0 zombie
%Cpu(s): 4.7 us, 0.8 sy, 0.0 ni, 93.4 id, 0.9 wa, 0.1 hi, 0.1 si, 0.0 st
KiB Mem : 16424720 total, 151184 free, 1651008 used, 14622528 buff/cache
KiB Swap: 33554428 total, 33554428 free, 0 used. 14567948 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
5440 kaza 20 0 9312356 584156 168004 S 52.9 3.6 4:23.97 firefox
5509 kaza 20 0 2651412 460716 182280 S 17.6 2.8 4:18.17 Web Content
1048 root 20 0 435780 114984 86776 S 11.8 0.7 2:45.53 Xorg
3268 root 20 0 128808 3412 2644 D 5.9 0.0 1:22.58 tar
6283 root 20 0 164072 4356 3692 R 5.9 0.0 0:00.01 top
1 root 20 0 223924 10420 6720 S 0.0 0.1 0:01.65 systemd
2 root 20 0 0 0 0 S 0.0 0.0 0:00.01 kthreadd
6 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 mm_percpu_wq
7 root 20 0 0 0 0 S 0.0 0.0 0:00.05 ksoftirqd/0
8 root 20 0 0 0 0 I 0.0 0.0 0:00.91 rcu_sched
9 root 20 0 0 0 0 I 0.0 0.0 0:00.00 rcu_bh
10 root rt 0 0 0 0 S 0.0 0.0 0:00.00 migration/0
11 root rt 0 0 0 0 S 0.0 0.0 0:00.00 watchdog/0
12 root 20 0 0 0 0 S 0.0 0.0 0:00.00 cpuhp/0
13 root 20 0 0 0 0 S 0.0 0.0 0:00.00 cpuhp/1
14 root rt 0 0 0 0 S 0.0 0.0 0:00.00 watchdog/1
15 root rt 0 0 0 0 S 0.0 0.0 0:00.00 migration/1
16 root 20 0 0 0 0 S 0.0 0.0 0:00.02 ksoftirqd/1
19 root 20 0 0 0 0 S 0.0 0.0 0:00.00 cpuhp/2
20 root rt 0 0 0 0 S 0.0 0.0 0:00.00 watchdog/2
21 root rt 0 0 0 0 S 0.0 0.0 0:00.00 migration/2
22 root 20 0 0 0 0 S 0.0 0.0 0:00.09 ksoftirqd/2
25 root 20 0 0 0 0 S 0.0 0.0 0:00.00 cpuhp/3
26 root rt 0 0 0 0 S 0.0 0.0 0:00.00 watchdog/3
27 root rt 0 0 0 0 S 0.0 0.0 0:00.00 migration/3
28 root 20 0 0 0 0 S 0.0 0.0 0:00.04 ksoftirqd/3
31 root 20 0 0 0 0 S 0.0 0.0 0:00.00 kdevtmpfs
32 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 netns
33 root 20 0 0 0 0 S 0.0 0.0 0:00.00 rcu_tasks_kthre
34 root 20 0 0 0 0 S 0.0 0.0 0:00.00 kauditd
36 root 20 0 0 0 0 I 0.0 0.0 0:00.40 kworker/1:1
37 root 20 0 0 0 0 S 0.0 0.0 0:00.00 oom_reaper
38 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 writeback
39 root 20 0 0 0 0 S 0.0 0.0 0:01.87 kcompactd0
40 root 25 5 0 0 0 S 0.0 0.0 0:00.00 ksmd
41 root 39 19 0 0 0 S 0.0 0.0 0:00.00 khugepaged
42 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 crypto
43 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 kintegrityd
44 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 kblockd
46 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 ata_sff
47 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 md
48 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 edac-poller
49 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 devfreq_wq
50 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 watchdogd
52 root 20 0 0 0 0 S 0.0 0.0 0:06.16 kswapd0
101 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 kthrotld
103 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 acpi_thermal_pm
104 root 20 0 0 0 0 S 0.0 0.0 0:00.00 scsi_eh_0
105 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 scsi_tmf_0
106 root 20 0 0 0 0 S 0.0 0.0 0:00.00 scsi_eh_1
107 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 scsi_tmf_1
108 root 20 0 0 0 0 S 0.0 0.0 0:00.00 scsi_eh_2
109 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 scsi_tmf_2
110 root 20 0 0 0 0 S 0.0 0.0 0:00.00 scsi_eh_3
111 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 scsi_tmf_3
117 root 20 0 0 0 0 I 0.0 0.0 0:00.00 kworker/3:2
118 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 dm_bufio_cache
119 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 ipv6_addrconf
125 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 kstrp
338 root 20 0 0 0 0 S 0.0 0.0 0:00.01 scsi_eh_4
340 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 scsi_tmf_4
342 root 20 0 0 0 0 S 0.0 0.0 0:00.01 scsi_eh_5
343 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 scsi_tmf_5
344 root 20 0 0 0 0 S 0.0 0.0 0:00.00 scsi_eh_6
345 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 scsi_tmf_6
346 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 firewire
347 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 firewire_ohci
355 root 20 0 0 0 0 S 0.0 0.0 0:00.00 scsi_eh_7
357 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 scsi_tmf_7
358 root 20 0 0 0 0 S 0.0 0.0 0:00.02 scsi_eh_8
360 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 scsi_tmf_8
361 root 20 0 0 0 0 S 0.0 0.0 0:00.01 scsi_eh_9
362 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 scsi_tmf_9
366 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 ttm_swap
368 root 20 0 0 0 0 S 0.0 0.0 0:00.02 aacraid
451 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 kdmflush
452 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 bioset
461 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 kdmflush
462 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 bioset
480 root 20 0 0 0 0 S 0.0 0.0 0:00.10 jbd2/dm-0-8
481 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 ext4-rsv-conver
564 root 20 0 156516 39052 38320 S 0.0 0.2 0:00.77 systemd-journal
585 root 20 0 58032 7752 3696 S 0.0 0.0 0:00.78 systemd-udevd
609 root 0 -20 0 0 0 I 0.0 0.0 0:00.71 kworker/1:1H
660 root 20 0 0 0 0 S 0.0 0.0 0:00.00 saa7134[0]
709 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 kdmflush
710 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 bioset
716 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 kdmflush
717 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 bioset
733 root 20 0 0 0 0 S 0.0 0.0 0:00.33 jbd2/dm-2-8
734 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 ext4-rsv-conver
736 root 20 0 0 0 0 S 0.0 0.0 0:00.02 jbd2/dm-3-8
737 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 ext4-rsv-conver
741 root 20 0 0 0 0 S 0.0 0.0 0:00.00 jbd2/sda1-8
742 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 ext4-rsv-conver
764 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 rpciod
765 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 xprtiod
768 root 16 -4 64048 1744 1328 S 0.0 0.0 0:00.01 auditd
789 root 20 0 25448 4188 3480 S 0.0 0.0 0:00.02 smartd
791 root 20 0 278072 8652 7640 S 0.0 0.1 0:00.03 sssd
792 root 20 0 723736 21832 21104 S 0.0 0.1 0:00.35 rsyslogd
793 dbus 20 0 80016 5964 4256 S 0.0 0.0 0:00.60 dbus-daemon
799 root 20 0 204764 3376 2924 S 0.0 0.0 0:00.01 gssproxy
805 root 20 0 284340 9048 7680 S 0.0 0.1 0:00.25 sssd_be
807 root 20 0 353296 32496 13124 S 0.0 0.2 0:00.91 firewalld
810 rtkit 21 1 186040 3340 3012 S 0.0 0.0 0:00.06 rtkit-daemon
811 root 20 0 99928 3096 2788 S 0.0 0.0 0:00.51 irqbalance
812 root 39 19 17140 1564 1420 S 0.0 0.0 0:00.00 alsactl
815 avahi 20 0 72228 3640 3284 S 0.0 0.0 0:00.02 avahi-daemon
816 root 20 0 415128 8540 6844 S 0.0 0.1 0:00.05 ModemManager
827 root 20 0 463960 9344 7980 S 0.0 0.1 0:00.03 abrtd
828 root 20 0 277656 10368 9124 S 0.0 0.1 0:00.25 sssd_nss
837 chrony 20 0 122508 3632 3176 S 0.0 0.0 0:02.13 chronyd
842 avahi 20 0 72228 356 0 S 0.0 0.0 0:00.00 avahi-daemon
846 root 20 0 71360 7476 5480 S 0.0 0.0 0:00.11 systemd-logind
847 root 20 0 417704 7680 6492 S 0.0 0.0 0:00.20 accounts-daemon
889 polkitd 20 0 559128 16472 10548 S 0.0 0.1 0:00.41 polkitd
897 root 20 0 762516 16092 14940 S 0.0 0.1 0:00.07 abrt-dump-journ
898 root 20 0 768956 9724 8452 S 0.0 0.1 0:00.04 abrt-dump-journ
899 root 20 0 768880 11020 9792 S 0.0 0.1 0:00.04 abrt-dump-journ
961 root 20 0 793264 16692 13892 S 0.0 0.1 0:00.42 NetworkManager
979 root 20 0 231732 10316 7940 S 0.0 0.1 0:00.07 cupsd
991 root 20 0 42588 2380 2176 S 0.0 0.0 0:00.00 atd
992 root 20 0 143312 3392 2772 S 0.0 0.0 0:00.32 crond
995 root 20 0 314032 14664 13336 S 0.0 0.1 0:00.04 sddm
1004 colord 20 0 428356 13264 8520 S 0.0 0.1 0:01.36 colord
1166 root 20 0 77096 8564 6660 S 0.0 0.1 0:00.03 dhclient
1238 root 20 0 477180 9348 7424 S 0.0 0.1 0:00.16 udisksd
1244 root 20 0 424228 8856 7632 S 0.0 0.1 0:00.12 upowerd
1268 root 20 0 324420 15384 13840 S 0.0 0.1 0:00.38 sddm-helper
1272 kaza 20 0 82900 8040 6756 S 0.0 0.0 0:00.17 systemd
1274 kaza 20 0 133028 3824 20 S 0.0 0.0 0:00.00 (sd-pam)
1288 kaza 20 0 431380 34852 30456 S 0.0 0.2 0:00.34 kwalletd
1289 kaza 20 0 140564 3088 2748 S 0.0 0.0 0:00.06 tcsh
1309 kaza 20 0 73752 6100 4104 S 0.0 0.0 0:01.04 dbus-daemon
1315 kaza 20 0 70160 788 0 S 0.0 0.0 0:00.01 ssh-agent
1329 kaza 20 0 120032 3152 2876 S 0.0 0.0 0:00.00 startkde
1356 kaza 20 0 13388 164 0 S 0.0 0.0 0:00.00 start_kdeinit
1357 kaza 20 0 263452 8312 6152 S 0.0 0.1 0:00.05 kdeinit5
1358 kaza 20 0 627632 35740 31484 S 0.0 0.2 0:00.68 klauncher
1361 kaza 20 0 1641360 71792 58416 S 0.0 0.4 0:02.15 kded5
1370 kaza 20 0 650680 33796 28936 S 0.0 0.2 0:00.59 kaccess
1384 kaza 20 0 190216 5588 4548 S 0.0 0.0 0:00.16 dconf-service
1386 kaza 20 0 88636 5368 4824 S 0.0 0.0 0:00.00 kwrapper5
1387 kaza 20 0 754148 43704 38012 S 0.0 0.3 0:01.07 ksmserver
1391 kaza 20 0 630112 35464 30036 S 0.0 0.2 0:00.66 kglobalaccel5
1404 kaza 20 0 401252 11596 10064 S 0.0 0.1 0:00.07 mission-control
1417 kaza 20 0 472428 18832 16808 S 0.0 0.1 0:00.42 kscreen_backend
1441 kaza 20 0 3321220 92032 71092 S 0.0 0.6 1:28.86 kwin_x11
1459 kaza 20 0 729336 44600 37776 S 0.0 0.3 0:01.32 kdeconnectd
1460 kaza 20 0 963324 85892 65972 S 0.0 0.5 0:02.11 krunner
1462 kaza 20 0 3992484 207080 99152 S 0.0 1.3 0:32.86 plasmashell
1463 kaza 20 0 805160 39308 34600 S 0.0 0.2 0:00.56 polkit-kde-auth
1464 kaza 20 0 638296 33984 29832 S 0.0 0.2 0:00.64 xembedsniproxy
1478 kaza 20 0 351352 21428 18108 S 0.0 0.1 0:16.42 gkrellm
1496 kaza 9 -11 1620020 15780 12184 S 0.0 0.1 0:12.25 pulseaudio
1539 kaza 20 0 22624 2772 2612 S 0.0 0.0 0:00.00 gam_server
1542 kaza 20 0 1021752 42008 34924 S 0.0 0.3 0:00.91 kactivitymanage
1583 kaza 20 0 838068 73940 61980 S 0.0 0.5 0:12.54 konsole
1584 kaza 20 0 690960 40776 33592 S 0.0 0.2 0:00.17 abrt-applet
1586 kaza 20 0 1201092 90368 75448 S 0.0 0.6 0:01.24 korgac
1587 kaza 20 0 814888 39964 35136 S 0.0 0.2 0:00.72 org_kde_powerde
1619 kaza 20 0 346932 5832 5328 S 0.0 0.0 0:00.01 at-spi-bus-laun
1628 kaza 20 0 72352 4564 3952 S 0.0 0.0 0:00.15 dbus-daemon
1631 kaza 20 0 220596 5496 4880 S 0.0 0.0 0:00.58 at-spi2-registr
1642 kaza 20 0 134576 3636 2840 S 0.0 0.0 0:00.03 tcsh
1659 kaza 20 0 398016 6376 5664 S 0.0 0.0 0:00.01 gvfsd
1661 kaza 20 0 164072 4316 3616 R 0.0 0.0 0:23.97 top
1666 root 20 0 360536 9776 8456 S 0.0 0.1 0:00.06 abrt-dbus
1716 kaza 20 0 637196 35604 31140 S 0.0 0.2 0:01.26 akonadi_control
1720 kaza 20 0 2649920 32280 27976 S 0.0 0.2 0:00.32 akonadiserver
1724 kaza 20 0 636264 67756 17464 S 0.0 0.4 0:03.66 mysqld
1767 kaza 20 0 732156 41780 36956 S 0.0 0.3 0:00.69 akonadi_akonote
1769 kaza 20 0 1234240 90760 75972 S 0.0 0.6 0:01.17 akonadi_archive
1771 kaza 20 0 745184 43252 38228 S 0.0 0.3 0:00.76 akonadi_birthda
1773 kaza 20 0 823712 44768 39644 S 0.0 0.3 0:01.37 akonadi_followu
1775 kaza 20 0 841468 45404 40144 S 0.0 0.3 0:01.37 akonadi_ical_re
1777 kaza 39 19 761776 45936 39804 S 0.0 0.3 0:00.81 akonadi_indexin
1781 kaza 20 0 732148 41768 36948 S 0.0 0.3 0:00.70 akonadi_maildir
1782 kaza 20 0 811204 42336 37352 S 0.0 0.3 0:01.28 akonadi_maildis
1783 kaza 20 0 1281160 93500 78284 S 0.0 0.6 0:01.30 akonadi_mailfil
1791 kaza 20 0 725196 40768 36040 S 0.0 0.2 0:00.66 akonadi_migrati
1798 kaza 20 0 1072820 82396 68312 S 0.0 0.5 0:00.95 akonadi_newmail
1800 kaza 20 0 1223656 88572 73884 S 0.0 0.5 0:01.74 akonadi_sendlat
1807 kaza 20 0 836684 44488 39548 S 0.0 0.3 0:01.37 akonadi_vcard_r
1810 kaza 20 0 762228 44004 38876 S 0.0 0.3 0:00.76 akonadi_vcard_r
1825 kaza 20 0 636776 37704 33384 S 0.0 0.2 0:00.59 kuiserver5
1904 root 20 0 554108 19752 16408 S 0.0 0.1 0:00.14 packagekitd
2048 kaza 20 0 564532 9120 8188 S 0.0 0.1 0:00.07 xdg-desktop-por
2053 kaza 20 0 715380 12240 10412 S 0.0 0.1 0:00.05 xdg-document-po
2056 kaza 20 0 491448 9944 8256 S 0.0 0.1 0:00.02 xdg-permission-
2073 kaza 20 0 603108 21548 18288 S 0.0 0.1 0:00.08 xdg-desktop-por
2078 kaza 20 0 1445556 37496 30616 S 0.0 0.2 0:00.24 evolution-sourc
2085 kaza 20 0 858616 29880 24892 S 0.0 0.2 0:00.09 goa-daemon
2093 kaza 20 0 526420 13120 11312 S 0.0 0.1 0:00.29 goa-identity-se
2098 kaza 20 0 1078472 49176 39676 S 0.0 0.3 0:00.38 evolution-alarm
2103 kaza 20 0 854588 80820 67652 S 0.0 0.5 0:05.62 konsole
2107 kaza 20 0 1053008 42196 33204 S 0.0 0.3 0:00.26 evolution-calen
2115 kaza 20 0 135152 4268 2888 S 0.0 0.0 0:00.17 tcsh
2145 kaza 20 0 1212464 38708 30372 S 0.0 0.2 0:00.20 evolution-calen
2155 kaza 20 0 1037380 37336 30796 S 0.0 0.2 0:00.22 evolution-addre
2158 kaza 20 0 1408884 40588 30544 S 0.0 0.2 0:00.25 evolution-calen
2186 kaza 20 0 1205792 37676 31280 S 0.0 0.2 0:00.17 evolution-addre
2203 kaza 20 0 474268 8816 7776 S 0.0 0.1 0:00.07 gnome-keyring-d
2688 root 20 0 268924 9604 7704 S 0.0 0.1 0:00.01 su
2697 root 20 0 143348 4172 3004 S 0.0 0.0 0:00.10 tcsh
2977 kaza 20 0 839744 76176 63852 S 0.0 0.5 0:03.29 konsole
2982 kaza 20 0 134580 3532 2712 S 0.0 0.0 0:00.04 tcsh
3009 root 20 0 268924 9648 7756 S 0.0 0.1 0:00.02 su
3018 root 20 0 143348 4216 3048 S 0.0 0.0 0:00.08 tcsh
3068 root 20 0 0 0 0 I 0.0 0.0 0:00.00 kworker/1:2
3106 kaza 20 0 839372 75728 63624 S 0.0 0.5 0:03.39 konsole
3111 kaza 20 0 134580 3576 2756 S 0.0 0.0 0:00.04 tcsh
3138 root 20 0 268924 9568 7672 S 0.0 0.1 0:00.01 su
3147 root 20 0 142952 3796 2988 S 0.0 0.0 0:00.04 tcsh
3215 root 20 0 136868 4608 4172 S 0.0 0.0 0:00.01 periodic_status
3263 root 20 0 136996 4644 4072 S 0.0 0.0 0:00.00 backup_tar
3267 root 20 0 120032 2580 2404 S 0.0 0.0 0:00.00 sh
3276 root 20 0 212352 29588 9764 S 0.0 0.2 0:04.52 nedit
3642 root 20 0 0 0 0 I 0.0 0.0 0:00.35 kworker/3:0
4283 kaza 20 0 186040 5792 5260 S 0.0 0.0 0:00.02 gconfd-2
4623 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 kworker/1:2H
4681 root 0 -20 0 0 0 I 0.0 0.0 0:00.29 kworker/3:1H
4889 root 0 -20 0 0 0 I 0.0 0.0 0:00.21 kworker/2:1H
4960 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 kworker/3:2H
5171 root 0 -20 0 0 0 I 0.0 0.0 0:00.10 kworker/0:2H
5338 root 20 0 0 0 0 I 0.0 0.0 0:00.15 kworker/2:0
5437 root 20 0 0 0 0 I 0.0 0.0 0:00.24 kworker/0:1
5694 kaza 20 0 2001556 85788 57868 S 0.0 0.5 0:00.72 Web Content
5809 root 20 0 0 0 0 I 0.0 0.0 0:00.00 kworker/0:0
5863 root 20 0 0 0 0 I 0.0 0.0 0:00.05 kworker/u8:0
6076 root 20 0 0 0 0 I 0.0 0.0 0:00.01 kworker/u8:2
6114 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 kworker/0:0H
6115 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 kworker/2:2H
6128 root 20 0 0 0 0 I 0.0 0.0 0:00.00 kworker/2:2
The "dmesg -c" output remained empty.
I also looked at /var/log/messages, I only see messages related to network manager
until slightly after 14:05:30 (3.5 min after the last write of tar log file)
I see a binary data in the file, probably that's when I powered down the PC
(couldn't do anything to properly shutdown, Ctrl+Alt+F3 didn't help and neither
the Alt+SysRq(Prt Scr)+R-E-I-S-U-B):
Code:
Oct 26 13:39:07 localhost dhclient[1166]: DHCPREQUEST on enp4s0 to 10.0.0.138 port 67 (xid=0x385ead74)
Oct 26 13:39:08 localhost dhclient[1166]: DHCPACK from 10.0.0.138 (xid=0x385ead74)
Oct 26 13:39:08 localhost NetworkManager[961]: <info> [1540550348.0764] dhcp4 (enp4s0): address 10.0.0.1
Oct 26 13:39:08 localhost NetworkManager[961]: <info> [1540550348.0769] dhcp4 (enp4s0): plen 24 (255.255.255.0)
Oct 26 13:39:08 localhost dbus-daemon[793]: [system] Activating via systemd: service name='org.freedesktop.nm_dispatcher' unit='dbus-org.freedesktop.nm-dispatcher.service' requested by ':1.9' (uid=0 pid=961 comm="/usr/sbin/NetworkManager --no-daemon " label="system_u:system_r:NetworkManager_t:s0")
Oct 26 13:39:08 localhost NetworkManager[961]: <info> [1540550348.0770] dhcp4 (enp4s0): gateway 10.0.0.138
Oct 26 13:39:08 localhost audit: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=NetworkManager-dispatcher comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Oct 26 13:39:08 localhost dhclient[1166]: bound to 10.0.0.1 -- renewal in 1556 seconds.
Oct 26 13:39:08 localhost NetworkManager[961]: <info> [1540550348.0770] dhcp4 (enp4s0): lease time 3600
Oct 26 13:39:08 localhost dbus-daemon[793]: [system] Successfully activated service 'org.freedesktop.nm_dispatcher'
Oct 26 13:39:08 localhost NetworkManager[961]: <info> [1540550348.0770] dhcp4 (enp4s0): nameserver '10.0.0.138'
Oct 26 13:39:08 localhost nm-dispatcher: req:1 'dhcp4-change' [enp4s0]: new request (6 scripts)
Oct 26 13:39:08 localhost NetworkManager[961]: <info> [1540550348.0770] dhcp4 (enp4s0): domain name 'Home'
Oct 26 13:39:08 localhost nm-dispatcher: req:1 'dhcp4-change' [enp4s0]: start running ordered scripts...
Oct 26 13:39:08 localhost NetworkManager[961]: <info> [1540550348.0770] dhcp4 (enp4s0): state changed bound -> bound
Oct 26 13:39:08 localhost systemd: Starting Network Manager Script Dispatcher Service...
Oct 26 13:39:08 localhost systemd: Started Network Manager Script Dispatcher Service.
Oct 26 13:39:18 localhost audit: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=NetworkManager-dispatcher comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Oct 26 13:53:10 localhost plasmashell: QXcbConnection: XCB error: 2 (BadValue), sequence: 37067, resource id: 98566161, major code: 141 (Unknown), minor code: 3
Oct 26 13:53:10 localhost plasmashell: QXcbConnection: XCB error: 2 (BadValue), sequence: 37133, resource id: 104857975, major code: 141 (Unknown), minor code: 3
Oct 26 14:01:27 localhost cupsd: REQUEST localhost - - "POST / HTTP/1.1" 200 183 Renew-Subscription successful-ok
Oct 26 14:01:59 localhost plasmashell: QXcbConnection: XCB error: 2 (BadValue), sequence: 40626, resource id: 98566161, major code: 141 (Unknown), minor code: 3
Oct 26 14:01:59 localhost plasmashell: QXcbConnection: XCB error: 2 (BadValue), sequence: 40644, resource id: 102760454, major code: 141 (Unknown), minor code: 3
Oct 26 14:01:59 localhost plasmashell: QXcbConnection: XCB error: 2 (BadValue), sequence: 40660, resource id: 104857975, major code: 141 (Unknown), minor code: 3
Oct 26 14:01:59 localhost plasmashell: QXcbConnection: XCB error: 2 (BadValue), sequence: 40680, resource id: 102760454, major code: 141 (Unknown), minor code: 3
Oct 26 14:02:00 localhost plasmashell: QXcbConnection: XCB error: 2 (BadValue), sequence: 40721, resource id: 104857975, major code: 141 (Unknown), minor code: 3
Oct 26 14:05:04 localhost dhclient[1166]: DHCPREQUEST on enp4s0 to 10.0.0.138 port 67 (xid=0x385ead74)
Oct 26 14:05:04 localhost dhclient[1166]: DHCPACK from 10.0.0.138 (xid=0x385ead74)
Oct 26 14:05:05 localhost NetworkManager[961]: <info> [1540551905.0447] dhcp4 (enp4s0): address 10.0.0.1
Oct 26 14:05:05 localhost NetworkManager[961]: <info> [1540551905.0451] dhcp4 (enp4s0): plen 24 (255.255.255.0)
Oct 26 14:05:05 localhost audit: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=NetworkManager-dispatcher comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Oct 26 14:05:05 localhost dbus-daemon[793]: [system] Activating via systemd: service name='org.freedesktop.nm_dispatcher' unit='dbus-org.freedesktop.nm-dispatcher.service' requested by ':1.9' (uid=0 pid=961 comm="/usr/sbin/NetworkManager --no-daemon " label="system_u:system_r:NetworkManager_t:s0")
Oct 26 14:05:05 localhost NetworkManager[961]: <info> [1540551905.0451] dhcp4 (enp4s0): gateway 10.0.0.138
Oct 26 14:05:05 localhost dhclient[1166]: bound to 10.0.0.1 -- renewal in 1392 seconds.
Oct 26 14:05:05 localhost NetworkManager[961]: <info> [1540551905.0451] dhcp4 (enp4s0): lease time 3600
Oct 26 14:05:05 localhost dbus-daemon[793]: [system] Successfully activated service 'org.freedesktop.nm_dispatcher'
Oct 26 14:05:05 localhost NetworkManager[961]: <info> [1540551905.0451] dhcp4 (enp4s0): nameserver '10.0.0.138'
Oct 26 14:05:05 localhost nm-dispatcher: req:1 'dhcp4-change' [enp4s0]: new request (6 scripts)
Oct 26 14:05:05 localhost NetworkManager[961]: <info> [1540551905.0451] dhcp4 (enp4s0): domain name 'Home'
Oct 26 14:05:05 localhost nm-dispatcher: req:1 'dhcp4-change' [enp4s0]: start running ordered scripts...
Oct 26 14:05:05 localhost NetworkManager[961]: <info> [1540551905.0452] dhcp4 (enp4s0): state changed bound -> bound
Oct 26 14:05:05 localhost systemd: Starting Network Manager Script Dispatcher Service...
Oct 26 14:05:05 localhost systemd: Started Network Manager Script Dispatcher Service.
Oct 26 14:05:15 localhost audit: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=NetworkManager-dispatcher comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Oct 26 14:05:30 localhost chronyd[837]: Source 2001:3a0:0:2006::87:123 replaced with 2a02:2a50:6::123
<nul>...<nul> <-- Few hundreds of bytes of binary data
Oct 26 14:05:30 localhost chronyd[837]: Source 2001:3a0:0:2006::87:123 replaced with 2a02:2a50:6::123
Oct 26 14:20:31 localhost kernel: Linux version 4.15.6-200.spi_gpio.fc26.x86_64 (kaza@localhost.localdomain) (gcc version 7.3.1 20180130 (Red Hat 7.3.1-2) (GCC)) #1 SMP Sat Mar 3 02:13:51 IST 2018
Oct 26 14:20:31 localhost kernel: Command line: BOOT_IMAGE=/vmlinuz-4.15.6-200.spi_gpio.fc26.x86_64 root=/dev/mapper/VolGroup-lv_root ro rd.lvm.lv=VolGroup/lv_root rd.lvm.lv=VolGroup/lv_swap rhgb quiet LANG=en_US.UTF-8
Oct 26 14:20:31 localhost kernel: x86/fpu: x87 FPU will use FXSAVE
Oct 26 14:20:31 localhost kernel: e820: BIOS-provided physical RAM map:
Oct 26 14:20:31 localhost kernel: BIOS-e820: [mem 0x0000000000000000-0x000000000009bfff] usable
Oct 26 14:20:31 localhost kernel: BIOS-e820: [mem 0x000000000009c000-0x000000000009ffff] reserved
Oct 26 14:20:31 localhost kernel: BIOS-e820: [mem 0x00000000000e4000-0x00000000000fffff] reserved
Oct 26 14:20:31 localhost kernel: BIOS-e820: [mem 0x0000000000100000-0x00000000cffaffff] usable
Oct 26 14:20:31 localhost kernel: BIOS-e820: [mem 0x00000000cffb0000-0x00000000cffbdfff] ACPI data
Oct 26 14:20:31 localhost kernel: BIOS-e820: [mem 0x00000000cffbe000-0x00000000cffdffff] ACPI NVS
Oct 26 14:20:31 localhost kernel: BIOS-e820: [mem 0x00000000cffe0000-0x00000000cfffffff] reserved
Oct 26 14:20:31 localhost kernel: BIOS-e820: [mem 0x00000000fff00000-0x00000000ffffffff] reserved
Oct 26 14:20:31 localhost kernel: BIOS-e820: [mem 0x0000000100000000-0x000000042fffffff] usable
What can you see in the logs that I can't?
TIA,
kaza.
|
|
|
10-26-2018, 01:55 PM
|
#8
|
Member
Registered: Apr 2010
Distribution: FC17
Posts: 383
Original Poster
Rep:
|
Some more info: I made one more experiment, after installing "iotop" and adding its output
to the list of saved data every 10 minutes. I also changed the cartrige.
Again, after over 2 hours the tape stopped. This time I decided to do nothing for more
than 10 minutes hoping the periodic data save will make one more save and I'll see something in the
last saved group of files. After waiting for more than 10 minutes I decided, instead of like
previously dealing with "nedit" of the tar log file to bring to the top the konsole where
I called the "tar" command and pressed "Ctrl+C" - indeed, the tape made few movements and
after some time the prompt returned. I thought: "ok, maybe if the tar process no longer
requests I/O everything will be OK now". However, when I attempted "ls" in the same konsole
it hang and at the same time the windows of "top", "iotop" and "gkrellm" stopped being
updated (they were updating during the over 10 minutes period from the tape stopping).
The unobscured part of the "top" window again showed "wa" over 70%. In the "iotop" window
there was no activity. Also, since the tape stopped the line with the "tar" command didn't
appear in the "iotop" window.
Together with the frozen "top", "iotop" and "gkrellm" the whole GUI froze - again, not even
the Alt+SysRq+R-E-I-S-U-B could get it to reboot and again I had to power the PC down.
Afer rebooting I looked at the last saved files and saw that during the over 10 minues period
from tape stop to the GUI freeze the periodic files save wasn't performed: the last saved
file timestamp was earlier than the write time of the "tar" command log file.
So far the facts, now I'll speculate, correct me if I'm wrong.
It seems to me that at some point the tape makes some I/O request which can't be completed
(70+% of "wa"). Other processes can continue running as long as they don't need to access
the filesystem. That's why the screens of "top", "iotop", "gkrellm" continue to be updated.
But as soon as something needs to access the filesystem (like "ls") this new request is blocked
until the original tape I/O request completes (which never happens) and thus the whole GUI
freezes.
I wonder what would happen if instead of calling the "tar" command directly I'll call it
through "ionice -c 3" ?
It seems very strange to me that the queueing system of the I/O requests can hang if one device
I/O can't be completed, I would expect some "timeout" allowing later requests to be performed.
But I don't have enough knowledge of the LINUX disk I/O system
to answer that question.
Can anyone explain (or point to a knowledge source) how the LINUX disk I/O system queueing works?
TIA,
kaza.
|
|
|
10-30-2018, 06:22 AM
|
#9
|
Member
Registered: Apr 2010
Distribution: FC17
Posts: 383
Original Poster
Rep:
|
I finally caught the error soon enough to save some relevant info and it doesn't look good.
I managed to run "dmesg -c>filename" just as I heared the beep but before
the GUI stuck and here's what I saw later after having to power down the PC and to reboot:
Code:
[ 4109.872050] sky2 0000:04:00.0 enp4s0: Link is down
[ 4127.824514] sky2 0000:04:00.0 enp4s0: Link is up at 100 Mbps, full duplex, flow control rx
[ 4139.334846] sky2 0000:04:00.0 enp4s0: Link is down
[ 4141.794760] sky2 0000:04:00.0 enp4s0: Link is up at 100 Mbps, full duplex, flow control rx
[ 8832.460042] sd 7:0:1:0: [sdb] tag#1 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[ 8832.460053] sd 7:0:1:0: [sdb] tag#1 Sense Key : Hardware Error [current]
[ 8832.460057] sd 7:0:1:0: [sdb] tag#1 Add. Sense: Internal target failure
[ 8832.460061] sd 7:0:1:0: [sdb] tag#1 CDB: Read(16) 88 00 00 00 00 00 5c 58 3d e0 00 00 01 00 00 00
[ 8832.460065] print_req_error: critical target error, dev sdb, sector 1549286880
[ 8832.490713] sd 7:0:1:0: [sdb] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[ 8832.490719] sd 7:0:1:0: [sdb] tag#0 Sense Key : Hardware Error [current]
[ 8832.490723] sd 7:0:1:0: [sdb] tag#0 Add. Sense: Internal target failure
[ 8832.490727] sd 7:0:1:0: [sdb] tag#0 CDB: Read(16) 88 00 00 00 00 00 5c 58 3e e0 00 00 01 00 00 00
[ 8832.490730] print_req_error: critical target error, dev sdb, sector 1549287136
[ 8854.477447] sd 7:0:1:0: [sdb] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[ 8854.477455] sd 7:0:1:0: [sdb] tag#0 Sense Key : Hardware Error [current]
[ 8854.477459] sd 7:0:1:0: [sdb] tag#0 Add. Sense: Internal target failure
[ 8854.477463] sd 7:0:1:0: [sdb] tag#0 CDB: Read(16) 88 00 00 00 00 00 5c 58 3e 00 00 00 00 08 00 00
[ 8854.477466] print_req_error: critical target error, dev sdb, sector 1549286912
[ 8857.893105] blk_partition_remap: fail for partition 1
[ 8857.893113] blk_partition_remap: fail for partition 1
[ 8858.842183] blk_partition_remap: fail for partition 1
[ 8858.842192] blk_partition_remap: fail for partition 1
[ 8859.722773] scsi 7:1:3:0: Direct-Access TOSHIBA MK2001TRKB 0105 PQ: 0 ANSI: 5
[ 8859.724433] scsi 7:1:3:0: Attached scsi generic sg2 type 0
[ 8860.188272] blk_partition_remap: fail for partition 1
[ 8860.188280] blk_partition_remap: fail for partition 1
[ 8861.652640] blk_partition_remap: fail for partition 1
[ 8861.652648] blk_partition_remap: fail for partition 1
[ 8861.652889] blk_partition_remap: fail for partition 1
[ 8861.652894] blk_partition_remap: fail for partition 1
[ 8861.652898] blk_partition_remap: fail for partition 1
[ 8861.652902] blk_partition_remap: fail for partition 1
[ 8861.652905] blk_partition_remap: fail for partition 1
[ 8863.513851] blk_partition_remap: fail for partition 1
[ 8863.513857] blk_partition_remap: fail for partition 1
[ 8863.513859] blk_partition_remap: fail for partition 1
[ 8863.513860] blk_partition_remap: fail for partition 1
[ 8863.513864] Aborting journal on device dm-3-8.
[ 8863.513870] blk_partition_remap: fail for partition 1
[ 8863.513873] Buffer I/O error on dev dm-3, logical block 487096320, lost sync page write
[ 8863.513876] JBD2: Error -5 detected when updating journal superblock for dm-3-8.
[ 8870.622855] blk_partition_remap: fail for partition 1
[ 8870.622865] blk_partition_remap: fail for partition 1
[ 8870.622868] blk_partition_remap: fail for partition 1
[ 8870.622872] blk_partition_remap: fail for partition 1
[ 8870.622876] blk_partition_remap: fail for partition 1
[ 8870.622887] blk_partition_remap: fail for partition 1
[ 8870.622890] Buffer I/O error on dev dm-3, logical block 0, lost sync page write
[ 8870.622897] EXT4-fs error (device dm-3): ext4_journal_check_start:61: Detected aborted journal
[ 8870.622899] EXT4-fs (dm-3): Remounting filesystem read-only
[ 8870.622902] EXT4-fs (dm-3): previous I/O error to superblock detected
[ 8870.622904] blk_partition_remap: fail for partition 1
[ 8870.622905] Buffer I/O error on dev dm-3, logical block 0, lost sync page write
[ 8928.949870] blk_partition_remap: fail for partition 1
[ 8928.949878] blk_partition_remap: fail for partition 1
[ 8928.950020] blk_partition_remap: fail for partition 1
[ 8928.950026] blk_partition_remap: fail for partition 1
[ 8928.950030] blk_partition_remap: fail for partition 1
[ 8928.950033] blk_partition_remap: fail for partition 1
[ 8928.950037] blk_partition_remap: fail for partition 1
[ 8941.084894] blk_partition_remap: fail for partition 1
[ 8941.084914] blk_partition_remap: fail for partition 1
[ 8941.084918] blk_partition_remap: fail for partition 1
[ 8941.084922] blk_partition_remap: fail for partition 1
[ 8941.084926] blk_partition_remap: fail for partition 1
[ 8944.289808] scsi 7:0:1:0: Direct-Access ICP KA_RAID_1 V1.0 PQ: 0 ANSI: 2
[ 8944.290018] sd 7:0:1:0: [sdc] Very big device. Trying to use READ CAPACITY(16).
[ 8944.290036] sd 7:0:1:0: [sdc] 7801405440 512-byte logical blocks: (3.99 TB/3.63 TiB)
[ 8944.290047] sd 7:0:1:0: [sdc] Write Protect is off
[ 8944.290049] sd 7:0:1:0: [sdc] Mode Sense: 12 00 10 08
[ 8944.290067] sd 7:0:1:0: [sdc] Write cache: disabled, read cache: enabled, supports DPO and FUA
[ 8944.290203] sd 7:0:1:0: Attached scsi generic sg6 type 0
[ 8944.290450] sd 7:0:1:0: [sdc] Very big device. Trying to use READ CAPACITY(16).
[ 8944.325562] sdc: sdc1
[ 8944.325926] sd 7:0:1:0: [sdc] Very big device. Trying to use READ CAPACITY(16).
[ 8944.326030] sd 7:0:1:0: [sdc] Attached SCSI removable disk
[ 8944.618268] blk_partition_remap: fail for partition 1
[ 8944.618276] blk_partition_remap: fail for partition 1
[ 8944.618406] blk_partition_remap: fail for partition 1
[ 8944.618411] blk_partition_remap: fail for partition 1
[ 8944.618415] blk_partition_remap: fail for partition 1
[ 8944.618418] blk_partition_remap: fail for partition 1
[ 8944.618422] blk_partition_remap: fail for partition 1
[ 8946.304142] blk_partition_remap: fail for partition 1
[ 8946.304151] blk_partition_remap: fail for partition 1
[ 8946.304154] blk_partition_remap: fail for partition 1
[ 8946.304158] blk_partition_remap: fail for partition 1
[ 8946.304161] blk_partition_remap: fail for partition 1
[ 8946.304165] blk_partition_remap: fail for partition 1
[ 8946.304168] blk_partition_remap: fail for partition 1
[ 8946.304177] blk_partition_remap: fail for partition 1
[ 8946.304181] blk_partition_remap: fail for partition 1
[ 8947.897137] blk_partition_remap: fail for partition 1
[ 8947.897159] blk_partition_remap: fail for partition 1
[ 8947.897175] blk_partition_remap: fail for partition 1
[ 8947.897183] blk_partition_remap: fail for partition 1
[ 8947.897401] blk_partition_remap: fail for partition 1
So after searching for the source of the problem at the backup tape the real source of it in the HDD,
a hardware "RAID 0" array of two 2TB SAS disks.
I looked at the log of the "tar" command to see in which directories are the files
with problems, added few directories to the "tar" exclude list and re-run the backup tar command.
So far I'm at the 9-th cartrige and it seems the backup advances normally.
After I'll finish (and verify) the backup I'm planing to run the "fsck /dev/sdb" but I'm worried
with regard of: after I'll "umount" the partition of that disk, will I be able to mount it
again and will most of the files remain? I guess some file (where the damaged sectors reside)
will be damaged but will the rest be OK?
TIA,
kaza.
|
|
|
10-30-2018, 07:39 AM
|
#10
|
LQ Veteran
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 21,446
|
Well done for sticking with it - and getting that log. Wasn't making a lot of sense prior to that.
fsck fixes filesystems. Often the errors are "transient" in the sense that (temporary) files that were open at the time of failure didn't close, and need to be cleaned up. No harm done.
Sometimes you get cross-linked inodes (for example) that can mean truncated files after "fixing". And sometimes you get a total basket case when the whole lot goes down the toilet. So it can be a bit of a crap-shoot.
Fortunately it's usually benign - but given your history, you may not be that lucky. fsck (for ext4) will attempt to save file fragments in lost+found - which may help you or may just mislead you. In severe cases I just restore and eat any data loss if my backups aren't sufficiently recent.
As a general rule, if lost+found is empty, your data (all of it in that filesystem) is probably safe.
|
|
|
10-31-2018, 02:30 PM
|
#11
|
Member
Registered: Apr 2010
Distribution: FC17
Posts: 383
Original Poster
Rep:
|
Hello syg00, thanks for the reply.
Yesterday I finished the backup and now I'm at the beginning of verify, I guess it'll take few days,
possible up to a week (life...). If the backup will be OK I'll run "fsck" "and may the SHWARTZ be with me" (where I
live there are lots of them). :-)
One thing isn't clear to me: when I'll run the "fsck" (after "umount") should I run it on the physical disk
/dev/sdb
or on it's LVM name:
/dev/mapper/VolGroup_graphics-lv_graphics
?
TIA,
kaza.
|
|
|
10-31-2018, 07:28 PM
|
#12
|
LQ Veteran
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 21,446
|
Where-ever the filesystem exists - fsck is a filesystem command. So, in this case the lv - /dev/mapper/VolGroup_graphics-lv_graphics
|
|
|
11-15-2018, 03:05 PM
|
#13
|
Member
Registered: Apr 2010
Distribution: FC17
Posts: 383
Original Poster
Rep:
|
Well, it took longer than I thought:
at first, while verifying, on the 8-th (out of 12)
cartriges (each - about 4-5 hours) I've got I/O errors from the tape
(this time - the tape, not the disk). So I replaced the 8-th cartrige
and re-run the backup and the verify and finally, was able to "umount"
the partition in question and to run the "fsck". However, the "fsck"
output looks like it didn't fins any errors:
Code:
<root localhost>.../temp>lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 1 273.2G 0 disk
├─sda1 8:1 1 1G 0 part /boot
└─sda2 8:2 1 262.2G 0 part
├─VolGroup-lv_root 253:0 0 99.4G 0 lvm /
├─VolGroup-lv_swap 253:1 0 32G 0 lvm [SWAP]
└─VolGroup-lv_home 253:3 0 130.9G 0 lvm /home
sdb 8:16 1 3.6T 0 disk
└─sdb1 8:17 1 3.6T 0 part
└─VolGroup_graphics-lv_graphics 253:2 0 3.6T 0 lvm /home/user/GRAPHICS
sr0 11:0 1 1024M 0 rom
<root localhost>.../temp>umount /home/user/GRAPHICS
<root localhost>.../temp>lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 1 273.2G 0 disk
├─sda1 8:1 1 1G 0 part /boot
└─sda2 8:2 1 262.2G 0 part
├─VolGroup-lv_root 253:0 0 99.4G 0 lvm /
├─VolGroup-lv_swap 253:1 0 32G 0 lvm [SWAP]
└─VolGroup-lv_home 253:3 0 130.9G 0 lvm /home
sdb 8:16 1 3.6T 0 disk
└─sdb1 8:17 1 3.6T 0 part
└─VolGroup_graphics-lv_graphics 253:2 0 3.6T 0 lvm
sr0 11:0 1 1024M 0 rom
<root localhost>.../temp>lvdisplay
--- Logical volume ---
LV Path /dev/VolGroup_graphics/lv_graphics
LV Name lv_graphics
VG Name VolGroup_graphics
LV UUID eAorkr-Lo1l-qadG-b0nW-AjnY-RWSi-uPFdf5
LV Write Access read/write
LV Creation host, time localhost.localdomain, 2012-10-19 01:46:42 +0200
LV Status available
# open 0
LV Size 3.63 TiB
Current LE 952319
Segments 1
Allocation inherit
Read ahead sectors auto
- currently set to 256
Block device 253:2
.
.
.
<root localhost>.../temp>fsck.ext4 /dev/VolGroup_graphics/lv_graphics
e2fsck 1.43.4 (31-Jan-2017)
/dev/VolGroup_graphics/lv_graphics: clean, 17126/243793920 files, 300877458/975174656 blocks
Why is that? I cleary remember (and have logs/dmesg )
that I had 3 disk errors stating that some sectors had a problem
so why the "fsck" doesn't indicate any problem found?
BTW, it took "fsck" about 1 Sec to check 4 TB partition, does this
short time reasonable?
TIA,
kaza.
|
|
|
11-16-2018, 07:14 AM
|
#14
|
Senior Member
Registered: Aug 2018
Location: Atlanta, GA - USA
Distribution: CentOS/RHEL, openSuSE/SLES, Ubuntu
Posts: 1,005
|
If you grep for sdb in you logs; you probably have more hits, so something like:
Code:
grep sdb /var/log/messages*
Is sdb in a software RAID? Can you paste output of "cat /proc/mdstat" and also of "pvdisplay"
|
|
|
11-17-2018, 02:59 PM
|
#15
|
Member
Registered: Apr 2010
Distribution: FC17
Posts: 383
Original Poster
Rep:
|
OK, here's the end of "grep sdb /var/log/messages*", Oct 28 is around when I had the disk errors,
later dates are during the successful backup and the verify:
Code:
/var/log/messages-20181104:Oct 28 21:39:29 localhost kernel: print_req_error: critical target error, dev sdb, sector 7488278528
/var/log/messages-20181104:Oct 28 21:39:29 localhost kernel: sd 7:0:1:0: [sdb] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
/var/log/messages-20181104:Oct 28 21:39:29 localhost kernel: sd 7:0:1:0: [sdb] tag#0 Sense Key : Hardware Error [current]
/var/log/messages-20181104:Oct 28 21:39:29 localhost kernel: sd 7:0:1:0: [sdb] tag#0 Add. Sense: Internal target failure
/var/log/messages-20181104:Oct 28 21:39:29 localhost kernel: sd 7:0:1:0: [sdb] tag#0 CDB: Read(16) 88 00 00 00 00 01 be 56 10 00 00 00 00 08 00 00
/var/log/messages-20181104:Oct 28 21:39:29 localhost kernel: print_req_error: critical target error, dev sdb, sector 7488278528
/var/log/messages-20181104:Nov 3 08:56:56 localhost kernel: sd 5:0:1:0: [sdb] Very big device. Trying to use READ CAPACITY(16).
/var/log/messages-20181104:Nov 3 08:56:56 localhost kernel: sd 5:0:1:0: [sdb] 7801405440 512-byte logical blocks: (3.99 TB/3.63 TiB)
/var/log/messages-20181104:Nov 3 08:56:56 localhost kernel: sd 5:0:1:0: [sdb] Write Protect is off
/var/log/messages-20181104:Nov 3 08:56:56 localhost kernel: sd 5:0:1:0: [sdb] Write cache: disabled, read cache: enabled, supports DPO and FUA
/var/log/messages-20181104:Nov 3 08:56:56 localhost kernel: sd 5:0:1:0: [sdb] Very big device. Trying to use READ CAPACITY(16).
/var/log/messages-20181104:Nov 3 08:56:56 localhost kernel: sdb: sdb1
/var/log/messages-20181104:Nov 3 08:56:56 localhost kernel: sd 5:0:1:0: [sdb] Very big device. Trying to use READ CAPACITY(16).
/var/log/messages-20181104:Nov 3 08:56:56 localhost kernel: sd 5:0:1:0: [sdb] Attached SCSI removable disk
/var/log/messages-20181104:Nov 3 08:57:11 localhost dracut-initqueue: Scanning devices sda2 sdb1 for LVM logical volumes VolGroup/lv_root VolGroup/lv_swap
/var/log/messages-20181104:Nov 3 08:57:19 localhost smartd[777]: Device: /dev/sdb, opened
/var/log/messages-20181104:Nov 3 08:57:19 localhost smartd[777]: Device: /dev/sdb, [ICP KA_RAID_1 V1.0], S/N: 8435327C, 3.99 TB
/var/log/messages-20181104:Nov 3 08:57:19 localhost smartd[777]: Device: /dev/sdb, is SMART capable. Adding to "monitor" list.
/var/log/messages-20181104:Nov 3 09:25:10 localhost kernel: sd 6:0:1:0: [sdb] Very big device. Trying to use READ CAPACITY(16).
/var/log/messages-20181104:Nov 3 09:25:10 localhost kernel: sd 6:0:1:0: [sdb] 7801405440 512-byte logical blocks: (3.99 TB/3.63 TiB)
/var/log/messages-20181104:Nov 3 09:25:10 localhost kernel: sd 6:0:1:0: [sdb] Write Protect is off
/var/log/messages-20181104:Nov 3 09:25:10 localhost kernel: sd 6:0:1:0: [sdb] Write cache: disabled, read cache: enabled, supports DPO and FUA
/var/log/messages-20181104:Nov 3 09:25:10 localhost kernel: sd 6:0:1:0: [sdb] Very big device. Trying to use READ CAPACITY(16).
/var/log/messages-20181104:Nov 3 09:25:11 localhost kernel: sdb: sdb1
/var/log/messages-20181104:Nov 3 09:25:11 localhost kernel: sd 6:0:1:0: [sdb] Very big device. Trying to use READ CAPACITY(16).
/var/log/messages-20181104:Nov 3 09:25:11 localhost kernel: sd 6:0:1:0: [sdb] Attached SCSI removable disk
/var/log/messages-20181104:Nov 3 09:25:25 localhost dracut-initqueue: Scanning devices sda2 sdb1 for LVM logical volumes VolGroup/lv_root VolGroup/lv_swap
/var/log/messages-20181104:Nov 3 09:25:36 localhost smartd[809]: Device: /dev/sdb, opened
/var/log/messages-20181104:Nov 3 09:25:36 localhost smartd[809]: Device: /dev/sdb, [ICP KA_RAID_1 V1.0], S/N: 8435327C, 3.99 TB
/var/log/messages-20181104:Nov 3 09:25:36 localhost smartd[809]: Device: /dev/sdb, is SMART capable. Adding to "monitor" list.
Code:
<root localhost.localdomain>.../root>cat /proc/mdstat
Personalities :
unused devices: <none>
[code]
<root localhost.localdomain>.../root>pvdisplay
--- Physical volume ---
PV Name /dev/sdb1
VG Name VolGroup_graphics
PV Size 3.63 TiB / not usable 2.00 MiB
Allocatable yes (but full)
PE Size 4.00 MiB
Total PE 952319
Free PE 0
Allocated PE 952319
PV UUID W71uV5-m17D-qugv-1RRf-ESAn-wexC-O1ZHhE
--- Physical volume ---
PV Name /dev/sda2
VG Name VolGroup
PV Size 262.24 GiB / not usable 23.00 MiB
Allocatable yes (but full)
PE Size 32.00 MiB
Total PE 8391
Free PE 0
Allocated PE 8391
PV UUID bWsE7m-zeBE-5wCi-BoL2-dL1i-0tFg-ClCH0z
[code]
Is there anything suspicious?
TIA,
kaza.
|
|
|
All times are GMT -5. The time now is 08:02 PM.
|
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.
|
Latest Threads
LQ News
|
|