Linux - NewbieThis Linux forum is for members that are new to Linux.
Just starting out and have a question?
If it is not in the man pages or the how-to's this is the place!
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
If their drive is faulty, then it would have happened anyway - systemd or not. Another idea is that they might be able to look in their /var/log/ folder on the drive in question (if it's still there), and see if there is any logs that could help, like /var/log/messages Then they could look for any messages from libata in that log.
We already looked for the systemd journal. There wasn't one. In a systemd distro, the other logs get written by plugging syslogd into a socket provided by journald. But it looks as if the panic occurred before anything could be written to disk at all.
There is perhaps one thing that could still be done: chroot from the live disc, and use apt to remove the corrupt systemd and install a good one. The running init process will be the systemd from the image, so removing the other one shouldn't cause any problems. What do you think of that?
But if the ultimate problem is a bad drive, then your idea of using smartctl to check it takes precedence over any attempt to fix corrupt files.
Distribution: Currently: OpenMandriva. Previously: openSUSE, PCLinuxOS, CentOS, among others over the years.
Posts: 3,881
Rep:
There's still /var/log/boot.log they can look at for any messages from libata - I'm using a "systemd distro" and that file exists on my system. And even when I was using CentOS 7 /var/log/messages as well as /var/log/boot.log still existed.
OK, Jake, let's try jsbjsb's suggestion. Boot from your live image and look at the tail end of /var/log/boot.log. I still don't think the messages file will help, but you can try.
@jsbjsb: of course the traditional log files still exist in most (all?) systemd distros. The point I was making is that they now get their data from a journald socket, so if there's nothing in the journal, there'll be nothing in the other files either.
Distribution: Currently: OpenMandriva. Previously: openSUSE, PCLinuxOS, CentOS, among others over the years.
Posts: 3,881
Rep:
Quote:
Originally Posted by hazel
...
@jsbjsb: of course the traditional log files still exist in most (all?) systemd distros. The point I was making is that they now get their data from a journald socket, so if there's nothing in the journal, there'll be nothing in the other files either.
They could also grep dmesg on their live system for any messages from libata
Perhaps it's worth looking at it's SMART status with smartctl. Use a "live system" running from a USB stick or similar, then run the following command and post the results using CODE tags;
I launched the command: smartctl -a /dev/sdb5 but the answer was command not found
Quote:
Another idea is that they might be able to look in their /var/log/ folder on the drive in question (if it's still there), and see if there is any logs that could help, like /var/log/messages Then they could look for any messages from libata in that log.
I don't found anything from "libata", is the next message something that can help to understand ?
Code:
ACPI Warning: SystemIO range 0x0000000000000428-0x000000000000042F conflicts with OpRegion 0x0000000000000400-0x000000000000047F (\PMIO) (20160831/utaddress-247)
What I like about you is that when you're asked to do something, you do it and post the results. I wish more newbies were like you!
I appreciate very much your help
Do you think that now is time to consider to reinstall the operating system ?
If the answer is yes, in this situation, do yo have any hint to do that ? There is something special that I have to take care ?
Distribution: Currently: OpenMandriva. Previously: openSUSE, PCLinuxOS, CentOS, among others over the years.
Posts: 3,881
Rep:
Quote:
Originally Posted by JakeJake
Thanks to you jsbjsb001 and hazel.
Sorry for the late of my answer.
I can't see anything from libata that indicates any problem with your drive.
Quote:
I launched the command: smartctl -a /dev/sdb5 but the answer was command not found
...
The smartmontools package isn't installed by default in some distributions. But given the situation, the easiest way to run smartctl would be to download a live system that includes that package by default. So have a look at the SystemRescueCd.
Also, you don't need to specify the partition as well to smartctl - you can just specify the node for the drive as a whole, eg. just "sdb", rather than "sdb5".
Do you think that now is time to consider to reinstall the operating system ? If the answer is yes, in this situation, do you have any hint to do that ? There is something special that I have to take care ?
Before you do that, I would consider just reinstalling systemd, since that seems to be the program that's crashing. You can't do that in a live system, but you could do it in chroot when booted from your installation image. You would then be using the kernel and systemd from the image, not the ones on your hard drive.
But first you can try using smartmontools to test out your drive. That certainly can't do any harm. If the drive really is failing, reinstalling the software won't help.
Reinstalling is easy. You just do the same as when you installed it the first time, then do a major update and restore the saved files from your home partition. But it's kind of like admitting defeat, so let's see if you can find another solution.
I create the usb stick with systemrescuecd and the command "smartctl -a /dev/sda5" give me this output:
Code:
[root@sysresccd ~]# smartctl -a /dev/sda5
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-4.19.34-1-lts] (local build)
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Family: Western Digital Scorpio Blue Serial ATA (AF)
Device Model: WDC WD7500BPVT-80HXZT3
Serial Number: WD-WXL1E61LVCS6
LU WWN Device Id: 5 0014ee 656e1bae3
Firmware Version: 01.01A01
User Capacity: 750,156,374,016 bytes [750 GB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Rotation Rate: 5400 rpm
Device is: In smartctl database [for details use: -P show]
ATA Version is: ATA8-ACS (minor revision not indicated)
SATA Version is: SATA 2.6, 3.0 Gb/s
Local Time is: Tue Aug 27 21:52:24 2019 UTC
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x00) Offline data collection activity
was never started.
Auto Offline Data Collection: Disabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: (15900) seconds.
Offline data collection
capabilities: (0x7b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Distribution: Currently: OpenMandriva. Previously: openSUSE, PCLinuxOS, CentOS, among others over the years.
Posts: 3,881
Rep:
Well, the good news is that smartctl hasn't seen, and isn't reporting any failed SMART attributes, so that's good news. But I was hoping that it would give some more information about the drive's attributes, which would give a clearer picture of exactly how healthy or otherwise it is or isn't.
While we haven't seen any indication that the drive is failing in any way; I'd want to be as sure as I could be before reinstalling any system on it. You can try running a long test on it with the command below - it should tell you how long you need to wait until it's finished. It might take a little while depending on the drive. Post the results when it's finished the test.
Code:
smartctl -t long /dev/sdX
Again, replace "sdX" with the correct node for the drive in question. Then run smartctl -a /dev/sdX (replace "sdX" with correct node once again) to view the results of that long test when it's finished testing the drive.
If that doesn't show anything of concern, then I'd agree with Hazel that it would be time to reinstall. And therefore assuming the drive's ok, it was just some filesystem corruption. But I'd just do a full reinstall of the whole system rather than chrooting into the current install and just reinstalling systemd. But make sure you backup anything that's not a part of a default install, that you wish to keep beforehand. The reason I say to just do a full reinstall is because you'll get a clean system, therefore this will likely avoid having to deal with any complications that might occur with just reinstalling systemd.
Just so you know what I mean about smartctl displaying the SMART attributes, and starting a long test;
Code:
[root@jamespc] ~> smartctl -a /dev/sdb
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-4.18.3-desktop-1omv4000] (OpenMandriva Lx 7.0-1)
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Family: Western Digital Blue
Device Model: WDC WD20EZRZ-00Z5HB0
Serial Number: WD-WCC4M1LY00KA
LU WWN Device Id: 5 0014ee 20ff29446
Firmware Version: 80.00A80
User Capacity: 2,000,398,934,016 bytes [2.00 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Rotation Rate: 5400 rpm
Device is: In smartctl database [for details use: -P show]
ATA Version is: ACS-2 (minor revision not indicated)
SATA Version is: SATA 3.0, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is: Wed Aug 28 13:48:04 2019 ACST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x82) Offline data collection activity
was completed without error.
Auto Offline Data Collection: Enabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: (27720) seconds.
Offline data collection
capabilities: (0x7b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 280) minutes.
Conveyance self-test routine
recommended polling time: ( 5) minutes.
SCT capabilities: (0x7035) SCT Status supported.
SCT Feature Control supported.
SCT Data Table supported.
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 0
3 Spin_Up_Time 0x0027 169 167 021 Pre-fail Always - 4541
4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 705
5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0
7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0
9 Power_On_Hours 0x0032 094 094 000 Old_age Always - 4519
10 Spin_Retry_Count 0x0032 100 100 000 Old_age Always - 0
11 Calibration_Retry_Count 0x0032 100 100 000 Old_age Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 705
192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 2
193 Load_Cycle_Count 0x0032 196 196 000 Old_age Always - 14800
194 Temperature_Celsius 0x0022 122 105 000 Old_age Always - 25
196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0
197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0030 200 200 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0
200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age Offline - 0
SMART Error Log Version: 1
No Errors Logged
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed without error 00% 968 -
# 2 Extended offline Aborted by host 70% 847 -
SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
[root@jamespc] ~> smartctl -t long /dev/sdb
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-4.18.3-desktop-1omv4000] (OpenMandriva Lx 7.0-1)
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION ===
Sending command: "Execute SMART Extended self-test routine immediately in off-line mode".
Drive command "Execute SMART Extended self-test routine immediately in off-line mode" successful.
Testing has begun.
Please wait 280 minutes for test to complete.
Test will complete after Wed Aug 28 18:33:32 2019
Use smartctl -X to abort test.
Last edited by jsbjsb001; 08-27-2019 at 11:49 PM.
Reason: forgot command to get test results
If that doesn't show anything of concern, then I'd agree with Hazel that it would be time to reinstall. And therefore assuming the drive's ok, it was just some filesystem corruption. But I'd just do a full reinstall of the whole system rather than chrooting into the current install and just reinstalling systemd... The reason I say to just do a full reinstall is because you'll get a clean system, therefore this will likely avoid having to deal with any complications that might occur with just reinstalling systemd.
Yeah, I always like to do things the hard way ! Not reinstalling has become a bit of a fetish with me. It seems somehow like giving up, although it's often much quicker than the things I end up doing.
Distribution: Currently: OpenMandriva. Previously: openSUSE, PCLinuxOS, CentOS, among others over the years.
Posts: 3,881
Rep:
Quote:
Originally Posted by hazel
Yeah, I always like to do things the hard way ! Not reinstalling has become a bit of a fetish with me. It seems somehow like giving up, although it's often much quicker than the things I end up doing.
It can seem like the easy way out, and I guess it probably is a lot of the time. Believe it or not, I tend to "fix" things with my own system before opting for a complete reinstall (or ignore it if it isn't a big deal anyway).
The main reason I suggest it in this case is; who knows what else got corrupted? Particularly if the OP is new to Linux, we probably should make it easy on them
Personally at this point I'd look into compiling a new Kernel and checking the bootloader setup and verify fdisk -l and fstab etc.
Maybe a bit of an overkill, but at least an option. Or, better said (easiest solution first):
1. I'd try to boot into single user/rescue mode
2. If successfull, read logs
3. Check fdisk -l and blkid vs && bootloader configurations
4. Boot another ready kernel if available
5. Compile Kernel from source for testing and rebuilding system
I happen to have several available Kernels I can choose from in most cases, so it would be a rather dramatic situation if I'd compile one from source to try that.
Personally at this point I'd look into compiling a new Kernel and checking the bootloader setup and verify fdisk -l and fstab etc.
You wouldn't normally compile a kernel on Debian. Reinstall one perhaps, but a complete reinstall would include that. The other steps sound reasonable and would only take a couple of minutes.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.