LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)
-   -   HDD or File System issue (https://www.linuxquestions.org/questions/linux-newbie-8/hdd-or-file-system-issue-4175629198/)

ghultstrand 05-07-2018 11:25 AM

HDD or File System issue
 
I'm still relativity new to Linux world especially when it comes to diagnostics. Last Friday I had a boot problem and my OS wouldn't load. It was giving Unexpected Inconsistency Run fsck Manually. So, I did and P1 was repaired and the OS loaded, granted some apps were corrupt. When I did smartctl on the drive it was reporting fine. Now n1p2 is having an issue. Should I replace the drive or is it just a cascading error in the file system and I should just nuke the machine and start over? Or is there a good way to stem the problem?

Screenshots of the current error: https://1drv.ms/f/s!AtItXRNac1nKhpZmKneTiOe6xvLWVg

Code:

root@Pixy-PC:~# sudo smartctl -a /dev/nvme0
smartctl 6.6 2016-05-31 r4324 [x86_64-linux-4.9.0-6-amd64] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Number:                      Force MP500
Serial Number:                      17127956000123380135
Firmware Version:                  E7FM02.1
PCI Vendor/Subsystem ID:            0x1987
IEEE OUI Identifier:                0x6479a7
Controller ID:                      0
Number of Namespaces:              1
Namespace 1 Size/Capacity:          120,034,123,776 [120 GB]
Namespace 1 Formatted LBA Size:    512
Local Time is:                      Fri May  4 09:14:49 2018 CDT
Firmware Updates (0x02):            1 Slot
Optional Admin Commands (0x0007):  Security Format Frmw_DL
Optional NVM Commands (0x001e):    Wr_Unc DS_Mngmt Wr_Zero Sav/Sel_Feat
Maximum Data Transfer Size:        512 Pages
Warning  Comp. Temp. Threshold:    110 Celsius
Critical Comp. Temp. Threshold:    130 Celsius

Supported Power States
St Op    Max  Active    Idle  RL RT WL WT  Ent_Lat  Ex_Lat
 0 +    7.90W      -        -    0  0  0  0        0      0
 1 +    2.40W      -        -    1  1  1  1      600    600
 2 +    1.90W      -        -    2  2  2  2      600    600
 3 -  0.1100W      -        -    3  3  3  3      600    600
 4 -  0.0050W      -        -    4  4  4  4  100000  160000

Supported LBA Sizes (NSID 0x1)
Id Fmt  Data  Metadt  Rel_Perf
 0 +    512      0        1

=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

SMART/Health Information (NVMe Log 0x02, NSID 0xffffffff)
Critical Warning:                  0x00
Temperature:                        45 Celsius
Available Spare:                    100%
Available Spare Threshold:          0%
Percentage Used:                    2%
Data Units Read:                    3,044,091 [1.55 TB]
Data Units Written:                6,453,922 [3.30 TB]
Host Read Commands:                77,758,445
Host Write Commands:                174,001,563
Controller Busy Time:              0
Power Cycles:                      34
Power On Hours:                    7,214
Unsafe Shutdowns:                  10
Media and Data Integrity Errors:    2,874
Error Information Log Entries:      2,874
Warning  Comp. Temperature Time:    0
Critical Comp. Temperature Time:    0
Temperature Sensor 2:              59 Celsius

Error Information (NVMe Log 0x01, max 64 entries)
Num  ErrCount  SQId  CmdId  Status  PELoc          LBA  NSID    VS
  0      2874    1  0x029b  0x4502      -  4294967288    1    -
  1      2873    1  0x029b  0x4502      -  4294967288    1    -
  2      2872    1  0x029b  0x4502      -  4294967288    1    -
  3      2871    1  0x029b  0x4502      -  4294967288    1    -
  4      2870    1  0x029b  0x4502      -  4294967288    1    -
  5      2869    1  0x029b  0x4502      -  4294967288    1    -
  6      2868    1  0x029b  0x4502      -  4294967288    1    -
  7      2867    1  0x029b  0x4502      -  4294967288    1    -
  8      2866    1  0x029b  0x4502      -  4294967288    1    -
  9      2865    1  0x029b  0x4502      -  4294967288    1    -
 10      2864    1  0x029b  0x4502      -  4294967288    1    -
 11      2863    1  0x029b  0x4502      -  4294967288    1    -
 12      2862    1  0x029b  0x4502      -  4294967288    1    -
 13      2861    1  0x029b  0x4502      -  4294967288    1    -
 14      2860    1  0x029b  0x4502      -  4294967288    1    -
 15      2859    1  0x029b  0x4502      -  4294967288    1    -
... (47 entries not shown)


AwesomeMachine 05-07-2018 08:08 PM

Hi ghultstrand,

Welcome!

It sounds like a hardware problem on the motherboard. I can't be positive about it. It might also be a failing PSU. If you have important data on the machine, I would back it up immediately. You never can tell how long the machine will last.

I had a machine that had periodic file system errors, boot errors, and miscellaneous weirdness. Finally I backed everything up. I shut down afterward. And the machine never booted again. NO POST, no beep, just fans on the highest speed.

It wouldn't do anything. And it had fast/wide scsi drives. So, I wouldn't have been able to just drop them in another chassis.

frankbell 05-07-2018 08:34 PM

Try booting to a Live CD of something. If it's a motherboard issue, it should affect the Live CD boot.

Note: You should like try this several times with a couple of different Live CDs before you can safely draw any conclusions.

fatmac 05-08-2018 03:31 AM

Make sure you back up all your personal data, now, just in case, then try & sort out what is happening.

Teufel 05-08-2018 04:05 AM

could you post output of "smartctl -x" instead of "smartctl -a" ?

ghultstrand 05-15-2018 11:30 AM

Hey all - I gave up and formatted the HDD and just implemented a better backup plan so even if it crashes I won't be down for long. Thanks for the advice everyone!


All times are GMT -5. The time now is 08:43 AM.