Linux - Hardware This forum is for Hardware issues.
Having trouble installing a piece of hardware? Want to know if that peripheral is compatible with Linux? |
| Notices |
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
Are you new to LinuxQuestions.org? Visit the following links:
Site Howto |
Site FAQ |
Sitemap |
Register Now
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
 |
GNU/Linux Basic Guide
This 255-page guide will provide you with the keys to understand the philosophy of free software, teach you how to use and handle it, and give you the tools required to move easily in the world of GNU/Linux. Many users and administrators will be taking their first steps with this GNU/Linux Basic guide and it will show you how to approach and solve the problems you encounter.
Click Here to receive this Complete Guide absolutely free. |
|
 |
02-09-2010, 03:42 PM
|
#1
|
|
Member
Registered: Dec 2004
Distribution: Slackware64-current
Posts: 95
Rep:
|
Permanent filesystem corruption on reiserfs, ext3 and ext4 - disk failure?
I have been having problems with filesystem corruption on my eeepc 1000H for a long time now. I have tried using different filesystems, kernels and distributions (arch, slackware) to no effect. I am starting to grow suspicious that this problem lies somewhere else, as I haven't seen anyone else having similar problems in such a variety of scenarios.
I have tried testing my ram using memtest86+, didn't come up with anything after a full run through. I also have tried using e2fsck -c to check for bad blocks, it finds none. I had a go at using smartctl but wasn't really sure what I was doing. I did a long test and it came up with nothing anyway.
This problem is in addition to the problems I've been having with my intel graphics chip and KMS. A lot of the time there are lockups when booting into X, which can only be gotten out of by a hard reset. This is sometimes what causes the original filesystem errors. I've stopped messing around with KMS for now to eliminate this but my current system in unbootable.
I'm guessing my disk is wrecked but have as yet seen no definitive proof. Can anyone recommend anything that I should do?
I am currently on ext4 with a custom kernel 2.6.33-rc6 (the stock kernel shipping with slackware does not have the elantech extension for psmouse included). When I was using arch, I was just using the stock kernels.
Thanks,
Tom
|
|
|
|
02-10-2010, 03:21 AM
|
#2
|
|
Guru
Registered: Jan 2006
Location: Ireland
Distribution: Slackware & Android
Posts: 5,295
|
I'm still on ext3 because I want others to have the headaches with ext4 and to have all of them sorted before I go near it.
If things continually find no disk errors, the disk is probably OK. I'd go to 'safe defaults' in the BIOS. Next I'd back up fully elsewhere. Elaborate on your problems with KMS. You can boot from an install cd, or if the kernel boots boot with init=/bin/bash as a kernel option. You'll get a root shell instead of running init. You'll have to work at it from there
/sbin/mount /dev/whatsit -t ext4 /
a PATH statement, etc. but you can get in and repair things.
|
|
|
|
02-10-2010, 03:50 AM
|
#3
|
|
Guru
Registered: Oct 2005
Location: $RANDOM
Distribution: slackware64
Posts: 12,614
|
Any error messages you could post ?
It could be something related to the SATA controller ? What kind of HDD do you have ? What southbridge ? What mobo ?
|
|
|
|
02-10-2010, 05:43 AM
|
#4
|
|
Member
Registered: Dec 2004
Distribution: Slackware64-current
Posts: 95
Original Poster
Rep:
|
HDD model is ST980811AS from hdparm -I. From google I get that it's a Seagate Momentus 80GB
Here's my lspci -v output:
Code:
00:00.0 Host bridge: Intel Corporation Mobile 945GME Express Memory Controller Hub (rev 03)
Subsystem: ASUSTeK Computer Inc. Device 8340
Flags: bus master, fast devsel, latency 0
Capabilities: [e0] Vendor Specific Information <?>
Kernel driver in use: agpgart-intel
00:02.0 VGA compatible controller: Intel Corporation Mobile 945GME Express Integrated Graphics Controller (rev 03) (prog-if 00 [VGA controller])
Subsystem: ASUSTeK Computer Inc. Device 8340
Flags: bus master, fast devsel, latency 0, IRQ 16
Memory at f7f00000 (32-bit, non-prefetchable) [size=512K]
I/O ports at dc00 [size=8]
Memory at d0000000 (32-bit, prefetchable) [size=256M]
Memory at f7ec0000 (32-bit, non-prefetchable) [size=256K]
Expansion ROM at <unassigned> [disabled]
Capabilities: [90] MSI: Enable- Count=1/1 Maskable- 64bit-
Capabilities: [d0] Power Management version 2
00:02.1 Display controller: Intel Corporation Mobile 945GM/GMS/GME, 943/940GML Express Integrated Graphics Controller (rev 03)
Subsystem: ASUSTeK Computer Inc. Device 8340
Flags: bus master, fast devsel, latency 0
Memory at f7f80000 (32-bit, non-prefetchable) [size=512K]
Capabilities: [d0] Power Management version 2
00:1b.0 Audio device: Intel Corporation 82801G (ICH7 Family) High Definition Audio Controller (rev 02)
Subsystem: ASUSTeK Computer Inc. Device 831a
Flags: bus master, fast devsel, latency 0, IRQ 16
Memory at f7eb8000 (64-bit, non-prefetchable) [size=16K]
Capabilities: [50] Power Management version 2
Capabilities: [60] MSI: Enable- Count=1/1 Maskable- 64bit+
Capabilities: [70] Express Root Complex Integrated Endpoint, MSI 00
Capabilities: [100] Virtual Channel <?>
Capabilities: [130] Root Complex Link <?>
Kernel driver in use: HDA Intel
00:1c.0 PCI bridge: Intel Corporation 82801G (ICH7 Family) PCI Express Port 1 (rev 02) (prog-if 00 [Normal decode])
Flags: bus master, fast devsel, latency 0
Bus: primary=00, secondary=04, subordinate=04, sec-latency=0
I/O behind bridge: 00001000-00001fff
Memory behind bridge: 40000000-401fffff
Prefetchable memory behind bridge: 0000000040200000-00000000403fffff
Capabilities: [40] Express Root Port (Slot+), MSI 00
Capabilities: [80] MSI: Enable- Count=1/1 Maskable- 64bit-
Capabilities: [90] Subsystem: ASUSTeK Computer Inc. Device 830f
Capabilities: [a0] Power Management version 2
Capabilities: [100] Virtual Channel <?>
Capabilities: [180] Root Complex Link <?>
Kernel driver in use: pcieport
00:1c.1 PCI bridge: Intel Corporation 82801G (ICH7 Family) PCI Express Port 2 (rev 02) (prog-if 00 [Normal decode])
Flags: bus master, fast devsel, latency 0
Bus: primary=00, secondary=03, subordinate=03, sec-latency=0
I/O behind bridge: 0000e000-0000efff
Memory behind bridge: fbf00000-fbffffff
Prefetchable memory behind bridge: 0000000040400000-00000000405fffff
Capabilities: [40] Express Root Port (Slot+), MSI 00
Capabilities: [80] MSI: Enable- Count=1/1 Maskable- 64bit-
Capabilities: [90] Subsystem: ASUSTeK Computer Inc. Device 830f
Capabilities: [a0] Power Management version 2
Capabilities: [100] Virtual Channel <?>
Capabilities: [180] Root Complex Link <?>
Kernel driver in use: pcieport
00:1c.3 PCI bridge: Intel Corporation 82801G (ICH7 Family) PCI Express Port 4 (rev 02) (prog-if 00 [Normal decode])
Flags: bus master, fast devsel, latency 0
Bus: primary=00, secondary=01, subordinate=02, sec-latency=0
I/O behind bridge: 00002000-00002fff
Memory behind bridge: f8000000-fbefffff
Prefetchable memory behind bridge: 00000000f0000000-00000000f6ffffff
Capabilities: [40] Express Root Port (Slot+), MSI 00
Capabilities: [80] MSI: Enable- Count=1/1 Maskable- 64bit-
Capabilities: [90] Subsystem: ASUSTeK Computer Inc. Device 830f
Capabilities: [a0] Power Management version 2
Capabilities: [100] Virtual Channel <?>
Capabilities: [180] Root Complex Link <?>
Kernel driver in use: pcieport
00:1d.0 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI Controller #1 (rev 02) (prog-if 00 [UHCI])
Subsystem: ASUSTeK Computer Inc. Device 830f
Flags: bus master, medium devsel, latency 0, IRQ 23
I/O ports at d400 [size=32]
Kernel driver in use: uhci_hcd
00:1d.1 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI Controller #2 (rev 02) (prog-if 00 [UHCI])
Subsystem: ASUSTeK Computer Inc. Device 830f
Flags: bus master, medium devsel, latency 0, IRQ 19
I/O ports at d480 [size=32]
Kernel driver in use: uhci_hcd
00:1d.2 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI Controller #3 (rev 02) (prog-if 00 [UHCI])
Subsystem: ASUSTeK Computer Inc. Device 830f
Flags: bus master, medium devsel, latency 0, IRQ 18
I/O ports at d800 [size=32]
Kernel driver in use: uhci_hcd
00:1d.3 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI Controller #4 (rev 02) (prog-if 00 [UHCI])
Subsystem: ASUSTeK Computer Inc. Device 830f
Flags: bus master, medium devsel, latency 0, IRQ 16
I/O ports at d880 [size=32]
Kernel driver in use: uhci_hcd
00:1d.7 USB Controller: Intel Corporation 82801G (ICH7 Family) USB2 EHCI Controller (rev 02) (prog-if 20 [EHCI])
Subsystem: ASUSTeK Computer Inc. Device 830f
Flags: bus master, medium devsel, latency 0, IRQ 23
Memory at f7eb7c00 (32-bit, non-prefetchable) [size=1K]
Capabilities: [50] Power Management version 2
Capabilities: [58] Debug port: BAR=1 offset=00a0
Kernel driver in use: ehci_hcd
00:1e.0 PCI bridge: Intel Corporation 82801 Mobile PCI Bridge (rev e2) (prog-if 01 [Subtractive decode])
Flags: bus master, fast devsel, latency 0
Bus: primary=00, secondary=05, subordinate=05, sec-latency=32
Capabilities: [50] Subsystem: ASUSTeK Computer Inc. Device 830f
00:1f.0 ISA bridge: Intel Corporation 82801GBM (ICH7-M) LPC Interface Bridge (rev 02)
Subsystem: ASUSTeK Computer Inc. Device 830f
Flags: bus master, medium devsel, latency 0
Capabilities: [e0] Vendor Specific Information <?>
00:1f.2 IDE interface: Intel Corporation 82801GBM/GHM (ICH7 Family) SATA IDE Controller (rev 02) (prog-if 80 [Master])
Subsystem: ASUSTeK Computer Inc. Device 830f
Flags: bus master, 66MHz, medium devsel, latency 0, IRQ 19
I/O ports at 01f0 [size=8]
I/O ports at 03f4 [size=1]
I/O ports at 0170 [size=8]
I/O ports at 0374 [size=1]
I/O ports at ffa0 [size=16]
Capabilities: [70] Power Management version 2
Kernel driver in use: ata_piix
01:00.0 Network controller: RaLink RT2860
Subsystem: RaLink Device 2790
Physical Slot: eeepc-wifi
Flags: bus master, fast devsel, latency 0, IRQ 19
Memory at fbef0000 (32-bit, non-prefetchable) [size=64K]
Capabilities: [40] Power Management version 3
Capabilities: [50] MSI: Enable- Count=1/32 Maskable- 64bit+
Capabilities: [70] Express Endpoint, MSI 00
Capabilities: [100] Advanced Error Reporting
Kernel driver in use: rt2860
Kernel modules: rt2860sta, rt2800pci
03:00.0 Ethernet controller: Attansic Technology Corp. Atheros AR8121/AR8113/AR8114 PCI-E Ethernet Controller (rev b0)
Subsystem: ASUSTeK Computer Inc. Device 8324
Flags: bus master, fast devsel, latency 0, IRQ 17
Memory at fbfc0000 (64-bit, non-prefetchable) [size=256K]
I/O ports at ec00 [size=128]
Capabilities: [40] Power Management version 2
Capabilities: [48] MSI: Enable- Count=1/1 Maskable- 64bit+
Capabilities: [58] Express Endpoint, MSI 00
Capabilities: [6c] Vital Product Data
Capabilities: [100] Advanced Error Reporting
Capabilities: [180] Device Serial Number ff-22-52-d9-00-23-54-ff
Kernel driver in use: ATL1E
My problems with KMS don't exist any more as I have stopped using it! There are various posts on this forum about it being unstable. My problem occurs at the moment before X is involved anyway, just booting using text mode from lilo.
I've tried booting from a usb stick and running fsck from there, it returns the same results. When I boot up normally in slackware it always returns me to the single-user shell anyway so that's no different is it?
I'll attach a couple of outputs from consecutive runs of
Code:
fsck -v -y /dev/sda1
rebooting in between each attempt.
Tom
|
|
|
|
02-10-2010, 07:02 AM
|
#5
|
|
Guru
Registered: Oct 2005
Location: $RANDOM
Distribution: slackware64
Posts: 12,614
|
How old is this disk ? Can you also post the output of 'smartctl -a /dev/sda'.
|
|
|
|
02-10-2010, 07:11 AM
|
#6
|
|
Member
Registered: Dec 2004
Distribution: Slackware64-current
Posts: 95
Original Poster
Rep:
|
The disk is a year and a half old. It came with the computer in late August 2008
Code:
smartctl version 5.38 [i486-slackware-linux-gnu] Copyright (C) 2002-8 Bruce Allen
Home page is http://smartmontools.sourceforge.net/
=== START OF INFORMATION SECTION ===
Model Family: Seagate Momentus 5400.3
Device Model: ST980811AS
Serial Number: 5LYBETCH
Firmware Version: 3.ALC
User Capacity: 80,026,361,856 bytes
Device is: In smartctl database [for details use: -P show]
ATA Version is: 7
ATA Standard is: Exact ATA specification draft version not indicated
Local Time is: Wed Feb 10 13:10:10 2010 GMT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
See vendor-specific Attribute list for marginal Attributes.
General SMART Values:
Offline data collection status: (0x82) Offline data collection activity
was completed without error.
Auto Offline Data Collection: Enabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: ( 426) seconds.
Offline data collection
capabilities: (0x5b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
No Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
No General Purpose Logging support.
Short self-test routine
recommended polling time: ( 1) minutes.
Extended self-test routine
recommended polling time: ( 84) minutes.
SCT capabilities: (0x0001) SCT Status supported.
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 100 253 006 Pre-fail Always - 0
3 Spin_Up_Time 0x0003 099 099 000 Pre-fail Always - 0
4 Start_Stop_Count 0x0032 099 099 020 Old_age Always - 1642
5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always - 0
7 Seek_Error_Rate 0x000f 077 060 030 Pre-fail Always - 4349911548
9 Power_On_Hours 0x0032 096 096 000 Old_age Always - 3720
10 Spin_Retry_Count 0x0013 100 100 034 Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 099 099 020 Old_age Always - 1767
187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0
189 High_Fly_Writes 0x003a 018 018 000 Old_age Always - 82
190 Airflow_Temperature_Cel 0x0022 081 043 045 Old_age Always In_the_past 19 (Lifetime Min/Max 18/19)
192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 1466
193 Load_Cycle_Count 0x0032 001 001 000 Old_age Always - 471825
194 Temperature_Celsius 0x0022 019 057 000 Old_age Always - 19 (0 10 0 0)
195 Hardware_ECC_Recovered 0x001a 091 061 000 Old_age Always - 235905829
197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 1
200 Multi_Zone_Error_Rate 0x0000 100 253 000 Old_age Offline - 0
202 TA_Increase_Count 0x0032 100 253 000 Old_age Always - 0
SMART Error Log Version: 1
ATA Error Count: 1
CR = Command Register [HEX]
FR = Features Register [HEX]
SC = Sector Count Register [HEX]
SN = Sector Number Register [HEX]
CL = Cylinder Low Register [HEX]
CH = Cylinder High Register [HEX]
DH = Device/Head Register [HEX]
DC = Device Command Register [HEX]
ER = Error register [HEX]
ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.
Error 1 occurred at disk power-on lifetime: 2975 hours (123 days + 23 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
84 51 00 be 9c 95 e1 Error: ICRC, ABRT at LBA = 0x01959cbe = 26582206
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
c8 00 80 3f 9c 95 e1 00 00:48:49.216 READ DMA
c8 00 40 ff 9b 95 e1 00 00:48:49.216 READ DMA
c8 00 20 df 9b 95 e1 00 00:48:49.195 READ DMA
c8 00 08 d7 9b 95 e1 00 00:48:48.606 READ DMA
c8 00 00 37 f3 95 e1 00 00:48:48.555 READ DMA
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed without error 00% 3686 -
# 2 Short offline Completed without error 00% 3685 -
SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
|
|
|
|
02-10-2010, 09:02 AM
|
#7
|
|
Guru
Registered: Oct 2005
Location: $RANDOM
Distribution: slackware64
Posts: 12,614
|
I think that this drive is reasonably likely to fail. The reasons being:
Code:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
189 High_Fly_Writes 0x003a 018 018 000 Old_age Always - 82
193 Load_Cycle_Count 0x0032 001 001 000 Old_age Always - 471825
When value goes down to threshold, the drive can fail at any time. This is more true for Pre-fail attributes than for Old_age (means the drive is old and has reached end of life in a statistical sense)
There's also the error listed there.
This suggests the drive is old and is wearing out, this may be a reason for the persistent corruption. It also seems that the drive has been overheating, and has overheated at least once.
Last edited by H_TeXMeX_H; 02-10-2010 at 09:04 AM.
|
|
|
|
02-10-2010, 10:19 AM
|
#8
|
|
Member
Registered: Dec 2004
Distribution: Slackware64-current
Posts: 95
Original Poster
Rep:
|
That's the second laptop I've killed then. I guess it serves me right for using them more or less as desktops! Thanks a lot for interpreting that smartctl output H_TeXMeX_H. Now seems like as good a time as any to have a look at the world of SSDs!
|
|
|
|
02-11-2010, 03:29 AM
|
#9
|
|
Guru
Registered: Jan 2006
Location: Ireland
Distribution: Slackware & Android
Posts: 5,295
|
These might be silly questions, but bear with me.
Have you any errors in the logs? particularly /var/log/messages?
Is acpid running?
Do you hear a fan blowing warm air and does it change speed?
My laptop always blows warm air. If the box is lightly used, the fan remains slow. If I'm compiling, the fan speed rises.
Is the hard disk spinning down? Lots of good tips on http://www.lesswatts.org
If you set up again, install acpitools. Then you can run acpitool -t and find your internal temperatures. Also run lm_sensors, let it set up your sensors, and run acpid with the -l option for a week. Then you can look in the logs and see all the signals the box gives, and set up scripts and events to do what you want with them. Slackware sadly gives you squat there. For instance, my laptop has a hibernate script on the lid switch, so you turn off by closing the lid.
|
|
|
|
02-11-2010, 05:29 AM
|
#10
|
|
Member
Registered: Dec 2004
Distribution: Slackware64-current
Posts: 95
Original Poster
Rep:
|
Thanks for the tips business_kid. There's already a package that I use to handle acpi, it's called eeepc-acpi-scripts, made by Alien Bob. This, along with kde and the eeepc kernel module, handles most powersaving, suspend2ram etc. and I dynamically underclock anyway. I've kept my eye on the temperature in the past but as there's only an intel atom processor inside the eee 1000h heat has not really been an issue in general.
Having said that, I recently sent it back to be repaired for a seperate issue and they (ASUS) replaced my motherboard without putting a heatsink on the processor. I didn't realise that this was the case until it suddenly shut down, having reached the critical temperature. As you can imagine my heartfelt thanks and respect goes out to the helpful folk at ASUS. It is worth noting that I was experiencing these errors before this happened (hence not mentioning it) but that would probably explain why smartctl also shows some high temperatures in the past.
Anyway, I've bitten the bullet and ordered myself a 30GB SSD. I think I'm still within my warranty period no the old hard disk but I kinda wanted an SSD anyway! I'll mark this thread as solved when it arrives this weekend, assuming that was the problem of course.
Thanks again for both of your help
Tom
|
|
|
|
| Thread Tools |
Search this Thread |
|
|
|
Posting Rules
|
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
HTML code is Off
|
|
|
All times are GMT -5. The time now is 10:44 PM.
|
|
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.
|
Latest Threads
LQ News
|
|