Linux - HardwareThis forum is for Hardware issues.
Having trouble installing a piece of hardware? Want to know if that peripheral is compatible with Linux?
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
Hi, ive posted about this problem before, i thought it would go away after a thorough format and reinstall but it hasnt.
When i boot Slackware 10 on my laptop (new Dell Inspiron 8600), at some random point along the boot process, it freezes. I get no response from the keyboard and the cursor stops blinking. I say random as ive concluded that the hang is not a problem with a particular process, ive had it freezing when it tries to load gpm, acpi, sendmail and even when it tries to update shared library links. Surely all of these cant be screwed...
Ive had suggestions previously that there may be a problem with my disk. Is there any disk diagnostic utilities for linux that someone could point me to, to check the integrity of my disk?
donbellioni - this issue does appear to be more appropriate in Hardware than the Software forum, so in response to you request to move it, for the time being, it seems better to leave it here. As you already know, it is OK to bump your thread after 24 hours if you wish.
In terms of your issue, your use of the word "random" in the problem description makes it unclear if this is a repeatable, consistent behavior, or if it is an unpredictable thing that happens every once in a while. Can you clarify?
Also, in addition to mcleodnine's recommendation, most hard drive manufacturers have diagnostic programs available on their websites that will perform some basic diagnostic/integrity checks, which may be worth investigating. However, I've seen a number of threads here at LQ that describe the same basic behavior that you outline, and at least from what I've seen, booting with the "noacpi" parameter often seems to solve the problem. I'd suggest doing a Search here at LQ for similar threads, if you have not done so already. Good luck with it and feel free to post back with any news. -- J.W.
To clarify, the hanging is unpredictable. Sometimes the system boots without problems but other times it hangs. Originally, it hung whilest loading gpm so i disabled it, that solved that but then it would sometimes freeze at loading the keyboard map, again i disabled this, but once again the system would hang elsewhere - this time when acpi was loading. This seemed to fix itself but now its hanging at Loading Shared Libraries - /sbin/ldconfig.
Sometimes it freezes, sometimes it works perfectly.
Ive tried your suggestion J.W, and edited my lilo.conf to include the line:
Code:
append = "pci=noacpi"
in the linux config section.
Hopefully this will solve it, ill need to give it a couple of days use to know for sure.
mcloednine, as per your suggestion i issued that command, but i dont quite understand the output. It appears there is errors. This is the output, please excuse the verbosity.
Code:
smartctl version 5.30 Copyright (C) 2002-4 Bruce Allen
Home page is http://smartmontools.sourceforge.net/
=== START OF INFORMATION SECTION ===
Device Model: IC25N040ATMR04-0
Serial Number: MRG257KBC2VYDH
Firmware Version: MO2OAD0A
Device is: Not in smartctl database [for details use: -P showall]
ATA Version is: 6
ATA Standard is: ATA/ATAPI-6 T13 1410D revision 3a
Local Time is: Sun Aug 29 17:29:47 2004 GMT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x82) Offline data collection activity was
completed without error.
Auto Offline Data Collection: Enabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: ( 645) seconds.
Offline data collection
capabilities: (0x5b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
No Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 37) minutes.
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000b 100 100 062 Pre-fail Always - 0
2 Throughput_Performance 0x0005 105 105 040 Pre-fail Offline - 5848
3 Spin_Up_Time 0x0007 152 152 033 Pre-fail Always - 1
4 Start_Stop_Count 0x0012 100 100 000 Old_age Always - 436
5 Reallocated_Sector_Ct 0x0033 100 100 005 Pre-fail Always - 0
7 Seek_Error_Rate 0x000b 100 100 067 Pre-fail Always - 0
8 Seek_Time_Performance 0x0005 120 120 040 Pre-fail Offline - 36
9 Power_On_Hours 0x0012 099 099 000 Old_age Always - 674
10 Spin_Retry_Count 0x0013 100 100 060 Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 434
191 G-Sense_Error_Rate 0x000a 100 100 000 Old_age Always - 0
192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 34
193 Load_Cycle_Count 0x0012 097 097 000 Old_age Always - 37353
194 Temperature_Celsius 0x0002 152 152 000 Old_age Always - 36 (Lifetime Min/Max 17/2286)
196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always - 0
197 Current_Pending_Sector 0x0022 100 100 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0008 100 100 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x000a 200 200 000 Old_age Always - 0
SMART Error Log Version: 1
ATA Error Count: 12 (device log contains only the most recent five errors)
CR = Command Register [HEX]
FR = Features Register [HEX]
SC = Sector Count Register [HEX]
SN = Sector Number Register [HEX]
CL = Cylinder Low Register [HEX]
CH = Cylinder High Register [HEX]
DH = Device/Head Register [HEX]
DC = Device Command Register [HEX]
ER = Error register [HEX]
ST = Status register [HEX]
Timestamp = decimal seconds since the previous disk power-on.
Note: timestamp "wraps" after 2^32 msec = 49.710 days.
Error 12 occurred at disk power-on lifetime: 608 hours
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 01 8f 9a 01 e2 Error: UNC 1 sectors at LBA = 0x02019a8f = 33659535
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Timestamp Command/Feature_Name
-- -- -- -- -- -- -- -- --------- --------------------
25 00 02 8e 9a 01 e0 00 2147.400 READ DMA EXT
25 00 04 8c 9a 01 e0 00 2143.700 READ DMA EXT
25 00 06 8a 9a 01 e0 00 2140.100 READ DMA EXT
25 00 08 88 9a 01 e0 00 2136.500 READ DMA EXT
25 00 70 90 9a 01 e0 00 2136.400 READ DMA EXT
Error 11 occurred at disk power-on lifetime: 608 hours
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 01 8f 9a 01 e2 Error: UNC 1 sectors at LBA = 0x02019a8f = 33659535
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Timestamp Command/Feature_Name
-- -- -- -- -- -- -- -- --------- --------------------
25 00 04 8c 9a 01 e0 00 2143.700 READ DMA EXT
25 00 06 8a 9a 01 e0 00 2140.100 READ DMA EXT
25 00 08 88 9a 01 e0 00 2136.500 READ DMA EXT
25 00 70 90 9a 01 e0 00 2136.400 READ DMA EXT
25 00 72 8e 9a 01 e0 00 2132.700 READ DMA EXT
Error 10 occurred at disk power-on lifetime: 608 hours
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 01 8f 9a 01 e2 Error: UNC 1 sectors at LBA = 0x02019a8f = 33659535
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Timestamp Command/Feature_Name
-- -- -- -- -- -- -- -- --------- --------------------
25 00 06 8a 9a 01 e0 00 2140.100 READ DMA EXT
25 00 08 88 9a 01 e0 00 2136.500 READ DMA EXT
25 00 70 90 9a 01 e0 00 2136.400 READ DMA EXT
25 00 72 8e 9a 01 e0 00 2132.700 READ DMA EXT
25 00 74 8c 9a 01 e0 00 2129.100 READ DMA EXT
Error 9 occurred at disk power-on lifetime: 608 hours
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 01 8f 9a 01 e2 Error: UNC 1 sectors at LBA = 0x02019a8f = 33659535
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Timestamp Command/Feature_Name
-- -- -- -- -- -- -- -- --------- --------------------
25 00 08 88 9a 01 e0 00 2136.500 READ DMA EXT
25 00 70 90 9a 01 e0 00 2136.400 READ DMA EXT
25 00 72 8e 9a 01 e0 00 2132.700 READ DMA EXT
25 00 74 8c 9a 01 e0 00 2129.100 READ DMA EXT
25 00 76 8a 9a 01 e0 00 2125.500 READ DMA EXT
Error 8 occurred at disk power-on lifetime: 608 hours
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 71 8f 9a 01 e2 Error: UNC 113 sectors at LBA = 0x02019a8f = 33659535
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Timestamp Command/Feature_Name
-- -- -- -- -- -- -- -- --------- --------------------
25 00 72 8e 9a 01 e0 00 2132.700 READ DMA EXT
25 00 74 8c 9a 01 e0 00 2129.100 READ DMA EXT
25 00 76 8a 9a 01 e0 00 2125.500 READ DMA EXT
25 00 78 88 9a 01 e0 00 2121.800 READ DMA EXT
25 00 7a 86 9a 01 e0 00 2118.200 READ DMA EXT
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Short offline Completed without error 00% 607 -
# 2 Short offline Completed without error 00% 1 -
# 3 Short offline Completed without error 00% 0 -
A couple read errors, might want to try badblocks as well and see if that has significant output, although I've seen smart errors on a drive like that not mean anything at all.
Since my last post ive booted and the system has hung once again. It seems that the noapci kernel option hasnt solved this particular problem.
Finegan, the system is hanging at "Updating Shared Libraries: /sbin/ldconfig". But as before, it only happens sometimes. "badblocks -v /dev/hda3" gives:
Code:
Checking blocks 0 to 6144862
Checking for bad blocks (read-only test): 614486056/ 6144862
6144861
done
Pass completed, 2 bad blocks found.
Is this bad? What can be done to fix these bad blocks?
Mcleodnine, its a standard Dell Inspiron running hardware such as a Dell Wireless 1350 Mini-PCI wireless card, nVidia Geforce FX 5200 64MB Graphics Card, Modular 8xDVD/24x CDRW Combo Drive, Integrated 56Kpbs V92 Modem - 10/100 Ethernet, 15.4" Wide Aspect Ultrasharp WXGA (1280X800) Screen, Intel Pentium M Processor.
As soon as bad blocks start appearing, that's not a good sign, the drive is smart enough to cordon off bad sectors, but you've got some slipping around the cracks and its just going to start to get worse...
ldconfig horking would seem to me more of a software issue as it can't update shared library links quite right... made any changes to glibc or anything along those lines?
I'm beginning to think this isn't hardware as much as a possible mucked up library problem, although running ldconfig is a lot of drive whacking...
No changes to glibc finnegan, i assumed it was a hardware issue as ive had the same freezing occuring with the other processes above (gpm, acpi, keymaps), i might be wrong!
Is the badblocks problem fixable or is the drive physically broken?? Its just a new system.
badblocks are just drive death man, I recently had to kiss goodbye to a 60Gb Samsung OEM white-label that's off warranty, got about 95% of what I had on it off... ironically all of its bad sectors were clustered around the Slackware 10.0 and Solaris 9 x86 ISOs so I didn't have to lose anything I didn't already have. Badblocks are plenty enough grounds for an RMA though and a Dell isn't that hard to yank a drive out of. Might as well do it now while the warranty is probably still valid.
How did you tell where the bad blocks were on your system? Am i right in thinking that they should be around the location where ldconfig is on mine? Can i verify this?
Thanks a million for the help finnegan, i best get on to Dell then.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.