LinuxQuestions.org
Share your knowledge at the LQ Wiki.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Hardware
User Name
Password
Linux - Hardware This forum is for Hardware issues.
Having trouble installing a piece of hardware? Want to know if that peripheral is compatible with Linux?

Notices

Reply
 
Search this Thread
Old 08-24-2004, 01:11 PM   #1
donbellioni
Member
 
Registered: Mar 2004
Location: UK
Distribution: Slackware 10
Posts: 137

Rep: Reputation: 15
Unhappy Hard disk problem? Boot freezes...


Hi, ive posted about this problem before, i thought it would go away after a thorough format and reinstall but it hasnt.

When i boot Slackware 10 on my laptop (new Dell Inspiron 8600), at some random point along the boot process, it freezes. I get no response from the keyboard and the cursor stops blinking. I say random as ive concluded that the hang is not a problem with a particular process, ive had it freezing when it tries to load gpm, acpi, sendmail and even when it tries to update shared library links. Surely all of these cant be screwed...

Ive had suggestions previously that there may be a problem with my disk. Is there any disk diagnostic utilities for linux that someone could point me to, to check the integrity of my disk?

If this is a disk problem, is this fixable?

Any suggestions are very welcome.

Thanks.
 
Old 08-27-2004, 01:21 PM   #2
donbellioni
Member
 
Registered: Mar 2004
Location: UK
Distribution: Slackware 10
Posts: 137

Original Poster
Rep: Reputation: 15
bump
 
Old 08-28-2004, 11:02 AM   #3
mcleodnine
Senior Member
 
Registered: May 2001
Location: Left Coast - Canada
Distribution: s l a c k w a r e
Posts: 2,731

Rep: Reputation: 45
smartctl can give you access to the S.M.A.R.T. utilities built into most modern drives.

'smartctl -a /dev/hda' should generate some insight
 
Old 08-28-2004, 03:35 PM   #4
J.W.
LQ Veteran
 
Registered: Mar 2003
Location: Milwaukee, WI
Distribution: Mint
Posts: 6,642

Rep: Reputation: 69
donbellioni - this issue does appear to be more appropriate in Hardware than the Software forum, so in response to you request to move it, for the time being, it seems better to leave it here. As you already know, it is OK to bump your thread after 24 hours if you wish.

In terms of your issue, your use of the word "random" in the problem description makes it unclear if this is a repeatable, consistent behavior, or if it is an unpredictable thing that happens every once in a while. Can you clarify?

Also, in addition to mcleodnine's recommendation, most hard drive manufacturers have diagnostic programs available on their websites that will perform some basic diagnostic/integrity checks, which may be worth investigating. However, I've seen a number of threads here at LQ that describe the same basic behavior that you outline, and at least from what I've seen, booting with the "noacpi" parameter often seems to solve the problem. I'd suggest doing a Search here at LQ for similar threads, if you have not done so already. Good luck with it and feel free to post back with any news. -- J.W.
 
Old 08-29-2004, 11:44 AM   #5
donbellioni
Member
 
Registered: Mar 2004
Location: UK
Distribution: Slackware 10
Posts: 137

Original Poster
Rep: Reputation: 15
Thanks for the replies.

To clarify, the hanging is unpredictable. Sometimes the system boots without problems but other times it hangs. Originally, it hung whilest loading gpm so i disabled it, that solved that but then it would sometimes freeze at loading the keyboard map, again i disabled this, but once again the system would hang elsewhere - this time when acpi was loading. This seemed to fix itself but now its hanging at Loading Shared Libraries - /sbin/ldconfig.

Sometimes it freezes, sometimes it works perfectly.

Ive tried your suggestion J.W, and edited my lilo.conf to include the line:
Code:
append = "pci=noacpi"
in the linux config section.

Hopefully this will solve it, ill need to give it a couple of days use to know for sure.

mcloednine, as per your suggestion i issued that command, but i dont quite understand the output. It appears there is errors. This is the output, please excuse the verbosity.

Code:
smartctl version 5.30 Copyright (C) 2002-4 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF INFORMATION SECTION ===
Device Model:     IC25N040ATMR04-0
Serial Number:    MRG257KBC2VYDH
Firmware Version: MO2OAD0A
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   6
ATA Standard is:  ATA/ATAPI-6 T13 1410D revision 3a
Local Time is:    Sun Aug 29 17:29:47 2004 GMT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x82)	Offline data collection activity was
					completed without error.
					Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0)	The previous self-test routine completed
					without error or no self-test has ever 
					been run.
Total time to complete Offline 
data collection: 		 ( 645) seconds.
Offline data collection
capabilities: 			 (0x5b) SMART execute Offline immediate.
					Auto Offline data collection on/off support.
					Suspend Offline collection upon new
					command.
					Offline surface scan supported.
					Self-test supported.
					No Conveyance Self-test supported.
					Selective Self-test supported.
SMART capabilities:            (0x0003)	Saves SMART data before entering
					power-saving mode.
					Supports SMART auto save timer.
Error logging capability:        (0x01)	Error logging supported.
					General Purpose Logging supported.
Short self-test routine 
recommended polling time: 	 (   2) minutes.
Extended self-test routine
recommended polling time: 	 (  37) minutes.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000b   100   100   062    Pre-fail  Always       -       0
  2 Throughput_Performance  0x0005   105   105   040    Pre-fail  Offline      -       5848
  3 Spin_Up_Time            0x0007   152   152   033    Pre-fail  Always       -       1
  4 Start_Stop_Count        0x0012   100   100   000    Old_age   Always       -       436
  5 Reallocated_Sector_Ct   0x0033   100   100   005    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000b   100   100   067    Pre-fail  Always       -       0
  8 Seek_Time_Performance   0x0005   120   120   040    Pre-fail  Offline      -       36
  9 Power_On_Hours          0x0012   099   099   000    Old_age   Always       -       674
 10 Spin_Retry_Count        0x0013   100   100   060    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       434
191 G-Sense_Error_Rate      0x000a   100   100   000    Old_age   Always       -       0
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       34
193 Load_Cycle_Count        0x0012   097   097   000    Old_age   Always       -       37353
194 Temperature_Celsius     0x0002   152   152   000    Old_age   Always       -       36 (Lifetime Min/Max 17/2286)
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0022   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0008   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x000a   200   200   000    Old_age   Always       -       0

SMART Error Log Version: 1
ATA Error Count: 12 (device log contains only the most recent five errors)
	CR = Command Register [HEX]
	FR = Features Register [HEX]
	SC = Sector Count Register [HEX]
	SN = Sector Number Register [HEX]
	CL = Cylinder Low Register [HEX]
	CH = Cylinder High Register [HEX]
	DH = Device/Head Register [HEX]
	DC = Device Command Register [HEX]
	ER = Error register [HEX]
	ST = Status register [HEX]
Timestamp = decimal seconds since the previous disk power-on.
Note: timestamp "wraps" after 2^32 msec = 49.710 days.

Error 12 occurred at disk power-on lifetime: 608 hours
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 01 8f 9a 01 e2  Error: UNC 1 sectors at LBA = 0x02019a8f = 33659535

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Timestamp  Command/Feature_Name
  -- -- -- -- -- -- -- --   ---------  --------------------
  25 00 02 8e 9a 01 e0 00    2147.400  READ DMA EXT
  25 00 04 8c 9a 01 e0 00    2143.700  READ DMA EXT
  25 00 06 8a 9a 01 e0 00    2140.100  READ DMA EXT
  25 00 08 88 9a 01 e0 00    2136.500  READ DMA EXT
  25 00 70 90 9a 01 e0 00    2136.400  READ DMA EXT

Error 11 occurred at disk power-on lifetime: 608 hours
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 01 8f 9a 01 e2  Error: UNC 1 sectors at LBA = 0x02019a8f = 33659535

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Timestamp  Command/Feature_Name
  -- -- -- -- -- -- -- --   ---------  --------------------
  25 00 04 8c 9a 01 e0 00    2143.700  READ DMA EXT
  25 00 06 8a 9a 01 e0 00    2140.100  READ DMA EXT
  25 00 08 88 9a 01 e0 00    2136.500  READ DMA EXT
  25 00 70 90 9a 01 e0 00    2136.400  READ DMA EXT
  25 00 72 8e 9a 01 e0 00    2132.700  READ DMA EXT

Error 10 occurred at disk power-on lifetime: 608 hours
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 01 8f 9a 01 e2  Error: UNC 1 sectors at LBA = 0x02019a8f = 33659535

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Timestamp  Command/Feature_Name
  -- -- -- -- -- -- -- --   ---------  --------------------
  25 00 06 8a 9a 01 e0 00    2140.100  READ DMA EXT
  25 00 08 88 9a 01 e0 00    2136.500  READ DMA EXT
  25 00 70 90 9a 01 e0 00    2136.400  READ DMA EXT
  25 00 72 8e 9a 01 e0 00    2132.700  READ DMA EXT
  25 00 74 8c 9a 01 e0 00    2129.100  READ DMA EXT

Error 9 occurred at disk power-on lifetime: 608 hours
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 01 8f 9a 01 e2  Error: UNC 1 sectors at LBA = 0x02019a8f = 33659535

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Timestamp  Command/Feature_Name
  -- -- -- -- -- -- -- --   ---------  --------------------
  25 00 08 88 9a 01 e0 00    2136.500  READ DMA EXT
  25 00 70 90 9a 01 e0 00    2136.400  READ DMA EXT
  25 00 72 8e 9a 01 e0 00    2132.700  READ DMA EXT
  25 00 74 8c 9a 01 e0 00    2129.100  READ DMA EXT
  25 00 76 8a 9a 01 e0 00    2125.500  READ DMA EXT

Error 8 occurred at disk power-on lifetime: 608 hours
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 71 8f 9a 01 e2  Error: UNC 113 sectors at LBA = 0x02019a8f = 33659535

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Timestamp  Command/Feature_Name
  -- -- -- -- -- -- -- --   ---------  --------------------
  25 00 72 8e 9a 01 e0 00    2132.700  READ DMA EXT
  25 00 74 8c 9a 01 e0 00    2129.100  READ DMA EXT
  25 00 76 8a 9a 01 e0 00    2125.500  READ DMA EXT
  25 00 78 88 9a 01 e0 00    2121.800  READ DMA EXT
  25 00 7a 86 9a 01 e0 00    2118.200  READ DMA EXT

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%       607         -
# 2  Short offline       Completed without error       00%         1         -
# 3  Short offline       Completed without error       00%         0         -
Could someone translate this for me?

Much thanks.
 
Old 08-29-2004, 08:01 PM   #6
finegan
Guru
 
Registered: Aug 2001
Location: Dublin, Ireland
Distribution: Slackware
Posts: 5,700

Rep: Reputation: 57
A couple read errors, might want to try badblocks as well and see if that has significant output, although I've seen smart errors on a drive like that not mean anything at all.

From here;

http://www.mikeoliveri.com/utils/dellslack.html

and all the others:

http://www.linux-on-laptops.com/dell.html

There doesn't seem to be any model specific issues though, so that is a bit odd. What specific part of kernel boot is it on when it crashes?

Cheers,

Finegan
 
Old 08-29-2004, 08:32 PM   #7
mcleodnine
Senior Member
 
Registered: May 2001
Location: Left Coast - Canada
Distribution: s l a c k w a r e
Posts: 2,731

Rep: Reputation: 45
A shot in the dark...

How much hardware do you have hanging off your power supply? (also include motherboard, Video card, and any PCI cards).
 
Old 08-30-2004, 12:52 PM   #8
donbellioni
Member
 
Registered: Mar 2004
Location: UK
Distribution: Slackware 10
Posts: 137

Original Poster
Rep: Reputation: 15
Thanks again for the replies.

Since my last post ive booted and the system has hung once again. It seems that the noapci kernel option hasnt solved this particular problem.

Finegan, the system is hanging at "Updating Shared Libraries: /sbin/ldconfig". But as before, it only happens sometimes. "badblocks -v /dev/hda3" gives:

Code:
Checking blocks 0 to 6144862
Checking for bad blocks (read-only test): 614486056/  6144862
6144861
done
Pass completed, 2 bad blocks found.
Is this bad? What can be done to fix these bad blocks?

Mcleodnine, its a standard Dell Inspiron running hardware such as a Dell Wireless 1350 Mini-PCI wireless card, nVidia Geforce FX 5200 64MB Graphics Card, Modular 8xDVD/24x CDRW Combo Drive, Integrated 56Kpbs V92 Modem - 10/100 Ethernet, 15.4" Wide Aspect Ultrasharp WXGA (1280X800) Screen, Intel Pentium M Processor.

Thanks.
 
Old 08-30-2004, 01:02 PM   #9
mcleodnine
Senior Member
 
Registered: May 2001
Location: Left Coast - Canada
Distribution: s l a c k w a r e
Posts: 2,731

Rep: Reputation: 45
donbellioni - My bad for not paying close enough attention - I forgot we were dealing with a notebook here.
 
Old 08-30-2004, 01:27 PM   #10
donbellioni
Member
 
Registered: Mar 2004
Location: UK
Distribution: Slackware 10
Posts: 137

Original Poster
Rep: Reputation: 15
Thanks anyway.
 
Old 08-31-2004, 02:56 PM   #11
donbellioni
Member
 
Registered: Mar 2004
Location: UK
Distribution: Slackware 10
Posts: 137

Original Poster
Rep: Reputation: 15
bump
 
Old 08-31-2004, 03:22 PM   #12
finegan
Guru
 
Registered: Aug 2001
Location: Dublin, Ireland
Distribution: Slackware
Posts: 5,700

Rep: Reputation: 57
As soon as bad blocks start appearing, that's not a good sign, the drive is smart enough to cordon off bad sectors, but you've got some slipping around the cracks and its just going to start to get worse...

ldconfig horking would seem to me more of a software issue as it can't update shared library links quite right... made any changes to glibc or anything along those lines?

I'm beginning to think this isn't hardware as much as a possible mucked up library problem, although running ldconfig is a lot of drive whacking...

Cheers,

Finegan
 
Old 08-31-2004, 04:22 PM   #13
donbellioni
Member
 
Registered: Mar 2004
Location: UK
Distribution: Slackware 10
Posts: 137

Original Poster
Rep: Reputation: 15
No changes to glibc finnegan, i assumed it was a hardware issue as ive had the same freezing occuring with the other processes above (gpm, acpi, keymaps), i might be wrong!

Is the badblocks problem fixable or is the drive physically broken?? Its just a new system.

Thanks.
 
Old 08-31-2004, 04:32 PM   #14
finegan
Guru
 
Registered: Aug 2001
Location: Dublin, Ireland
Distribution: Slackware
Posts: 5,700

Rep: Reputation: 57
badblocks are just drive death man, I recently had to kiss goodbye to a 60Gb Samsung OEM white-label that's off warranty, got about 95% of what I had on it off... ironically all of its bad sectors were clustered around the Slackware 10.0 and Solaris 9 x86 ISOs so I didn't have to lose anything I didn't already have. Badblocks are plenty enough grounds for an RMA though and a Dell isn't that hard to yank a drive out of. Might as well do it now while the warranty is probably still valid.

Cheers,

Finegan
 
Old 09-01-2004, 02:17 PM   #15
donbellioni
Member
 
Registered: Mar 2004
Location: UK
Distribution: Slackware 10
Posts: 137

Original Poster
Rep: Reputation: 15
Ah, exactly what i didn't want to hear.

How did you tell where the bad blocks were on your system? Am i right in thinking that they should be around the location where ldconfig is on mine? Can i verify this?

Thanks a million for the help finnegan, i best get on to Dell then.

Cheers.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
FreeBSD 5.3 hard disk boot problem kpachopoulos *BSD 0 12-31-2004 09:23 AM
boot problem about AS3.0QU1 on SATA hard disk aeolus Red Hat 0 06-22-2004 10:29 PM
dual boot problem with hard disk =( aktee67 Mandriva 5 02-05-2004 07:00 AM
Boot problem with hard disk, any os Vibrato Linux - Newbie 2 12-19-2003 12:46 AM
Redhat 7.2 freezes on large hard disk bruce_mckinnon Linux - Software 0 03-27-2002 05:55 PM


All times are GMT -5. The time now is 09:54 PM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration