LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - General (http://www.linuxquestions.org/questions/linux-general-1/)
-   -   Repeated hard drive thrashing followed by lockup (http://www.linuxquestions.org/questions/linux-general-1/repeated-hard-drive-thrashing-followed-by-lockup-764784/)

peppergrower 10-27-2009 08:27 AM

Repeated hard drive thrashing followed by lockup
 
A couple weeks ago I started having odd problems with my laptop that may be hardware related, but I'm not sure.

First, my specs: I have a Dell Inspiron E1505 (aka 6400) running Arch Linux and KDE4, completely up to date. I also have a Windows XP partition that I boot into occasionally. The Linux partition is fully encrypted: I'm using LVM on top of dm-crypt.

The problem: I'll be using my laptop normally (some light web surfing, etc.) when all of the sudden programs will go completely unresponsive. This is accompanied by continuous hard drive access. Sometimes, after 30 seconds or so, the hard drive will stop and I'll regain control. Sometimes I'm forced to hold down the power button for 5 seconds to force my computer to turn off. (And if it stops on its own, odds are good it'll have another episode in a few minutes.)

Occasionally I can get to a terminal (Alt-Ctrl-F1), though usually I can't do anything there either. One time, though, I found a ton of error messages that might shed some light on my problem:

EXT3fs error (device dm-1): ext3_find_entry: reading directory # _____ offset 0

[clip many more lines saying the same thing, but with different directory numbers]

EXT3fs error (device dm-3) in ext3_reserve_inode_write: Journal has aborted
EXT3fs error (device dm-3) in ext3_orphan_del: Journal has aborted
EXT3fs error (device dm-3) in ext3_reserve_inode_write: Journal has aborted
EXT3fs error (device dm-3) in ext3_delete_inode: Journal has aborted

I've seen similar messages a few times since; also, at least once I got a kernel panic after such messages. (I suspect that these messages are always displayed, but not necessarily where I can see them, since I can't always get to a terminal.)

I began to think perhaps my hard drive was failing; it's fairly new (I got it last summer) and I treat my laptop with care, but it can happen. So I ran some diagnostics. Unfortunately, the diagnostic utility from Western Digital (who makes my hard drive) wouldn't finish booting; I'd get as far as a boot screen saying it was loading Caldera DR-DOS, and then nothing. So I used Hitachi's tool instead. After an hour or hour and a half (it did a thorough surface scan and queried the S.M.A.R.T information, among other things), it found no problems. Hm.

One more thing that makes me wonder if it's a hardware problem: at the same time this started, my Windows partition started acting up as well. I'd be watching a television show online (e.g., Hulu)--I can get reasonable full-screen graphics performance under Windows but not in Linux--when I'd get a blue screen of death and the hard drive would thrash. I don't remember the STOP code, unfortunately; if it would help, I can switch back over to XP and try to get the error again.

So what do you think? Are the Windows crashes related, or could that just be coincidence (and maybe a buggy version of Adobe Flash or something)? Should I trust the diagnosis of the Hitachi tool, or is it possibly missing something that only a WD-produced tool would know about my hard drive? Maybe it's something about my file system, rather than hardware? I'm very open to suggestions, since this is super annoying. :) (And yes, I have full, up-to-date backups, in case things go really wrong.)

TB0ne 10-27-2009 09:22 AM

Quote:

Originally Posted by peppergrower (Post 3734078)
A couple weeks ago I started having odd problems with my laptop that may be hardware related, but I'm not sure.

First, my specs: I have a Dell Inspiron E1505 (aka 6400) running Arch Linux and KDE4, completely up to date. I also have a Windows XP partition that I boot into occasionally. The Linux partition is fully encrypted: I'm using LVM on top of dm-crypt.

The problem: I'll be using my laptop normally (some light web surfing, etc.) when all of the sudden programs will go completely unresponsive. This is accompanied by continuous hard drive access. Sometimes, after 30 seconds or so, the hard drive will stop and I'll regain control. Sometimes I'm forced to hold down the power button for 5 seconds to force my computer to turn off. (And if it stops on its own, odds are good it'll have another episode in a few minutes.)

Occasionally I can get to a terminal (Alt-Ctrl-F1), though usually I can't do anything there either. One time, though, I found a ton of error messages that might shed some light on my problem:

EXT3fs error (device dm-1): ext3_find_entry: reading directory # _____ offset 0

[clip many more lines saying the same thing, but with different directory numbers]

EXT3fs error (device dm-3) in ext3_reserve_inode_write: Journal has aborted
EXT3fs error (device dm-3) in ext3_orphan_del: Journal has aborted
EXT3fs error (device dm-3) in ext3_reserve_inode_write: Journal has aborted
EXT3fs error (device dm-3) in ext3_delete_inode: Journal has aborted

I've seen similar messages a few times since; also, at least once I got a kernel panic after such messages. (I suspect that these messages are always displayed, but not necessarily where I can see them, since I can't always get to a terminal.)

I began to think perhaps my hard drive was failing; it's fairly new (I got it last summer) and I treat my laptop with care, but it can happen. So I ran some diagnostics. Unfortunately, the diagnostic utility from Western Digital (who makes my hard drive) wouldn't finish booting; I'd get as far as a boot screen saying it was loading Caldera DR-DOS, and then nothing. So I used Hitachi's tool instead. After an hour or hour and a half (it did a thorough surface scan and queried the S.M.A.R.T information, among other things), it found no problems. Hm.

One more thing that makes me wonder if it's a hardware problem: at the same time this started, my Windows partition started acting up as well. I'd be watching a television show online (e.g., Hulu)--I can get reasonable full-screen graphics performance under Windows but not in Linux--when I'd get a blue screen of death and the hard drive would thrash. I don't remember the STOP code, unfortunately; if it would help, I can switch back over to XP and try to get the error again.

So what do you think? Are the Windows crashes related, or could that just be coincidence (and maybe a buggy version of Adobe Flash or something)? Should I trust the diagnosis of the Hitachi tool, or is it possibly missing something that only a WD-produced tool would know about my hard drive? Maybe it's something about my file system, rather than hardware? I'm very open to suggestions, since this is super annoying. :) (And yes, I have full, up-to-date backups, in case things go really wrong.)

Based on the symptoms described, the errors, and the physical notes (i.e. 'thrashing'), I'd say your drive was having a problem. The surface-scan diags check the physical media, but won't catch a flaky logic-board on your drive, unless the problem occurs DURING that scan.

I'd replace the drive ASAP, since you've got good backups. I'd also probably get a cheap external USB enclosure for the existing drive, so you'll have an easier time copying files back and forth, since you can still access it.

peppergrower 10-27-2009 10:48 AM

Quote:

Originally Posted by TB0ne (Post 3734124)
Based on the symptoms described, the errors, and the physical notes (i.e. 'thrashing'), I'd say your drive was having a problem. The surface-scan diags check the physical media, but won't catch a flaky logic-board on your drive, unless the problem occurs DURING that scan.

I'd replace the drive ASAP, since you've got good backups. I'd also probably get a cheap external USB enclosure for the existing drive, so you'll have an easier time copying files back and forth, since you can still access it.

You're correct; I don't know what the exact problem is, but just in the last couple of hours it's gotten to the point that half the time my computer doesn't even recognize that the hard drive exists. (There's a diagnostic utility built into the BIOS, and when I ran it it complained that I had no hard drive, so I don't think it's just a bad boot sector.) So perhaps it's the logic board, as you suggested.

Oh well! It's still under warranty, and I even have my old hard drive set up with a fresh install of Arch from a couple months ago when I was looking into switching (from Ubuntu), so this'll be a small headache but nothing serious. Thanks for the help! I'll be calling Western Digital today to get a replacement.

TB0ne 10-27-2009 11:13 AM

Quote:

Originally Posted by peppergrower (Post 3734195)
You're correct; I don't know what the exact problem is, but just in the last couple of hours it's gotten to the point that half the time my computer doesn't even recognize that the hard drive exists. (There's a diagnostic utility built into the BIOS, and when I ran it it complained that I had no hard drive, so I don't think it's just a bad boot sector.) So perhaps it's the logic board, as you suggested.

Oh well! It's still under warranty, and I even have my old hard drive set up with a fresh install of Arch from a couple months ago when I was looking into switching (from Ubuntu), so this'll be a small headache but nothing serious. Thanks for the help! I'll be calling Western Digital today to get a replacement.

No problem, and good luck. I've been in your shoes more times than I care to remember. :)

peppergrower 11-08-2009 01:38 AM

Alas, it looks like it is hardware, but not the hard drive: I swapped in the other hard drive I mentioned, and my laptop still claims it can't find a bootable device. Plus, I ran Western Digital's disk scan (using the Windows version of their tool), and it came out totally clean. I'm thinking a failing SATA controller now.

Due to the intermittent nature of the problem, and the way it most often occurs after moving the laptop (or shifting its position on my lap), I suspect a loose connection somewhere. I'm not afraid to use a soldering iron (even on surface mount); does anyone have any ideas where to start on something like this? (I figure if it's dying anyway, at least there's little danger I'll make it worse.)

Alternatively, does anyone have any other ideas how to fix this? Is the SATA controller something you can swap out on a laptop, and would it be worth it on one that's slightly over 3 years old?

TB0ne 11-08-2009 01:46 PM

Quote:

Originally Posted by peppergrower (Post 3748860)
Alas, it looks like it is hardware, but not the hard drive: I swapped in the other hard drive I mentioned, and my laptop still claims it can't find a bootable device. Plus, I ran Western Digital's disk scan (using the Windows version of their tool), and it came out totally clean. I'm thinking a failing SATA controller now.

Due to the intermittent nature of the problem, and the way it most often occurs after moving the laptop (or shifting its position on my lap), I suspect a loose connection somewhere. I'm not afraid to use a soldering iron (even on surface mount); does anyone have any ideas where to start on something like this? (I figure if it's dying anyway, at least there's little danger I'll make it worse.)

Alternatively, does anyone have any other ideas how to fix this? Is the SATA controller something you can swap out on a laptop, and would it be worth it on one that's slightly over 3 years old?

No, they're not typically something you can upgrade/replace, as they're usually part of the mobo. You MIGHT want to consider getting a brand-new motherboard, though...3 years old, you can perhaps find one fairly cheap....

MrCode 11-08-2009 02:26 PM

Quote:

No, they're not typically something you can upgrade/replace, as they're usually part of the mobo. You MIGHT want to consider getting a brand-new motherboard, though...3 years old, you can perhaps find one fairly cheap....
For a laptop? I thought the only user-replaceable components in a laptop were the HDD and RAM...

TB0ne 11-08-2009 02:31 PM

Quote:

Originally Posted by MrCode (Post 3749542)
For a laptop? I thought the only user-replaceable components in a laptop were the HDD and RAM...

Normally, yes. But 3 years old, and not under warranty, why not? Nothing magic about it...just held in with tiny screws, so as long as you've got some patience and feel like doing it, it's not difficult.

The hardest part is finding the replacement part.

peppergrower 11-09-2009 09:40 AM

Quote:

Originally Posted by TB0ne (Post 3749481)
No, they're not typically something you can upgrade/replace, as they're usually part of the mobo. You MIGHT want to consider getting a brand-new motherboard, though...3 years old, you can perhaps find one fairly cheap....

Yeah, that's a good idea; I'll look into it. (Thanks again for the suggestions!) I took my laptop apart on a whim the other night; I knew it wouldn't help, but it was mildly cathartic. :) Anyway, it's not too bad to take apart, and replacing the motherboard would only be one step farther than I've been already.

Random question: without thinking, I took the heat sink off (to more easily blow some dust out of the fins). If I find a way to keep this machine going I should probably apply some fresh thermal grease, yes? (The stuff that was on the CPU was totally dried out, and the GPU seemed to just have something rubbery--not thermal grease at all--so I'm assuming I just leave that be.)

TB0ne 11-09-2009 09:58 AM

Quote:

Originally Posted by peppergrower (Post 3750381)
Yeah, that's a good idea; I'll look into it. (Thanks again for the suggestions!) I took my laptop apart on a whim the other night; I knew it wouldn't help, but it was mildly cathartic. :) Anyway, it's not too bad to take apart, and replacing the motherboard would only be one step farther than I've been already.

Random question: without thinking, I took the heat sink off (to more easily blow some dust out of the fins). If I find a way to keep this machine going I should probably apply some fresh thermal grease, yes? (The stuff that was on the CPU was totally dried out, and the GPU seemed to just have something rubbery--not thermal grease at all--so I'm assuming I just leave that be.)

I certainly would...fresh thermal paste is always a must, in my opinion.

lewc 11-10-2009 04:22 PM

content removed


All times are GMT -5. The time now is 01:23 PM.