Short hard disk accesses like a beating heart

273 · 06-08-2017, 03:01 PM

I recently noticed that I was hearing my hard drive[s] doing a similar thing. Tonight, on a whim, I stopped conky (which didn't change anything) then I pkilled hddtemp and, lo and behold, the noise stopped.
I'm fairly sure I only started getting this noise recently so I can only assume it's a bug in hddtemp?

JZL240I-U · 06-23-2017, 07:16 AM

Quote:

Originally Posted by business_kid

... It would be helpful if you could summarize problems, progress, and remaining issues while eliminating what didn't work. ...

Good idea. I'll do it systematically from the beginning of the thread please bear with me.

Problem:

"I can hear that one hard disk is briefly accessed about 88 times / minute -- sounds like a quietly beating heart. This is on rare occasions about once a month.

/ is on a SSD, so no mechanical noises can come from there. Two partitions are mounted from two seagate disks, /home with ext4 on sda and /backup with ext4 ond sdb. About six partitions on sda and sdb are not mounted.

What I tried so far:

iostat: no problem obvious
fstab: no problem obvious
indexing a file?: akonadi, mysqld killed, no change; not a new disk / partition
swapping: no
smartmontools: installed, no Problem, but seagate changed the meaning of the parameters without disclosing what they now stand for...
seatools: results in "everything roger" and SMART will be "not triggered"
lsof |grep sdX: no problem obvious for the experts either
iotop: kworker and systemd both are at the top but numbers read 0
sync: no change
"relatime" in fstab: no change
"lazytime" in fstab: no change
"noatime" in fstab: no Change
Firmware of sda: unresolvable
"echo 1 > /proc/sys/vm/block_dump": did not change anything
"diff <(grep '_bytes: [1-9]' /proc/*/io) <(sleep 1;grep '_bytes: [1-9]' /proc/*/io)": came with empty result
"dmesg|tail": indicated activities of jbd2, the journalling daemon of ext4

Resume:

There is an indicator thad jbd2 is the culprit but no real proof (while Hardware can't be ruled out entirely). I found no way to shut jbd2 down while /home is mounted (and not endangering my data).

Is a dirtied inode a sign of danger:

Code:

[ 5060.928177] DOM Worker(3327): dirtied inode 9437485 (recovery.js.tmp) on sda6

Suggestions? And thanks for everybody's patience

.

273 · 06-23-2017, 07:51 AM

Have you tried putting them to sleep with hddparm? Should be something like "hddparm -S 5 /dev/sde" run with root permission.

business_kid · 06-23-2017, 11:18 AM

To me, this isn't looking like a software problem, unless you can tell us it is. I think we may have to consider disk firmware or internals as a possibility. You seem to have been thorough.

As you have several unmounted partitions, I would enlist the spare space and try to move the problem. If there is no spare space (otherwise occupied) could you unmount one or the other in a process of elimination and mount it only when needed?

273 · 06-23-2017, 12:04 PM

Quote:

Originally Posted by business_kid

To me, this isn't looking like a software problem, unless you can tell us it is. I think we may have to consider disk firmware or internals as a possibility. You seem to have been thorough.

As you have several unmounted partitions, I would enlist the spare space and try to move the problem. If there is no spare space (otherwise occupied) could you unmount one or the other in a process of elimination and mount it only when needed?

As I aluded to, unless you specifically tell the disk to power down, using hddparm, it probably won't power down and, for whatever reason (I noted hddtemp), will do this.
Not that I disagree completely just that I don't think software has been ruled out.

business_kid · 06-24-2017, 04:34 AM

I spent a lot of time addressing hardware faults in strange hardware in a past life in Industrial Electronics. Then my industry customers moved manufacturing to China, India & Poland, but I didn't :-/.

Then I survived on a well sensitive 'seat of the pants' for assessing the most likely cause, and my comment gave you the benefit of that. I agree software has not been ruled out entirely, and don't believe it ever can be. But your previous work as detailed in the review has caused it to recede as a likely cause. You can get more definite ruling out the hardware/firmware in the disk, which I suggest you do, and then you have nearly proved it's software or PC related. It's also very difficult to rule out hardware & firmware entirely. As complexity and chip density increases, certainty in maintenance issues reduces. Today's pcs are complex machines.

JZL240I-U · 06-28-2017, 07:22 AM

Then we are at a dead end. As already written in post #37 of this thread, I'm unable to update the firmware of sda due to a faulty update from seagate. But I'll sure try hddtemp when it starts ticking again.

In the meantime I'll open a new thread and ask whether somebody knows how to stop jbd2. If and when I get results I'll post a link here.

<edit> Just to be on the secure side I checked the seagate site again. It is contradictionary in itself, saying in one place that there is no new firmware while on an other page it offers the faulty firmware *sigh*.

I just noticed that I can umount sdb (i.e. /backup) next time, thus restricting the possible target once more. Will be back

</edit>

JZL240I-U · 10-01-2017, 12:21 PM

Okay.

It is not /dev/sdb (backup). Umounted it, heart still beating.

It is not hddtemp either. Stopped it, heart still beating, also after starting it again.

"hdparm -S 5 /dev/sda" doesn't do the trick, either.

*grrr*

I wonder, whether we can crack this nut eventually...

Thanks for your support.

business_kid · 10-01-2017, 01:52 PM

This is a maintenance bodge, but may work for you. It's sort of Horse Sense.

Quote:

Horse sense is the thing horses have which prevents them betting on people.
W.C. Fields

When the 88 times/minute heartbeat starts up again, encourage it and do nothing to stop it.
Then go round the disks, put a finger on them and feel for vibration. Or put an ear to it and listen for it. Often putting a head between 2 disks and listening for left/right is best.

This is the stupid but most effective way to get at this. I don't have your exact disk alignment, and don't feel like reading 52 posts to discover you haven't laid it out. If you find the disk, then maybe we can go after why.

IsaacKuo · 10-01-2017, 03:04 PM

Skimming through part of this thread...is /home mounted on a spinning hard drive (/dev/sda)? If so, then it could be any number of things related to X and the desktop environment periodically writing to the hard drive. If you have any sort of web browser or numerous other sorts of software running, there will be periodic writes (these generally assist in recovering what you were working on in case of a power failure or software crash).

If you have /home on a spinning hard drive, I think you're just going to have to live with the system writing to /home often. If the noise is annoying to you, put /home on the SSD instead. You can use symlinks to link to the spinning hard drive for stuff which is too big to fit on the SSD.

Although it's annoying, I would recommend nesting those symlinks within a folder so they are not directly in ~. If you don't, then even something as simple as "ls" in bash can access and probably spin up the hard drive (even if just to see whether or not the symlink is broken).

jpollard · 10-02-2017, 06:08 AM

Quote:

Originally Posted by business_kid

This is a maintenance bodge, but may work for you. It's sort of Horse Sense.
When the 88 times/minute heartbeat starts up again, encourage it and do nothing to stop it.
Then go round the disks, put a finger on them and feel for vibration. Or put an ear to it and listen for it. Often putting a head between 2 disks and listening for left/right is best.

This is the stupid but most effective way to get at this. I don't have your exact disk alignment, and don't feel like reading 52 posts to discover you haven't laid it out. If you find the disk, then maybe we can go after why.

I've used a flathead screwdriver as a stethoscope to help with that. Any vibrations from the disk get transferred to the screwdriver, then put the handle of the screwdriver against your head, which transfers the vibration to your head/eardrum making it easier to hear.

JZL240I-U · 10-02-2017, 08:11 AM

@business_kid & jpollard Good idea, I'll make doubly sure that it is /dev/sda next time it starts ticking again.

@IsaacKuo Yes, this is a spinning disk. I have three but mounted partitions only from two, one of which I can umount to no effect. So it is quite clearly emanating from /home mounted on /dev/sda6.

It occurs only every other month or so, therefore I interpret it as a faulty condition (glitch) in the system with no harm done so far. BUT(!) I want to know what it is.

I started another thread on the suspicion that it might be the jbd2-demon of the ext4 file system: https://www.linuxquestions.org/quest...em-4175608780/ but at the last occurrence I forgot to try the suggestion there -- mea culpa, mea maxima culpa. But the ticking will come again, never fear

.

business_kid · 10-03-2017, 03:45 AM

Quote:

Originally Posted by jpollard

I've used a flathead screwdriver as a stethoscope to help with that. Any vibrations from the disk get transferred to the screwdriver, then put the handle of the screwdriver against your head, which transfers the vibration to your head/eardrum making it easier to hear.

/OT
I cut my teeth and did most of my work around high voltage. Screwdrivers weren't much help there, except you could approach Extra High Tension (EHT 20KV or more) with a screwdriver tip, a spark would jump up to half an inch (12-13 mm) and you could check your EHT as you earthed it through the plastic. I carelessly left a knuckle over the metal one day, and the spark jumped again without the limiting protection of the plastic. I didn't do it twice!

JZL240I-U · 11-05-2017, 10:43 AM

@business_kid & jpollard Yep, it is definitely /dev/sda, I put my finger on it and can clearly feel the vibrations during every "tick". I can't "umount -f"/dev/sda6 (/home), system tells me it is in use...

The suggestion in the other thread doesn't help either, seems to be more Arch specific but I am running SUSE -- not to mention the fact that I don't want a long commit interval since usually there is no ticking. Btw. changing commit times would work only after a reboot and then the ticking is gone anyway...

business_kid · 11-05-2017, 12:21 PM

At long last we have eliminated your umpteen drives and got it down to the / partition. 59 posts is nearly a record length for a problem - enter it for the Guinnes Book of records, or the LQ equivalent. We Irish are always finding excuses for introducing booze into a conversation :-).

Anyhow, next time this happens, get a root console open and run

Code:

umount -a (Ignore 'can't unmount' errors)
lsof > ~/lsof_bad.txt

/ will not be unmounted, of course. ideally you'd unmount everything else. If the 'umount -a' stops it, be sure to mention it when you report back, but continue here. Then when it's stopped

Code:

umount -a
lsof > ~/lsof_good.txt

Then, you can narrow it further with

Code:

diff ~/lsof_bad.txt ~/lsof_good.txt > lsof.diff

Post the output. It's possible that things are running in /opt or /home or places which are causing this. Presuming that's not the case, you can eliminate them (See below). That will certainly narrow it down to hopefully a few apps.
If you find that an appreciable number of things are running from other drives that you know are not causing this, change the lsof commands to read

Code:

lsof |grep -v -e 'home' -e 'opt' > file

or wherever the known innocent stuff resides. This will help to keep the diff as short as possible. The -v on grep selects non matching lines. You can pack multiple -e 'something' options in a single grep.