LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - General (https://www.linuxquestions.org/questions/linux-general-1/)
-   -   Directory has become inaccessible (https://www.linuxquestions.org/questions/linux-general-1/directory-has-become-inaccessible-4175682537/)

lucmove 09-22-2020 05:09 PM

Directory has become inaccessible
 
I was downloading mail from all my mailboxes when the email application crashed just after downloading mail from one of the mailboxes.

Actually, the email application froze so I had to kill it.

I ran it again and checked the mailboxes again, and the email application froze again. And again. And again.

Upon investigation, I found that the directory where the messages of that mailbox are stored is inaccessible:

Code:

# /home/luc/Mail> ls -ls gmail1
[very long list of files]

# /home/luc/Mail> ls -ls gmail2
Killed

"Killed" is the actual output. I don't know what it means.

Note the #, i.e. it is inaccessible even as root.

SpaceFM (file manager) can open gmail1, but not gmail2. It says it's opening, but it takes forever and I give up. But right-clicking those directories and selecting Properties seems to work:

gmail1: 540MB, 11664 files, 14 folders
gmail2: 463MB, 6249 files, 7 folders

I also tried PCManFM and had the same results.

What is happening to that directory and how can I fix it?

TIA

dugan 09-22-2020 05:15 PM

Anything interesting in "dmesg" after "ls" gets killed?

lucmove 09-22-2020 05:34 PM

Quote:

Originally Posted by dugan (Post 6168689)
Anything interesting in "dmesg" after "ls" gets killed?

Code:

[35335.564211] BUG: unable to handle kernel paging request at 000000000003f8ca
[35335.568435] IP: [<ffffffffa661c433>] __d_lookup_rcu+0x63/0x180
[35335.572625] PGD 0
[35335.576780] Oops: 0000 [#37] SMP
[35335.581025] Modules linked in: [all my modules]
[35335.621144] task: ffff9b06d8a63100 task.stack: ffffb58142fec000
[35335.625414] RIP: 0010:[<ffffffffa661c433>]  [<ffffffffa661c433>] __d_lookup_rcu+0x63/0x180
[35335.625415] RSP: 0018:ffffb58142fefc60  EFLAGS: 00010202
[35335.625416] RAX: 0000000000000004 RBX: 000000000003f8ce RCX: ffffb5814001c000
[35335.625417] RDX: ffffb58142fefcc4 RSI: ffffb58142fefd90 RDI: ffff9b0512006180
[35335.625418] RBP: 0000000000000000 R08: ffffb58142fefcc4 R09: 0000000000000004
[35335.625419] R10: 000000006a3fda9f R11: 0000000000000000 R12: ffff9b0512006180
[35335.625420] R13: 000000046a3fda9f R14: ffff9b052e040024 R15: ffff9b07156e13e0
[35335.625421] FS:  00007fa192aa7f40(0000) GS:ffff9b071fb00000(0000) knlGS:0000000000000000
[35335.625422] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[35335.625423] CR2: 000000000003f8ca CR3: 0000000177ec8000 CR4: 00000000001406e0
[35335.625424] Stack:
[35335.625427]  ffffb58142fefd90 ffffb58142fefd80 0000000000000000 ffffb58142fefd80
[35335.625429]  0000000000000000 ffffb58142fefd18 ffff9b0512006180 ffffb58142fefd10
[35335.625434]  ffff9b07156e13e0 ffffffffa660d2e2 ffffb58142fefd0c ffffb58142fefd80
[35335.625434] Call Trace:
[35335.625449]  [<ffffffffa660d2e2>] ? lookup_fast+0x52/0x2e0
[35335.625451]  [<ffffffffa660e394>] ? walk_component+0x44/0x320
[35335.625454]  [<ffffffffa660f647>] ? path_lookupat+0x67/0x120
[35335.625456]  [<ffffffffa6612001>] ? filename_lookup+0xb1/0x180
[35335.625458]  [<ffffffffa65fdeca>] ? __check_object_size+0xfa/0x1d8
[35335.625462]  [<ffffffffa6756908>] ? strncpy_from_user+0x48/0x160
[35335.625464]  [<ffffffffa6611c3a>] ? getname_flags+0x6a/0x1e0
[35335.625466]  [<ffffffffa6606d49>] ? vfs_fstatat+0x59/0xb0
[35335.625467]  [<ffffffffa66072fd>] ? SYSC_newlstat+0x2d/0x60
[35335.625469]  [<ffffffffa660bdf2>] ? path_put+0x12/0x20
[35335.625472]  [<ffffffffa6629005>] ? path_getxattr+0x75/0xb0
[35335.625476]  [<ffffffffa6a0637b>] ? system_call_fast_compare_end+0xc/0x9b
[35335.625497] Code: 48 83 e3 fe 0f 84 92 00 00 00 4c 89 e8 45 89 ea 49 89 d0 48 c1 e8 20 48 89 34 24 49 89 fc 49 89 c1 eb 08 48 8b 1b 48 85 db 74 71 <8b> 6b fc 4c 3b 63 10 75 ef 48 83 7b 08 00 74 e8 83 e5 fe 41 f6
[35335.625500] RIP  [<ffffffffa661c433>] __d_lookup_rcu+0x63/0x180
[35335.625500]  RSP <ffffb58142fefc60>
[35335.625501] CR2: 000000000003f8ca
[35335.625547] ---[ end trace 7c7d972ec4895c48 ]---


dugan 09-22-2020 05:48 PM

Uh, wow. I'd certainly say that counts as interesting...

I'd definitely run a memtest after seeing this, and not try to do any more debugging until I've actually ruled out bad RAM.

lucmove 09-22-2020 05:51 PM

What kind of test would you run and, assuming memory is bad, why does it affect that directory only?

dugan 09-22-2020 05:54 PM

https://www.memtest86.com/

This was the standard test for faulty RAM last time I checked, which admittedly was a while ago.

lucmove 09-22-2020 05:57 PM

assuming memory is bad, why does it affect that directory only?

dugan 09-22-2020 05:59 PM

Quote:

Originally Posted by lucmove (Post 6168709)
assuming memory is bad, why does it affect that directory only?

That question is unanswerable.

If it turns out that your memory is good, then "why does that it affect that directory only" becomes extremely interesting.

dugan 09-22-2020 06:23 PM

You also need to scan the hard drive for bad sectors/bad blocks. That's definitely another potential cause, and it's more directly relevant to "why is this directory the only one affected?"

I'll just take the liberty of posting one link:

https://www.tecmint.com/check-linux-...rs-bad-blocks/

This is much more likely to be a hardware issue than to be a kernel bug (which, really, is the alternative explanation).

lucmove 09-22-2020 06:50 PM

Thanks. I am very familiar with badblocks. I don't like it because I once bought a new hard disk and decided to check it with badblocks which accused about thirty-odd bad blocks. I had the disk returned/replaced and the new one had many bad blocks too! The vendor refused to replace it again and I sucked it up, but ended up using that hard disk for more than ten years without a single problem. I still have it and it works. I just hardly ever use it anymore because I bought much larger disks and outgrew it.

Now, I just rebooted and the directory is working normally. The mail application isn't freezing anymore either, all the messages are there. Nothing turns up in dmesg either. The problem seems to be gone.

I should just point out that the machine froze when I issued the reboot command and I had to hard reset it. I ran memtest and no error was detect. Then I finally logged in again and everything seems normal.

I have no idea what happened.

MadeInGermany 09-23-2020 06:33 PM

With a bad disk block you should get an I/O error message. But you got a paging error, that has to do with virtual memory, for example bad RAM.
Do you have a swapfile? Was it manipulated while in use by the kernel?
Code:

free -m

scasey 09-23-2020 06:38 PM

Quote:

Originally Posted by lucmove (Post 6168719)
Now, I just rebooted and the directory is working normally. The mail application isn't freezing anymore either, all the messages are there. Nothing turns up in dmesg either. The problem seems to be gone.

I should just point out that the machine froze when I issued the reboot command and I had to hard reset it. I ran memtest and no error was detect. Then I finally logged in again and everything seems normal.

I have no idea what happened.

My guess would be that the memory got corrupted when you aborted the copy. Rebooting flushed the memory.

lucmove 09-24-2020 01:01 AM

Quote:

Originally Posted by MadeInGermany (Post 6169019)
With a bad disk block you should get an I/O error message. But you got a paging error, that has to do with virtual memory, for example bad RAM.
Do you have a swapfile? Was it manipulated while in use by the kernel?
Code:

free -m

I haven't had a swap file or partition for almost 10 years. The system has been running smoothly all this time. If this problem was caused by lack of swap, it was the very first one.

MadeInGermany 09-24-2020 02:30 AM

Then I suspect a bug in the kernel, most likely the driver for your disk.
Look for updates!

Last but not least, there are possible hardware faults like a distortion on a power line, leading to random corruptions...

Once I met a series of hard disks with faulty embedded SRAM cache. Very nasty, all types of hangings, malfunctions, corruptions occurred. Finally we detected a bit flip in a corrupted data file. Contacted the vendor: the bit flip error was already suspected and examined. We got new disks :)

dugan 09-24-2020 04:31 PM

There are a lot of hardware issues that could have caused this. Overheating is another possibility.


All times are GMT -5. The time now is 12:04 PM.