I have a raid5 volume formatted in XFS (3x 8TB disks) that heve been working just fine for 4 years now.
Since two days ago I get a notification via email (openmediavault) that the volume has been dismounted.
Code:
Status failed Service mountpoint_srv_dev-disk-by-label-DATA
Date: Wed, 12 Jan 2022 08:08:30
Action: alert
Host: NAS
Description: status failed (1) -- mountpoint: /srv/dev-disk-by-label-DATA: Input/output error
I tried what I know already, dismount all the other binding to the volume, run xfs_repair and remounted ok (even after a reboot)
The thing is.... it keeps disconnecting! This is what my syslog is telling me:
Code:
Jan 12 08:07:43 NAS kernel: [95441.856183] CPU: 3 PID: 1438 Comm: syncthing Not tainted 4.18.0-0.bpo.1-amd64 #1 Debian 4.18.6-1~bpo9+1
Jan 12 08:07:43 NAS kernel: [95441.856184] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 12/12/2018
Jan 12 08:07:43 NAS kernel: [95441.856185] Call Trace:
Jan 12 08:07:43 NAS kernel: [95441.856191] dump_stack+0x5c/0x7b
Jan 12 08:07:43 NAS kernel: [95441.856233] xfs_trans_cancel+0x116/0x140 [xfs]
Jan 12 08:07:43 NAS kernel: [95441.856274] xfs_create+0x41d/0x640 [xfs]
Jan 12 08:07:43 NAS kernel: [95441.856316] xfs_generic_create+0x241/0x2e0 [xfs]
Jan 12 08:07:43 NAS kernel: [95441.856321] path_openat+0x141c/0x14d0
Jan 12 08:07:43 NAS kernel: [95441.856325] do_filp_open+0x99/0x110
Jan 12 08:07:43 NAS kernel: [95441.856329] ? vfs_statx+0x73/0xe0
Jan 12 08:07:43 NAS kernel: [95441.856331] ? vfs_statx+0x73/0xe0
Jan 12 08:07:43 NAS kernel: [95441.856333] ? __check_object_size+0x98/0x1a0
Jan 12 08:07:43 NAS kernel: [95441.856335] ? do_sys_open+0x12e/0x210
Jan 12 08:07:43 NAS kernel: [95441.856337] do_sys_open+0x12e/0x210
Jan 12 08:07:43 NAS kernel: [95441.856340] do_syscall_64+0x55/0x110
Jan 12 08:07:43 NAS kernel: [95441.856343] entry_SYSCALL_64_after_hwframe+0x44/0xa9
Jan 12 08:07:43 NAS kernel: [95441.856346] RIP: 0033:0x4b5c2a
Jan 12 08:07:43 NAS kernel: [95441.856346] Code: e8 fb 4f fb ff 48 8b 7c 24 10 48 8b 74 24 18 48 8b 54 24 20 4c 8b 54 24 28 4c 8b 44 24 30 4c 8b 4c 24 38 48 8b 44 24 08 0f 05 <48> 3d 01 f0 ff ff 76 20 48 c7 44 24 40 ff ff ff ff 48 c7 44 24 48
Jan 12 08:07:43 NAS kernel: [95441.856373] RSP: 002b:000000c0009ad250 EFLAGS: 00000206 ORIG_RAX: 0000000000000101
Jan 12 08:07:43 NAS kernel: [95441.856375] RAX: ffffffffffffffda RBX: 000000c000045800 RCX: 00000000004b5c2a
Jan 12 08:07:43 NAS kernel: [95441.856376] RDX: 00000000000800c2 RSI: 000000c00122a360 RDI: ffffffffffffff9c
Jan 12 08:07:43 NAS kernel: [95441.856377] RBP: 000000c0009ad2e0 R08: 0000000000000000 R09: 0000000000000000
Jan 12 08:07:43 NAS kernel: [95441.856378] R10: 00000000000001a4 R11: 0000000000000206 R12: 000000c00122a360
Jan 12 08:07:43 NAS kernel: [95441.856379] R13: 0000000000000001 R14: 000000c0002e8000 R15: ffffffffffffffff
Jan 12 08:07:43 NAS kernel: [95441.856382] XFS (sdc1): xfs_do_force_shutdown(0x8) called from line 1018 of file /build/linux-GVmoCH/linux-4.18.6/fs/xfs/xfs_trans.c. Return address = 000000008dcb83c7
Once it gets dismounted it looks like this:
Code:
root@NAS:~# ls -la /mnt/
ls: cannot access '/mnt/raid5': Input/output error
total 24
drwxr-xr-x 4 root root 4096 Nov 20 12:24 .
drwxr-xr-x 24 root root 4096 Oct 24 2019 ..
drwxrwsrwx 184 ftp users 16384 Nov 1 02:05 5TB
d????????? ? ? ? ? ? raid5
root@NAS:~# ls -la /mnt/raid5/
ls: cannot access '/mnt/raid5/': Input/output error
The funny thing is: xfs_repair doesn't find anything (any more)
Code:
root@NAS:~# xfs_repair /dev/sdc1
Phase 1 - find and verify superblock...
Phase 2 - using internal log
- zero log...
- scan filesystem freespace and inode maps...
- found root inode chunk
Phase 3 - for each AG...
- scan and clear agi unlinked lists...
- process known inodes and perform inode discovery...
- agno = 0
- agno = 1
- agno = 2
- agno = 3
- agno = 4
- agno = 5
- agno = 6
- agno = 7
- agno = 8
- agno = 9
- agno = 10
- agno = 11
- agno = 12
- agno = 13
- agno = 14
- process newly discovered inodes...
Phase 4 - check for duplicate blocks...
- setting up duplicate extent list...
- check for inodes claiming duplicate blocks...
- agno = 3
- agno = 0
- agno = 2
- agno = 4
- agno = 1
- agno = 6
- agno = 5
- agno = 7
- agno = 8
- agno = 9
- agno = 10
- agno = 11
- agno = 12
- agno = 13
- agno = 14
Phase 5 - rebuild AG headers and trees...
- reset superblock...
Phase 6 - check inode connectivity...
- resetting contents of realtime bitmap and summary inodes
- traversing filesystem ...
- traversal finished ...
- moving disconnected inodes to lost+found ...
Phase 7 - verify and correct link counts...
Note - quota info will be regenerated on next quota mount.
done
root@NAS:~#
An the raid controller tells me the volume is completely OK.
I'm not sure what to try next?