Linux - SoftwareThis forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
I using the RHEL 6.6 that installed in vSphere 5.5, Linux Kernel version is 2.6.32-431.
[root@DATASRV(dat01-b)~]# uname -a
Linux DATASRV 2.6.32-431.el6.x86_64 #1 SMP Sun Nov 10 22:19:54 EST 2013 x86_64 x86_64 x86_64 GNU/Linux
The partition /dat009 is mounted to the SAN switch and it is shared by a SAN.
[root@DATASRV(dat01-b)~]# df -khT /dat009
/dev/mapper/vg--dat009-lv--dat009 ext4 1.4T 21G 1.3T 2% /dat009
[root@DATASRV(dat01-b)~]#
I Google and found it should be a bug of RHEL and seems no patch to fix it. But I found a lot of suggestions are to run fsck to fix the issue.
However, our customer don't agree to umount the disk to run fsck, he think it is dangerous for the system, and he also don't want to occur service interruption.
Can you please help to share your experience and suggestions?
The customer is always right.
However my attitude would be to point out they have accepted responsibility for using a filesystem known to have errors. So any future failure is not your fault, but theirs.
Some of my past managers have stated I have an attitude problem ... :shrug:
You need to fsck that system. At the very least make sure that there are good recent backups in case things go even more awry.
svg0 is correct - explain the situation to the customer and hand responsibility back over if they refuse to play ball. You also have to determine why the customer thinks it would be "dangerous for the system" and provide arguments against that. If he/she is worried about the loss of income due to service interruption, point them towards the saying at the start of this post.
The input/output errors suggest that a disk is failing. An unreadable inode will cause the "???????" result from ls. There should be messages logged in /var/log/messages at that time with more information about what happened.
Thank you ALL,
We just run the fsck in the problematic partition and it is finished within 10 mins for 25 GB data.
However, the error have not be found again after fsck, but some files and directories are missing, only few files / directories can be moved to "lost+found". I restore the data from backup, it seems normal now.
This is what the customer was (probably) concerned about - fsck is to ensure the integrity of the filesystem not necessarily the files contained there-in. Good to see you have a solid backup regime.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.