We are having issue with Red Hat Enterprise Linux Server release 6.5 (Santiago)
Linux - KernelThis forum is for all discussion relating to the Linux kernel.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
We are having issue with Red Hat Enterprise Linux Server release 6.5 (Santiago)
Hello Guys,
In our database Red Hat linux 6.5 server, we facing issue like Filesystem disappeared suddenly.
Getting below error in my putty session while we facing issue,
"kernel:journal commit I/O error"
Aug 29 09:42:12 localhost dhclient[42261]: Sending on Socket/fallback
Aug 29 09:42:12 localhost dhclient[42261]: DHCPDISCOVER on em2 to 255.255.255.255 port 67 interval 5 (xid=0x27fe43b6)
Aug 29 09:42:17 localhost dhclient[42261]: DHCPDISCOVER on em2 to 255.255.255.255 port 67 interval 9 (xid=0x27fe43b6)
Aug 29 09:42:26 localhost dhclient[42261]: DHCPDISCOVER on em2 to 255.255.255.255 port 67 interval 15 (xid=0x27fe43b6)
Aug 29 09:42:36 localhost kernel: rport-2:0-0: blocked FC remote port time out: removing target and saving binding
Aug 29 09:42:36 localhost kernel: sd 2:0:0:2: rejecting I/O to offline device
Aug 29 09:42:36 localhost kernel: sd 2:0:0:2: rejecting I/O to offline device
Aug 29 09:42:36 localhost kernel: sd 2:0:0:2: rejecting I/O to offline device
Aug 29 09:42:36 localhost kernel: sd 2:0:0:3: rejecting I/O to offline device
Aug 29 09:42:36 localhost kernel: sd 2:0:0:3: rejecting I/O to offline device
Aug 29 09:42:36 localhost kernel: sd 2:0:0:3: rejecting I/O to offline device
Aug 29 09:42:36 localhost kernel: Aborting journal on device sdf-8.
Aug 29 09:42:36 localhost kernel: sd 2:0:0:3: rejecting I/O to offline device
Aug 29 09:42:36 localhost kernel: EXT4-fs error (device sdf): ext4_journal_start_sb: Detected aborted journal
Aug 29 09:42:36 localhost kernel: EXT4-fs (sdf):
Aug 29 09:42:36 localhost kernel: rport-2:0-1: blocked FC remote port time out: removing target and saving binding
Aug 29 09:42:36 localhost kernel: Remounting filesystem read-only
Aug 29 09:42:36 localhost kernel: JBD2: Detected IO errors while flushing file data on sdg-8
Aug 29 09:42:36 localhost kernel:
Aug 29 09:42:36 localhost kernel: sd 2:0:0:2: rejecting I/O to offline device
Aug 29 09:42:36 localhost kernel: JBD2: I/O error detected when updating journal superblock for sdf-8.
Aug 29 09:42:36 localhost kernel: EXT4-fs (sdg): delayed block allocation failed for inode 5006 at logical offset 2285 with max blocks 1 with error -5
Aug 29 09:42:36 localhost kernel: Aborting journal on device sdg-8.
Aug 29 09:42:36 localhost kernel:
Aug 29 09:42:36 localhost kernel: This should not happen!! Data will be lost
Aug 29 09:42:36 localhost kernel: EXT4-fs error (device sdg) in ext4_da_writepages: IO failure
Aug 29 09:42:36 localhost kernel: EXT4-fs error (device sdg): ext4_journal_start_sb: Detected aborted journal
Aug 29 09:42:36 localhost kernel: EXT4-fs (sdg): Remounting filesystem read-only
Aug 29 09:42:36 localhost kernel: sd 2:0:0:3: rejecting I/O to offline device
Hello Guys,
In our database Red Hat linux 6.5 server, we facing issue like Filesystem disappeared suddenly. Getting below error in my putty session while we facing issue,
"kernel:journal commit I/O error"
Aug 29 09:42:12 localhost dhclient[42261]: Sending on Socket/fallback
Aug 29 09:42:12 localhost dhclient[42261]: DHCPDISCOVER on em2 to 255.255.255.255 port 67 interval 5 (xid=0x27fe43b6)
Aug 29 09:42:17 localhost dhclient[42261]: DHCPDISCOVER on em2 to 255.255.255.255 port 67 interval 9 (xid=0x27fe43b6)
Aug 29 09:42:26 localhost dhclient[42261]: DHCPDISCOVER on em2 to 255.255.255.255 port 67 interval 15 (xid=0x27fe43b6)
Aug 29 09:42:36 localhost kernel: rport-2:0-0: blocked FC remote port time out: removing target and saving binding
Aug 29 09:42:36 localhost kernel: sd 2:0:0:2: rejecting I/O to offline device
Aug 29 09:42:36 localhost kernel: sd 2:0:0:2: rejecting I/O to offline device
Aug 29 09:42:36 localhost kernel: sd 2:0:0:2: rejecting I/O to offline device
Aug 29 09:42:36 localhost kernel: sd 2:0:0:3: rejecting I/O to offline device
Aug 29 09:42:36 localhost kernel: sd 2:0:0:3: rejecting I/O to offline device
Aug 29 09:42:36 localhost kernel: sd 2:0:0:3: rejecting I/O to offline device
Aug 29 09:42:36 localhost kernel: Aborting journal on device sdf-8.
Aug 29 09:42:36 localhost kernel: sd 2:0:0:3: rejecting I/O to offline device
Aug 29 09:42:36 localhost kernel: EXT4-fs error (device sdf): ext4_journal_start_sb: Detected aborted journal
Aug 29 09:42:36 localhost kernel: EXT4-fs (sdf):
Aug 29 09:42:36 localhost kernel: rport-2:0-1: blocked FC remote port time out: removing target and saving binding
Aug 29 09:42:36 localhost kernel: Remounting filesystem read-only
Aug 29 09:42:36 localhost kernel: JBD2: Detected IO errors while flushing file data on sdg-8
Aug 29 09:42:36 localhost kernel:
Aug 29 09:42:36 localhost kernel: sd 2:0:0:2: rejecting I/O to offline device
Aug 29 09:42:36 localhost kernel: JBD2: I/O error detected when updating journal superblock for sdf-8.
Aug 29 09:42:36 localhost kernel: EXT4-fs (sdg): delayed block allocation failed for inode 5006 at logical offset 2285 with max blocks 1 with error -5
Aug 29 09:42:36 localhost kernel: Aborting journal on device sdg-8.
Aug 29 09:42:36 localhost kernel:
Aug 29 09:42:36 localhost kernel: This should not happen!! Data will be lost
Aug 29 09:42:36 localhost kernel: EXT4-fs error (device sdg) in ext4_da_writepages: IO failure
Aug 29 09:42:36 localhost kernel: EXT4-fs error (device sdg): ext4_journal_start_sb: Detected aborted journal
Aug 29 09:42:36 localhost kernel: EXT4-fs (sdg): Remounting filesystem read-only
Aug 29 09:42:36 localhost kernel: sd 2:0:0:3: rejecting I/O to offline device
You don't tell us anything about your hardware, or where/how the disk(s) are connected, what you've done/tried, or when this error occurred. We can't guess. Is this a SAN? SATA? JBOD? RAID (what level/controller??)
Most importantly, since this is with RHEL 6, you should really call Red Hat support..you are PAYING FOR RHEL, aren't you????
Aug 29 09:42:12 localhost dhclient[42261]: Sending on Socket/fallback
Your DHCP client sent a DHCPDISCOVER three times. Seems like network troubleshooting comes first?.. Is this a virtual machine or real hardware? Any previous network problems or recent changes? Any adjacent servers in the same network segment experiencing trouble too?
Your DHCP client sent a DHCPDISCOVER three times. Seems like network troubleshooting comes first?.. Is this a virtual machine or real hardware? Any previous network problems or recent changes? Any adjacent servers in the same network segment experiencing trouble too?
Hmm...that, coupled with a disk error (?). OP, are you using ISCSI by any chance??
You have a disk access failure, probably hardware.
Your logs indicate that your system is losing contact with the physical disk devices and can't write. This is almost certainly NOT an operating system problem!
Once the devices fail write, their on-disk structures may become corrupted. It depends on exactly when the writes start failing; but you should always assume in such situations that your filesystem will be corrupt, and once you've fixed the underlying problem you should run fsck to repair the corruption. If you don't do this you'll strongly regret it.
One of the worst features - possibly THE worst feature - of the linux distro & kernel you are using is that it will always try to remount a disk that has failed write as readonly. So instead of the machine crashing and being obviously broken, it will pretend to still work, and end users will continue to try to write and things will spiral rapidly into a worse situation than if the machine had simply crashed.
Repair whatever communication path your disk devices rely on and this problem will go away.
Your logs indicate that your system is losing contact with the physical disk devices and can't write. This is almost certainly NOT an operating system problem!
Once the devices fail write, their on-disk structures may become corrupted. It depends on exactly when the writes start failing; but you should always assume in such situations that your filesystem will be corrupt, and once you've fixed the underlying problem you should run fsck to repair the corruption. If you don't do this you'll strongly regret it.
One of the worst features - possibly THE worst feature - of the linux distro & kernel you are using is that it will always try to remount a disk that has failed write as readonly. So instead of the machine crashing and being obviously broken, it will pretend to still work, and end users will continue to try to write and things will spiral rapidly into a worse situation than if the machine had simply crashed.
If it is mounted read only no further damage will occur - and the user will not be able to write after the first failure, very little will continue to operate (perhaps some CPU only operations... but no writes to the failed disk.
Next, mounting read-only (if it succeeds) allows time to make an emergency backup to an alternate filesystem or other storage.
Quote:
Repair whatever communication path your disk devices rely on and this problem will go away.
Agree with that. It also would help to use some redundancy (raid and multiple communication channels).
If it is mounted read only no further damage will occur - and the user will not be able to write after the first failure, very little will continue to operate (perhaps some CPU only operations... but no writes to the failed disk.
"No further damage will occur" to the disk volume, sure.
But when critically important operations, such as logging continuous data inputs from processes that cannot be reversed (like scientific experiments) or that require real-time responses in order to avoid loss of life (like reactor controls) can't function, it's best that the system either crash entirely and reboot or else start screaming its bloody head off. Making obscure entries in logs and remounting read-only (so that processes that READ still are running, and reacting as if old data were current, but processes that WRITE are not updating the old data) has always turned out to be a terrible idea in my experience. Especially in industrial process control!
As a sysadmin, it's best to turn that "feature" off. Don't read broken disks. As a programmer, do not assume that since you can read the disk that the data is up to date. Timestamp everything critical and be prepared for incoming data to suddenly cease.
"No further damage will occur" to the disk volume, sure.
But when critically important operations, such as logging continuous data inputs from processes that cannot be reversed (like scientific experiments) or that require real-time responses in order to avoid loss of life (like reactor controls) can't function, it's best that the system either crash entirely and reboot or else start screaming its bloody head off. Making obscure entries in logs and remounting read-only (so that processes that READ still are running, and reacting as if old data were current, but processes that WRITE are not updating the old data) has always turned out to be a terrible idea in my experience. Especially in industrial process control!
If you don't have redundancy in your filesystems, networks, and systems with automatic failover... you deserve the failure you get. With such "critical systems" incompetence what you already have.
Quote:
As a sysadmin, it's best to turn that "feature" off. Don't read broken disks. As a programmer, do not assume that since you can read the disk that the data is up to date. Timestamp everything critical and be prepared for incoming data to suddenly cease.
It doesn't matter.
If a disk is failing then you DON'T want to write. If a disk is failing for a write you DON'T want to continue writing. read is inherently less sensitive. If the filesystem can't be remounted (which happens), your system is dead.
You are already SOL for "everything critical" in either case.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.