LinuxQuestions.org
Download your favorite Linux distribution at LQ ISO.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Software > Linux - Kernel
User Name
Password
Linux - Kernel This forum is for all discussion relating to the Linux kernel.

Notices


Reply
  Search this Thread
Old 08-29-2016, 04:58 AM   #1
thoufic
LQ Newbie
 
Registered: Aug 2016
Posts: 1

Rep: Reputation: Disabled
We are having issue with Red Hat Enterprise Linux Server release 6.5 (Santiago)


Hello Guys,

In our database Red Hat linux 6.5 server, we facing issue like Filesystem disappeared suddenly.

Getting below error in my putty session while we facing issue,

"kernel:journal commit I/O error"

Aug 29 09:42:12 localhost dhclient[42261]: Sending on Socket/fallback
Aug 29 09:42:12 localhost dhclient[42261]: DHCPDISCOVER on em2 to 255.255.255.255 port 67 interval 5 (xid=0x27fe43b6)
Aug 29 09:42:17 localhost dhclient[42261]: DHCPDISCOVER on em2 to 255.255.255.255 port 67 interval 9 (xid=0x27fe43b6)
Aug 29 09:42:26 localhost dhclient[42261]: DHCPDISCOVER on em2 to 255.255.255.255 port 67 interval 15 (xid=0x27fe43b6)
Aug 29 09:42:36 localhost kernel: rport-2:0-0: blocked FC remote port time out: removing target and saving binding
Aug 29 09:42:36 localhost kernel: sd 2:0:0:2: rejecting I/O to offline device
Aug 29 09:42:36 localhost kernel: sd 2:0:0:2: rejecting I/O to offline device
Aug 29 09:42:36 localhost kernel: sd 2:0:0:2: rejecting I/O to offline device
Aug 29 09:42:36 localhost kernel: sd 2:0:0:3: rejecting I/O to offline device
Aug 29 09:42:36 localhost kernel: sd 2:0:0:3: rejecting I/O to offline device
Aug 29 09:42:36 localhost kernel: sd 2:0:0:3: rejecting I/O to offline device
Aug 29 09:42:36 localhost kernel: Aborting journal on device sdf-8.
Aug 29 09:42:36 localhost kernel: sd 2:0:0:3: rejecting I/O to offline device
Aug 29 09:42:36 localhost kernel: EXT4-fs error (device sdf): ext4_journal_start_sb: Detected aborted journal
Aug 29 09:42:36 localhost kernel: EXT4-fs (sdf):
Aug 29 09:42:36 localhost kernel: rport-2:0-1: blocked FC remote port time out: removing target and saving binding
Aug 29 09:42:36 localhost kernel: Remounting filesystem read-only
Aug 29 09:42:36 localhost kernel: JBD2: Detected IO errors while flushing file data on sdg-8
Aug 29 09:42:36 localhost kernel:
Aug 29 09:42:36 localhost kernel: sd 2:0:0:2: rejecting I/O to offline device
Aug 29 09:42:36 localhost kernel: JBD2: I/O error detected when updating journal superblock for sdf-8.
Aug 29 09:42:36 localhost kernel: EXT4-fs (sdg): delayed block allocation failed for inode 5006 at logical offset 2285 with max blocks 1 with error -5
Aug 29 09:42:36 localhost kernel: Aborting journal on device sdg-8.
Aug 29 09:42:36 localhost kernel:
Aug 29 09:42:36 localhost kernel: This should not happen!! Data will be lost
Aug 29 09:42:36 localhost kernel: EXT4-fs error (device sdg) in ext4_da_writepages: IO failure
Aug 29 09:42:36 localhost kernel: EXT4-fs error (device sdg): ext4_journal_start_sb: Detected aborted journal
Aug 29 09:42:36 localhost kernel: EXT4-fs (sdg): Remounting filesystem read-only
Aug 29 09:42:36 localhost kernel: sd 2:0:0:3: rejecting I/O to offline device
 
Old 08-29-2016, 10:49 AM   #2
sundialsvcs
LQ Guru
 
Registered: Feb 2004
Location: SE Tennessee, USA
Distribution: Gentoo, LFS
Posts: 10,665
Blog Entries: 4

Rep: Reputation: 3945Reputation: 3945Reputation: 3945Reputation: 3945Reputation: 3945Reputation: 3945Reputation: 3945Reputation: 3945Reputation: 3945Reputation: 3945Reputation: 3945
It would appear that you have a damaged file system.

Since, as a Red Hat purchaser, you have access to technical support, I suggest that you contact them directly.

It is highly probable that this device is malfunctioning.
 
Old 08-29-2016, 10:49 AM   #3
TB0ne
LQ Guru
 
Registered: Jul 2003
Location: Birmingham, Alabama
Distribution: SuSE, RedHat, Slack,CentOS
Posts: 26,652

Rep: Reputation: 7970Reputation: 7970Reputation: 7970Reputation: 7970Reputation: 7970Reputation: 7970Reputation: 7970Reputation: 7970Reputation: 7970Reputation: 7970Reputation: 7970
Quote:
Originally Posted by thoufic View Post
Hello Guys,
In our database Red Hat linux 6.5 server, we facing issue like Filesystem disappeared suddenly. Getting below error in my putty session while we facing issue,

"kernel:journal commit I/O error"

Aug 29 09:42:12 localhost dhclient[42261]: Sending on Socket/fallback
Aug 29 09:42:12 localhost dhclient[42261]: DHCPDISCOVER on em2 to 255.255.255.255 port 67 interval 5 (xid=0x27fe43b6)
Aug 29 09:42:17 localhost dhclient[42261]: DHCPDISCOVER on em2 to 255.255.255.255 port 67 interval 9 (xid=0x27fe43b6)
Aug 29 09:42:26 localhost dhclient[42261]: DHCPDISCOVER on em2 to 255.255.255.255 port 67 interval 15 (xid=0x27fe43b6)
Aug 29 09:42:36 localhost kernel: rport-2:0-0: blocked FC remote port time out: removing target and saving binding
Aug 29 09:42:36 localhost kernel: sd 2:0:0:2: rejecting I/O to offline device
Aug 29 09:42:36 localhost kernel: sd 2:0:0:2: rejecting I/O to offline device
Aug 29 09:42:36 localhost kernel: sd 2:0:0:2: rejecting I/O to offline device
Aug 29 09:42:36 localhost kernel: sd 2:0:0:3: rejecting I/O to offline device
Aug 29 09:42:36 localhost kernel: sd 2:0:0:3: rejecting I/O to offline device
Aug 29 09:42:36 localhost kernel: sd 2:0:0:3: rejecting I/O to offline device
Aug 29 09:42:36 localhost kernel: Aborting journal on device sdf-8.
Aug 29 09:42:36 localhost kernel: sd 2:0:0:3: rejecting I/O to offline device
Aug 29 09:42:36 localhost kernel: EXT4-fs error (device sdf): ext4_journal_start_sb: Detected aborted journal
Aug 29 09:42:36 localhost kernel: EXT4-fs (sdf):
Aug 29 09:42:36 localhost kernel: rport-2:0-1: blocked FC remote port time out: removing target and saving binding
Aug 29 09:42:36 localhost kernel: Remounting filesystem read-only
Aug 29 09:42:36 localhost kernel: JBD2: Detected IO errors while flushing file data on sdg-8
Aug 29 09:42:36 localhost kernel:
Aug 29 09:42:36 localhost kernel: sd 2:0:0:2: rejecting I/O to offline device
Aug 29 09:42:36 localhost kernel: JBD2: I/O error detected when updating journal superblock for sdf-8.
Aug 29 09:42:36 localhost kernel: EXT4-fs (sdg): delayed block allocation failed for inode 5006 at logical offset 2285 with max blocks 1 with error -5
Aug 29 09:42:36 localhost kernel: Aborting journal on device sdg-8.
Aug 29 09:42:36 localhost kernel:
Aug 29 09:42:36 localhost kernel: This should not happen!! Data will be lost
Aug 29 09:42:36 localhost kernel: EXT4-fs error (device sdg) in ext4_da_writepages: IO failure
Aug 29 09:42:36 localhost kernel: EXT4-fs error (device sdg): ext4_journal_start_sb: Detected aborted journal
Aug 29 09:42:36 localhost kernel: EXT4-fs (sdg): Remounting filesystem read-only
Aug 29 09:42:36 localhost kernel: sd 2:0:0:3: rejecting I/O to offline device
You don't tell us anything about your hardware, or where/how the disk(s) are connected, what you've done/tried, or when this error occurred. We can't guess. Is this a SAN? SATA? JBOD? RAID (what level/controller??)

Most importantly, since this is with RHEL 6, you should really call Red Hat support..you are PAYING FOR RHEL, aren't you????
 
Old 08-29-2016, 11:44 AM   #4
unSpawn
Moderator
 
Registered: May 2001
Posts: 29,415
Blog Entries: 55

Rep: Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600
Quote:
Originally Posted by thoufic View Post
(..) Filesystem disappeared suddenly.

Code:
Aug 29 09:42:12 localhost dhclient[42261]: Sending on   Socket/fallback
Your DHCP client sent a DHCPDISCOVER three times. Seems like network troubleshooting comes first?.. Is this a virtual machine or real hardware? Any previous network problems or recent changes? Any adjacent servers in the same network segment experiencing trouble too?
 
Old 08-30-2016, 07:30 AM   #5
TB0ne
LQ Guru
 
Registered: Jul 2003
Location: Birmingham, Alabama
Distribution: SuSE, RedHat, Slack,CentOS
Posts: 26,652

Rep: Reputation: 7970Reputation: 7970Reputation: 7970Reputation: 7970Reputation: 7970Reputation: 7970Reputation: 7970Reputation: 7970Reputation: 7970Reputation: 7970Reputation: 7970
Quote:
Originally Posted by unSpawn View Post
Your DHCP client sent a DHCPDISCOVER three times. Seems like network troubleshooting comes first?.. Is this a virtual machine or real hardware? Any previous network problems or recent changes? Any adjacent servers in the same network segment experiencing trouble too?
Hmm...that, coupled with a disk error (?). OP, are you using ISCSI by any chance??
 
Old 08-30-2016, 08:05 AM   #6
Medievalist
Member
 
Registered: Aug 2003
Distribution: Dead Rat
Posts: 191

Rep: Reputation: 56
You have a disk access failure, probably hardware.

Your logs indicate that your system is losing contact with the physical disk devices and can't write. This is almost certainly NOT an operating system problem!

Once the devices fail write, their on-disk structures may become corrupted. It depends on exactly when the writes start failing; but you should always assume in such situations that your filesystem will be corrupt, and once you've fixed the underlying problem you should run fsck to repair the corruption. If you don't do this you'll strongly regret it.

One of the worst features - possibly THE worst feature - of the linux distro & kernel you are using is that it will always try to remount a disk that has failed write as readonly. So instead of the machine crashing and being obviously broken, it will pretend to still work, and end users will continue to try to write and things will spiral rapidly into a worse situation than if the machine had simply crashed.

Repair whatever communication path your disk devices rely on and this problem will go away.
 
Old 08-30-2016, 08:57 AM   #7
jpollard
Senior Member
 
Registered: Dec 2012
Location: Washington DC area
Distribution: Fedora, CentOS, Slackware
Posts: 4,912

Rep: Reputation: 1513Reputation: 1513Reputation: 1513Reputation: 1513Reputation: 1513Reputation: 1513Reputation: 1513Reputation: 1513Reputation: 1513Reputation: 1513Reputation: 1513
Quote:
Originally Posted by Medievalist View Post
Your logs indicate that your system is losing contact with the physical disk devices and can't write. This is almost certainly NOT an operating system problem!

Once the devices fail write, their on-disk structures may become corrupted. It depends on exactly when the writes start failing; but you should always assume in such situations that your filesystem will be corrupt, and once you've fixed the underlying problem you should run fsck to repair the corruption. If you don't do this you'll strongly regret it.

One of the worst features - possibly THE worst feature - of the linux distro & kernel you are using is that it will always try to remount a disk that has failed write as readonly. So instead of the machine crashing and being obviously broken, it will pretend to still work, and end users will continue to try to write and things will spiral rapidly into a worse situation than if the machine had simply crashed.
If it is mounted read only no further damage will occur - and the user will not be able to write after the first failure, very little will continue to operate (perhaps some CPU only operations... but no writes to the failed disk.

Next, mounting read-only (if it succeeds) allows time to make an emergency backup to an alternate filesystem or other storage.
Quote:

Repair whatever communication path your disk devices rely on and this problem will go away.
Agree with that. It also would help to use some redundancy (raid and multiple communication channels).

Last edited by jpollard; 08-30-2016 at 09:02 AM.
 
Old 08-30-2016, 09:23 AM   #8
Medievalist
Member
 
Registered: Aug 2003
Distribution: Dead Rat
Posts: 191

Rep: Reputation: 56
Quote:
If it is mounted read only no further damage will occur - and the user will not be able to write after the first failure, very little will continue to operate (perhaps some CPU only operations... but no writes to the failed disk.
"No further damage will occur" to the disk volume, sure.

But when critically important operations, such as logging continuous data inputs from processes that cannot be reversed (like scientific experiments) or that require real-time responses in order to avoid loss of life (like reactor controls) can't function, it's best that the system either crash entirely and reboot or else start screaming its bloody head off. Making obscure entries in logs and remounting read-only (so that processes that READ still are running, and reacting as if old data were current, but processes that WRITE are not updating the old data) has always turned out to be a terrible idea in my experience. Especially in industrial process control!

As a sysadmin, it's best to turn that "feature" off. Don't read broken disks. As a programmer, do not assume that since you can read the disk that the data is up to date. Timestamp everything critical and be prepared for incoming data to suddenly cease.
 
Old 08-30-2016, 10:24 AM   #9
jpollard
Senior Member
 
Registered: Dec 2012
Location: Washington DC area
Distribution: Fedora, CentOS, Slackware
Posts: 4,912

Rep: Reputation: 1513Reputation: 1513Reputation: 1513Reputation: 1513Reputation: 1513Reputation: 1513Reputation: 1513Reputation: 1513Reputation: 1513Reputation: 1513Reputation: 1513
Quote:
Originally Posted by Medievalist View Post
"No further damage will occur" to the disk volume, sure.

But when critically important operations, such as logging continuous data inputs from processes that cannot be reversed (like scientific experiments) or that require real-time responses in order to avoid loss of life (like reactor controls) can't function, it's best that the system either crash entirely and reboot or else start screaming its bloody head off. Making obscure entries in logs and remounting read-only (so that processes that READ still are running, and reacting as if old data were current, but processes that WRITE are not updating the old data) has always turned out to be a terrible idea in my experience. Especially in industrial process control!
If you don't have redundancy in your filesystems, networks, and systems with automatic failover... you deserve the failure you get. With such "critical systems" incompetence what you already have.
Quote:

As a sysadmin, it's best to turn that "feature" off. Don't read broken disks. As a programmer, do not assume that since you can read the disk that the data is up to date. Timestamp everything critical and be prepared for incoming data to suddenly cease.
It doesn't matter.

If a disk is failing then you DON'T want to write. If a disk is failing for a write you DON'T want to continue writing. read is inherently less sensitive. If the filesystem can't be remounted (which happens), your system is dead.

You are already SOL for "everything critical" in either case.

Last edited by jpollard; 08-30-2016 at 10:30 AM.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
yum update not working on Red Hat Enterprise Linux Server release 6.4 (Santiago) kevin777motorcycles Linux - Desktop 7 06-09-2015 12:10 PM
How to download Mplayer for Linux running Red Hat release 6.6(Santiago) cristian8 Linux - Software 2 05-23-2015 08:57 AM
VLC media player download for Linux running Red Hat release 6.6(Santiago) cristian8 Linux - Software 3 05-22-2015 12:51 PM
Red Hat Enterprise Linux ES release 4 (Nahant Update 4) issue umeshsharma Linux - Newbie 2 08-05-2009 01:07 PM
External HDD on Red Hat Enterprise Linux Server release 5.3 issue mdmazaza Linux - Newbie 7 04-09-2009 02:24 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Software > Linux - Kernel

All times are GMT -5. The time now is 12:51 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration