IO errors in redhat AS
we have about 20 middle-tier dell 2650's (with internal storage) running oracle 11i (e-businessSuite) on redhat AS (2.4.9-e.34smp #1 SMP Wed Dec 10 16:52:22 EST 2003 i686 unknown). before we migrated these servers from oregon to arizona, they were stable. over the last six months, several of them have hung or locked up numerous times. symptoms include voluminous ext3 errors (both root and app fs), IO errors, all indicating fs corruption, disk or bus issues. We have reseated all of the hw, repaired ext3 with fsck, recreated the ext3 using full r/w scans (-c -c options) for bad blocks, all to no avail. We are not aware of anything that has changed other than physical location. onsite folks report normal dc temps, network infrastructure is the same. normally a reboot (hard or soft) will temp fix the problem, the corrupted access to disk seems to only live in memory. occasionally even console access fails usually due to IO error when login attempts to cd to home dir. servers respond to icmp, but nothing else.
Two questions:
1) has anyone experienced this issue and resolved it?
2) how do I force and capture a memory core dump both from within the OS (akin to savecore -L on solaris) and from the outside (i.e. - firmware, akin to "sync" at eeprom on sun hw)
much obliged to this generous community.
peace.
|