LinuxQuestions.org
Download your favorite Linux distribution at LQ ISO.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Distributions > Red Hat
User Name
Password
Red Hat This forum is for the discussion of Red Hat Linux.

Notices

Reply
 
Search this Thread
Old 06-24-2005, 04:00 PM   #1
sklam
LQ Newbie
 
Registered: Jun 2005
Posts: 1

Rep: Reputation: 0
high iowait in RHES


Dear all

I found that I have gotten a high IOwait in top when just copy a large from one folder to other in RH Enterprise server v3. I have tried some different vender 's hardware and gotten the same result.
However, it does not occur the problem on Fedora and RH8.

Any idea of it ??

Thanks
 
Old 07-08-2005, 02:19 AM   #2
ddaas
Member
 
Registered: Oct 2004
Location: Romania
Distribution: Ubuntu server, FreeBsd
Posts: 453

Rep: Reputation: 30
I've also observed that at my rhel es 3 server.
Anybody knows why?
 
Old 07-12-2005, 10:35 AM   #3
RedHatCat
Member
 
Registered: Jun 2005
Location: London, Uk
Distribution: RH-ES 3/4, FC 5/6
Posts: 51

Rep: Reputation: 15
Hmm, me too - just got a 1U back today that is showing signs of the same problems you describe (has RHes 3 upate3 installed).

Its a dual Xeon with 4Gb ram, hyper-threading is enabled, when I watch "top" while copying a file from one partition, the IOwait is ~90-100% for each cpu, not good for a storage server.
 
Old 07-15-2005, 11:11 AM   #4
RedHatCat
Member
 
Registered: Jun 2005
Location: London, Uk
Distribution: RH-ES 3/4, FC 5/6
Posts: 51

Rep: Reputation: 15
You guys got Xeons in your machines? From the scraps of info I've read it seems to be a problem with RH es 3 and Xeons.
 
Old 12-17-2005, 12:57 AM   #5
kbadeau
LQ Newbie
 
Registered: Dec 2005
Posts: 1

Rep: Reputation: 0
We've recently migrated from one of our two production database servers from Solaris on a v880 to RHEL AS v3 U4 on a HP DL580 (dual Xeon).

We have a Clarion CX500 with Emulex LP9802-E cards running the rhel provided lpfc 7.1.14 driver for the emulex card.

We find a copy test of 1 gig takes 20 seconds on another solaris machine with respectable activity, at peak times IO waits can run about 50%.

We find on the HP/RHEL machine these copies can take over a minute upon 1st invocation. After that cacheing kicks in and makes subsequent repeats of the same test take much less time (under 10 seconds).

We are running oracle 9.2.0.4 and are finding for certain heavy sequential read operations performance to be very poor. We eliminated the oracle database as an initial issue by doing a simple copy test at the os level and found this disprepency between the old solaris db server and the new rhel server. During even light activity on the database IO waits shoot into the 80-90% range.

I am wondering if anyone can offer experiences with a similar configuration and/or any solutions they may have encountered in addressing such issues.

We have been working with certain kernel params:
/proc/sys/vm/min-readahead
/proc/sys/vm/max-readahead
/proc/sys/vm/inactive_clean_percent
/proc/sys/vm/pagecache
to no significant avail.

Looking forward to any feedback possible.

Thanks,
Kevin
 
Old 02-20-2006, 08:59 AM   #6
Krietjur
LQ Newbie
 
Registered: Feb 2006
Location: Hengelo, The Netherlands
Posts: 5

Rep: Reputation: 0
We experience problems like this on our production server. The server is a HP Proliant ML570, 4GB ram, four 3Ghz Xeon processors and a Compaq Smart Array 64xx Raid-controller, with four raid-1 arrays, and one raid 1+0 array. I've created a 1GB testfile, and timed how long it took to copy it to another array. I timed it on 58 seconds, IOwait had peaks at 80-100% a lot of the time, between those peaks the IOwait was around 40%.

We only have one machine like this, but I tried the same on a system with only one PII 350Mhz processor, 512MB memory and copied from a software raid-1 drive to a non-raid scsi drive, and there it took 52 seconds. IOwait was 0% all the time.. don't know if this can be correct.. top doesn't show iowait, and I just installed the systat package on that machine to be able to watch the iowait.. This is a Gentoo machine by the way..

I also did a third test, on a dual Xeon 3Ghz system, running Redhat Enterprise. There it took 1 minute and 56 seconds to copy the 1GB testfile. On this system, we use an Adaptec AAC-RAID controller with SATA disks. IOwait is 100% here almost all of the time during this action, CPU idle time 0%.

Then a final test, on my home linux box (Debian). This is no high-performance machine, just a Pentium II 350 Mhz, 192MB memory and two simple IDE disks in it, each on its own ide-controller. Same test again. The IOwait sometimes had a little peak at 80%, but most of the time it was below 50%. The copy test itself, took 1 minute 49 seconds. At that moment, the machine had some other stuff to do (X-Windows running, Azureus downloading) and it's still faster then the Dual Xeon machine running Redhat Enterprise from the second test who was doing nothing else!

So I have tested four systems, two Red Hat Enterprise, two other systems, both Red Hat Enterprise systems are having the problems...
 
Old 02-21-2006, 04:49 AM   #7
RedHatCat
Member
 
Registered: Jun 2005
Location: London, Uk
Distribution: RH-ES 3/4, FC 5/6
Posts: 51

Rep: Reputation: 15
I'm still keeping an eye on this subject

I spent a while playing with swap files last year, it really had little or no impact on IOwait. I got a bit nervous when building a Blade up recently, because it seemed to show all the signs of an IOwait problem like the 1U's I had last year, I didn't realise it was still building the RAID1; and by the morning it was shifting 2Gb files in less than 50secs again.

Switching from NCQ (or SATA2 compliant) SATA disks, to original non-NCQ (native command queuing) disks, seemed to cure our problems in the 1U servers (from Seagate Barracuda 400Gb's to Maxtor Maxline III 300Gb's), what drives does your HP Proliant ML570 have in it Krietjur?

Take it easy,

Jim

Last edited by RedHatCat; 02-21-2006 at 04:50 AM.
 
Old 02-22-2006, 05:13 AM   #8
Krietjur
LQ Newbie
 
Registered: Feb 2006
Location: Hengelo, The Netherlands
Posts: 5

Rep: Reputation: 0
There are SCSI disks in it, but I don't know which type etc. I don't know where I can find that info in Linux (normally I'm able to find it somewhere under /proc/scsi but there is only my tapestreamer listed.) I've had some other hints, that it might have to do with a cache-battery failure on the raid-controller. As soon as I have more information, I'll post it here ofcourse
 
Old 02-24-2006, 03:05 AM   #9
RedHatCat
Member
 
Registered: Jun 2005
Location: London, Uk
Distribution: RH-ES 3/4, FC 5/6
Posts: 51

Rep: Reputation: 15
Hmm, the third machine you tested with SATA disks in, can you find out what make and model the drives are? That to me looks like a full on IOwait lockup, whereas the SCSI-based system just looks a bit slow. Exactly why its slow, I'm not sure, have you tried copying the file from different RAID's? I'd expect it to perform slightly better, for example, copying from or to the RAID 1+0 than between plain RAID 1's; are all the RAID's on that box fully built when it was tested?

If the machines can't be downed to have a look at the disks, or in the RAID bios, I would expect to find info on the drives in the /proc/ directory somewhere. I often find the info is inside a folder with the name of the driver for your RAID card. Good luck,

Jim
 
Old 02-24-2006, 07:09 AM   #10
Krietjur
LQ Newbie
 
Registered: Feb 2006
Location: Hengelo, The Netherlands
Posts: 5

Rep: Reputation: 0
The copy actions I did on those RAID systems, were all from an array to a different array. I've looked in /proc but can't find the model of the harddrives there. The SATA system was rebooted yesterday, and I saw then that they have 6 Maxtor disks, but I don't know what model. There's a Raid 1 and a Raid 1+0 configured there.

The RAID's on the machines were fully built when tested, machines are running for about one year now I think.
 
Old 03-01-2006, 10:19 AM   #11
RedHatCat
Member
 
Registered: Jun 2005
Location: London, Uk
Distribution: RH-ES 3/4, FC 5/6
Posts: 51

Rep: Reputation: 15
Uh oh - just been told there's a system coming my way thats got lockup problems (its another 1U SATA-RAID system), the symptoms sound like another IOwait lockup but the snippets from 'top' that I've seen show 0% IOwait and ~100% on user/system processes.

The plan is to replace the 400Gb NCQ drives with 300Gb non-NCQ ones and see how we go from there. If anything useful comes to light I'll let you know.
 
Old 03-06-2006, 03:14 AM   #12
Krietjur
LQ Newbie
 
Registered: Feb 2006
Location: Hengelo, The Netherlands
Posts: 5

Rep: Reputation: 0
I've had a chance to reboot the Proliant ML570 last week, and boot it from the smart start cd. I did a run of the ADU tool, and this reported no errors, so I guess the Raid controller is ok. I've also a list of drives in the system now:
Code:
driveid raid port 1     raid port 2     raid type
--------------------------------------------------
0	BF03688284      BF03688284      Raid 1
1	BF07285A36      BF07285A36      Raid 1
2	BF03688284      BF03688284      Raid 1+0
3	BF03688284      BF03688284      Raid 1+0
4	BF03685A35      BF03688284      Raid 1 <-- this one is a bit strange, two different drives in the same array
5	BF03688284      BF03688284      Raid 1

Last edited by Krietjur; 03-06-2006 at 03:16 AM.
 
Old 03-10-2006, 01:26 PM   #13
RedHatCat
Member
 
Registered: Jun 2005
Location: London, Uk
Distribution: RH-ES 3/4, FC 5/6
Posts: 51

Rep: Reputation: 15
Early reports suggest the machine with the suspected IOwait is now back on top form. It was acting very odd whilst watching "top" and seemed ony to recognise 3 of the 4 cores (dual xeons). I swapped the disks for 300Gb's from the 400Gb sata's that were in there anyway, but disabling HT seemed to be the turning point. This was a slightly different platform to the one we usually have issues with, which makes me think some kind of issue with the cores being recognised was more likely.

Not sure about the mirror with the different model disks - they are obviously the same size, but perhaps slightly different spec, I'm not sure if this could have an impact. I'm not even sure how well disks from totally different manufacturers would play together a SCSI/sata array, my old IDE raid in an Athlon box doesn't care what disks it uses but its hardly a performance machine/server. If I get an hour with my xeon in the next few weeks, I'll build an array or two and see how mixed model or manufacturer disks perform.

Last edited by RedHatCat; 03-10-2006 at 01:27 PM.
 
Old 03-26-2006, 10:06 AM   #14
simcolor
LQ Newbie
 
Registered: Mar 2006
Posts: 2

Rep: Reputation: 0
Hi, I have got similar problems. We have a RAID-5 array with 12 disk, and a computer farm with 7 dual-Xeon nodes. It gets very slow, even for ls command and tab-complete.

I want to try to turn off the HT option, can you tell me how? Thx!
 
Old 03-26-2006, 10:12 AM   #15
simcolor
LQ Newbie
 
Registered: Mar 2006
Posts: 2

Rep: Reputation: 0
On www(dot)felipecruz(dot)com/blog_high-iowait-times-may21.php
it said upgrading the kernel may solve the problem, has anyone tried this?
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
CPUs in high IOwait state despite of lack of load kvsraju Linux - Enterprise 3 11-03-2005 03:34 PM
How do I tell if RHES 3 AS is running in 64 bit dguse Red Hat 1 06-08-2005 09:37 AM
sshd not restarting on RHES Version 3 ETSUSnake Linux - Networking 3 06-06-2005 02:08 PM
SATA Raid-1 (ICH5) and partition problem on RHES 3.94 (beta) Sharp Red Hat 0 10-11-2004 09:29 AM
creating a vpn tunnel to windows 2003 machine with ISA2003 using IPSec from RHES 3.0 gauravjee Linux - Networking 0 08-26-2004 06:05 AM


All times are GMT -5. The time now is 07:00 PM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration