LinuxQuestions.org
Share your knowledge at the LQ Wiki.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Hardware
User Name
Password
Linux - Hardware This forum is for Hardware issues.
Having trouble installing a piece of hardware? Want to know if that peripheral is compatible with Linux?

Notices

Reply
 
Search this Thread
Old 11-30-2009, 07:14 PM   #1
orthogonal3
LQ Newbie
 
Registered: Jun 2009
Location: Manchester, UK
Distribution: Ubuntu
Posts: 6

Rep: Reputation: 5
Unhappy Slow disk read speeds on HP DL380 G4 SmartArray 6i Raid 5


Hi All!

I'm having a rough time at work at the minute pushing a tar backup to tape.
I've tried to rack my brains with what the frell is wrong with my server.

The nitty gritty is this:

HP DL380 G4
Dual 64-bit Xeon 3.4GHz
6GB RAM

SmartArray 6i U320 (& 128MB BBWC i think)
+ 4x 146GB 10K U320 in RAID 5

Adaptec U160 PCI-X SCSI Card (unsure of model)
+ External IBM Ultrium 3 Tape Drive

Ubuntu 9.10 x64

All hardware seems pretty ok, nothing missing from usual places like /proc/scsi

I think the RAID 5 stripe size is the reccommended 64KB.

---

I'm using tar to dump /filestore to tape (/filestore being the root of the RAID 5 partition)

htparm -t -T /dev/cciss/c0d1p1 (/filestore) yields values in the following ranges:

Cached Reads 950 ~ 980 MB/s
Buffered Reads 30 ~ 35 MB/s

Apologies if buf/cache wrong way round

Basically I'm getting awful performance dumping ~150GB of data to /dev/tape. It takes in the order of about 3hrs 30mins. That tells me about 11MB/s. Now i've read that the tape drive's data rate matching (finally a good DRM!) is good from 80MB/s native down to 30MB/s after that its shoe shine time and I know thats what I'm getting because I can hear spin ups and spin downs a lot (no cockney accents though!)

Now....

I ran some tests running the system through mbuffer (many thanks to its author for a good program) and a 1GB memory mapped i/o file. I ran the simulation into /dev/null and put a cap on the stdout from mbuffer at 40MB/s ( using pv -q -L 40M )to stop the bit bucket munching the buffer too quick and it is a reasonable rate I would like to see going to tape. (Half speed isn't too upsetting!)

A block diagram i guess looks like this:

Code:
/filestore ---tar---> mbuffer ---> pv [Lim 40MB/s] ---> /dev/null (test)
                        /\                              /dev/tape (real)
                        ||
                        \/
                    1GB Memory
I've got mbuffer to cache 800MB before starting to stream out data.

I used the logging feature of Putty (another good tool) to keep a nice record from mbuffer's stats, parsed this in MS Excel and graphed the data (I miss gnuplot).

-- side note: how often does mbuffer produce output - looks like 1/2 sec

For the first 5000 entries, all is good, I'm not sure where I am in the filestructure of /filestore, its full of Adobe Suite files [ai, psd, etc], web source files [php/asp], images [png/jpg] and SVN repos/dumps.

There are a couple of buffer underruns but largely the buffer stays full.

After this point the whole lot goes down hill. I must move into another par of the filestore where things aren't so easy and I start to get more and more buffer underruns. After a bit of this the average graphed input rate drops to 12MB/s and its game over, a bit later it drops again to 9MB/s.

Obviously I can't backup this to tape without some seriously shiney shoes! (And associated tape wear)

---

I'm not looking neccessarily for someone to tell me a full solution but the way things are looking, its going to be a case of buying another server with a big RAID 0 striped array on it and thrash everything to there, once staged on there I can back up to tape nice and fast.

Now for the questions:

1. Is there something I have missed, blaringly obvious? I'm not the oldest or wisest sysadmin around.

2. Does anybody have or have experience with a similar setup and can give me some idea of what this lump of steel should be able to do. I'm guessing a SmartArray 6i and RAID 5 should be faster than 30MB/s.

3. Is there a better way of backing up than a straight tar? I have heard of dump(?) to dump the whole unmounted filesystem but that doesn't sound like what I need. I'm not really able to unmount the filestore for any time and I often need to pull back a couple of files off tape when they go "missing"

4. Is it going to be a case of forking out for an external SCSI drive array and running a fast RAID 0 striped staging drive on it? (And get ready for my boss' expression!)

5. Can anybody shed any other light on this?

I can try to get you any logs / outputs you need, if you need them.


Thanks all for your help in advance and for keeping a great community!

Phil
 
Old 12-03-2009, 01:50 PM   #2
mesiol
Member
 
Registered: Nov 2008
Location: Lower Saxony, Germany
Distribution: CentOS, RHEL, Solaris 10, AIX, HP-UX
Posts: 731

Rep: Reputation: 137Reputation: 137
Hi,

We use HP DL servers at our customer sites, and the performance of the internal RAID controllers is not really the best. Your disk IO performance from hdparm seems what slow. Here an example:
Code:
hdparm -t -T /dev/md0

/dev/md0:
 Timing cached reads:   1792 MB in  2.00 seconds = 895.50 MB/sec
 Timing buffered disk reads:  120 MB in  3.01 seconds =  39.82 MB/sec
This is from my own software RAID-5, consists of 3 external USB disks on my notebook.

There are several things you can check on your system to speed up your backup. To make it as short as possible:

- Take a look at this document Optimize Backup Performance on HP Ultrium Tapes. It's what older, but covers good basic information in HP Ultrium backup hardware and RAID technology, also it covers cabling and performance differences in various scenarios.

- mount the filesystem with option
Code:
noatime
. This will prevent from updating the inode atime information on every read access, this will speed up all operations on the files. Because every read will create a write to the inode to change file access time.



Some questions first:

- do i understand correctly that you do not use file compression tools like gzip/compress/bzip2 or so?
- what tells you:
Code:
mt -f /dev/$TAPEDEVICENAME status
Code:
iostat -c $DEVICENAME 5 50
Code:
sar -a
and
Code:
uptime
- what options do you pass to tar command?
- what filesystem do you use?
- do you use standard kernel parameters for all?
- what bus type uses the Ultra 160 Adaptec controller with tape attached? PCI-something, 32/64 bit?

What i would do?

- are all disks healthy?

- check if RAID controller is really healthy? does the cache battery really work? or not? is there really a battery installed? and that all settings for cache using are well configured

- check write performance to your tape reading from a pseudo device like /dev/random, like this
Code:
 dd if=/dev/random bs=1M count=4096 of=/dev/tape
(this will write 4GB to /dev/tape)

- minimum blocksize on HP Ultrium is 32k, you can also use multiplies of 32k for IO, so you might find something there to tune?

- Did you have installed the HP Systemmanagement homepage on the host? There you can check hardware health status and RAId controller settings.

- latest HP drivers CD for Linux (does not remember how the call it at the moment)?

Hope this little brain dump can help.
 
Old 12-03-2009, 07:11 PM   #3
Electro
Guru
 
Registered: Jan 2002
Posts: 6,042

Rep: Reputation: Disabled
It sounds like a driver problem or a software problem. Double check the terminators on the SCSI cable. It could be the tape drive is talking too much on the cable causing a problem. Though backups should be done on another computer instead being connected to the same computer that is setup as a server.

It might be easier to use CentOS since CentOS resembles Redhat Enterprise. Most modules or drivers are written for 2.6.9 which Redhat Enterprise uses, but Ubuntu does not use that kernel version. Ubuntu is meant for desktops and workstations. It could be used for servers, but for personal servers that are using the latest hardware.

Probably your storage controller is Adaptec model 29160. I am not sure the card is a true hardware RAID controller. If it is, then you should have a lot more bandwidth such as at least 60 megabytes per second. If it is software RAID the numbers seems right. Using software RAID with one controller is going to cause bandwidth problems because the controller is trying to be access multiple times. All hardware have limits on how many times it can be access at one time. If you are going to use software RAID, it is best to use multiple storage controllers, so the main bus, PCI, handles the tasks.

Sure you can go with HP SmartArray 6i. I found out that it is a software RAID controller. It will not make anything better because it is going to penalize performance.

If you are going for software RAID, stick to RAID-0 or RAID-10 as your highest RAID levels. For performance, it is best to put each hard drive on its own controller.

RAID-5 performance is known for its multiple writes and high write throughput, but it has poor read throughput. You will have to combine two RAID-5 array in RAID-0 to increase to desire bandwidth.

The utility hdparm just prints out raw performance. It tells you nothing about real world throughput. It is best to use iozone or any similar utility. It measures throughput based on real world tests.

Have you thought about using a true hardware SATA RAID controllers from 3ware or Areca with several Western Digital VelociRaptor?

If you insist on using this hardware, like what mesiol have suggested is tune the stripe size. I would say go with the lowest and see if that makes a difference. Also like what mesiol said is run diagnostics on the controller itself. I would like to include to to do diagnostic on the hard drives and maybe turn off the cache in the hard drive to keep data consistent and keep the data moving throughly. The cache on the controller should be the main cache from what I read.
 
Old 12-04-2009, 07:43 AM   #4
mesiol
Member
 
Registered: Nov 2008
Location: Lower Saxony, Germany
Distribution: CentOS, RHEL, Solaris 10, AIX, HP-UX
Posts: 731

Rep: Reputation: 137Reputation: 137
Hi,

Quote:
Originally Posted by Electro View Post
Sure you can go with HP SmartArray 6i. I found out that it is a software RAID controller. It will not make anything better because it is going to penalize performance.

Probably your storage controller is Adaptec model 29160.
The disks are connected to the Smart Array controller not to the Adaptec.

Where did you get the information that Smart Array 6.i is an software RAID controller? Could you please post a link? I did not found this information at HP site.

Quote:
Originally Posted by Electro View Post
RAID-5 performance is known for its multiple writes and high write throughput, but it has poor read throughput. You will have to combine two RAID-5 array in RAID-0 to increase to desire bandwidth.
RAID-5 performance is known for good read, but lower write performance, because of two phase write (data and parity information). So the write throughput cannot be as fast as on a single disk without parity.

Quote:
Originally Posted by Electro View Post
Have you thought about using a true hardware SATA RAID controllers from 3ware or Areca with several Western Digital VelociRaptor?
What does a SATA controller have to do with U320 SCSI disks? SATA by itself does not provide nearly the same performance than U320 SCSI. Also there is an add-on controller board for the SmartArray controller to add more performance, possibly this provides additional storage controller cpu and RAM or so. Not sure what it really does.

Quote:
Originally Posted by Electro View Post
If you insist on using this hardware, like what mesiol have suggested is tune the stripe size. I would say go with the lowest and see if that makes a difference. Also like what mesiol said is run diagnostics on the controller itself. I would like to include to to do diagnostic on the hard drives and maybe turn off the cache in the hard drive to keep data consistent and keep the data moving throughly. The cache on the controller should be the main cache from what I read.
I think there is no possibility to disable the cache on the disks, because there is no physical access to the disk.
Also read performance from hdarm/sdparm will printout wrong results for disks connected to a cached RAID controller. So therefor the hdparm output is extremly slow, because all reads from the disks connected to the SmartArray controller will be done via the controllers read cache. I am not sure if hdparm will here printout usable results.

I hope my english is okay, i'm not a native speaker, so please excuse any errors
 
Old 12-04-2009, 10:55 AM   #5
orthogonal3
LQ Newbie
 
Registered: Jun 2009
Location: Manchester, UK
Distribution: Ubuntu
Posts: 6

Original Poster
Rep: Reputation: 5
Talking Thanks for the help so far!

Mesiol,

Danke für die Erklärung.
Ihr Englisch ist perfekt, so viel besser als mein Deutsch!

Also, auf Englisch jetzt...

Thanks again for clarifying. The HP white paper document you suggested is great.
I'm going to give that a thorough read this weekend.

Quote:
Where did you get the information that Smart Array 6.i is an software RAID controller? Could you please post a link? I did not found this information at HP site.
From the HP website here: http://h18000.www1.hp.com/products/q...12030_div.html
Code:
Protocol  	Ultra320 SCSI
Processor 	405 Power PC
XOR engine 	Hardware-based XOR calculations for RAID 5
Electrical interface 	Low Voltage Differential (LVD)
Total SCSI Channels 	2 (Depending upon server Model)
SCSI Port Connectors 	68-pin wide SCSI connector
Memory 	DDR 266MHz CL2, 64MB of DDR SDRAM used for code, transfer buffers, and read cache comes standard. Upgradeable to 192MB with 128MB BBWC Option kit.
Peak transfer rate 	320 MB/s per SCSI channel
That tells me its hardware. Do you think that too?

Quote:
do i understand correctly that you do not use file compression tools like gzip/compress/bzip2 or so?
That is correct, also I am not using the built in tape drive compression as that increases the required read rate from disk.

Quote:
mt -f /dev/$TAPEDEVICENAME status
Result:
Code:
SCSI 2 tape drive:
File number=0, block number=0, partition=0.
Tape block size 0 bytes. Density code 0x44 (LTO-3).
Soft error count since last status=0
General status bits on (41010000):
 BOT ONLINE IM_REP_EN
Not great but should still run fine.
Traversing the file tree was the slow part especially with updating the access times on every read on every block & file in the file tree, with the two-phase parity write on top of this I'm surprised I didn't come back to find my server on fire.

Quote:
iostat -c $DEVICENAME 5 50
iostat not available at the moment - I will see about this this next week

Quote:
uptime
Ha ha ha - the old favourite!
Code:
 16:29:30 up 1 day,  5:24,  1 user,  load average: 0.21, 0.11, 0.03
Quote:
- what options do you pass to tar command?
My current command is this:
Code:
tar -cf $device $backup_paths 2> /dev/null --label="`date +%Y-%m-%d_%a`"
Theres no mbuffer on the current live backup script (which is much simpler than my new test script - as you can see) but if I'm filling it at 9MB/s it won't make too much difference I don't think.

Quote:
- what filesystem do you use?
- do you use standard kernel parameters for all?
/dev/cciss/c0d1p1 on /filestore type ext3 (rw,noatime)

noatime has been added on your suggestion, before this it was on kernel defaults.

Thanks for pointing this out - I should have thought about noatime before!

Quote:
- what bus type uses the Ultra 160 Adaptec controller with tape attached? PCI-something, 32/64 bit?
It's a 64 bit PCI-X card in a 64 bit PCI-X slot with the bus running at either 100MHz or 133MHz ( not sure which slot but its one of those! )

Quote:
- are all disks healthy?
I think so - no warnings on last reboot!

Quote:
- check if RAID controller is really healthy? does the cache battery really work? or not? is there really a battery installed? and that all settings for cache using are well configured

dd if=/dev/random bs=1M count=4096 of=/dev/tape
I will check these early next week when I am available out of hours with some server down time.

On top of this I'm going to look once again at using bacula to move from file based backup to block based - hopefully this will stop issues with traversing the tree - but until my test server arrives I'm going to keep things the way they are - working but slow!


Thanks again for your level of help. I appreciate a second opinion.
I know you have helped me enormously already!
Score one point against those who think Linux users are selfish and love to laugh at people who don't understand!

Danke wieder mein Freund,

Phil
 
Old 12-04-2009, 12:29 PM   #6
mesiol
Member
 
Registered: Nov 2008
Location: Lower Saxony, Germany
Distribution: CentOS, RHEL, Solaris 10, AIX, HP-UX
Posts: 731

Rep: Reputation: 137Reputation: 137
Hi Phil,

i also appreciate a technical discussion like this

As you stated the information from HP describes the controller as a hardware RAID controller. Not sure what HP option pack for the Smart Array contains. The other hardware information you provided looks okay. So i think the biggest problem will be tuning the read process from the disks. As stated check the battery pack for the controller cache, mostly they will die after 3 years. I also would check the physical disks by viewing the server, whatever the controller reports. Possibly there's a pre-failure state for one of the devices.

Also not using hardware compression in addition to a tool like gzip is okay, it will not provide any kind of help, except for higher cpu load on the machine

Yeah, uptime is a real old but very useful tool. It will be nice if you can install/configure
Code:
sar
, because this will give you the possibility to get some long term statistics of the behavior of your machine. Also the output (text with tab) can nicely be formatted for spreadsheet application or used by other tools like mtrg, which provides a great overview.

Nice if you can reply after installation HP systemmanagent homepage and check if there are some log entries. As stated read performance seems very poor to me, because all data should be send from cache which means there should be nearly no differences between buffered or cached reads.

I'll check also one of our machine, to see if i can reproduce this behavior or not.

Last question for today. what kernel version do you use?

Last edited by mesiol; 12-04-2009 at 12:41 PM. Reason: typo
 
Old 12-04-2009, 04:19 PM   #7
dalai lama
LQ Newbie
 
Registered: Dec 2009
Location: Amsterdam
Distribution: Debian
Posts: 18

Rep: Reputation: 0
Hi there,

I am just wondering if you have a BBU installed on the controller. This could solve the whole issue about the slow disk speed. I have had the same issue with HP DL180 G5 and G6. Adding a BBU solved it for me.
 
Old 12-05-2009, 12:39 AM   #8
mesiol
Member
 
Registered: Nov 2008
Location: Lower Saxony, Germany
Distribution: CentOS, RHEL, Solaris 10, AIX, HP-UX
Posts: 731

Rep: Reputation: 137Reputation: 137
Hi,

the backup battery unit is esssential for the RAID controller because they will go in read cache only mode without this, and all write requests will be done directly to disks what slows down all activity of the controller. That's the reason why i asked to check the battery pack of the controller.
 
Old 12-08-2009, 06:21 AM   #9
orthogonal3
LQ Newbie
 
Registered: Jun 2009
Location: Manchester, UK
Distribution: Ubuntu
Posts: 6

Original Poster
Rep: Reputation: 5
Quick note:

I just ran hdparm -t -T on my Centos based DL360 G3 with 2x 10K 36GB drives and its coming back with 69MB/s!

Grrrr.....

How can 2 disk RAID 1 beat 4 disk RAID 5 on reads!!!!

Anyway - I need to get some time to look into this properly at a hardware level.
I'm thinking that Centos might be the way forward as HP supports RHEL 5 much better than Ubuntu.

Running a quick test on my DL36t0 G3 looks like its going to be easier and better to install all the HP management agents.
Just a case of tracking down a couple of deps.

I hate yum :P


Thanks all for your help in this thread, very much appreciated.

Phil
 
  


Reply

Tags
disk, dl380, raid5, ubuntu


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
HP DL380 G3 Raid VERY slow... SLOW JoeyB Linux - Hardware 6 05-08-2009 12:49 PM
Can I read data from a disk, out of a Linux RAID 1? PlatinumX Linux - Newbie 2 10-07-2008 12:25 PM
Raid disk problem : Attempt to read block from filesystem resulted in short read ElmPie Linux - General 5 08-26-2008 05:21 AM
Extremely slow disk read access software raid 5 Magsimoe Linux - Software 2 12-18-2006 04:59 PM
Linux driver for HPs Proliant DL380, G4 Raid Controller nordickiwi Linux - Hardware 6 04-13-2005 05:25 AM


All times are GMT -5. The time now is 06:03 AM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration