LinuxQuestions.org
Help answer threads with 0 replies.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Hardware
User Name
Password
Linux - Hardware This forum is for Hardware issues.
Having trouble installing a piece of hardware? Want to know if that peripheral is compatible with Linux?

Notices


Reply
  Search this Thread
Old 09-14-2010, 08:48 AM   #1
Skaperen
Senior Member
 
Registered: May 2009
Location: center of singularity
Distribution: Xubuntu, Ubuntu, Slackware, Amazon Linux, OpenBSD, LFS (on Sparc_32 and i386)
Posts: 2,684
Blog Entries: 31

Rep: Reputation: 176Reputation: 176
What speed to expect from RAID 5?


I've set up a small RAID array using 4 drives. That's 3 data drives and 1 parity drive. When I do a sequential read using a read block size at least as large as 3 times the stride size (also tried 12 times), I get a speed of 400 MB/sec. That suggests to me the individual drives are capable of 133 MB/sec (since they would not need to read the parity blocks),

Given that writing RAID 5 involves, in the worst case, 2 reads (the data corresponding to the write, and the associated parity), followed by 2 writes (replacing the data, and the updated parity based on the difference between old and new data), I should ideally see a throughput speed of 33 MB/sec.

In theory, if a single write request exactly covers a parity group (in this case, 3 times the stride (or a whole multiple thereof within the capacity of the controller buffers), it should be able to just construct the parity based on all this data, and perform 4 writes in parallel. I would expect a speed of 300 MB/s in this ideal case.

In reality, I'm getting only around 11 MB/s speed when doing writes, regardless of the block size used (up to 12 times the stride, including both multiples and non-multiples of 3 times the stride). And this is with sequential writes. So it seems either the controller is leaving the drives idle a lot (I can't tell from the blinking lights), perhaps because it is slow at calculating parity, or is doing far more I/O than is really necessary.

And apparently it is failing to recognize the opportunity to avoid reading back data/parity that won't be needed when all that data is the subject of a write request. Maybe this is because it breaks all write requests down to stride units or sector units? Or maybe this is because its Linux driver is doing this?

Any ideas (that are not specific to the particular RAID controller model) why this is happening?

I can post the actual model numbers later (two different models of the same manufacturer). I'm exploring for general ideas first and will consider model specific issues later (e.g. if that model is a PoJ or needs new firmware, etc).
 
Old 09-14-2010, 12:01 PM   #2
never say never
Member
 
Registered: Sep 2009
Location: Indiana, USA
Distribution: SLES, SLED, OpenSuse, CentOS, ubuntu 10.10, OpenBSD, FreeBSD
Posts: 195

Rep: Reputation: 37
The speed of any array can be affected by a number of variables.

1. Hard Drives and their specs. (PATA, SATA I/II/III, or SAS)
2. Controller and it's specs
3. Server and it's specs
4. Drivers
5. Type(s) and size of Data being read / written.

Write speeds can be hampered by any of the above, but are also hampered by not being able to utilize the drives built in read ahead buffer. Raid 3, 4, 5 and 6 all suffer from terrible write speeds due to the need to calculate and write parity. This can be somewhat compensated for by really good raid cards with large caches that will allow the software to believe the write is done, though the raid controller has not yet committed it to disk. Also remember that just because you are writing a file that matches a stripe size doesn't mean that it is stored on exactly one stripe in the array, it could be stored across two stripes, which cuts the write speed by one write cycle.

Stripe size can also have a large impact on the write speed of the array, and should be based on the type of data expected to be used on the array.

Fast (15,000 RPM) SAS Drives can help lessen the impact of writes to a RAID 3, 4, 5 and 6 array. High end Raid Cards with large caches can also help to reduce the pain of writing to RAID 3, 4, 5, and 6 (Make sure it has a battery backup in case of power failure) but there are inherent dis-advantages to Raid 3, 4, 5 and 6 when it is used in a write intensive application.

Basically it is a good idea to know what kind of data you are putting on your drives and pick a storage medium that is good for that type of data.

If you would care to post the specs of your system, Raid Card, drives, and type(s) of data we could look at the best way(s) to maximize the throughput.

Here is a good overview that explains what has to happen to write data to a RAID Array.

RAID 5 parity handling

A concurrent series of blocks (one on each of the disks in an array) is collectively called a stripe. If another block, or some portion thereof, is written on that same stripe, the parity block, or some portion thereof, is recalculated and rewritten. For small writes, this requires:

* Read the old data block
* Read the old parity block
* Compare the old data block with the write request. For each bit that has flipped (changed from 0 to 1, or from 1 to 0) in the data block, flip the corresponding bit in the parity block
* Write the new data block
* Write the new parity block

The disk used for the parity block is staggered from one stripe to the next, hence the term distributed parity blocks. RAID 5 writes are expensive in terms of disk operations and traffic between the disks and the controller.

The parity blocks are not read on data reads, since this would add unnecessary overhead and would diminish performance. The parity blocks are read, however, when a read of blocks in the stripe fails due to failure of any one of the disks, and the parity block in the stripe are used to reconstruct the errant sector. The CRC error is thus hidden from the main computer. Likewise, should a disk fail in the array, the parity blocks from the surviving disks are combined mathematically with the data blocks from the surviving disks to reconstruct the data from the failed drive on-the-fly.

This is sometimes called Interim Data Recovery Mode. The computer knows that a disk drive has failed, but this is only so that the operating system can notify the administrator that a drive needs replacement; applications running on the computer are unaware of the failure. Reading and writing to the drive array continues seamlessly, though with some performance degradation.
 
Old 09-14-2010, 12:19 PM   #3
Skaperen
Senior Member
 
Registered: May 2009
Location: center of singularity
Distribution: Xubuntu, Ubuntu, Slackware, Amazon Linux, OpenBSD, LFS (on Sparc_32 and i386)
Posts: 2,684

Original Poster
Blog Entries: 31

Rep: Reputation: 176Reputation: 176
That looks like it was pasted from something I already read. For example, I already described the 2 reads/2 writes for each requested write. And this doesn't go into cases where a write request covers exactly a whole row of associated data and parity that would avoid the need to do these 4 operations (and instead, it should be able to do the equivalent of 3 writes worth of data in a larger request via 4 real writes in parallel).

What's going on here is that for various reasons I expect the worst case ratio between write and read speeds to be 12:1. That would be 33MB/sec to 400 MB/sec. But in reality it is worse than that, somewhere between 36:1 and 40:1.

One of the things I am looking for is just what is the RATIO of speed expected for RAID 5 for some given number of drives (3 data 1 parity in my case). Is it supposed to be 12:1 or 36:1 ? The pasted document doesn't give any kind of quantitative detail.

I'm also now looking at RAID 10 performance. I'd expect 2:1 because write has to write 2 drives in parallel for each block, whereas read can simultaneously read 1 drive for one block and its mirror for the next, in parallel. That's not actually happening, either, although it is a lot closer (150 MB/sec write, 240 MB/sec read ... same machine, controller, and drives as the RAID 5 test).

I need to do a JBOD configuration and get exact drive performance specs, when run one at a time, and when run in parallel (in case of a controller or bus bottleneck or IRQ issues).
 
Old 09-14-2010, 02:03 PM   #4
never say never
Member
 
Registered: Sep 2009
Location: Indiana, USA
Distribution: SLES, SLED, OpenSuse, CentOS, ubuntu 10.10, OpenBSD, FreeBSD
Posts: 195

Rep: Reputation: 37
The raid 5 explanation was pasted because while your explanation was close I didn't think it was entirely correct.

As I stated before you can't assume a write that is EXACTLY the size of one stripe will be written across EXACTLY one stripe. That is part of the reason it is hard to quantify the expected write speed. Then you have to look at the RAID controller, CACHE size and type . . .

Remember that RAID 5 Parity is STRIPED across all the drives. So there is not one parity drive and three data drives. Each Drive contains both data and parity information.

Using your example with 4 drives configured in a RAID 5. For simplicities sake lets say that you are reading exactly 1 stripe of data. That read can take place in parallel and results in 1 read on 3 of the 4 disks (Parity is not read). This results in a throughput (in your example of 400 MB/sec, OK. So this happens in 1 step.

Now we need to write one stripe of data, so

Step 1 - we have to read 1 data stripe from 3 drives and 1 parity stripe from the 4th. (33% more than needed for a read)

Step 2 - we need to do a compare between the data read from the drive and the data to be written to the drive and calculate the new parity stripe. This is all load on the RAID card, and can be minimized by using RAID controllers with Read / Write cache.

Step 3 - We need to write to all 4 drives (1 data stripe across 3 drives and the parity to the 4th drive). (33% more than needed for a read)

Up to know we have read/written 266% more data than a read (based on a 4 disk Raid Array)

Step 4 - Depending on the configuration the data and parity may be re-read to verify data integrity.

If this happens we get to add another 133% or a total of 400% more than the read.

Now steps 1 and 3 are easily measured and tied to the speed of the drives. Step 2 however, is dependent on the RAID card selected, and how quickly it can compare the data and create the parity stripe.

So to perform the write we must access stripe + parity * 2 (266% more data than a read) and perform the needed calculations.

If by chance the data does not land exactly across one stripe then we need to access stripe + parity * 4 and perform needed calculations * 2.

So from a hard drive stand point alone your read to write ratio is like 3:8 so a write is about 2 - 2/3 slower than a read and that is without taking into account the time needed for the controller to construct the stripe and parity data for the write or any delay caused by the drive(s). In other words this is an absolutely perfect scenario that will never happen.

The bottom line is that you need to construct your storage based on the kind of data it will hold. If it is write intensive RAID 5 may not be the best solution, this is especially true for something like a transactional database where the writes are much smaller than a stripe.

The reason you can't simply assign an expected ratio to reads / writes is that there are far to many variables and without knowing the server, the drives (and their specs), controller (and it's specs), Raid Configuration (level, stripe size, number of drives), and data type there is no way to say RAID 5 writes should be 30% of read speed. For instance a read / write caching controller may write 40% faster than a non caching controller. Writing a record to a database could be 80 times slower than reading a record from the database, but with a caching controller that could drop to 30 times slower.

My initial guess is that you have a low end raid card that is significantly slowing your writes, but without all the specs noted above it is just a guess.

-- Added as additional thought
One other thing that needs to be taken into account is the file system on the RAID array. Is it a journalling file system, that will slow the writes down even more because the journal needs to be written to disk, then the data committed to disk, and finally the journal updated again.

Hopefully that explains it a little better than I did the first time.

Last edited by never say never; 09-14-2010 at 02:23 PM. Reason: Correct %
 
Old 09-14-2010, 03:03 PM   #5
jefro
Moderator
 
Registered: Mar 2008
Posts: 21,982

Rep: Reputation: 3626Reputation: 3626Reputation: 3626Reputation: 3626Reputation: 3626Reputation: 3626Reputation: 3626Reputation: 3626Reputation: 3626Reputation: 3626Reputation: 3626
I agree about the actual card. Many cheap cards are software raid. You would only get true enterprise level speeds from a real hardware raid card.
 
Old 09-14-2010, 03:24 PM   #6
Skaperen
Senior Member
 
Registered: May 2009
Location: center of singularity
Distribution: Xubuntu, Ubuntu, Slackware, Amazon Linux, OpenBSD, LFS (on Sparc_32 and i386)
Posts: 2,684

Original Poster
Blog Entries: 31

Rep: Reputation: 176Reputation: 176
Quote:
Originally Posted by jefro View Post
I agree about the actual card. Many cheap cards are software raid. You would only get true enterprise level speeds from a real hardware raid card.
By "software raid" you mean the firmware on the card, or the kernel driver for that controller?
 
Old 09-14-2010, 04:19 PM   #7
Skaperen
Senior Member
 
Registered: May 2009
Location: center of singularity
Distribution: Xubuntu, Ubuntu, Slackware, Amazon Linux, OpenBSD, LFS (on Sparc_32 and i386)
Posts: 2,684

Original Poster
Blog Entries: 31

Rep: Reputation: 176Reputation: 176
Quote:
Originally Posted by never say never View Post
As I stated before you can't assume a write that is EXACTLY the size of one stripe will be written across EXACTLY one stripe. That is part of the reason it is hard to quantify the expected write speed. Then you have to look at the RAID controller, CACHE size and type . . .

Remember that RAID 5 Parity is STRIPED across all the drives. So there is not one parity drive and three data drives. Each Drive contains both data and parity information.
I know what RAID 5 is (as compared to RAID 3 or RAID 4). But I also know that given any data group of (stridesize*N), the data is spread across all N data drives and 1 parity drive, where which drive is being the parity drive varies. Have a look here for a common reference point:

http://www.accs.com/p_and_p/RAID/LinuxRAID.html

This shows 4 different RAID 5 layout strategies. These examples differ from my case by number of drives. The example is 4+1. My case is 3+1.

But my point is that when I am writing any one row ... which for a stride size of 64K would be 256K for 4+1, and 192K for 3+1, the complete write request of (stridesize*N) bytes gives the controller exactly the data is needs to calculate the parity and completely write all the data blocks AND the one parity block without having to read any of the previous data.

Quote:
Originally Posted by never say never View Post
Using your example with 4 drives configured in a RAID 5. For simplicities sake lets say that you are reading exactly 1 stripe of data. That read can take place in parallel and results in 1 read on 3 of the 4 disks (Parity is not read). This results in a throughput (in your example of 400 MB/sec, OK. So this happens in 1 step.

Now we need to write one stripe of data, so

Step 1 - we have to read 1 data stripe from 3 drives and 1 parity stripe from the 4th. (33% more than needed for a read)

Step 2 - we need to do a compare between the data read from the drive and the data to be written to the drive and calculate the new parity stripe. This is all load on the RAID card, and can be minimized by using RAID controllers with Read / Write cache.

Step 3 - We need to write to all 4 drives (1 data stripe across 3 drives and the parity to the 4th drive). (33% more than needed for a read)

Up to know we have read/written 266% more data than a read (based on a 4 disk Raid Array)

Step 4 - Depending on the configuration the data and parity may be re-read to verify data integrity.

If this happens we get to add another 133% or a total of 400% more than the read.

Now steps 1 and 3 are easily measured and tied to the speed of the drives. Step 2 however, is dependent on the RAID card selected, and how quickly it can compare the data and create the parity stripe.

So to perform the write we must access stripe + parity * 2 (266% more data than a read) and perform the needed calculations.

If by chance the data does not land exactly across one stripe then we need to access stripe + parity * 4 and perform needed calculations * 2.

So from a hard drive stand point alone your read to write ratio is like 3:8 so a write is about 2 - 2/3 slower than a read and that is without taking into account the time needed for the controller to construct the stripe and parity data for the write or any delay caused by the drive(s). In other words this is an absolutely perfect scenario that will never happen.
Writing one stripe might be simple to consider, but it is not what I am doing.

To avoid confusion here, I'll use the term "row" to refer to the data quantity and alignment that is N times as much as a stripe. This is a ROW as shown in the referred page in any of the layout strategies.

If you write ONE WHOLE ROW all at once, then the controller has all the data it needs to completely calculate the parity stride/stripe without reading anything from any drive in the array. The issues might be:

1. The OS is not actually passing the write requests to the driver, or the driver is not passing them to the controller, in the exact who unit requested, despite getting write requests from the application exactly that way (even with the O_DIRECT flag, which would bypass OS cache).

2. The controller simply has not implemented this kind of intelligence (low end card?)

3. There are some bizarre alignment variations going on, where the mapping is not as illustrated in that page.

Quote:
Originally Posted by never say never View Post
The bottom line is that you need to construct your storage based on the kind of data it will hold. If it is write intensive RAID 5 may not be the best solution, this is especially true for something like a transactional database where the writes are much smaller than a stripe.
The real bottom line is that I already expect RAID 5 to be a lot slower at writing than reading. But this is a case where it is EXTREMELY slower ... 40 times slower. It is a 40:1 ratio ... 10 MB/sec for writing, and 400 MB/sec for reading ... for a wide range of block sizes from the stride size to to (N)*(N+1)*stride*(another integer just to make things even larger).

Quote:
Originally Posted by never say never View Post
The reason you can't simply assign an expected ratio to reads / writes is that there are far to many variables and without knowing the server, the drives (and their specs), controller (and it's specs), Raid Configuration (level, stripe size, number of drives), and data type there is no way to say RAID 5 writes should be 30% of read speed. For instance a read / write caching controller may write 40% faster than a non caching controller. Writing a record to a database could be 80 times slower than reading a record from the database, but with a caching controller that could drop to 30 times slower.
I'm not talking about "a record". I'm talking about a LARGE block of data with a size and alignment clearly taking in exact whole rows. But even if the controller doesn't implement "row detection" optimization, this is still a case of plenty data available.

Can you estimate any kind of approximation to a ratio? What about a figure where speeds below that would even suggest to you that "something is wrong".

Quote:
Originally Posted by never say never View Post
My initial guess is that you have a low end raid card that is significantly slowing your writes, but without all the specs noted above it is just a guess.
It might be. But I'm looking for a base reference to determine that. If the RAID card is NOT a low end card, what ratios seem reasonable ... for the case of LARGE optimized blocksizes of sequential data?

I just tested the drives individually (JBOD) through that same controller and I get 134 MB/sec. Then I fired up reads for all 4 drives to run in parallel. It actually got a slight bit faster: 135 MB/sec for each of all 4 at the same time.

The drives are plenty fast enough. The controller PHY and BUS are, too. It appears to be an issue with the RAID 5 logic and/or its parity calculation and/or OS/kernel/driver interference.

Quote:
Originally Posted by never say never View Post
-- Added as additional thought
One other thing that needs to be taken into account is the file system on the RAID array. Is it a journalling file system, that will slow the writes down even more because the journal needs to be written to disk, then the data committed to disk, and finally the journal updated again.
There is no filesystem at all, yet. I'm accessing it via the /dev/sda file. Like I say, I am reading the device sequentially. I am writing it sequentially. For example:

dd if=/dev/zero bs=12582912 of=/dev/sda

and:

dd if=/dev/zero bs=12582912 of=/dev/sda oflag=direct

by comparison, if I do equivalent to this:

dd if=/dev/zero bs=12582912 of=/dev/null

I get a speed of 11.518 GB/sec (so reading /dev/zero is not a bottleneck, here). Some of the tests are using a program which initializes a buffer to binary zero once at the start and re-uses it, so there is no delay reading some data or creating some data. It just writes from the same buffer every time.

And the writing starts at the beginning of the whole device.

The controller is 3ware 9690SA with 4 ports. Is that in the "low end" list?
 
Old 09-14-2010, 07:33 PM   #8
never say never
Member
 
Registered: Sep 2009
Location: Indiana, USA
Distribution: SLES, SLED, OpenSuse, CentOS, ubuntu 10.10, OpenBSD, FreeBSD
Posts: 195

Rep: Reputation: 37
Ok, now we are really getting into it here and further complicating things.

I have no idea if your card / drivers allow you to pick the Raid 5 algorithm or not, but the algorithm used can greatly enhance read speeds by putting segment 4 on the parity drive for row 1. In essence this means you get to read segments 0 - 4 at once, then 5 - 9, 10 - 14 ...

However for writes you have to read 0 - 3 + parity(1), Compute data and parity(1), write 0 - 3 + parity(1), verify 0 - 3 + parity(1). Read 4 - 7 + parity(2), compute data and parity(2), Write 4 - 7 + parity(3), Read 8 - 11 + parity(4), compute data and parity(4), write 8 - 11 + parity(4), Read 12 - 15 + parity(5), compute 12 - 15 + parity(5),...

So in 3 reads (across all drives) you read 15 segments, because you don't have to read parity.

To write those same 15 segments (plus the 4 parity segments) will require no less than 20 reads and writes (across 5 all drives). So even without the overhead required to compute the data and parity for each stripe (row) you are at 3:20 Read to write. In a best case I would guess write would be 20 - 25% of the speed of the read.

You might want to check out this blog for more information. It is about stripe size but touches on why RAID 5 impact writes.

As for your raid card I don't think it is a enterprise class card, but it is not a software raid card either.

The real question is what is what is your real world data going to look like. Writing the way you are will quickly swamp the on-board memory of your RAID Card and that may be having a negative impact on the write speed too.

You would be better off testing with the type of data and conditions you expect in while in production. For instance I would expect there will be reads happening concurrently with the writes.

Your card may be suited perfectly to handle your real world data, but be totally inadequate for the testing you are doing. Raid 5 is terrible at sequential writes regardless, and if that is a requirement you need to look at something different.

I hope this is helpful.
 
Old 09-14-2010, 08:14 PM   #9
jefro
Moderator
 
Registered: Mar 2008
Posts: 21,982

Rep: Reputation: 3626Reputation: 3626Reputation: 3626Reputation: 3626Reputation: 3626Reputation: 3626Reputation: 3626Reputation: 3626Reputation: 3626Reputation: 3626Reputation: 3626
Seems to have an issue with port 4.
 
Old 09-15-2010, 08:28 AM   #10
Skaperen
Senior Member
 
Registered: May 2009
Location: center of singularity
Distribution: Xubuntu, Ubuntu, Slackware, Amazon Linux, OpenBSD, LFS (on Sparc_32 and i386)
Posts: 2,684

Original Poster
Blog Entries: 31

Rep: Reputation: 176Reputation: 176
Quote:
Originally Posted by never say never View Post
Ok, now we are really getting into it here and further complicating things.
Quote:
Originally Posted by never say never View Post
I have no idea if your card / drivers allow you to pick the Raid 5 algorithm or not, but the algorithm used can greatly enhance read speeds by putting segment 4 on the parity drive for row 1. In essence this means you get to read segments 0 - 4 at once, then 5 - 9, 10 - 14 ...
It would help if there was a standard to the terms. The term stripe is very inconsistently used, and I'm always having to figure out when I read something, which semantics are being used. I'm getting very tempted to just create a whole new set of terms and given them finite definitions, just to be sure of what is going on (as long as I can get others to use them). For example, in this thread, stripe was used as stride, whereas the page I referenced, stripe is used as row. I previously used group for that, but it doesn't seem to be as common a term.

The Left Synchronous layout strategy would certainly be better for reading. I think that is clear. And I don't know if this card does that, or not. I'm still trying to figure out how to get it into JBOD configuration so I don't need to actually move drives to another machine to get access them directly. My plan was to write each 512 byte sector with its sector number, starting with 0 at the MBR (not partitioned or formatted, yet), via the RAID configuration, then look at each drive individually and see what sector number was written to each drive physical sector.

Quote:
Originally Posted by never say never View Post
However for writes you have to read 0 - 3 + parity(1), Compute data and parity(1), write 0 - 3 + parity(1), verify 0 - 3 + parity(1). Read 4 - 7 + parity(2), compute data and parity(2), Write 4 - 7 + parity(3), Read 8 - 11 + parity(4), compute data and parity(4), write 8 - 11 + parity(4), Read 12 - 15 + parity(5), compute 12 - 15 + parity(5),...
You didn't define how much data is being written, so I don't know your meaning, here.

However, I assert this (based on the 5 drive, 4 data drive, 1 parity drive, configuration in the page I referenced). Let's say the STRIDE (that's the term I most often see as the amount of data within a single drive, for each row or group) is 64K. If I write 64K, I would expect the controller to have to do 2 reads (data plus parity), update the parity from the difference between old and new data, and do 2 writes. BUT ... and I think this is important ... if I write 256K, I believe the controller does not need to read anything. The reason is that this 256K has all FOUR strides of data needed to calculate the parity stride. All it needs to do is just use all the data from this 4 stride write request from the OS, calculate parity, and perform 5 writes to each of the 5 drives.

Note, I'm not including any data verification read-back after writing. That CAN certainly be a performance factor. And I don't know if my controller is doing that or not. I want to figure that out.

Quote:
Originally Posted by never say never View Post
So in 3 reads (across all drives) you read 15 segments, because you don't have to read parity.
Right. Reading is faster because lots of things don't need to happen compared to writing.

Quote:
Originally Posted by never say never View Post
To write those same 15 segments (plus the 4 parity segments) will require no less than 20 reads and writes (across 5 all drives). So even without the overhead required to compute the data and parity for each stripe (row) you are at 3:20 Read to write. In a best case I would guess write would be 20 - 25% of the speed of the read.
Change that to 12 or 16 segments (you pick) and tell me how many reads are needed.

Quote:
Originally Posted by never say never View Post
You might want to check out this blog for more information. It is about stripe size but touches on why RAID 5 impact writes.
It does not seem to address anything about full row writes. To that end, it seems to be a re-hash (though a better one than most) of just the classic "why writing on RAID 5 sucks almost as bad as RAID 3".

Quote:
Originally Posted by never say never View Post
As for your raid card I don't think it is a enterprise class card, but it is not a software raid card either.
What I want to know is what is the performance I should get from an enterprise class card, that uses the best layout strategy, and supports optimizing the writing of rows it gets all the data for in the write request. I would test such a card by doing writes of EXACTLY a whole ROW at once. Then I would test it with N ROWS at once in a very large write request. At some point the requests can't go any larger (DMA limits, etc). One other test I would do is a large request that overlaps many whole rows BUT also partially writes on row. That partial row would have to do 2 read + calculate + 2 write (if one segment is written). The question there would be if it would still optimize the whole row writes even though one of the rows can't be optimized.

Quote:
Originally Posted by never say never View Post
The real question is what is what is your real world data going to look like. Writing the way you are will quickly swamp the on-board memory of your RAID Card and that may be having a negative impact on the write speed too.
It should not any more so than say RAID 0 or JBOD. It would have some quota on how much data it can handle, and then no more write requests can be taken until it frees up enough to take another request (perhaps up to a certain size).

Quote:
Originally Posted by never say never View Post
You would be better off testing with the type of data and conditions you expect in while in production. For instance I would expect there will be reads happening concurrently with the writes.
That kind of testing would be "noise" and not be clear as to characterize the controller behaviour. This sequential test is what it is. And yes, I am well aware it is faster than typical filesystem level accesses. That is why I am doing it this way. I want a kind of test I can accurately characterize, as well as precisely reproduce in another test situation so they can be compared.

Quote:
Originally Posted by never say never View Post
Your card may be suited perfectly to handle your real world data, but be totally inadequate for the testing you are doing. Raid 5 is terrible at sequential writes regardless, and if that is a requirement you need to look at something different.
I don't see why it is terrible at sequential writes. For small blocks, it should be no worse than random, which requires each write request to perform 2 reads, calculate, 2 writes, and maybe 2 read-verify. For large blocks of exactly a ROW size, it should be able to optimize that. But if it can't optimize that, it should be no worse than the random case. And the random case should be worse because of the seeks (maybe long seeks).

One web page I read somewhere (lost it) suggested that for many controller cards, switching it to JBOD and using software RAID in the kernel is actually better. I presume that would be the case only if the channel and bus speed won't be saturated by all those 2 read, calculate, 2 write operations. Their assertion was that smarter strategies and logic can be more easily incorporated into open source code than in closed proprietary firmware. I find that plausible, but not sure how real it would be.
 
Old 09-15-2010, 09:48 AM   #11
Skaperen
Senior Member
 
Registered: May 2009
Location: center of singularity
Distribution: Xubuntu, Ubuntu, Slackware, Amazon Linux, OpenBSD, LFS (on Sparc_32 and i386)
Posts: 2,684

Original Poster
Blog Entries: 31

Rep: Reputation: 176Reputation: 176
Quote:
Originally Posted by never say never View Post
You might want to check out this blog for more information. It is about stripe size but touches on why RAID 5 impact writes.
This blog includes some interesting text:
Quote:
But what if those writes were longer and sequential? In that case, the RMW technique would be replaced with a Full Stripe Write technique, and the extra IOs would be eliminated. (Again, just trust me that that is a good thing.) And how do you make long, sequential writes? An obvious way is to have the host write long, sequential commands. An alternative, which is common with RAID controllers, is to use the controller cache and write-back, or lazy writes, to permit short IOs to hopefully coalesce into longer IOs.
After I got past the difference in terminology to figure out just what they were saying, it does appear this part is saying one of the things I have been trying to say. I've called it a full row.

Still my whole overall issue is:

1. There is one speed I'm getting for reads (since parity can be skipped): 400 MB/sec

2. For the most optimized writes where reading is not needed, because parity does need to be written for every 3 data writes (in 3+1 RAID 5): 300 MB/sec

3. For the simple write scenario involving 2 reads, calculate, and 2 writes: somewhere around 33 MB/sec (based on drives being around 133 MB/sec individually)

4. Actual write speed with big sequential writes is a mere 10 MB/sec.

Yes, there are plenty of reasons why RAID 5 writing is much slower than reading in most cases. But in even the best case scenario, the speed is even way worse than it should be.

What I am trying to do is characterize why and what. What speed should I get with the best RAID card available? What if I disable verify read-back? What if I disable coalescent caching? Then what speed?

I want to figure out if (NOT have someone tell me ... because that doesn't answer the "why") the RAID card I have is a bad design. Yes, there are a lot of variables involved. I'm trying to do certain testing that avoids as many of those variables as possible. Even so, it would help to know what all of those variables are and how they affect the performance. There needs to be some web page to describe all this, but it does not appear to exist.

Ironically, for the use I have, the terrible performance I am getting from RAID 5 is actually more than adequate. It will be getting data to write at T1 internet speed. But I really do not want to do it that way unless I really know what all is happening. The widely available literature at best just glosses over it. So I will probably end up using RAID 10 just because its performance is in the ballpark of predicted, and thus means I probably understand what it is doing, and not because it is providing the performance I need (it is in fact providing many times the performance I need).
 
Old 09-15-2010, 12:01 PM   #12
never say never
Member
 
Registered: Sep 2009
Location: Indiana, USA
Distribution: SLES, SLED, OpenSuse, CentOS, ubuntu 10.10, OpenBSD, FreeBSD
Posts: 195

Rep: Reputation: 37
You aren't using SSDs are you?

Take a look at this link http://www.xbitlabs.com/articles/sto...a_7.html#sect0

I think this has the specs you are searching for, using the same card (only 8 port version) you are using.

I didn't study it at all, but I think it will give you the baseline you are seeking. I just glanced at it, but yeah, you have problems. Something is not right that is for sure.
 
Old 09-15-2010, 02:04 PM   #13
Skaperen
Senior Member
 
Registered: May 2009
Location: center of singularity
Distribution: Xubuntu, Ubuntu, Slackware, Amazon Linux, OpenBSD, LFS (on Sparc_32 and i386)
Posts: 2,684

Original Poster
Blog Entries: 31

Rep: Reputation: 176Reputation: 176
Quote:
Originally Posted by never say never View Post
You aren't using SSDs are you?
No. These are 2TB Hitachi brand drives.

Quote:
Originally Posted by never say never View Post
The read performances they got look close to what I got, for a 4 drive array, both RAID5 and RAID 10. Their write performance for RAID 10 was better than I got by a significant percentage, but their RAID 5 writes were WAY better than mine.

Quote:
Originally Posted by never say never View Post
I think this has the specs you are searching for, using the same card (only 8 port version) you are using.
Sort of. I'm looking for predicted minimum/maximum for any card, to compare against actual for my card. This is still useful to know because mine comes in way low on writing RAID 5.

Quote:
Originally Posted by never say never View Post
I didn't study it at all, but I think it will give you the baseline you are seeking. I just glanced at it, but yeah, you have problems. Something is not right that is for sure.
At least for RAID 5. RAID 10 is close.

But they are using 15000 RPM drives. Mine are 7200. Still, there is something wrong. My mainboard does have 6 SATA ports (1 used for DVD) so I could bypass the controller and try software RAID to compare.
 
Old 09-15-2010, 06:52 PM   #14
never say never
Member
 
Registered: Sep 2009
Location: Indiana, USA
Distribution: SLES, SLED, OpenSuse, CentOS, ubuntu 10.10, OpenBSD, FreeBSD
Posts: 195

Rep: Reputation: 37
Quote:
Originally Posted by jefro View Post
Seems to have an issue with port 4.
Probably need to look into this, I think he may have a point. I have read several reviews reporting problems with port 4 of this card. Set up a three drive Raid 5 using ports 1-3 and see what results you get. If the speed is more reasonable there you go.

I have also read it is picky about what slot it is in, so that could be an issue too.

You wont want to use the card at all if there is a hardware problem with it.

As for min / max of any raid 5. I have one that will do 4 Gbit/sec writes all day long. I have one that won't do more than about 60M/sec. Just depends on how much you want to spend. Your trying to compare apples and oranges, that's why I keep saying it depends on a lot of variables and asking for the hardware specs. You build a system that will support your needs and expected growth over the life of the equipment.

Anyway, hope this helps you out.
 
Old 09-16-2010, 07:03 AM   #15
Skaperen
Senior Member
 
Registered: May 2009
Location: center of singularity
Distribution: Xubuntu, Ubuntu, Slackware, Amazon Linux, OpenBSD, LFS (on Sparc_32 and i386)
Posts: 2,684

Original Poster
Blog Entries: 31

Rep: Reputation: 176Reputation: 176
Quote:
Originally Posted by never say never View Post
Probably need to look into this, I think he may have a point. I have read several reviews reporting problems with port 4 of this card. Set up a three drive Raid 5 using ports 1-3 and see what results you get. If the speed is more reasonable there you go.
The bays are marked 0..3 so I though the reference to port 4 was a joke. OK, I'll check into this. I did do timing on each drive as JBOD and they were all the same.

Quote:
Originally Posted by never say never View Post
I have also read it is picky about what slot it is in, so that could be an issue too.
And that does not seem to affect other RAID levels. I'm still holding the idea this card/chip just isn't good/smart at RAID 5.

Quote:
Originally Posted by never say never View Post
You wont want to use the card at all if there is a hardware problem with it.
I probably will end up using RAID 10 which seems to work fine. Also, I won't be backing up this machine because it is the 2nd backup that will be moved to a remote site, replicating from the 1st backup via rsync.

Quote:
Originally Posted by never say never View Post
As for min / max of any raid 5. I have one that will do 4 Gbit/sec writes all day long. I have one that won't do more than about 60M/sec. Just depends on how much you want to spend. Your trying to compare apples and oranges, that's why I keep saying it depends on a lot of variables and asking for the hardware specs. You build a system that will support your needs and expected growth over the life of the equipment.
How many drives are in that one that does 4 Gbit?

I still think there is to be a formula for RAID 5 once all the variables are determined ... for a theoretical performance. The info about "2 reads, calculate, 2 writes" is just a start on that.[/QUOTE]

Quote:
Originally Posted by never say never View Post
Anyway, hope this helps you out.
Yeah, I've picked up a lot of info ... not all I want, but a lot. I do intend to test out software RAID, which in the past would have been speed limited due to aggregate bus issues, south bridge, etc. But I have found that these days I can keep all SATA drives running at full speed even when many are used in parallel. So a software RAID would only be using up some main CPU.
 
  


Reply

Tags
raid, raid5, write



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] /usr/bin/expect : Script to check server load using both expect and bash Soji Antony Programming 1 07-27-2010 11:27 PM
How to test the RAID speed videoclock Linux - Software 5 08-26-2009 05:35 AM
Raid disk speed gbwien Linux - Hardware 2 03-05-2009 08:26 AM
How can I speed up a RAID 5 grow operation? damiendusha Linux - Server 1 08-06-2008 07:02 PM
mplayer and fglrx [gl] Could not aquire buffer for dr Expect a _major_ speed penalty Debian_Poland Linux - Software 1 04-07-2006 04:58 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Hardware

All times are GMT -5. The time now is 11:36 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration