Originally posted by: Blain
Jon makes a pretty good argument. :roll:
I disagree, i wrote the following email to him in response:
Hi Jon,
I would like to comment on your article named "Why RAID is (usually) a Terrible Idea". While i generally agree with you that RAID is much of a hype and does indeed complexify the system and due to crappy onboard driver/firmware implementations can create lot's of headaches, there are some factual inaccuracies i would like to correct.
Let me take on a quote:
> RAID0 does increase throughput, but it does absolutely nothing to help the access time. What does that mean? It means that if you are reading and writing a large number of smaller files, the performance benefit will be very minimal. If you are reading or writing a large amount of data at one location on the disk, that is where you will see a benefit. Therefore, in times where you are working with transferring or copying very large files, RAID0 can make sense.
This, quite frankly, is not true. Although RAID0 does not lower the access time (it may even yield higher access time due to overhead latency), it *does* actually speedup random I/O and thus realistic I/O patterns. This is a common misconception that people refuse to accept; sustained by useless synthetic benchmarks like HDtune. When i tell this to people on forums they reject my statement; then i ask them the following question: how come a *real* hardware RAID controller like Areca ARC-1210 does have higher access time than a single disk but beat the single disk in any benchmark by a great margin? Apparantly, access time is not the paramount variable regarding 'realistic speed'.
In general, performance can be classified in:
- Sequential Transfer Rate (STR)
- Random I/O performance
The first is easy to measure and a shitload of free and commercial applications exist to measure them; they are used in 98% of all RAID-benchmarks. The problem is that STR is often not very important anymore; since even a single disk can deliver quite high STR values - the random I/O performance is much more important. It's obvious how RAID0 does speedup STR performance, but how does it speedup random I/O performance? Assuming the access time is the same or even slightly higher - how can a RAID0 array be faster than a single disk when there is virtually no sequential access? The answer is parallelism - with a single disk all I/O requests will be executed in serial order but with RAID0 each disk is able to perform I/O at the same time - thus if there are 2000 I/O requests to be done, each disk could take a part of that load and the aggregate performance will be higher than a single disk. The accesstime is actually a measurement of ONE I/O request; with no ability for parallelization. With a 4-disk RAID0 array, you have four drives able to seek. One I/O request may not be processed faster but a bunch of them most likely will, since at least some of them will likely occur on one of the other disks.
To show some proof, please review this highly random I/O benchmark called RAIDTEST. This benchmark does a mixture of random I/O requests with transfer sizes ranging from 16KB to 128KB, so very much non-sequential. I/O requests are 50% read and 50% write.
Single drive (ad8)
concurrency Performance in I/O's per sec. average
1 106 106 107 106
4 106 106 106 106
16 116 116 116 116
32 127 125 126 126
128 151 151 150 150
256 156 156 157 156
RAID0 with 4 disks: gstripe 4xad - 128KB stripe - FM off
concurrency Performance in I/O's per sec. average
1 173 173 173 173
4 270 270 270 270
16 338 338 338 338
32 370 370 370 370
128 444 434 434 437
256 465 465 465 465
As you can see the RAID0 array yields a significant benefit with regard to extremely random performance. The performance gain ranges from 63% to 300%.
There are some conditions which spoil RAID0's ability to increase random I/O performance though, like:
- too low stripesize (with a stripesize of 16KB; many I/O requests have to be done by multiple disks thus killing the parallelization ability)
- filesystem misalignment (the filesystem partition does not start at the beginning of a stripe block)
- application data offset != 0 (usually no real problem and unavoidable)
So my conclusion is that RAID0 offers significant performance benefits to both sequential and non-sequential ("random") access patterns. If people tell me RAID0 is unsafe, then i tell them their single-disk is, too. If your data is important, you need AT LEAST a proper backup or RAID1/3/4/5/6, while a combination provides the best safeguard against dataloss. It kind of irritates me that people who use single disks both internally and a bunch of external disks, maybe a total of 6 disks without any protection or proper backup, but when somebody has a 2-disk RAID0 array people yell at them calling them unsafe. Single disk-users are just as unsafe. The only difference is that the magnitude of dataloss is bigger WHEN dataloss occurs. RAID0 in itself does not speedup the failure rate of drives. Any statistics that implicate this draw wrong conclusions. They do not account for the different usage patterns for disks used in RAID:
- often used by more power users who put greater demands and utilization
- multiple disks likely means more heat
- RAID-systems can be 24/7
Apart from this, i agree that any RAID and especially onboard RAID adds complexity to one's system and can cause major headaches. It should not be used unless people understand what it is and carefully weigh the advantages versus the disadvantages. But RAID is not evil, and can actually be of great use to even simple gamers or casual computer users. Disk performance is considered to be the biggest bottleneck of modern computers. Everytime when i hear a computer is 'slow' it's due to disk bottleneck - often caused by too little RAM which is in turn caused by spyware or many loaded applications at startup. In the future mechanical harddrives will be abandoned which in my opinion can't happen fast enough.
Thanks for your time for reading my feedback and please be sure to respond if your time allows!
Regards