How to best build RAID for scientific computing

hugoist123

Junior Member
Jan 14, 2007
5
0
0

Here is the situation that I am facing. I have a large data file, about 50G to 100G, that is to be read from disk storage repeatedly by a Fortran program. I am looking at the possibility of using RAID 0, say 4 SATA disks, as a way of speeding up the reading of the data. Here are some questions that I would appreciate any input and discussions:

1. Sould I use the SATA slots on the motherboard or it would be better to use a controller card for RAID?

2. I am using linux as the operating system. What is the best file system to be used on the RAID disk for fast data access?

3. If I use mdadm to creat the array, is there an optimal value for chunk size for reading primarily large data files?

4. What realistic performance increase that I can expect from RAID 0 configuration? Would it be N*V where N the number of disks and V the read speed of single drive?

5. What are the things to look for when buying a motherboard or RAID controller for this purpose?

6. Any recommendations on particular motherboard/harddrive combinations from experiences?

A little test that I have experimented with the motherboard that I currently have is as follows. The board is P4P800-E, it has two SATA 1.5g/s connections. I have two identical Seagate 320MB .10 SATA 3g/s drives. The system is running on Mandrake 10.1. The drive is only recognized when the on board raid controller is turned on. The drives are recongnized as sda and sdb and hdparm returns 72MB/s on each device. I created RAID level 0 and the hdparm -t /dev/md0 returned only 83MB/s. The single drive is not giving 150MB/s as it should (right, for 1.5g/s?) and the raid 0 is not doubling the read speed.

I would appreciate any comments and suggestions.

Thanks!

/Hugoist




 

silverpig

Lifer
Jul 29, 2001
27,709
11
81
1. It's better to use a controller card, but make sure it's a hardware card (usually >$100) and not a software card (which is basically what is on your motherboard). How worth it depends on how much you have to spend I guess.

2. XFS and JFS are both good, but opinions will vary.

3. Not sure on my end here.

4. Not quite. In relatively ideal situations the performance of the nth drive should add ~70% of the performance of the last one, but this again varies.

5. Is it a hardware or software card? Most cheap RAID cards (that means most motherboards too, until you get into server-class stuff) are software (ie, your cpu does the calculations for where to put the data). If you really want a good performance boost, look into a hardware RAID card.

6. I've never run RAID myself, but I like Seagate and WD drives...

You shouldn't get 150 MB/s. That's the bandwidth of the cable/bus connecting your drive to your cpu basically. The drive has a read/write speed which is slower than that. You can usually find programs that will test the cache burst speed of your drives, and those should get you 140 MB/s plus, but once the cache is empty, you'll be back to your 72 MB/s or so. That RAID speed seems a little slow to me, but yeah, it won't be double.
 

CTho9305

Elite Member
Jul 26, 2000
9,214
1
81
Could you get lots of RAM instead? You'd have to move to a server-class motherboard (and an Opteron system if it's going to be >64GB, since IIRC the Xeons are 36-bit rather than 40 or 48).
 

tidehigh

Senior member
Nov 13, 2006
568
0
0
i like CTho's idea. Is it possible to organize the file into data sets that have speed priority based on how much they are accessed? If so you wouldn't need enough RAM to store all the data. RAM is the way to go if you need real speed. Maybe you don't need this much speed though? I've never played with RAID configs enough to provide any valuable information on that specific topic.
 

hugoist123

Junior Member
Jan 14, 2007
5
0
0
Thanks to everyon for all the comments above. RAM is fastest, but the problem is that the big data file is to be reused over time and also the size could grow to over 1-2000G in a later stage. Basically, if a RAID configuration that can give a read speed around 1GB/s, if that is achievable, that should solve my problem. Looks like I need also to check out hardware RAID cards for their potential.
 

silverpig

Lifer
Jul 29, 2001
27,709
11
81
1 GB/s? Wow, not even a multi-thousand dollar U320 SCSI array can do that. You'd have to go to fibre-channel and I don't even know if that could do it.
 

Loki726

Senior member
Dec 27, 2003
228
0
0
Originally posted by: hugoist123
Thanks to everyon for all the comments above. RAM is fastest, but the problem is that the big data file is to be reused over time and also the size could grow to over 1-2000G in a later stage. Basically, if a RAID configuration that can give a read speed around 1GB/s, if that is achievable, that should solve my problem. Looks like I need also to check out hardware RAID cards for their potential.

What exactly is your problem that requires 1GB/s of bandwidth? Even if you could get anywhere near that from a disk array over SATA (you can't because of OS, error checking, communication overheads) you won't be able to process it fast enough to do anything with it other than move it around.

Also, consider that it will still have a huge latency associated with random accesses to the array. Every time you access data in a new page, you have to trap into the OS, figure out the address on disk you are looking for, look up the piece of data you are accessing in the drive's meta data table, move the drive head over the correct piece of data, and send it back into memory. If your program is single threaded it has been sitting idle or swapped out all of this time (10s of millions of cycles). RAID has to go through the same steps and it even adds some more to deal with how the data is spread across multiple drives. The only advantage is that you can get back two or more pieces of data per operation. But you still have to wait a huge amount of time for the first piece of data once you realize that you need it. To be able to process data like this effectively, you should build in multithreaded support into your application where one or more thread prefetches data while another continues processing data already in cache or memory while the disks are getting the new data.
 

f95toli

Golden Member
Nov 21, 2002
1,547
0
0
I think it should be almost doable using a fibre channel SAN (It think the limit is 4 Gb/s or so. meaning you are halfway there).
However, as Loki726 has already pointed out: It is very diffuclt process data at that speed unless you are doing something extremely simple.
Also, a 4 Gb/s Fibre channel SAN will cost some serious money...

 

CTho9305

Elite Member
Jul 26, 2000
9,214
1
81
You're sure you can't optimize the program to have a better access pattern to the data?
 

hugoist123

Junior Member
Jan 14, 2007
5
0
0
Agian thanks for all the comments. The part of the code concerning the data is quite simple. It goes something like this

do i=1,N
read(*) chunk_of data
process(chunk_of_data)
enddo

where i=N is the end of the data file. The size for the chunk_of_data can be set arbitrarilly. It is not single threaded and can in fact be carried out in parallel. Eventually things will be done on a cluster. But still each node will be given a huge file to store and read SEQUENTIALLY, so the issue of reading a large file from storage as fast as possible is still relavent. Concerning Loki726's point about CPU processing time, the operations on the data read in are very very simple, so the real bottle neck is the read speed.
 

Madwand1

Diamond Member
Jan 23, 2006
3,309
0
76
Simple math. You can't do STR greater than drive speed. The fastest conventional drives will do around 80 MB/s, more typically around 60 MB/s. Even if you get some special ones that can do 100 MB/s, you'd need 10 of them running at optimal speed to hit 1000 MB/s aka 1 GB/s. More practically, you're looking at 16 or more very fast drives running efficiently to hit your targets. Are you willing to bear that cost? You're also looking at potential system limitations and declining returns at this point.

At 16 drives, on-board is pretty much out of the question, and so you're looking at a very high end controller, on a wide PCIe interface. PCI-X 133/64 will barely do 1 GB/s raw data rate, so that's probably out of the question as well, and you should look for a high-end native non-PCI-X bridged solution.

Moreover, if the cards are capable of doing such high-speed data transfers, you should be able to find marketing data from the manufactures to that effect, with some detail on system setup to get you started.
 

hugoist123

Junior Member
Jan 14, 2007
5
0
0
Madwand1's calculation is pretty close to what I initially thought. I have seen some boards that have 8 SATA slots. If each disk gives 100MB/s, it would be close to what I wanted. Still seaching web for the controller option ......
 

sjwaste

Diamond Member
Aug 2, 2000
8,760
12
81
I realize this is a technical forum, but posts like this should always contain some estimate of the budget. You can pretty much meet any spec w/ enough money to throw at the problem, but most of us don't have unlimited funds

OP, what is your budget?
 

imported_Tick

Diamond Member
Feb 17, 2005
4,682
1
0
Originally posted by: sjwaste
I realize this is a technical forum, but posts like this should always contain some estimate of the budget. You can pretty much meet any spec w/ enough money to throw at the problem, but most of us don't have unlimited funds

OP, what is your budget?

Hell, you could build an array entirely with huge numbers of 36 GB 15k RPM drives for max data rate, but it would cost a fortune, as you would need like 45 drives for any reasonable storage size, and then you would be into 3 15 drive enclosures, each running raid 5 with a hot spare, and then soft raid 0 across them on the host machine, all on fiber. The question is, is he willing to spend $25K on this.
 

hugoist123

Junior Member
Jan 14, 2007
5
0
0
Obviously cost is a factor. So,

let x=dolloar amount per node
let P=badnwidth, P GB/s

P is going to be a function of x, so P=P(x).

Of course P is expected to increase with x, more money gets higher bandwidth. But spending more x becomes less attractive when, say, P(2x)<2P(x), that is when doubling the money does not result in the doubling of bandwidth, (The factor 2 actually can be any number, come to think of it), because of distributed computing and because communications between nodes are neglegiblly low in this case. The question is then about finding the largest P such that P(2x)>2P(x).
 

sjwaste

Diamond Member
Aug 2, 2000
8,760
12
81
Well, the doubling of money doesn't result in the doubling of bandwidth when you go from 1 to 2 disks.

Is there a hard limit on what you can spend? $1k? $5k? $25k? I mean, if someone tells you the returns begin to decline at 100k, can you afford that too? A ballpark dollar figure would be good, doesn't have to be exact.

Help us help you
 

spidey07

No Lifer
Aug 4, 2000
65,469
5
76
What you are asking for in terms of throughput require huge arrays and advanced caching (which according to what you described caching won't help you) and load balancing via a SAN and fibre channel bonding. The load balancing software on the fibre channel adapters and the SAN itself will get around the bus bottlenecks

In other words, try and make the application a bit more efficient?
 

Mark R

Diamond Member
Oct 9, 1999
8,513
14
81
The issue is really, where you want to spend the money, and what upgrade paths you want.

Is the application optimisable? You say it needs sequential reads, but getting 1000 MB/s would be a major undertaking. Whereas, if you could parallelize it onto 4 systems, then achiveing your new target of 250 MB/s becomes trivial. Even parallizing it onto 2 systems would mean aiming for 500 MB/s - you'll need workstation grade motherboard and RAID, but you should be able to get pretty close with 8 SATA drives per node.

Is the application pure sequential reads, or is there random access? If it's pure sequential then just go for the drives with the fastest STR for the buck - the 7200 seagates are pretty good. If it's random access, then things get difficult and you need to consider where the money should go - RAM, caching RAID controllers, 10 or 15k rpm drives, duplicating your data in RAID 1 mode to distributed seeks, etc.

If cost is very important, and you need best possible value, then I think you need to look at clustering this - as this will be a lot cheaper than building a single very high end system.

I'd expect a system like this to get around 400 MB/s, maybe as much as 500 from time to time - so a pair of them would probably get you to your target. This then makes it easy to upgrade at a later date - just add more identical systems.
P5W64 WS
Areca ARC-1220
8x Seagate 7200.10 drives in RAID 5

Having thought about it a bit more, more lower end systems probably isn't going to be cheaper - due to CPU/RAM/costs, etc.

Otherwise to get 1000 MB/s is going to be a difficult and expensive problem.


 

uOpt

Golden Member
Oct 19, 2004
1,628
0
0
What is the exact access pattern?

It it is linear through all the file all the time then obviously you need to improve the algorithms, not the storage.

Software (not onboard sata) RAID-0 on the internal sata ports will do fine and you will reach about the sum of the speed of the drives for linear reads.

Note that if the access pattern is not large linear read that might change. For example RAID-0 does not speed up small reads after random seeks.
 

Ice Czar

Junior Member
Dec 29, 2004
13
0
0
http://tweakers.net/reviews/557/
comparative benchmarks in IOps


cache strategies

http://www.soliddata.com/pdf/file_caching.pdf

Abstract

While Internet bandwidth issues are being resolved with broadband connections, a transaction-processing bottleneck is merging as a critical issue. A single e-mail, stock trade or book purchase requires very little bandwidth, but customer response time and system resiliency often suffer when application servers must process millions of transactions per day. Where a server is waiting on mechanical disk drives to read or write the transaction data, one solution is to distribute transactions across many servers and disks. However, if mechanical latency can be eliminated, existing servers can handle many more transactions - multiplying scalability while speeding response time. Where a small percentage of the data files consume most of the I/O activity, the solution is to place those files in a high-performance file cache. This approach typically uses solid-state disk (SSD) for file caching. By eliminating mechanical latency,

http://www.storagesearch.com/ssd-buyers-guide.html
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |