Best platform for scientific modeling?

Page 2 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

JPS

Golden Member
Apr 23, 2001
1,745
0
71
Originally posted by: palladium
^ not sure of the OP's modelling software work with CUDA...........

CUDA would be sweet, but the application(s) involved are not written to leverage it.

RaynorWolfcastle

Your logic behind your suggestion is sound. We may wind up going that route if Nehalem Xeon CPU and motherboard availability for system builders is truly going to be a ways out...
 

clk500

Junior Member
Mar 25, 2009
11
0
0
rudder> JPS didn't focus on CUDA, otherwise he maked an entry in his gear hypothesis
 

Idontcare

Elite Member
Oct 10, 1999
21,118
58
91
Originally posted by: clk500
JPS> I think 24GB with your amount of data will be short (option 1)

Yeah it'll be a question of how frequently does the calculation "touch" all 4GB of the dataset.

If it needs to do computations on all 4GB of data multiple times per second then contention (bandwidth) of the trichannel memory controller is going to be the first and foremost bottleneck as one thread alone would saturate a 18GB/s interface while moving 4GB of data in and out of the CPU 4-5 times per second.

If the app merely needs to create 4GB of memory space to hold all its data and somewhat infrequently cycles thru all of it (over the course of seconds to minutes) then having a nice fast 2GB/s raid array will suffice.

On my simulations I load four instances of a single-threaded program onto my quad-core. The system has about 2.5GB of ram available to the four programs, so 600+ MB each. But the dataset is about 2GB per program, so swapfile is heavily used.

However my programs are compute intensive but only on about 50MB/s of the data per application, so long as my swap file can keep up with 200+ MB/s of transfers it never becomes the bottleneck.
 

clk500

Junior Member
Mar 25, 2009
11
0
0
that's true with more information about program archi we'll be able to do a better "forecast".
 

Denithor

Diamond Member
Apr 11, 2004
6,300
23
81
Originally posted by: Idontcare
On my simulations I load four instances of a single-threaded program onto my quad-core. The system has about 2.5GB of ram available to the four programs, so 600+ MB each. But the dataset is about 2GB per program, so swapfile is heavily used.

Originally posted by: RaynorWolfcastle
- Swap drives: 4x 32 GB Intel X25-E (all computation-related transactions happen here)
 

magreen

Golden Member
Dec 27, 2006
1,309
1
81
Learning a lot from this thread. Thanks all for the contributions. :thumbsup:
 

Idontcare

Elite Member
Oct 10, 1999
21,118
58
91
Originally posted by: Denithor
Originally posted by: Idontcare
On my simulations I load four instances of a single-threaded program onto my quad-core. The system has about 2.5GB of ram available to the four programs, so 600+ MB each. But the dataset is about 2GB per program, so swapfile is heavily used.

Originally posted by: RaynorWolfcastle
- Swap drives: 4x 32 GB Intel X25-E (all computation-related transactions happen here)

That's a needlessly expensive way to get swapfile bandwidth for the OP (and for myself), especially with that particular raid card.

Unless the swap file is composed of millions of 4KB files to make up 4GB of data the latency is a none-issue (plus the raid card itself has 4GB of cache to buffer anyways).
 

RaynorWolfcastle

Diamond Member
Feb 8, 2001
8,968
16
81
Originally posted by: Idontcare
Originally posted by: Denithor
Originally posted by: Idontcare
On my simulations I load four instances of a single-threaded program onto my quad-core. The system has about 2.5GB of ram available to the four programs, so 600+ MB each. But the dataset is about 2GB per program, so swapfile is heavily used.

Originally posted by: RaynorWolfcastle
- Swap drives: 4x 32 GB Intel X25-E (all computation-related transactions happen here)

That's a needlessly expensive way to get swapfile bandwidth for the OP (and for myself), especially with that particular raid card.

Unless the swap file is composed of millions of 4KB files to make up 4GB of data the latency is a none-issue (plus the raid card itself has 4GB of cache to buffer anyways).

That may be true, then again it may not, it really depends on the specifics of the simulations he is running. It's unclear from the OP's post whether the 4 GB was the working memory footprint for each simulation, or the the output file size of a simulation.

The advantage of the X25-E array is that it essentially buys you another low latency layer of cache before the spinning disks. You get 500 MBps+ read/write at 0.1 ms latency for 128 GB. Obviously the absolute best option would be to get enough RAM and keep all swapping in RAM, then write output directly to the spindle based drives through the RAID controller but that can be a rather pricey proposition.

Let's say for example, that each instance needs to access 3 GB worth of data. For 16 instances, you would really want at least 48 GB of memory so everything runs from RAM. That would be ~$2,400 worth of RAM but you can now skip the SSD array.

If the simulations each need to access 4+ GB worth of data. Well, now you really need a lot of RAM and that will blow through the OP's budget. At this point, the SSDs are a viable option because they act as another layer of cache, and having the swapping spill over onto the spinning disk array will be a performance nightmare.

In a situation like one here, I would doubt that the cache on the RAID controller is really all that significant. The RAID controller cache is hopefully not being used for swapping (in which case you would be better off adding more RAM) and only serves to attempt to convert random reads/writes into sequential ones as much as possible. I would think that 2 GB cache should work just as well for that.

 

JPS

Golden Member
Apr 23, 2001
1,745
0
71
Here is an update. My client took their time in pulling the trigger, but it looks like they are finally ready to go. Here is what I think they are going to go with:

MOTHERBOARD: SUPERMICRO MBD-X8DAi-O Dual LGA 1366 Intel 5520
PROCESSORS: 2 x Intel Xeon E5540 Nehalem 2.53GHz 80W Quad-Core Processors
PROCESSOR HEATSINKS: 2 x Noctua NH-U12P SE1366 120mm
SYSTEM MEMORY: 48GB (12 x 4GB) Crucial DDR3 PC3-10600, CL=9, Registered, ECC, DDR3-1333
VIDEO CARD: Some fanless nVidia option
RAID CONTROLLER: Areca ARC-1680IX-12 PCIe x8 SAS RAID Card w/ 4GB cache and BBU
STORAGE ARRAY DRIVES: 4 x Western Digital RE3 WD1002FBYS 1TB SATA Hard Drives (RAID10)
APP/SCRATCH DRIVE: 1 x OCZ Vertex Series OCZSSD2-1VTX120G 2.5" 120GB SATA Solid State Drive
OS/BOOT DRIVE: 1 x OCZ Vertex Series OCZSSD2-1VTX60G 2.5" 60GB SATA Solid State Drive
DVD DRIVE: Samsung SH-S223F
CASE: Cooler Master Stacker 810
POWER SUPPLY: PC Power & Cooling PPCT860 860W ATX12V / EPS12V 80 PLUS

The logic behind it is as follows:
- Max the RAM to eliminate the need for a separate high-throughput array. 48GB will give each thread approx 3GB to work with which the client thinks will be sufficient based on their in house testing.
- Boot and App drives are SSD as the client wants speed and lower power usage
- CPUs were chosen to allow headroom should the need to upgrade arise and the fact that they are 80W vs the 95W offerings higher up in the family.
- Current RAID card is overkill, I know, but it allows plenty of headroom to add drives and grow the storage array. Plus, I can add in a throughput array should the need arise.

Anyone see a glaring issue with this setup?
 

Idontcare

Elite Member
Oct 10, 1999
21,118
58
91
Originally posted by: JPS
PROCESSORS: 2 x Intel Xeon E5540 Nehalem 2.53GHz 80W Quad-Core Processors

PROCESSOR HEATSINKS: 2 x Noctua NH-U12P SE1366 120mm

- CPUs were chosen to allow headroom should the need to upgrade arise and the fact that they are 80W vs the 95W offerings higher up in the family.

Any reason to not just go with retail HSF?

Originally posted by: JPS
RAID CONTROLLER: Areca ARC-1680IX-12 PCIe x8 SAS RAID Card w/ 4GB cache and BBU

STORAGE ARRAY DRIVES: 4 x Western Digital RE3 WD1002FBYS 1TB SATA Hard Drives (RAID10)

APP/SCRATCH DRIVE: 1 x OCZ Vertex Series OCZSSD2-1VTX120G 2.5" 120GB SATA Solid State Drive
OS/BOOT DRIVE: 1 x OCZ Vertex Series OCZSSD2-1VTX60G 2.5" 60GB SATA Solid State Drive

- Boot and App drives are SSD as the client wants speed and lower power usage

- Current RAID card is overkill, I know, but it allows plenty of headroom to add drives and grow the storage array. Plus, I can add in a throughput array should the need arise.

I disagree with the statement that the raid card is overkill. Given how many concurrent threads this guys is going to be running and the fact that logging system states for the simulations will be happening in practically non-stop fashion, the harddrive subsystem is going to be THE bottleneck before anything else (as you currently have it spec'ed).

I'm assuming you plan to setup the SSD's as pass-thru disks on the raid card so they get to take advantage of the 4GB cache? If not, do consider it.

Vertex? Really? Dropping a bundle on the ram, the cpu, and the raid card but then it gets cheapened up at the SSD hardware? If going just a single SSD for each given function, versus some raid-0 setup of >=2 SSD's, then at least go Intel X-25M. If Vertex is unavoidable then double them up in raid-0 so you get those random writes for small files up to some levels worthy the rest of the system.

Sure single vertex sitting on the other side of that 4GB cache is going to be faster than any spindle-based setup, but raid-0 is just so much more fun its just a shame to not see this rig go there.

Just my opinions, I know you've considered these already.
 

JPS

Golden Member
Apr 23, 2001
1,745
0
71
Originally posted by: Idontcare
Originally posted by: JPS
PROCESSORS: 2 x Intel Xeon E5540 Nehalem 2.53GHz 80W Quad-Core Processors

PROCESSOR HEATSINKS: 2 x Noctua NH-U12P SE1366 120mm

- CPUs were chosen to allow headroom should the need to upgrade arise and the fact that they are 80W vs the 95W offerings higher up in the family.

Any reason to not just go with retail HSF?

Originally posted by: JPS
RAID CONTROLLER: Areca ARC-1680IX-12 PCIe x8 SAS RAID Card w/ 4GB cache and BBU

STORAGE ARRAY DRIVES: 4 x Western Digital RE3 WD1002FBYS 1TB SATA Hard Drives (RAID10)

APP/SCRATCH DRIVE: 1 x OCZ Vertex Series OCZSSD2-1VTX120G 2.5" 120GB SATA Solid State Drive
OS/BOOT DRIVE: 1 x OCZ Vertex Series OCZSSD2-1VTX60G 2.5" 60GB SATA Solid State Drive

- Boot and App drives are SSD as the client wants speed and lower power usage

- Current RAID card is overkill, I know, but it allows plenty of headroom to add drives and grow the storage array. Plus, I can add in a throughput array should the need arise.

I disagree with the statement that the raid card is overkill. Given how many concurrent threads this guys is going to be running and the fact that logging system states for the simulations will be happening in practically non-stop fashion, the harddrive subsystem is going to be THE bottleneck before anything else (as you currently have it spec'ed).

I'm assuming you plan to setup the SSD's as pass-thru disks on the raid card so they get to take advantage of the 4GB cache? If not, do consider it.

Vertex? Really? Dropping a bundle on the ram, the cpu, and the raid card but then it gets cheapened up at the SSD hardware? If going just a single SSD for each given function, versus some raid-0 setup of >=2 SSD's, then at least go Intel X-25M. If Vertex is unavoidable then double them up in raid-0 so you get those random writes for small files up to some levels worthy the rest of the system.

Sure single vertex sitting on the other side of that 4GB cache is going to be faster than any spindle-based setup, but raid-0 is just so much more fun its just a shame to not see this rig go there.

Just my opinions, I know you've considered these already.

The HSFs were chosen as an option for a quieter cooling setup than the stock. Really the only reason.

As for the drive subsystem, yes the SSDs are going to be set as pass through on the Areca at a minimum. In regards to vertexs vs X25-Ms, well let's see:

Vertex 120GB - $385
# Sequential Access - Read: Up to 250 MB/s
# Sequential Access - Write: Up to 180MB/s

Intel X25-M 80GB - $325
# Sequential Access - Read: Up to 250MB/s
# Sequential Access - Write: Up to 70MB/s

I honestly have not had the time to bench them, and to me, the Vertex looks like a better deal. As for setting them in RAID, I would L O V E to do this but one of the concessions made to go with 48GB of RAM was to cut back on the SSDs. If we do encounter a bottleneck there, there is room to add SSDs and alleviate that strain. For not too much more money I could do two 30GBs Vertex SSDs and two 60GB Vertex SSDs in RAIDo instead of what I posted before. Hmmmm......
 

Idontcare

Elite Member
Oct 10, 1999
21,118
58
91
Originally posted by: JPS
As for the drive subsystem, yes the SSDs are going to be set as pass through on the Areca at a minimum. In regards to vertexs vs X25-Ms, well let's see:

Vertex 120GB - $385
# Sequential Access - Read: Up to 250 MB/s
# Sequential Access - Write: Up to 180MB/s

Intel X25-M 80GB - $325
# Sequential Access - Read: Up to 250MB/s
# Sequential Access - Write: Up to 70MB/s

I honestly have not had the time to bench them, and to me, the Vertex looks like a better deal. As for setting them in RAID, I would L O V E to do this but one of the concessions made to go with 48GB of RAM was to cut back on the SSDs. If we do encounter a bottleneck there, there is room to add SSDs and alleviate that strain. For not too much more money I could do two 30GBs Vertex SSDs and two 60GB Vertex SSDs in RAIDo instead of what I posted before. Hmmmm......

The sequential bandwidth specs are nice but the experience one derives from using SSD's for the OS/bootdrive are limited by the small-file random read performance.

Sitting behind 4GB of raidcard cache makes the small-file random writes not much a performance concern with any SSD, but those random reads will still be gated by the inherent performance of the hardware on the other side of the cache.

Have you had a chance to read thru Anand's article on Vertex?

Also this legitreviews article on Vertex is fairly insightful as well.

I do think raid-0 two or three smaller capacity vertex's on that areca card will produce superior performance to a single X25-M, even for the small-file random reads, so as you mention this option is definitely something to consider if the computer case in question has the storage space for it.

Regardless, really whether you go raid-0 SSD or not the rig will be nice and responsive as far as the end-user experiences it. And as you say, getting the larger SSD today and then upgrading to raid-0 down the road as an upgrade path makes sense too.
 

Rubycon

Madame President
Aug 10, 2005
17,768
485
126
Originally posted by: Idontcare
I do think raid-0 two or three smaller capacity vertex's on that areca card will produce superior performance to a single X25-M, even for the small-file random reads

Cached i/o on ARC1680ix will crush any SSD IME.

What OS is the OP using?
 

JPS

Golden Member
Apr 23, 2001
1,745
0
71
Originally posted by: Rubycon
Originally posted by: Idontcare
I do think raid-0 two or three smaller capacity vertex's on that areca card will produce superior performance to a single X25-M, even for the small-file random reads

Cached i/o on ARC1680ix will crush any SSD IME.

What OS is the OP using?

We are going with Enterprise Server.

For shits and giggles and the sake of discussion, if we decided to scrap the SSDs and run regular spindles, would you go with SATA VRs or something else? SAS spindles?
 

Rubycon

Madame President
Aug 10, 2005
17,768
485
126
Savios give highest mechanical i/o. I have used up to 10 Fujitsu MBA147RC 15K SAS in RAID0 and it flies.
 

JPS

Golden Member
Apr 23, 2001
1,745
0
71
O.K. - after reviewing all of the articles pointed out by Idontcare and putting some more thought into this, I am now leaning toward SAS HDDs and will likely go sell my client on those tomorrow. I think ultimately, there will just be too much babysitting involved with the Vertex SSDs and the cost differential for the X25-Ms is steep. So, I will likely replace the prior boot array with a pair of MBA3073RCs in RAID 0 and the prior Application/Scratch array with a pair of MBA3147RCs in RAID 0 as the difference in cost is a wash. Cool and thanks for the suggestions!
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |