Best platform for scientific modeling?

JPS

Golden Member
Apr 23, 2001
1,745
0
71
I have a client who has asked me to spec out a dedicated workstation that they could use specifically for computer modeling/simulation - pure data crunching, not graphic simulation or CADish work. The programs they will use to do the work are both threaded and non-threaded. I suspect they will be using primarily non-threaded apps, but running multiple instances of said app simultaneously so a multi-core setup is a given. The setup will also most likely be run headless or via a remote session.

Cost for this client is not too much of a concern as long as I stay under $10K. If I presented them options that tend to the "greener" end of the power consumption spectrum, they would be excited, but there is a fair bit of flexibility there.

So, given the above, who is the current champ in terms of pure horsepower for number crunching and computer modeling/simulation? Would you go AMD or Intel? Which specific CPU(s) and why?

Thanks in advance for your time and comments....
 

Andrew1990

Banned
Mar 8, 2008
2,155
0
0
Intel !7 no doubt with any sort of modeling. When do you have to build this machine as a few motherboard makers may be coming out with dual socket LGA1366 boards which would allow for 2 !7s which makes a total of 16 threads.

 

mcrumiller

Junior Member
Dec 18, 2008
23
0
0
Yeah definitely...you want as many threads running as possible for almost all scientific simulations. RAM shouldn't be a problem, as most motherboards support at least 32GB now.

Of course, it entirely depends on how easily threaded your application is, but most scientific platforms (i.e. Matlab) are well-suited for multicore processors.
 

Idontcare

Elite Member
Oct 10, 1999
21,118
58
91
Here's some "enthusiast" benchmark data for i7 vs PhII (among others) for a couple real-world scientific apps.

http://techreport.com/articles.x/16147/11

But the best source for extracting scientific application performance is the Spec CFP2006 benchmarks. http://www.spec.org/cpu2006/CFP2006/

Find the particular sub-benches that represent your clients likely area of research and then look at performance results for just that benchmark.

For example if your client is a computational chemist interested in "smallish" molecules (<100 atoms) then they are likely to be using the computer for running calculations of the type that utilize Hartree-Fock and Density Functional Theory algorithms. In this example you'd be interested in knowing the strengths and weaknesses of Intel vs AMD when it comes to the 416.gamess benchmark (among others).

If they are a biochemist interested in performing molecular dynamics simulations on large proteins (tens of thousands of atoms) then you want the rig to be robust when it comes to 444.namd benchmark results (among others).

Fluid dynamics, finite element analysis, weather modeling, its pretty much all in there.

For example I see that an i7 965 churns out a base score of 26.0 for the gamess bench and 19.8 for the molecular dynamics bench; whereas an AMD Shanghai at 2.7GHz nets 16.8 for gamess and 13.0 for molecular dynamics. (no X4 benches are published it appears)

Even adjusting the AMD results for clockspeed scaling to 3.2GHz we see the i7 965 is still going to do markedly better in these particular type of scientific applications.
 

JPS

Golden Member
Apr 23, 2001
1,745
0
71
Thanks for all of the comments thus far. I just received some confirmation and the main thing my client wants is the shortest/quickest run time per simulation. They will be running multiple simulations at once in an effort to cut down on processing times. The goal is really for the fastest processing time/speed for multiple, single-threaded simulations running in parallel. What I do not know is really which algorithms they are using. The work primarily consists of climate change simulations and modeling the effects of fire across landscapes. Much of the input data is coming from GIS layers.
 

Idontcare

Elite Member
Oct 10, 1999
21,118
58
91
Sounds like they are a candidate for 481.wrf.

i7 965 score 46.2 on this one, versus 22.6 for the Shanghai at 2.7Ghz...again even scaling to account for the clockspeed increase you'd get for buying an X4 at 3.2GHz you are still better off going with the i7 965.

Also their simulations are no doubt going to be ram dependent (quantity and bandwidth) so going with an x58 mobo will allow you to pack 6 dimms with DDR3 offering pretty good bandwidth for their apps.
 

JPS

Golden Member
Apr 23, 2001
1,745
0
71
Originally posted by: Idontcare
Sounds like they are a candidate for 481.wrf.

i7 965 score 46.2 on this one, versus 22.6 for the Shanghai at 2.7Ghz...again even scaling to account for the clockspeed increase you'd get for buying an X4 at 3.2GHz you are still better off going with the i7 965.

Also their simulations are no doubt going to be ram dependent (quantity and bandwidth) so going with an x58 mobo will allow you to pack 6 dimms with DDR3 offering pretty good bandwidth for their apps.

Thanks for the links. I am really liking the i7 performance numbers, I am only pausing with going down this route as the client wants this system built ASAP and there are no dual-socket i7 motherboards out yet - ugh...
 

Idontcare

Elite Member
Oct 10, 1999
21,118
58
91
Consider that if you do go multi-socket for AMD then your top clockspeed is 2.7GHz and the performance numbers are as I referenced them (i.e. no scaling).

When I made mention to scaling the numbers to 3.2GHz that was because I assumed the choice would be 1S i7 965 vs. 1S PhII X4 940.

But if 2S AMD is on the table (4S is not feasible with your 10k budget) then the top clockspeed is 2.7GHz. Also consider the i7 975's due out in April are to be clocked at 3.33GHz stock.

So what will be more performance for your customer 2x2.7GHz AMD chips with the reduced DDR2 bandwidth or the 1x3.2GHz (8 threads with HT) with the much higher DDR3 bandwidth?

If I had to make an educated guess I'd wager the i7 965 will still be the better setup, and certainly will be lower power consumption both at idle and load compared to a 2x2.7GHz AMD rig.
 

JPS

Golden Member
Apr 23, 2001
1,745
0
71
Originally posted by: Idontcare
Consider that if you do go multi-socket for AMD then your top clockspeed is 2.7GHz and the performance numbers are as I referenced them (i.e. no scaling).

When I made mention to scaling the numbers to 3.2GHz that was because I assumed the choice would be 1S i7 965 vs. 1S PhII X4 940.

But if 2S AMD is on the table (4S is not feasible with your 10k budget) then the top clockspeed is 2.7GHz. Also consider the i7 975's due out in April are to be clocked at 3.33GHz stock.

So what will be more performance for your customer 2x2.7GHz AMD chips with the reduced DDR2 bandwidth or the 1x3.2GHz (8 threads with HT) with the much higher DDR3 bandwidth?

If I had to make an educated guess I'd wager the i7 965 will still be the better setup, and certainly will be lower power consumption both at idle and load compared to a 2x2.7GHz AMD rig.

2S i7 would be *ideal* for 16 separate threads, but that I do not have that kind of time to wait. What I am thinking right now, is going the i7 route on an Asus P6T6 maxed out at 12GB of RAM with some fast disks as well...
 

Denithor

Diamond Member
Apr 11, 2004
6,300
23
81
If you're working on a $10k budget - build two or three boxes @ $1.5k each & link together with network & kvm setup. They can monitor/control all boxes from one station & distribute the workload across multiple systems (that way each has dedicated RAM & HDD I/O etc).

Speaking of disk I/O - Idontcare, do you have any idea if a speedy SSD and/or a RAMdisk would speed up this kind of work?

Efficiency of various quad-core CPUs.

Read that article. You'll find that not only will i7 generally finish all work fastest it also does it in the most energy-efficient manner currently available. Take special note of the "total joules consumed" section of each benchmark - very telling data there.
 

JPS

Golden Member
Apr 23, 2001
1,745
0
71
Originally posted by: Denithor
If you're working on a $10k budget - build two or three boxes @ $1.5k each & link together with network & kvm setup. They can monitor/control all boxes from one station & distribute the workload across multiple systems (that way each has dedicated RAM & HDD I/O etc).

Speaking of disk I/O - Idontcare, do you have any idea if a speedy SSD and/or a RAMdisk would speed up this kind of work?

Efficiency of various quad-core CPUs.

Read that article. You'll find that not only will i7 generally finish all work fastest it also does it in the most energy-efficient manner currently available. Take special note of the "total joules consumed" section of each benchmark - very telling data there.

Thanks for the article link. Your multi-system setup is a good idea. I will likely set them up on a single i7 for the moment - if that does not suffice, we might clone that rig or move to a 2S i7 motherboard once they arrive on scene...
 

heyheybooboo

Diamond Member
Jun 29, 2007
6,278
0
0
This looks like a job for Virtualization Man - that's above my paygrade (but I am willing to be paid to learn)
 

JPS

Golden Member
Apr 23, 2001
1,745
0
71
If I had to build tomorrow & could not wait at all, which of these would you pick:
OPTION 1
MOTHERBOARD SUPERMICRO MBD-X8STE-O
CPU Intel 3.2GHz i7 Quad-Core (Extreme Edition)
RAM 6 x 2GB Crucial DDR3-1333 ECC
PSU 850W - 1000W (Seasonic or Corsair)
CASE Full eATX/ATX Tower (Lian Li or Coolermaster)
Optical Drive Samsung 22X DVD-R/W
Boot Drive 2 x Intel X25-E SSDs in RAID0 (64GB)
Primary Storage Drive 4 x WD RE3 1TB in RAID10 (2TB)
Swap Drive 2 x WD 300Gb Velociraptor in RAID0 (600GB)
RAID Controller 3ware 9650SE-8LPML + BBU
OS Windows XP Pro (64-bit)

OPTION 2
MOTHERBOARD ASUS KFSN5-D Dual Socket (F)
CPU 2 x 2.5GHz AMD Operton (Shanghi 8380)
RAM 4 x 4GB Crucial DDR2-5300 ECC/Reg
PSU 850W - 1000W (Seasonic or Corsair)
CASE Full eATX/ATX Tower (Lian Li or Coolermaster)
Optical Drive Samsung 22X DVD-R/W
Boot Drive 2 x Intel X25-E SSDs in RAID0 (64GB)
Primary Storage Drive 4 x WD RE3 1TB in RAID10 (2TB)
Swap Drive 2 x WD 300Gb Velociraptor in RAID0 (600GB)
RAID Controller 3ware 9650SE-8LPML + BBU
OS Windows XP Pro (64-bit)

OPTION 3
Wait for Xeon Nehalems and suitable motherboards...
 

Soulkeeper

Diamond Member
Nov 23, 2001
6,713
142
106
can you list the name of any of the applications they'll be running by any chance ?

do they do the programming themselves ?
if it's at all possible for them to be doing the programming or using future programs with opencl/cuda/stream support, then you might want to consider adding 2 beefy video cards or more to the setup
 

JPS

Golden Member
Apr 23, 2001
1,745
0
71
Without getting into too much detail, FlamMap , FSPro, FireBGC are inline with the applications that will be run on this specific machine.

What I am looking at in terms of raw data storage is something along these lines:

- The outfiles and outmaps will need approximately 4GB per simulation run
- There will be upwards of 16 factors (variable combinations) per run so 4 x 16 = 64 GB
- Each run will also need 10 replicates so 10 x 64GB = 640 GB storage space as a baseline.

Given that, I am now thinking of running 6 x 750GB WD RE3 HDDs in a RAID 10 setup for the DATA. Now, I am trying to decide between either 2 x Intel X25-E 32Gb SSDs in a RAID0 format or 2 x Fujitsu 15000rpm 147GB SAS HDDs in a RAID0 format for the OS and Applications drive. Given the budget, I might also keep the 2 x 300GB WD Velociraptors in RAID0 as a SWAP drive.

I am going to suggest a separate dedicated fileserver running the ZFS filesystem (Solaris) for their longterm data storage needs. Something on the order of at least 10TB.
 

Denithor

Diamond Member
Apr 11, 2004
6,300
23
81
SSD for boot drive for sure.

Read that article - especially the "real world testing" segment.

2x X25E 64GB drives in raid0 would be better for boot drive with plenty of space for the databases/apps/work-in-progress/etc.

What raid controller do you plan to use?


EDIT: Oh, yeah - option 1 for the build is stronger.
 

Idontcare

Elite Member
Oct 10, 1999
21,118
58
91
Originally posted by: Denithor
Speaking of disk I/O - Idontcare, do you have any idea if a speedy SSD and/or a RAMdisk would speed up this kind of work?

For this type of application where every last MB of ram is going to be consumed by the multiple simultaneous applications there is nothing really to be gained by going with a ramdisk.

Trying to fit 8 applications and their data into a 12GB footprint (figure 11GB after XP is loaded) is going to be "tight".

If the OP can find compatible 4GB dimms, even with the price premium, so the system can run 24GB of ram then his clients will be far more likely to not be ram limited.

What will happen with ram limitations is his CPU utilization will never reach 100%. It'll trudge along at 70-80% (or worse) as the applications fight each other for the rights to load all 4GB of their data into system memory.

I know because I am in a very similar situation. I have to manually prune my database sizes so my four concurrent applications are able to fit all their data into the available ram, otherwise my CPU utilization can get as low as 25% and everything takes 4x longer to complete.

Having superfast disk subsystem helps, it never hurts that for sure.

I personally favor Areca. Get the Areca 1680 with the 1.2GHz IOP348 and upgrade that sucker to 4GB of onboard cache. Then plug in a bank of 4-6 Intel 25-M's for some nice Raid-0 bandwidth and make this be the application data system. (no real need for the E's if you are going to create an array of them on raid-0 with 4GB of cache to buffer those random read/writes).

Areca cards allow you to "gang" them up. So if you get two areca's then you can gang them (split the disks across both cards) then you can get your bandwidth above 2GB/s and your latency still around 0.1ms.

In my case I use the ramdisk to buffer my small random writes as they can become the rate-limiting factor in my calcs. For what the OP is doing that won't really be an issue if he has (the right) SSD's involved. Not clear to me that he really needs the E's, I think M's in an appropriate array would give him the performance he needs and the added capacity boost.

The manual distributed computing model suggested above is probably not viewed as viable in the OP's client's environment. I run such a manual distributed cluster myself (5 Q6600's) but I consider my time and effort to manage such a cluster as being acceptable. Few commercial clients think that way. They'd much rather move to linux and setup a beowulf type cluster where the distribution of the workload is more automated and less manual.
 

JPS

Golden Member
Apr 23, 2001
1,745
0
71
Thanks for all of the comments and suggestions. I have incorporated many of the ideas and sent multiple configs off for review. These are the three main config options I think they will be considering:


OPTION 1
Code:
MOTHERBOARD	SUPERMICRO MBD-X8STE-O
CPU		Intel 3.2GHz i7 Quad-Core (Extreme Edition)
RAM		6 x 4GB Crucial DDR3-1333 ECC (24GB)
PSU		850W - 1000W (Seasonic or Corsair)
CASE		Full eATX/ATX Tower (Lian Li or Coolermaster)
Optical Drive	Samsung 22X DVD-R/W
Boot Drive	4 x Intel X25-M SSDs in RAID0 (320GB)
Storage Drive 	4 x WD RE3 HDDs 1TB in RAID10 (2TB)
Swap Drive	1  x WD Velicoraptor HDD (300GB)
RAID Controller	ARC-1680-2G + BBU



OPTION 2
Code:
MOTHERBOARD	Supermicro MBD-X7DWE-O Motherboard
CPU		2 x Intel E5450 3.0GHz Xeon CPUs
RAM		4 x 4GB Crucial PC2-6400 ECC FB-DIMMs (16GB)
PSU		850W - 1000W (Seasonic or Corsair)
CASE		Full eATX/ATX Tower (Lian Li or Coolermaster)
Optical Drive	Samsung 22X DVD-R/W
Boot Drive	4 x Intel X25-M SSDs in RAID0 (320GB)
Storage Drive 	4 x WD RE3 HDDs 1TB in RAID10 (2TB)
Swap Drive	1  x WD Velicoraptor HDD (300GB)
RAID Controller	ARC-1680-2G + BBU
OS		Vista Business (64-bit)


OPTION 3
Code:
MOTHERBOARD	Supermicro MBD-X7DWE-O Motherboard
CPU		2 x Intel E5450 3.0GHz Xeon CPUs
RAM		4 x 8GB Crucial PC2-5300 ECC FB-DIMMs (32GB)
PSU		850W - 1000W (Seasonic or Corsair)
CASE		Full eATX/ATX Tower (Lian Li or Coolermaster)
Optical Drive	Samsung 22X DVD-R/W
Boot Drive	4 x Intel X25-M SSDs in RAID0 (320GB)
Storage Drive 	4 x WD RE3 HDDs 1TB in RAID10 (2TB)
Swap Drive	1  x WD Velicoraptor HDD (300GB)
RAID Controller	ARC-1680-2G + BBU
OS		Vista Business (64-bit)


Waiting for i7-based Xeons and the appropriate motherboards to be available to system builders such as myself, though darn attractive, just does not seem like something they are willing to do at the moment. So, if you were building today, which of these configs would you prefer - personally I like option 1. Even with less threads available for parallel runs, I think the sheer throughput of the available threads on the i7 EE would hammer a dual E5450 setup. Plus, 24GB of higher-throughput bandwidth would not hurt either.
 

Idontcare

Elite Member
Oct 10, 1999
21,118
58
91
Option 1 is FTW without a doubt.

The tri-channel ram bandwidth will be critical for those apps manipulating 4GB of data whilst fighting for bandwidth (contention) against the other simultaneously operating apps. This is the first and primary concern for the rig.

The disk subsystem looks "robust" to say the least.

One question regarding the boot drive and the SSD's. Will your client really benefit from having fast boot drive with low-latency?

Your boot times (how often will they really reboot the rig? maybe once or twice a month when critical XP updates require it?) will be dominated by the raid card initialization sequence (30-40s typical).

Will the applications use the boot drive for caching datafiles or logfiles or anything like that? If yes, is that what the Swap Drive is for?

If SSD's are to be part of the package then IMO they really ought to be deployed in a manner that is likely to boost the performance of the rig during actual computations.

Is the 640GB number you worked out for use during actual simulation runs? If so then your goal would be to focus on improving the performance of this specific portion of the disk subsystem.

640GB means you need bandwidth, presumably the writes and reads of the applications are not in 4KB increments, meaning you do not need to be worried about latency as the performance bottleneck.

Rather you want to get that 640GB storage sub-system to have as high of bandwidth as possible (a single 1680 will max out around 1.2GB/s no matter how many drives you attach).

I think your option 1 is going to deliver solid performance. Don't take my questions above as being nitpicks. I'm more just curious to see the SSD's used in a way that might possibly provide better performance, but you know your client and their needs far better than you could ever communicate thru a forum so you got to go with your gut on that one. The client won't be disappointed no matter where you put them, its all gravy at this point with the rig you've spec'ed here.
 

Phew

Senior member
May 19, 2004
477
0
0
How parallelized is their software? I know at my work, the most cost-effective way to expand our CFD cluster is to build the cheapest quad-core boxes we can slap together (was Q6600s last time we upgraded, would probably be PIIX4's now). $10k would buy us 100+ CPU cores, which we'd probably break into multiple 32-core sub-clusters.

For raw computing power per $, I'm not sure that can be beat.
 

sjwaste

Diamond Member
Aug 2, 2000
8,760
12
81
The top Mac Pro configurations sometimes sell for less than what you could build it for, it's almost the only bargain in Apple's line, albeit not on an absolute scale!

For 10k, you could get two of them.

EDIT: Maybe not, you can deck one out for much more than 10k. However, the Mac Pro is still a very viable product at its price points.
 

RaynorWolfcastle

Diamond Member
Feb 8, 2001
8,968
16
81
I agree with the two previous posters and look at the Mac Pro dual i7.

Keep in mind that Lenovo just announced their Nehalem EP-based D20 & S20 workstation and Dell just announced their Nehalem EP-based workstations today. Intel is rumored to be formally introducing the Nehalem EP this coming Monday. Either way, if you can afford to wait a couple of weeks this is what I would do.

For the sake of argument, let's assume that the Lenovo and Dell workstations cost roughly the same as the Mac Pro in a basic 2S configuration. I would say your best route would be the following.

- Mac Pro Dell or Lenovo equivalent (incl. two 2.66 GHz Nehalem EPs), ~$4,700
- 24 GB 1333 MHz RAM (6x4GB ECC R-DIMM), ~$1,200
- Boot drive: included in workstation
- Storage drives: 4x WD RE3, $650
- Swap drives: 4x 32 GB Intel X25-E (all computation-related transactions happen here), ~$1,700
- Areca ARC-1680-2G, ~$900
- OS, optical drives, etc. will be included with the workstation
==========
Total: ~$9.2k

These are quick and dirty prices, I'm sure you can find cheaper if you look around.

This buys you 16 threads. Assuming that each run needs 4 GB of storage and you've got 16 concurrent runs (1 run per hardware thread), you really need only about 64 GB of swap storage. Once you finish a run, you should just copy the output to the much slower storage drives. This all works fine, assuming here that copying 4 GB to the storage array will be much faster than completing a simulation run. So 4x X25-E drives will buy you plenty of space (128 GB) with blindingly fast IO performance.

Also, the boot drive is probably inconsequential since you're not loading different applications over and over. The application and working data should be loaded in RAM when you start the simulations anyway. If anything, you'll probably want to see if you can stuff more RAM in the box; for example, the Lenovo D20 supports up to 96 GB of RAM.
 

rudder

Lifer
Nov 9, 2000
19,441
86
91
Originally posted by: clk500
Did you consider the mac pro dual i7 ?

No because your only options are crappy video cards. Get a PC with the fastest Nvidia GPU and take advantage of CUDA.

GPU's rip through scientific modeling calculations a lot more efficiently than CPU's.
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |