Bulldozer for Servers: Testing AMD's "Interlagos" Opteron 6200 Series

Elixer · Nov 16, 2011

In case nobody else read it yet, http://www.anandtech.com/show/5058/amds-opteron-interlagos-6200

All I have to say is ... ::sigh::

Is it any wonder people are getting canned ?

While it isn't all bad (ok, I lied, it mostly is all bad), and as the article says, "more testing needs to be done", but this is highly disappointing, more so than the desktop.

Intel is going to have a field day with this, I can see the slogans now "Our CPUs don't need to have all your software recompiled to offer the best performance!"

It still is a mean 7zip decompressor, so if all you ever do is archive, then you have a winner! Just decompress it using a Intel CPU.

Still way too much $$$ for what it can bring to the table.

Fixed typo

podspi · Nov 16, 2011

Elixer said:
While it isn't all bad (ok, I lied, it mostly is all bad), and as the article says, "more testing needs to be done", but this is highly disappointing, more so than the desktop.

This begs the question, what did they design this chip for

Or, being more reasonable/charitable, what went so wrong that BD as it was released was the FIXED version?!

Remember, it was "supposed" to be released in June. One has to wonder what the previous spins looked like...

Silverforce11 · Nov 16, 2011

BD looks a lot more impressive for servers.

This review is quite close actually, very close to the expensive xeons. It also managed a few wins.

Compared to desktop BD which got thrashed hard even by a 2500K (which is cheaper!).

antisocialmunky · Nov 16, 2011

25% more power for 33% more cores mind you.

Dresdenboy · Nov 16, 2011

@Elixer:
Decompression is faster on Interlagos actually. It's compression, which is slower.

antisocialmunky said:
25% more power for 33% more cores mind you.

It even looks like Interlagos clearly needs recompiled binaries to avoid losing even against MC.

But for servers it actually doesn't matter how they reach higher throughput. We already know that the cores' peak throughput is lower than for 10h and that it heavily depends on getting the right code:

Further I assume that the huge difference between performance projections and current performance of Interlagos (SPEC CPU, which already uses recompiled code) is mostly caused by GF's process maturity. Since 32nm should allow for at least 33% more cores and more cache (even in case of a shrinked MC) at the same clock and power the Interlagos base clocks suggest a missed clock frequency goal.

AtenRa · Nov 16, 2011

From the AT review, it seems that AMD Opteron 6276 ($788) cost less than Intel Xeon E5650 ($996) and it is faster in Cinebench, 3DSXMax 2012 (No CMT), Maxwell, True Crypt AES, TrueCrypt AES-Twofish-Serpent and in 7-zip dicomprasion.

In some of those benchmarks it is very close to the Intel XEON X5670 which cost almost double at $1440.

Also, Server 2008 SP2 is not optimized for the new Bulldozer Opterons, Server 2008 R2 SP1 is.

Idontcare · Nov 16, 2011

Dresdenboy said:
Further I assume that the huge difference between performance projections and current performance of Interlagos (SPEC CPU, which already uses recompiled code) is mostly caused by GF's process maturity. Since 32nm should allow for at least 33% more cores and more cache (even in case of a shrinked MC) at the same clock and power the Interlagos base clocks suggest a missed clock frequency goal.

I agree. Llano indicates this, zambezi indicates this, interlagos indicates this.

But I don't think GlobalFoundries is really to blame. Before GF was spun-off, AMD execs made the decision (along with all the other members of the IBM fab eco-system) that 32nm was going to be gate-first.

This decision was made at the time knowing full-well that gate-first sacrificed parametric performance (clockspeeds and power efficiency) for lower production cost (fewer process steps in the fab, higher xtor density in the design).

Comparing GF's 32nm to Intel's 32nm, the delta between the two is "baked in" and doesn't really give the current fab engineers all that much room to finesse and optimize the stuff they traded-away 4yrs ago when their R&D engineers at Fishkill decided to go with gate-first HKMG integration.

GloFo's 22nm will be their first internally managed node, no influence from the AMD execs before the spin-off. But 22nm will still be hamstrung by the conflicting priorities of the IBM fab ecosystem. It will be gate-last but still problematic from a yield perspective.

IMO 14nm will likely be internalized and developed solely by GF in their Malta fab if they can get the staff they so desperately need. The jobs aren't getting filled though, headhunters keep calling. Its difficult to rebuild a top-notch R&D development team and the infrastructure needed once you've dismantled them. But GlobalFoundries is never going to best TSMC so long as GF's process roadmap is tied to IBM's decision maker IMHO.

Edrick · Nov 16, 2011

Silverforce11 said:
BD looks a lot more impressive for servers. This review is quite close actually, very close to the expensive xeons. It also managed a few wins.

Compared to Westmere Xeons. SB Xeons will be released in Q1 and the performance gap will increase.

ed29a · Nov 16, 2011

I am actually pleasantly surprised by Anandtech's review. I was expecting Bulldozer to be destroyed all across the board, but with some tweaks, it even got a recommendation.

AMD still understands that it should price its CPUs more attractive than the competition, so from the price/performance/watt point of view, the Opteron 6276 is a good cost effective alternative to the Xeon...on the condition that you enable the "high performance" policy and that AMD keeps the price delta the same in the coming months.

The ball is in AMD's court now. If they get a better B3 stepping out, tweak power issues more (should come with maturing process at Global Foundries) they might have a decent alternative for some workloads. From a total disaster to an almost total disaster.

ed29a · Nov 16, 2011

Idontcare said:
This decision was made at the time knowing full-well that gate-first sacrificed parametric performance (clockspeeds and power efficiency) for lower production cost (fewer process steps in the fab, higher xtor density in the design).

I am sure I will get flamed as an apologist, but do they had a choice? The decisions came from high up the food chain (and they were wrong), but they had one goal in mind: stop the bleeding of money every quarter. If AMD had more monetary resources, the blame would be 100% fair, however they are not even close to Intel's resources and they try to compete with the big boy.

Ferzerp · Nov 16, 2011

It did better than I expected in those benchmarks, but the real SB Xeons (not the rebranded desktop stuff that you can buy now) are out Q1 of 2012.

I'm not sure where this pricing is coming from, and I can't divulge the actual pricing I see on systems due to contract reasons, but the price for the X5675 in the review is substantially higher than what I pay for them.

Real world price delta on the test systems with ONLY proc, chassis, memory, psu is around 8.5%. Add in storage, HBAs, etc, and your price delta is going to drop to 4.5-5% (or less) in favor of the AMD.

I don't really see a compelling reason to buy these at the moment, especially with the impending release of the SB Xeons. I'm a little disappointed in the way price is presented in this review, as when we look at purchasing servers, we don't look at the cost of individual components, but instead the entire platform cost. The price disparity when you're actually purchasing systems is very, very low (as I gave above)

Dribble · Nov 16, 2011

It's not so bad for servers - not great, but certainly not as far behind as it is on desktops. Looks like they could be competitive in future revisions of BD architecture assuming AMD can keep up the R&D to achieve that.

zlejedi · Nov 16, 2011

Well it's certainly better than desktop version.

mmaestro · Nov 16, 2011

AtenRa said:
Also, Server 2008 SP2 is not optimized for the new Bulldozer Opterons, Server 2008 R2 SP1 is.

I've seen some conflicting stuff on that. While it seems like R2 SP1 does take advantage of some BD features, from what I've read it looks like it still doesn't schedule correctly, and if that's the case I wonder what else is missing? My expectation is that they still need to do some work to get BD running to its potential on any version of Windows Server. The problem is that by the time that's likely to happen, with the next release of Windows Server late next year, Piledriver ought to be out. There's very little reason to recommend BD over Magny-Cours right now, IMO, because by the time you'll be able to take advantage of it, you'll have a better (we hope) alternative available.

veri745 · Nov 16, 2011

Just think, if IPC was "definitely higher" than Magny-Cours, like JFAMD was so sure it would be a year ago, this would be a pretty damn competitive product.

Hopefully the next iteration brings good things.

exar333 · Nov 16, 2011

33% more cores, 100 more mhz, and its 4% faster than the 6100 Magney-cours. That's just disappointing to me. Its a good performer for the money (no question) but AMD could command a better ASP if the performance was more stellar.

When your losing on peformance and power consumption, the only thing left is to be best bargain.

Gikaseixas · Nov 16, 2011

Not as bad as i thought it would be. They just need to control that crazy consumption and tweak the cores for more IPC.

AtenRa · Nov 16, 2011

veri745 said:
Just think, if IPC was "definitely higher" than Magny-Cours, like JFAMD was so sure it would be a year ago, this would be a pretty damn competitive product.

Hopefully the next iteration brings good things.

IPC is highly related to the application. If the application isn't optimized(written, compiled etc) for the Bulldozer architecture then IPC will not be higher than Thuban.

If the application can take advantage of BDs architecture then you will see a big increase in IPC over the Thuban.

Cerb · Nov 16, 2011

AtenRa said:
IPC is highly related to the application. If the application isn't optimized(written, compiled etc) for the Bulldozer architecture then IPC will not be higher than Thuban.

If the application can take advantage of BDs architecture then you will see a big increase in IPC over the Thuban.

This is true of just about any processor. Intel got a good wake-up call when the software industry was not, on a large scale, willing to do that for the P4. BD must deal with i386, i686, Opteron (x86_64), Conroe (x86_64), etc., binaries, and do it well.

hooflung · Nov 16, 2011

ExarKun333 said:
33% more cores, 100 more mhz, and its 4% faster than the 6100 Magney-cours. That's just disappointing to me. Its a good performer for the money (no question) but AMD could command a better ASP if the performance was more stellar.

When your losing on peformance and power consumption, the only thing left is to be best bargain.

The benchmark was tested on a kernel (windows 2008) that cannot exact the performance from its scheduler (windows 7 kernel issues will be 2k8 issues) and ESX kernel doesn't have any BD bits back ported into it.

So what this test is exactly is- is only a 'drop in replacement and forget about it' benchmark. I'd also like to see a Solaris 11 vs Linux 3.x + XEN 4 between a SPARC T3/T4 cpu and Bulldozer, respectively, running some relevant benchmarks with the latest enterprise ready JVMs.

Also, when Oracle and Apache finally make up we might get those tasty Harmony Vectorization libraries under development for Java going into OpenJDK. Then we can see if we can actually test the avx in enterprise applications.

Idontcare · Nov 16, 2011

Cerb said:
This is true of just about any processor. Intel got a good wake-up call when the software industry was not, on a large scale, willing to do that for the P4. BD must deal with i386, i686, Opteron (x86_64), Conroe (x86_64), etc., binaries, and do it well.

This is never more true than in the case of enterprise software because AMD has a mere 5% marketshare there.

Software programmers are not exactly going to reap huge profits from investing their engineers and money into re-optimizing their software to be more productive on what is literally a niche player.

Elixer · Nov 16, 2011

Edrick said:
Compared to Westmere Xeons. SB Xeons will be released in Q1 and the performance gap will increase.

That is the main problem I see, and since there is still no high-end competition, intel will just charge whatever they want.

For the current CPU rates, paying $200 more upfront for the intel CPU compared to doing tweaks and recompiling software (if that is even possible) is just not worth it. Yes, I know they would have to swap motherboards as well, but I am sure they can get a package deal that will still save them $$$ over the long haul compared to what they would be paying to fix interlagos "correctly".

They should sell the chip for $500, not $800.

It just seems this is a highly unfinished product that is only useful for *very* specific workloads, and that number is much, much lower than the "enthusiast" market on the desktop side.
It is a lackluster product on the desktop, and the same applies for most server loads at this time.

AtenRa · Nov 16, 2011

Idontcare said:
This is never more true than in the case of enterprise software because AMD has a mere 5% marketshare there.

Software programmers are not exactly going to reap huge profits from investing their engineers and money into re-optimizing their software to be more productive on what is literally a niche player.

According to AMD, if the applications are CPU Vendor agnostic and use ISAs like SSEs, AES etc, then no recompile is needed and BD architecture will stretch its legs.

If this is true, programmers can take advantage of ISA's for both CPU vendors writing a single code for both AMD and Intel

We have witness this at the AT review.

http://www.anandtech.com/show/5058/amds-opteron-interlagos-6200/13

It would be really helpful to have power consumption measurements with every different benchmark.

Piotrsama · Nov 16, 2011

They are pretty good at Encryption/Decryption.
So people making servers aimed at that, should consider them.

Cerb · Nov 16, 2011

AtenRa said:
According to AMD, if the applications are CPU Vendor agnostic and use ISAs like SSEs, AES etc, then no recompile is needed and BD architecture will stretch its legs.

If that were true, general performance numbers would be higher.

If this is true, programmers can take advantage of ISA's for both CPU vendors writing a single code for both AMD and Intel

This has been true for the K6 on up to Stars, but not so much for BD, or at least BD's initial public iteration.

It's not just a matter of ISA, but selecting and grouping sets of instructions and register accesses. Since most of the Windows and Linux world use canned binaries, optimized for prior CPUs, it is important (even 386 compatible binaries may very well be optimized for more modern deeply-pipelined CPUs). Intel's CPUs, starting with the Core 2 series, for example, happily run K8-optimized x86_64 binaries, and do so faster than the K8 itself, most of the time (all of the time, for plain statically-compiled code with no direct human optimizations). Even if it has been hand-tuned or profile-guided for the K8, such that the Core 2, Nehalem, or SB, would be at a disadvantage, it will still run well enough.

Prior AMD CPUs did this just fine, as well. A K6-2/3 could execute 386 and 486 code very fast (I honestly don't recall if there were any substantial differences from K6, aside from speeds). It could execute Pentium int code fast (often faster than a Pentium II). FP was a bit of an issue, but it was good enough for being a cheap CPU. The K7 could execute non-SSE-using PPro/PII/PIII code fast, Pentium code fast, 486 code fast, 386 code fast, and K6 code fast (I'm pretty sure common 286 and older features were being deprecated, by that point, like BCD, looping, etc.). The K8 could do the same, and do very well with up to SSE2 P4 code (or was it SSE3?).

In all cases, tweaking just for a given CPU, be it Intel's or AMD's, can give a major performance boost. But, as long as the instructions were supported, and the executable didn't check for GenuineIntel before it ran the good code, any executable already existing, made for a prior generation, would typically run faster than on the last generation CPU.

This has historically been a strength of the x86 platform, if not necessarily a planned one. Now, in past times, you could expect speed boosts, so slightly lower IPC running code made for older CPUs was OK. The Pentium II, FI, while not initially scaling as fast as Intel hoped, did scale up fast enough that the minor IPC hit running code tuned for Pentiums and 486s was generally a non-issue.

Today, speeds are only inching up, so increasing IPC on existing code is a must. At the least, AMD needed to be significantly faster per clock per thread than Stars. Making a P4-like, MIPS-like, or Alpha-like CPU from pretty much any time past ~2003 should have been a known bad idea. Even after seeing some latencies, I had hoped that AMD had been smart enough to not do that, possibly sacrificing some performance for the sake of speed scaling, since power/speed targets have consistently been a problem, but they apparently did not (IE, add in major IPC improvements, but sacrifice a little to make reaching certain speeds within a certain power envelope easier, because that happens almost every time).

Bulldozer for Servers: Testing AMD's "Interlagos" Opteron 6200 Series

Lifer

Golden Member

Lifer

Member

Golden Member

Lifer

Elite Member

Golden Member

Senior member

Senior member

Diamond Member

Platinum Member

Senior member

Member

Golden Member

Diamond Member

Platinum Member

Lifer

Elite Member

Golden Member

Elite Member

Lifer

Lifer

Senior member

Elite Member