New Zen microarchitecture details

bjt2 · Sep 14, 2016

Abwx said:
It wasnt, it s just that some viral marketers took a non significant subscore and presented it as a CPU bench, the actual perfs displayed in Aots point to the same thing as AMD s Blender demo but with what looks to be a restricted RAM bandwith plateform.

http://wccftech.com/amd-zen-es-benchmark/

With half the core count but 15% higher frequency than the ES Zen would match the i7 in this graph.

Ok... I vaguely remember all the discussions on the CPU score, that was almost double on some INTEL CPUs, plus someone here quite sceptical on Zen games performance and thought that was a bad test...

I repeat that I do not know much of games and games benchmarks... But if you say that the GPU score is all that counts, then Zen is a good CPU in multiple situations...
Let's dig in Aots database to see if there is some serie 5 or 6 i7...

lolfail9001 · Sep 14, 2016

Abwx said:
It wasnt, it s just that some viral marketers took a non significant subscore and presented it as a CPU bench, the actual perfs displayed in Aots point to the same thing as AMD s Blender demo but with what looks to be a restricted RAM bandwith plateform.

http://wccftech.com/amd-zen-es-benchmark/

With half the core count but 15% higher frequency than the ES Zen would match the i7 in this graph.

Sorry, but your claim is plain wrong. Here's what happens on 5960X that is clocked just 5% higher than Zen was supposed to in this AotS bench.
http://www.guru3d.com/news-story/amd-zen-engineering-sample-aos-further-analysis,5.html

Or you can debunk that as well, then go ahead

Abwx · Sep 14, 2016

bjt2 said:
Ok... I vaguely remember all the discussions on the CPU score, that was almost double on some INTEL CPUs, .

And it was almost double as some other Intel CPUs as well, because it wasnt an indication of performance but a number computed by the game that seemed to take account of the RAM bandwith, so i7s with DDR3 had much lower "score" than DDR4 equiped i7, a fact that was obfuscated by the usual suspects..

bjt2 said:
Let's dig in Aots database to see if there is some serie 5 or 6 i7...

Do the comparison i m talking about and you ll discover that the same people that based their "argument in this number have to aknowledge that with DDR4 i7 IPC increase by roughly 60%, wich is of course just ridiculous...

It remind me of Zen fequency/voltage curve according to these afficionados, not only they say that power is increasing as a cube (with fequency) when getting from 1.5GHz to 3GHz but once at this frequency it can decrease only linearly (with frequency) when getting back to 1.5GHz...

lolfail9001 · Sep 14, 2016

Abwx said:
And it was almost double as some other Intel CPUs as well, because it wasnt an indication of performance but a number computed by the game that seemed to take account of the RAM bandwith, so i7s with DDR3 had much lower "score" than DDR4 equiped i7, a fact that was obfuscated by the usual suspects..

~~Because, best 6700k only posts 17% higher score than best 4790k on low 1080p (that is purely CPU bottlenecked).~~

Do the comparison i m talking about and you ll discover that the same people that based their "argument in this number have to aknowledge that with DDR4 i7 IPC increase by roughly 60%, wich is of course just ridiculous...

~~Or how about you explain why 4790k posts ~50% higher result than 2600k despite using DDR3 as well? How bad.~~
I'll mostly concede, that AotS is a little bit bandwidth sensitive. With a small cave-at that DDR4 has little to do with it.

It remind me of Zen fequency/voltage curve according to these afficionados, not only they say that power is increasing as a cube (with fequency) when getting from 1.5GHz to 3GHz but once at this frequency it can decrease only linearly (with frequency) when getting back to 1.5GHz...

Right, because reading is hard, but spreading lies and false numbers is easy.

krumme · Sep 14, 2016

Pls. We have been over that number for pages.

KTE · Sep 14, 2016

The Stilt said:
Holding back the benchmarks really is no indicator to either direction. They might hold them back because the performance isn't as impressive as many people have expected, or because it is impressive and releasing the information of the product now would cause people to stop purchasing the current inventory. Zeppelin is at least 4 and half months away and AMD has tons of 32nm and 28nm inventory. In fact, both 32nm and 28nm parts are still produced (at least some SKUs). You don't want to turn your multi-million dollar inventory into a pumpkin over-night, by releasing the benchmarks of a better product.

We heard these arguments pre-Barcelona launch on XS... pre-Shanghai, pre-Istanbul, pre-Magny Cours and pre-BD.

We, the older folk know how it all unfolded later. Once bitten, twice shy. We've been bit 4-5 times consecutively now.

I hope it isn't poor, but it's not looking good at all. I am pretty sure any delay right now is to get the clocks higher.

Sorry to be doom and gloom but it's unfolding just like Barcelona.

Before anyone criticizes Barc, that was also a huge boost ahead in ST and MT. Only let down by clocks, power and too little, too late (Penryn launched soon after).

Sent from HTC 10
(Opinions are own)

DrMrLordX · Sep 14, 2016

Magny-Cours was pretty sweet, not sure how anyone was bitten by that.

krumme · Sep 14, 2016

People got crazy expectations imo.
Its presented by some solely as if amd disapoints but omit the unique situation of k7 as an outside buy. P4. Ibm soi. Its a coincidence it happened at the same time. And it will not happen again.
We know aprox what 14 nm gf is. 128 bit fpu. Small die. Dense low freq. Low cost. Made for server apu consoles.
There is about 3 people expecting some bwe like perf.
Anyone trying to frame this beforehand as a failire whatever because its not 4ghz with a fat wide fpu capable of running cb whatever crap is just as far out as the one expecting the intervention from above and the miracle to happen.
What a damn fanclub of old men this is.
And its even without music and free drinks!

The Stilt · Sep 14, 2016

LTC8K6 said:
Is Zen really going to interfere with Vishera sales at this point? It's not the same socket, so we are talking about people who are waiting to upgrade their whole system. It seems unlikely that a good Zen bench would affect current CPU sales.

I'd say pretty much every single 15h based CPU / APU owner, who do use their system for anything else than just Facebook and Youtube are dying to upgrade. After all a FX-8370 at default configuration (4.3GHz MSCB) has single threaded performance matching < 2.1 - 3.1GHz Skylake, depending on the workload. And most 15h owners don't have the high clocked variant either.

MajinCry · Sep 14, 2016

sm625 said:
No. It kind of reminds me of this benchmark:

Oh look at that, Bulldozer was competitive with sandy bridge! Until you load up a game, then you get something like this:

Which was obviously a total disaster. A simple 10 second javascript benchmark is all it will take to tell us whether this disaster is going to repeat.

Apples to oranges. The 8150 had 4 modules, pitted against the i7 with 4 cores 8 threads. Zen against Broadwell was 8 cores 16 threads vs 8 cores 16 threads.

krumme · Sep 14, 2016

MajinCry said:
Apples to oranges. The 8150 had 4 modules, pitted against the i7 with 4 cores 8 threads. Zen against Broadwell was 8 cores 16 threads vs 8 cores 16 threads.

Again. Winzip saw and used 8 cores. So if your primary usage for your cpu was zipping all day and dad paid the electricity bill it was all a fine cpu with sponsorship from mubadala.

The blender test probably uses all the lean fpu in zen to the most and doesnt benefit from the big dudes i bwe. It shows the zen fpu in the absolute best possible light.

But here comes the difference vs bd situation:

We dont need fat fpu and most loads dont exceed what the blender test does. A bd in comparison tanks like crazy in the same test. Its plenty fast.
The test shows its a balanced arch imo.
For all we know its not bloated like bd in size. Mathias estimate the 8c to 180mm2 and a core with l2 to 5 to 6 mm2. Meaning very low production cost.
And tdp looks very low.

But therefore it also spells out loud; this is no skylake killer or even compettitor for performance at the same core count. You cant have it all. (Even though bd proved you can have nothing at all lol )

MajinCry · Sep 14, 2016

Aye, winzip did scale to eight threads, but to contrast that ideal parallelized scenario to a video game benchmark, keeping in mind that it was four FPUs split across eight cores against a four core 8 thread cpu...Eeeeeh.

Whereas with Zen and Broadwell, same core count, same thread count, same fpu count, same integer unit count, etc. To take that and compare it to a different benchmark such as a game, would be fine. But to infer that it's going to be cock based off of a completely different architecture that was designed for multithreading (and not single thread perf), that performs poorly in single threaded scenarios...Bit crazy.

On the subject o' performance compared to Skylake, Skylake's only around 20% faster than Sandybridge, on average. Not exactly a big world of difference between intel's core architectures, as far as performance is concerned, sans outliers like Ivybridge vs Haswell in emulators.

cdimauro · Sep 14, 2016

MajinCry said:
Apples to oranges. The 8150 had 4 modules, pitted against the i7 with 4 cores 8 threads. Zen against Broadwell was 8 cores 16 threads vs 8 cores 16 threads.

Exactly like 8150 vs i7, since 4 modules = 4 core with 8 hardware threads.

MajinCry said:
Aye, winzip did scale to eight threads, but to contrast that ideal parallelized scenario to a video game benchmark, keeping in mind that it was four FPUs split across eight cores against a four core 8 thread cpu...Eeeeeh.

No, each FPU was shared by 2 hardware threads of the same module (core). Exactly like Intel's Hyperthreading.

cdimauro · Sep 14, 2016

krumme said:
The blender test probably uses all the lean fpu in zen to the most and doesnt benefit from the big dudes i bwe. It shows the zen fpu in the absolute best possible light.

If the Blender test used only the SSE or the 128-bit AVX SIMD, and not the 256-bit AVX, then it speaks a lot, and it's not a good signal for Zen: 4 128-bit FPU units shown only 2% more performance compared to just 2 128-bit FPU units of Broadwell...

krumme · Sep 14, 2016

cdimauro said:
If the Blender test used only the SSE or the 128-bit AVX SIMD, and not the 256-bit AVX, then it speaks a lot, and it's not a good signal for Zen: 4 128-bit FPU units shown only 2% more performance compared to just 2 128-bit FPU units of Broadwell...

Ok. Cant the 256 bits be used for 128 bit?

cdimauro · Sep 14, 2016

krumme said:
Ok. Cant the 256 bits be used for 128 bit?

AFAIK no, on Intel's implementations. So, if you use an FPU port, it's better to use it with 256-bit sized registers, otherwise you're just loosing (half) processing power.

krumme · Sep 14, 2016

cdimauro said:
AFAIK no, on Intel's implementations. So, if you use an FPU port, it's better to use it with 256-bit sized registers, otherwise you're just loosing (half) processing power.

Ok i just thought there was a performance penality doing it.
That the xmm registers overlay the ymm registers.
https://software.intel.com/en-us/articles/introduction-to-intel-advanced-vector-extensions
But hey its way over my head this tuff anyway.

deasd · Sep 14, 2016

cdimauro said:
If the Blender test used only the SSE or the 128-bit AVX SIMD, and not the 256-bit AVX, then it speaks a lot, and it's not a good signal for Zen: 4 128-bit FPU units shown only 2% more performance compared to just 2 128-bit FPU units of Broadwell...

You're a bit over-estimate the advantage of long instruction set, the longer instruction window you have, the higher latency you get, performance gain is not linear or even relative when move to wider vector.
I don't know where to read so-call FPU unit count but native wider ALUs could do narrow algorithm if application is well tuned. OTOH it's not even make sense to justify performance with ALUs count. CPU is not GPU.

Abwx · Sep 14, 2016

cdimauro said:
If the Blender test used only the SSE or the 128-bit AVX SIMD, and not the 256-bit AVX, then it speaks a lot, and it's not a good signal for Zen: 4 128-bit FPU units shown only 2% more performance compared to just 2 128-bit FPU units of Broadwell...

They likely used Blender as it is, why would they need to recompile it given that once Zen is released it will be tested with Blender as well, and AMD are not crazy to display a bench whose results are not reproducible, the argument that they could have voluntarly rigged the software is a poor one and has its roots in some public that absolutely dont want AMD to outperform Intel in any way, hence the tendency to discard no only the results but even Blender s relevancy...

As said i m 100% sure that if they had used PovRay they would have displayed even better perfs in respect of Broadwell but they have their reasons to not do so, first is that the same people who downplay Blender would have been even more critical of PovRay since AMD currently perform better in this renderer, and second is that AMD has no advantage in showing better results that what they did, they are not here to help their competitor position himself..

frozentundra123456 · Sep 14, 2016

MajinCry said:
Apples to oranges. The 8150 had 4 modules, pitted against the i7 with 4 cores 8 threads. Zen against Broadwell was 8 cores 16 threads vs 8 cores 16 threads.

That wasn't the point. The point was that it was a cherry picked benchmark which made bulldozer look better than Sandy Bridge, which overall it certainly was not. Quite frankly, we have very few benchmarks for Zen, and the ones we do have are contradictory. So I have no idea of what the performance will be. The blender benchmark makes me more optimistic than I originally was. At least it looks like it can clock to 3ghz, but we dont know how much higher it can go, or even if that chip was running in the anticipated 95w TDP.

Originally, I was expecting SB/IB level ipc and sub 3ghz base clock. Maybe they can beat that in a 95 watt envelope, but we shall see.

cdimauro · Sep 15, 2016

krumme said:
Ok i just thought there was a performance penality doing it.
That the xmm registers overlay the ymm registers.
https://software.intel.com/en-us/articles/introduction-to-intel-advanced-vector-extensions
But hey its way over my head this tuff anyway.

That's a very different thing. Previously maybe I wasn't clear, but I was talking about the possibility to use a 256-bit unit, "aggregating" 2 128-bit calculations, in order to maximize the use of the fatter FPU. That's not possible.

But obviously it's possible to do 128-bit (ONLY!) calculations with a 256-bit unit, and it's also possible to mix SSE and AVX (even 128-bit) calculations, but only in this case there's a penalty.

deasd said:
You're a bit over-estimate the advantage of long instruction set, the longer instruction window you have, the higher latency you get, performance gain is not linear or even relative when move to wider vector.

Correct, but I have made no statement about it.

I've only made a comparison between the 4 128-bit FPUs approach vs 2 256-bit FPUs on the only test available.

I don't know where to read so-call FPU unit count but native wider ALUs could do narrow algorithm if application is well tuned. OTOH it's not even make sense to justify performance with ALUs count. CPU is not GPU.

The FPU unit counts come from the respective microarchitecture details.

Abwx said:
They likely used Blender as it is, why would they need to recompile it given that once Zen is released it will be tested with Blender as well, and AMD are not crazy to display a bench whose results are not reproducible, the argument that they could have voluntarly rigged the software is a poor one and has its roots in some public that absolutely dont want AMD to outperform Intel in any way, hence the tendency to discard no only the results but even Blender s relevancy...

Well, if they have used Blender is because it's the test where Zen performs well.

And I have NOT said that they have recompiled it. This is unknown.

I have only said that we don't know if this Blender version was using SSE, AVX 128-bit, or AVX 256 bit.

As said i m 100% sure that if they had used PovRay they would have displayed even better perfs in respect of Broadwell but they have their reasons to not do so, first is that the same people who downplay Blender would have been even more critical of PovRay since AMD currently perform better in this renderer, and second is that AMD has no advantage in showing better results that what they did, they are not here to help their competitor position himself..

"Currently" means with the current AMD architectures, which are different.

Zen is another one. So maybe that it performs better on Blender than on PovRay.

Anyway, Blender is an application that scales very well with the number of cores, and greatly makes use of SMT capabilities as well. It's also FPU-intensive. And last but not least, it's quite "linear" (read: the code is not full of branches and so on, like an emulator, compiler, etc.).

So, it's a perfect benchmark for testing Zen's capabilities, with it's 4 ALUs + 2AGUs/LS + 4 128 bit FPU.

But does only 2% better performance of a Broadwell, which has much less resources from this point of view.

PS & BTW: I'm doing QA for Intel's Application Debugger team (especially Xeon Phis).

bjt2 · Sep 15, 2016

krumme said:
Again. Winzip saw and used 8 cores. So if your primary usage for your cpu was zipping all day and dad paid the electricity bill it was all a fine cpu with sponsorship from mubadala.

The blender test probably uses all the lean fpu in zen to the most and doesnt benefit from the big dudes i bwe. It shows the zen fpu in the absolute best possible light.

But here comes the difference vs bd situation:

We dont need fat fpu and most loads dont exceed what the blender test does. A bd in comparison tanks like crazy in the same test. Its plenty fast.
The test shows its a balanced arch imo.
For all we know its not bloated like bd in size. Mathias estimate the 8c to 180mm2 and a core with l2 to 5 to 6 mm2. Meaning very low production cost.
And tdp looks very low.

But therefore it also spells out loud; this is no skylake killer or even compettitor for performance at the same core count. You cant have it all. (Even though bd proved you can have nothing at all lol )

Zen can't compete if don't clock high... A 4c at 4ghz base it's surely feasible. Actually I think even an 8c at 4ghz, but probabily here you will think i am crazy...

krumme · Sep 15, 2016

I judge this cpu arch from its ability to earn money and have an impact on the market.

I dont really find it interesting how fast it is on the desktop as thats not important in that context, so i cant muster the same enthusiasm in talking it either up or down in performance - and not only because i lack the technical insight. I surely find it crazy if they in anyway did design arch and process to go for desktop.

And from a business perspective i simply dont understand why it cant compete if it dont clock high enough. In my world its actually exactly opposite !
You need an arch that should scale from 5w devices to servers, because thats where the profit is.
At the same time you need to keep process cost down, and i firmly beliewe that if you want a process that is optimal over a wide freq it will have add a lot to the cost and/or you end up having a product that is less efficient at either scale. AMD and GF just doesnt have that kind of ressources.
It just doesnt add up from an economic perspective to go for a high freq high perf. design with a process from Samsung that is everything but high frew, and a market that demands efficiency and low cost not highest perf.

But hey AMD history seems to show they are not in the business to earn money, so its not always easy to know what they are up to, but Lisa seems to getting a hand of it, so i guess there will be some sort of business sense in what they are trying to accomplish with zen.

ShintaiDK · Sep 15, 2016

krumme said:
And from a business perspective i simply dont understand why it cant compete if it dont clock high enough. In my world its actually exactly opposite !

Less IPC and less clock than the competitor. That means you are going to sell your chips for basement prices. While you struggle in market share and revenue.

Sheep221 · Sep 15, 2016

krumme said:
And from a business perspective i simply dont understand why it cant compete if it dont clock high enough. In my world its actually exactly opposite !
You need an arch that should scale from 5w devices to servers, because thats where the profit is.

Well Intel definitely does develop different architectures for their low power, high-end and server processors. I mean something like Haswell E or Broadwell E or HEDT Xeon is completely different architecturally from standard desktop Haswell or Broadwell and Xeon respectively, let alone their Atom line of low power CPUs.

Stop talking about Intel, in a AMD thread
Markfw900

New Zen microarchitecture details

Senior member

Golden Member

Lifer

Golden Member

Diamond Member

Senior member

Lifer

Diamond Member

Golden Member

Platinum Member

Diamond Member

Platinum Member

Member

Member

Diamond Member

Member

Diamond Member

Senior member

Lifer

Lifer

Member

Senior member

Diamond Member

Lifer

Golden Member