New Zen microarchitecture details

krumme · Jan 27, 2017

Sven_eng said:
Except that 99% of Intel CPUs are "fusion" of CPU and GPU and have been since 2011?

Yep and we have had 100% hsa ready hardware since kaveri january 2014.

The software that gives a better experience is not there. I think the basic problem is the need is not there to pay for the change.

Sticking a gpu and cpu on die is imo in the future only a low end solution and for ultrabooks and low power devices.

Infinity fabric is the future. It not only connects the hardware, separates data and control, but it also in a broader sense connects production and ecosystem from eg gf and tsmc. Its an extremely strong business model. A new strong business almost always wins.

Its not remotely similar to hypertransport. Its a new way to drive the business.

DrMrLordX · Jan 27, 2017

krumme said:
Yep and we have had 100% hsa ready hardware since kaveri january 2014.

Not exactly. Some HSA/OpenCL2 features were not and are still not supported by Kaveri or Godavari in hardware. In general, you need either GCN 1.2 (or greater) or Intel Gen9 to get those features.

Intel has sold enough Gen9+ iGPUs that anyone serious about iGPU power via OpenCL2 should be able to make their software work, though I am told that Intel's OpenCL drivers leave something to be desired. There aren't enough Carrizo or Bristol Ridge APUs out there for their market presence to really matter.

krumme · Jan 27, 2017

DrMrLordX said:
Not exactly. Some HSA/OpenCL2 features were not and are still not supported by Kaveri or Godavari in hardware. In general, you need either GCN 1.2 (or greater) or Intel Gen9 to get those features.

Intel has sold enough Gen9+ iGPUs that anyone serious about iGPU power via OpenCL2 should be able to make their software work, though I am told that Intel's OpenCL drivers leave something to be desired. There aren't enough Carrizo or Bristol Ridge APUs out there for their market presence to really matter.

But is it solely because of the numbers sold?
And not because the software even with hsa opencl features doesnt bring a change in experience?
Imo its tech without a solid purpose. The proposition have proven to be not quite strong enough.

Secondly we have to be carefull to always think software should drive the hardware.

Infinity fabric is flexible ecology imo its by far the most interesting tech and thinking we see with the new tech portfolio. Talk about synergy. It makes the single parts of zen vega hbm2 whatever far more worth than they are in themselves.
I would judge it alongside the importance of P6 and K7.

Glo. · Jan 27, 2017

If you want to know why HSA hasn't been taken out as viable solution: how fast is The hardware? How much money you want to invest in seriously underperforming hardware?

That is why Raven Ridge is so important. That is why we will see 4C/8T + 16 CU design with HBM2. Because it will be used not by low end customers. This sort of hardware will cost between 250 and 350 USD.

Even proffesionals will see use of those.

jpiniero · Jan 27, 2017

Glo. said:
That is why Raven Ridge is so important. That is why we will see 4C/8T + 16 CU design with HBM2. Because it will be used not by low end customers. This sort of hardware will cost between 250 and 350 USD.

Psst, Raven Ridge is 11 CU. And no HBM2.

Glo. · Jan 27, 2017

jpiniero said:
Psst, Raven Ridge is 11 CU. And no HBM2.

Juanrga predicted this?

I will play a game with you guys.
Server APU has 16C/32T setup with 2 stacks of HBM2(16 GB, Total) and 64 CU GPU.
4C/8T, 16 CU, 4GB's of HBM2 is exactly 1/4th of The big brother.

AMD decided to go with 2048 bit memory Controller for very simple reason. It provides enough bandwidth for The GPU's and... It saves money because only two stacks are required. And HBM2 will have lower production cost, because it will be used not only by AMD, but also NVidia, and will be seen in much wider product range.

11 CU APU and no HBM2? Sure, but only cut down version. Thats where you are correct.

jpiniero · Jan 27, 2017

Snowy Owl is probably not coming out until the end of next year though, if the rumors of Vega 20's release isn't until then.

krumme · Jan 28, 2017

Glo. said:
Juanrga predicted this?

I will play a game with you guys.
Server APU has 16C/32T setup with 2 stacks of HBM2(16 GB, Total) and 64 CU GPU.
4C/8T, 16 CU, 4GB's of HBM2 is exactly 1/4th of The big brother.

AMD decided to go with 2048 bit memory Controller for very simple reason. It provides enough bandwidth for The GPU's and... It saves money because only two stacks are required. And HBM2 will have lower production cost, because it will be used not only by AMD, but also NVidia, and will be seen in much wider product range.

11 CU APU and no HBM2? Sure, but only cut down version. Thats where you are correct.

That might be so. Thats typical amd thinking. And its the old rumor.
Whatever happens with such an apu you face like what 1.2b in upfront cost. Where the alternative was simply using the interposer. The drawback is obviously latency and a lot of new possibilitis for certain task (but what?)
The advantage:
Save 1.2B in upfront cost
Minimizes risk
Higher yield on parts used cpu and gpu
Can use perhaps 2 different gpu sizes v11 and v10
Can use perhaps 3 different cpu sizes
Giving a total of 6 different combinations with plenty of binning opportunity within . = differentiate without high fixed cost catering to many smaller segments.

krumme · Jan 28, 2017

What amd have is a company that generates aprox 4b in yearly revenue with zero profit.
Then they have a carizo apu on cheap 28nm perfectly fit for low end. Its low margin atom market.
The priority must be server laptop and desktop where there is most profit. What we know they have is zen 2x ccx 8c and probably two infinity "compatible" gpu comming. Then they have hbm and infinity fabric at hand and experience working with hbm.
We also know they go tock tock tock with zen.
In a company so cash constrained - also for those new improved tock nodes - for me simply taking a single ccx zen and bringing it to market using this new fabric is what have the best cost venefit at lowest risk. They will not cater to midrange but so what.
The alternative to using cost in a server 16c apu is using the same cost for eg consoles development or the successor to zen. 300 man was working on the zen cpu core design. Thats aprox 500M a year including tools and other cost. You get a long way designing a new arch for the cost of bringing a new huge serverdie to the market.

Glo. · Jan 28, 2017

One more thing about what is needed for HSA to explode. Both sides of equation have to be equally powerful. Today we have from one side strong GPU+ weak CPU combination, or strong CPU+weak GPU combination, from the other side.

Raven Ridge can change this. AMD has a lot of open source software which should help sell the idea to developers.

DrMrLordX · Jan 28, 2017

krumme said:
But is it solely because of the numbers sold?
And not because the software even with hsa opencl features doesnt bring a change in experience?

From my limited experiments with HSA, I can tell you that SVM-enabled GPGPU software was compellingly interesting in scenarios where the iGPU might not be in use elsewhere. It was technology aimed at a user with an APU + dGPU, more-or-less. There is no particular reason why a developer could not leverage that kind of power today using OpenCL2.0, and there are plenty of programs that could benefit from such technology (not just the spreadsheet dingus in LibreOffice).

The "change in experience" is basically this: your CPU gets faster in certain tasks. There are a lot of Intel CPUs (Skylake/Kabylake) with underutilized/unutilized iGPUs out there that could be accelerating everyday tasks using that dark silicon. All it would really take is a few developers passionate enough about OpenCL2.0 to make it happen.

krumme · Jan 28, 2017

DrMrLordX said:
From my limited experiments with HSA, I can tell you that SVM-enabled GPGPU software was compellingly interesting in scenarios where the iGPU might not be in use elsewhere. It was technology aimed at a user with an APU + dGPU, more-or-less. There is no particular reason why a developer could not leverage that kind of power today using OpenCL2.0, and there are plenty of programs that could benefit from such technology (not just the spreadsheet dingus in LibreOffice).

The "change in experience" is basically this: your CPU gets faster in certain tasks. There are a lot of Intel CPUs (Skylake/Kabylake) with underutilized/unutilized iGPUs out there that could be accelerating everyday tasks using that dark silicon. All it would really take is a few developers passionate enough about OpenCL2.0 to make it happen.

I understand that. I just look and see "spreadsheet dingus in LibreOffice" and simply cant imagine the software where this utilization of the gpu gives a user benefit? I dont question it can make a lot of software faster just that i cant imagine the software where the user will notice?
Its that simple. What software are we talking about? I dont know?

CHADBOGA · Jan 28, 2017

Glo. said:
Juanrga predicted this?

He is a God amongst men.

krumme · Jan 28, 2017

CHADBOGA said:
He is a God amongst men.

He is a God amongst strongmen.

cytg111 · Jan 28, 2017

krumme said:
I understand that. I just look and see "spreadsheet dingus in LibreOffice" and simply cant imagine the software where this utilization of the gpu gives a user benefit? I dont question it can make a lot of software faster just that i cant imagine the software where the user will notice?
Its that simple. What software are we talking about? I dont know?

Excatly what I was gonna write ... well not excatly, but something along those lines .
What apps?
I could maybe see it in games ala physx, that is, an APU with discrete on the side.. Other than that, what? (that wouldnt allready benefit from completely offloading to a gpgpu)

Doom2pro · Jan 28, 2017

cytg111 said:
Excatly what I was gonna write ... well not excatly, but something along those lines .
What apps?
I could maybe see it in games ala physx, that is, an APU with discrete on the side.. Other than that, what? (that wouldnt allready benefit from completely offloading to a gpgpu)

There is a lot of software out there that uses heavily parallel loads (Rendering), and software that could but currently doesn't (Like SPICE for example)... The programming techniques being taught today need to change, because blindly assuming that CPUs are going to get faster and faster in Frequency and IPC isn't helping things.

If there existed a culture where parallel programming was more emphasized in combination with the required hardware and techniques, the software in question would already be prevalent.

When the physical wall of node shrinking is finally met, chip designers will be forced to approach parallel architectures and programmers will have to follow suit.

cytg111 · Jan 28, 2017

Doom2pro said:
There is a lot of software out there that uses heavily parallel loads (Rendering), and software that could but currently doesn't (Like SPICE for example)... The programming techniques being taught today need to change, because blindly assuming that CPUs are going to get faster and faster in Frequency and IPC isn't helping things.

If there existed a culture where parallel programming was more emphasized in combination with the required hardware and techniques, the software in question would already be prevalent.

When the physical wall of node shrinking is finally met, chip designers will be forced to approach parallel architectures and programmers will have to follow suit.

Yea, I get that, but overall if I am rendering stuff, wouldnt I go the gpgpu route and do opencl? Rendering seems like a very specific task, what I am getting at, where is the day-to-day apps that would benefit from HSA? I can only come up with games! (and that might be enough given console deals, i dunno..)

Doom2pro · Jan 28, 2017

cytg111 said:
where is the day-to-day apps that would benefit from HSA? I can only come up with games! (and that might be enough given console deals, i dunno..)

The problem is, the day to day apps were written to take advantage of high performance serial threaded hardware, and they were written that way because most programmers were taught to write single threaded code, and they tend to ignore other cores or GPUs because they either don't know how to and or were never provided an adequate means to (Hardware and API wise).

We have made many strides on the hardware and API side but things wont change until the programming paradigms change, and they have to eventually because we are approaching a wall.

cytg111 · Jan 28, 2017

Doom2pro said:
The problem is, the day to day apps were written to take advantage of high performance serial threaded hardware, and they were written that way because most programmers were taught to write single threaded code, and they tend to ignore other cores or GPUs because they either don't know how to and or were never provided an adequate means to (Hardware and API wise).

We have made many strides on the hardware and API side but things wont change until the programming paradigms change, and they have to eventually because we are approaching a wall.

Yea but isnt it the same argument as with next gen SSD's ? If you allready have one (not 1st gen), you are problary not going to notice 2ms -> 1ms latency and 40.000 iops -> 100.000 iops in apps outside of benchmarks?
Where is the everyday apps that are so compute heavy that we are 'waiting to finish' .. that would benefit from HSA?

Doom2pro · Jan 28, 2017

I find this situation very similar to when High Level object oriented programming languages popped onto the scene, as programmers were still designing complex software that could easily take advantage of high level object oriented languages but they were still writing them in low level languages like C.

Doom2pro · Jan 28, 2017

cytg111 said:
Yea but isnt it the same argument as with next gen SSD's ? If you allready have one (not 1st gen), you are problary not going to notice 2ms -> 1ms latency and 40.000 iops -> 100.000 iops in apps outside of benchmarks?

Well SSDs are here and people have already made the switch, things like m.2 are only going to be improvements to a preexisting shift away from slower magnetic storage.

Where is the everyday apps that are so compute heavy that we are 'waiting to finish' .. that would benefit from HSA

They cannot benefit from HSA because they were written wrong, using old serial programming techniques taught years ago by professors who wouldn't dream of everyday folks having access to such high performance parallel beasts like GPUs or 8 core 16 thread CPUs.

The problem I'm trying to explain here is that the teaching side hasn't caught up to the hardware advancements or the APIs available now... So it doesn't matter if you have an 8c/16t CPU with a beast GPU capable of GPGPU and HSA if the damn programmers are writing code that runs well on/is targeted towards high frequency high IPC CPUs instead of Multi-core CPUs and GPGPU capable GPUs.

cytg111 · Jan 28, 2017

Doom2pro said:
I find this situation very similar to when High Level object oriented programming languages popped onto the scene, as programmers were still designing complex software that could easily take advantage of high level object oriented languages but they were still writing them in low level languages like C.

I wouldnt be surprised if this (finding parallelism) is an area where AI is going to make a significant difference in the near future.. Expect you next version of GCC/Clang to ship with a deep learning neural net - OR it might finally be the year where JIT overtakes native binaries (jokes jokes, easy now).

Doom2pro · Jan 28, 2017

cytg111 said:
I wouldnt be surprised if this (finding parallelism) is an area where AI is going to make a significant difference in the near future.. Expect you next version of GCC/Clang to ship with a deep learning neural net - OR it might finally be the year where JIT overtakes native binaries (jokes jokes, easy now).

Well the compilers getting smarter is a start, but eventually the languages themselves need to change (C++ is getting there), and the people teaching them need to start taking advantage of these changes and start approaching parallel programming techniques like it's not the third rail of programming.

cytg111 · Jan 28, 2017

Doom2pro said:
...

This is a dead horse but ima gonna dig it up anyway .. Amdahls law.
Very many things is inherently seriel in nature. No way around it.

Example, why would this benefit the user experience
Do X at 2ms : result at 10 watts
Do X with HSA at 1ms : result at 10 watts

You are doing twice the work at time n but still at the same watts. The user aint gonna notice going from 2ms to 1ms.
So where is the apps that are running for seconds that would benefit from HSA and thus the average user? I dont see them.

I dont see endless parallelism is gonna save us here, my bet is on frequency scaling.
Science is going to have to come up with more Hz's for our chips.

Doom2pro · Jan 28, 2017

cytg111 said:
This is a dead horse but ima gonna dig it up anyway .. Amdahls law.
Very many things is inherently seriel in nature. No way around it.

Obviously there are jobs that aren't capable of being parallelized but that wasn't my point.

Example, why would this benefit the user experience
Do X at 2ms : result at 10 watts
Do X with HSA at 1ms : result at 10 watts

You are doing twice the work at time n but still at the same watts. The user aint gonna notice going from 2ms to 1ms.
So where is the apps that are running for seconds that would benefit from HSA and thus the average user? I dont see them.

Well they would benefit if the job has to be done multiple times over a long period of time (all those ms's add up), because a "job" or a "task" isn't always just an irreducible task, some of them are made out of individual mixtures of inherently serial and inherently parallel tasks, of which the latter could take advantage of such hardware especially if this job is being done in a tight loop.

There are lots of old methods of doing things that can be rediscovered in a parallel nature, but nobody bothers unless like you said it's a very specific task and there is no way to do it otherwise.

New Zen microarchitecture details

Diamond Member

Lifer

Diamond Member

Diamond Member

Lifer

Diamond Member

Lifer

Diamond Member

Diamond Member

Diamond Member

Lifer

Diamond Member

Platinum Member

Diamond Member

Lifer

Senior member

Lifer

Senior member

Lifer

Senior member

Senior member

Lifer

Senior member

Lifer

Senior member