AMD Zen - Key Dates and Information

bjt2 · Jan 10, 2017

HBM2 is 256GB/s / stack AFAIK
128GB/s was for HBM1.
Maybe in a mobile APU lowering the HBM clock can be advisable, for power or also memory binning, especially if we think at middle class APUs. For high performance, high MRSP i think that a full speed HBM2 chip as L4 cache is advisable...

ShintaiDK · Jan 10, 2017

AtenRa said:
Im trying to understand what GloFo has to do with the fact that we can use a Single Stack 128GB/s HBM2 and not the dual stack 256GB/s (512GB/s) you were talking about ??

Nobody mentioned GloFo, so no wonder you are confused.

ShintaiDK · Jan 10, 2017

Dresdenboy said:
Bandwidths like 128GB/s from the HBM stack should be enough for a DT/NB APU (not Snowy Owl). But it wouldn't be a renamed frame buffer (which seems to be wrong based on Vega info), but truly a cache (what can be done with Xeon Phi too) with cache lines, tags, prefetches, and a dedicated, transparent management.

Working as a cache not only reduces the actual memory requirements (they've shown ~50% utilization for 2 typical games), but also reduce first hit latency in case of good prefetching.

Not to forget (old, but an indication) 50GB/s bus (not 16) and 128GB/s from a HBM stack:

Likely config for Snowy Owl:

It might even look like this:

Have there been any new info on these fabled APUs with HBM that isn't ~2 years old?

AtenRa · Jan 10, 2017

ShintaiDK said:
Nobody mentioned GloFo, so no wonder you are confused.

Ahahahaha ok my bad i read Glo for GloFo

ShintaiDK · Jan 10, 2017

bjt2 said:
HBM2 is 256GB/s / stack AFAIK
128GB/s was for HBM1.
Maybe in a mobile APU lowering the HBM clock can be advisable, for power or also memory binning, especially if we think at middle class APUs. For high performance, high MRSP i think that a full speed HBM2 chip as L4 cache is advisable...

Currently there isn't any officla 256GB/sec HBM2 modules. Hynix just got out with a 4GB 1.6Ghz module. Samsung haven't reached past 1.6Ghz either. Not that its a problem really.

http://www.skhynix.com/static/filedata/fileDownload.do?seq=366

The best way may actually just be a 2hi 2GB 512bit HBM2 module at 1.6Ghz with 104GB/sec to save power consumption. (No, HBM isn't power efficient anymore). Assuming the extra cost and all even make sense. IGPs still haven't sold on graphics performance and such a product could just erode Polaris 11/12 sales.

But again, TDP is another limit if you wish to unleash the GPU power.

bjt2 · Jan 10, 2017

ShintaiDK said:
Currently there isn't any officla 256GB/sec HBM2 modules. Hynix just got out with a 4GB 1.6Ghz module. Samsung haven't reached past 1.6Ghz either. Not that its a problem really.

http://www.skhynix.com/static/filedata/fileDownload.do?seq=366

The best way may actually just be a 2hi 2GB 512bit HBM2 module at 1.6Ghz with 104GB/sec to save power consumption. (No, HBM isn't power efficient anymore). Assuming the extra cost and all even make sense. IGPs still haven't sold on graphics performance and such a product could just erode Polaris 11/12 sales.

But again, TDP is another limit if you wish to unleash the GPU power.

I read that Vega have 2xHBM2 stacks and 512GB/s of B/W so at least in Vega there are 2GT/s modules. Maybe not so power efficient or cheap, but they exist. For one top APU SKU 1HBM2 module could be worth the effort. And on lower SKUs, slower and/or smaller modules can be used...

ShintaiDK · Jan 10, 2017

bjt2 said:
I read that Vega have 2xHBM2 stacks and 512GB/s of B/W so at least in Vega there are 2GT/s modules. Maybe not so power efficient or cheap, but they exist. For one top APU SKU 1HBM2 module could be worth the effort. And on lower SKUs, slower and/or smaller modules can be used...

Well we have to see. The Hynix update is quite resent and unexpected.

It would be easier and cheaper just to use LPDDR4.
https://news.samsung.com/global/samsung-rolls-out-industrys-first-8gb-lpddr4-dram-package
No, HBM power consumption penalty and you get 68.2GB/sec.

There is also the question if you could actually fit a HBM2 chip+interposer+APU in an AM4 package. Including height. HBM2 for example is 0.34mm taller than HBM1. Then add interposer etc. For non AM4 chips its obviously not an issue.

Dresdenboy · Jan 10, 2017

CatMerc said:
Taking this into account, how likely do you think it is for Raven Ridge to include some form of L4 cache? Perhaps not HBM, but something more akin to Crystalwell.

I don't think that they'll do sth. like that. It didn't seem to be a cheap solution and DDR4 + (faster than before) L3 should be fast enough for just 1 CCX. The iGPU has different needs. Using HBM as a cache has advantages not only for efficient use of virtual graphics memory + Onion/DDR4 bandwidth, but also with HSA, as the sharing doesn't need to happen via the common DDR4, but the GPU can also buffer data there as long as it's using it.

ShintaiDK said:
Have there been any new info on these fabled APUs with HBM that isn't ~2 years old?

Absense of evidence... Unfortunately not. Worst case could still be RR without HBM. AMD got better in suppressing such strategically important leaks.

Glo. · Jan 10, 2017

ShintaiDK said:
There is no real change besides just renaming it to cache. Vega still sits with 16GB/sec to main memory.

Its no different than how GP100, Xeon Phi etc works. However unlike the 2 others, Xeon Phi got 102GB/sec to main memory.

Reason is very simple. Zen CPU has 2048 memory bit controller for HBM2.

One design to rule them all. Thats why you did not do proper research about what you are trying to argue.

There will be two versions of Zen APUs. Those that do not have HBM2, and those that have. And those that will have HBM2 will have 2048 bit memory controller. For both CPU and GPU. Simpler it cannot be put.

Of course that means that bandwidth will be determined by the clock speed of the HBM2 chips. Even 256 GB/s will be sufficient.

Atari2600 · Jan 10, 2017

KTE said:
For CNC/CATIA/CAD use with an APU, minimum you need 6 cores. 4 cores will struggle in professional environments (I worked as a fighter aircraft engineer for some years).

Not in my experience.

Those areas named are currently pretty poorly threaded and your much better off with 4 racers than 6 downclocked.

If you are using CFD, different story you need CPU + memory bandwidth. If your using FEA, then its I/O that's the killer, if you've loads of money, get a large memory machine and make a virtual HD on it for the scratch area.

But CAD/CAM is very poorly threaded.

Atari2600 · Jan 10, 2017

ShintaiDK said:
There is no real change besides just renaming it to cache. Vega still sits with 16GB/sec to main memory.

Ugh. And if you have enough "cache" to keep your entire frame buffer local?

Some people are so obstinate its tough not to ridicule them. :-/

bjt2 · Jan 10, 2017

Glo. said:
Reason is very simple. Zen CPU has 2048 memory bit controller for HBM2.

One design to rule them all. Thats why you did not do proper research about what you are trying to argue.

There will be two versions of Zen APUs. Those that do not have HBM2, and those that have. And those that will have HBM2 will have 2048 bit memory controller. For both CPU and GPU. Simpler it cannot be put.

Of course that means that bandwidth will be determined by the clock speed of the HBM2 chips. Even 256 GB/s will be sufficient.

If the DDR4 controller is still on board, maybe in monochannel, the HBM2 i think that could be configured as RAM, if no DDR4 module is installed or L4 cache if installed or maybe partitioned in RAM and L4 or all RAM even if the DDR4 is installed... All is possible. But if the DDR4 is not much bigger than the HBM2 module i think that union of the capacity is the best: fast RAM and slow RAM... So Amiga times...

Glo. · Jan 10, 2017

bjt2 said:
If the DDR4 controller is still on board, maybe in monochannel, the HBM2 i think that could be configured as RAM, if no DDR4 module is installed or L4 cache if installed or maybe partitioned in RAM and L4 or all RAM even if the DDR4 is installed... All is possible. But if the DDR4 is not much bigger than the HBM2 module i think that union of the capacity is the best: fast RAM and slow RAM... So Amiga times...

What people are forgetting in terms of Zen APUs(Raven Ridge) is the Infinity Fabric. It is designed to connect the CPU cores and GPU cores in a way we have not seen before, and for that reason HBM2 has 2048 bit memory controller, to reduce both latency, and power consumption. Its all about delivering the right data, in the right time, and with minimum effort from power perspective.

At least thats how I understand the concept of Infinity Fabric.

bjt2 · Jan 10, 2017

Glo. said:
What people are forgetting in terms of Zen APUs(Raven Ridge) is the Infinity Fabric. It is designed to connect the CPU cores and GPU cores in a way we have not seen before, and for that reason HBM2 has 2048 bit memory controller, to reduce both latency, and power consumption. Its all about delivering the right data, in the right time, and with minimum effort from power perspective.

At least thats how I understand the concept of Infinity Fabric.

I think that is infinity fabric that connect the CCX, the NB, MC and SB. And in the APU will connect also CCX and GPU (along with cache/HBM2 controller)... i think that i ever read something on this topic: infinity fabric substitutes onion, garlic and HTT buses: an high bandwidth, low latency (like onion and garlic) and coherent (like HTT) bus...

KTE · Jan 10, 2017

Atari2600 said:
Not in my experience.

Those areas named are currently pretty poorly threaded and your much better off with 4 racers than 6 downclocked.

If you are using CFD, different story you need CPU + memory bandwidth. If your using FEA, then its I/O that's the killer, if you've loads of money, get a large memory machine and make a virtual HD on it for the scratch area.

But CAD/CAM is very poorly threaded.

6 fast not 6 downclocked.

I agree to that in part except that you always need >4 fast cores minimum. The remaining are not for the extra load but to make sure you can freely do extra things with multiple monitors. That's currently no where near APU domain.

And engineering of my experience (Lockheed, BAE, Rolls Royce, NG) use bespoke software internally that does scale above 8C, not run off the mill.

Sent from HTC 10
(Opinions are own)

Atari2600 · Jan 10, 2017

KTE said:
6 fast not 6 downclocked.

I agree to that in part except that you always need >4 fast cores minimum.

Again, not in my experience. >4 cores is only necessary for CAE, even then, the vast majority won't need it. Those in CAD or CAM simply won't need it at all. CATIA is woefully light on threads!

KTE said:
The remaining are not for the extra load but to make sure you can freely do extra things with multiple monitors. That's currently no where near APU domain.

A Snowy Owl APU would easily be capable of operating CATIA and whatever references you'd be using on the 2nd window. Indeed, even the last gen never mind the current gen would do it.

Extra monitors does not impose additional computational load, as you usually aren't executing multiple simulations in multiple programs simultaneously - if you are - your doing something wrong!

KTE said:
And engineering of my experience (Lockheed, BAE, Rolls Royce, NG) use bespoke software internally that does scale above 8C, not run off the mill.

Of course. I've written some of it.

But, from anything I've done or seen, bespoke software is something that is either (1) run on big workstations, HPC nodes or across the network on distributed over night runs, or (2) is quick turnarounds and lightly threaded - this is the vast majority of work. Also, they are CAE, not CAD or CAM.

The magical number for the big runs is <12 hrs... i.e. something that can be started at say 7pm and then picked up at 7am the next day. But nowerdays, the memory requirements for such a run would tend to push you onto a big workstation.

KTE said:
Sent from HTC 10
(Opinions are own)

Whats with the opinions are own? You are anonymous on the forum so no point having a disclaimer.

coercitiv · Jan 10, 2017

Atari2600 said:
Whats with the opinions are own? You are anonymous on the forum so no point having a disclaimer.

HTC 10 can and may influence opinions.

Rifter · Jan 10, 2017

This speculation is getting out of control, does anyone know the end date for the NDA????

krumme · Jan 10, 2017

Rifter said:
This speculation is getting out of control

Its a huge rock approaching earth. This is the calm period before the impact.
When it hits - thats when things get out of control.

Rifter · Jan 10, 2017

krumme said:
Its a huge rock approaching earth. This is the calm period before the impact.
When it hits - thats when things get out of control.

Well if we listen to everyone in this thread that is true! but with 2 opposite results lol.

krumme · Jan 10, 2017

Rifter said:
Well if we listen to everyone in this thread that is true! but with 2 opposite results lol.

Canard says its final sample now so plus 8 weeks and we are mid marts for reviews?

ShintaiDK · Jan 11, 2017

Glo. said:
Reason is very simple. Zen CPU has 2048 memory bit controller for HBM2.

One design to rule them all. Thats why you did not do proper research about what you are trying to argue.

There will be two versions of Zen APUs. Those that do not have HBM2, and those that have. And those that will have HBM2 will have 2048 bit memory controller. For both CPU and GPU. Simpler it cannot be put.

Of course that means that bandwidth will be determined by the clock speed of the HBM2 chips. Even 256 GB/s will be sufficient.

Citation please.

ShintaiDK · Jan 11, 2017

Dresdenboy said:
Absense of evidence... Unfortunately not. Worst case could still be RR without HBM. AMD got better in suppressing such strategically important leaks.

Just how they suppress the K12 news?

itsmydamnation · Jan 11, 2017

ShintaiDK said:
Just how they suppress the K12 news?

https://youtu.be/Ln9WKPEHm4w?t=1h28m59s

"our arm road map is the same that it has been"

"so we share techniques back of forth to improve performance"

AMD does have a semi custom ARM job as well.

i guess he is lying just like Lisa hey......

Atari2600 · Jan 11, 2017

ShintaiDK said:
Citation please.

http://techfrag.com/2016/02/12/amd-zen-high-end-exascale-cpu-and-apu-specs-leaked/

https://vrworld.com/2016/02/12/cern-confirms-amd-zen-high-end-specifications/

But then, it doesn't exactly tie with the below:

But this should be borne in mind:

AMD Zen - Key Dates and Information

Senior member

Lifer

Lifer

Lifer

Lifer

Senior member

Lifer

Golden Member

Diamond Member

Golden Member

Golden Member

Senior member

Diamond Member

Senior member

Senior member

Golden Member

Diamond Member

Lifer

Diamond Member

Lifer

Diamond Member

Lifer

Lifer

Platinum Member

Golden Member