AMD Zen - Key Dates and Information

Page 10 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

bjt2

Senior member
Sep 11, 2016
784
180
86
HBM2 is 256GB/s / stack AFAIK
128GB/s was for HBM1.
Maybe in a mobile APU lowering the HBM clock can be advisable, for power or also memory binning, especially if we think at middle class APUs. For high performance, high MRSP i think that a full speed HBM2 chip as L4 cache is advisable...
 

ShintaiDK

Lifer
Apr 22, 2012
20,378
145
106
Im trying to understand what GloFo has to do with the fact that we can use a Single Stack 128GB/s HBM2 and not the dual stack 256GB/s (512GB/s) you were talking about ??

Nobody mentioned GloFo, so no wonder you are confused.
 

ShintaiDK

Lifer
Apr 22, 2012
20,378
145
106
Bandwidths like 128GB/s from the HBM stack should be enough for a DT/NB APU (not Snowy Owl). But it wouldn't be a renamed frame buffer (which seems to be wrong based on Vega info), but truly a cache (what can be done with Xeon Phi too) with cache lines, tags, prefetches, and a dedicated, transparent management.

Working as a cache not only reduces the actual memory requirements (they've shown ~50% utilization for 2 typical games), but also reduce first hit latency in case of good prefetching.

Not to forget (old, but an indication) 50GB/s bus (not 16) and 128GB/s from a HBM stack:


Likely config for Snowy Owl:


It might even look like this:

Have there been any new info on these fabled APUs with HBM that isn't ~2 years old?
 

ShintaiDK

Lifer
Apr 22, 2012
20,378
145
106
HBM2 is 256GB/s / stack AFAIK
128GB/s was for HBM1.
Maybe in a mobile APU lowering the HBM clock can be advisable, for power or also memory binning, especially if we think at middle class APUs. For high performance, high MRSP i think that a full speed HBM2 chip as L4 cache is advisable...

Currently there isn't any officla 256GB/sec HBM2 modules. Hynix just got out with a 4GB 1.6Ghz module. Samsung haven't reached past 1.6Ghz either. Not that its a problem really.

http://www.skhynix.com/static/filedata/fileDownload.do?seq=366

The best way may actually just be a 2hi 2GB 512bit HBM2 module at 1.6Ghz with 104GB/sec to save power consumption. (No, HBM isn't power efficient anymore). Assuming the extra cost and all even make sense. IGPs still haven't sold on graphics performance and such a product could just erode Polaris 11/12 sales.

But again, TDP is another limit if you wish to unleash the GPU power.
 

bjt2

Senior member
Sep 11, 2016
784
180
86
Currently there isn't any officla 256GB/sec HBM2 modules. Hynix just got out with a 4GB 1.6Ghz module. Samsung haven't reached past 1.6Ghz either. Not that its a problem really.

http://www.skhynix.com/static/filedata/fileDownload.do?seq=366

The best way may actually just be a 2hi 2GB 512bit HBM2 module at 1.6Ghz with 104GB/sec to save power consumption. (No, HBM isn't power efficient anymore). Assuming the extra cost and all even make sense. IGPs still haven't sold on graphics performance and such a product could just erode Polaris 11/12 sales.

But again, TDP is another limit if you wish to unleash the GPU power.

I read that Vega have 2xHBM2 stacks and 512GB/s of B/W so at least in Vega there are 2GT/s modules. Maybe not so power efficient or cheap, but they exist. For one top APU SKU 1HBM2 module could be worth the effort. And on lower SKUs, slower and/or smaller modules can be used...
 

ShintaiDK

Lifer
Apr 22, 2012
20,378
145
106
I read that Vega have 2xHBM2 stacks and 512GB/s of B/W so at least in Vega there are 2GT/s modules. Maybe not so power efficient or cheap, but they exist. For one top APU SKU 1HBM2 module could be worth the effort. And on lower SKUs, slower and/or smaller modules can be used...

Well we have to see. The Hynix update is quite resent and unexpected.

It would be easier and cheaper just to use LPDDR4.
https://news.samsung.com/global/samsung-rolls-out-industrys-first-8gb-lpddr4-dram-package
No, HBM power consumption penalty and you get 68.2GB/sec.

There is also the question if you could actually fit a HBM2 chip+interposer+APU in an AM4 package. Including height. HBM2 for example is 0.34mm taller than HBM1. Then add interposer etc. For non AM4 chips its obviously not an issue.
 
Last edited:

Dresdenboy

Golden Member
Jul 28, 2003
1,730
554
136
citavia.blog.de
Taking this into account, how likely do you think it is for Raven Ridge to include some form of L4 cache? Perhaps not HBM, but something more akin to Crystalwell.
I don't think that they'll do sth. like that. It didn't seem to be a cheap solution and DDR4 + (faster than before) L3 should be fast enough for just 1 CCX. The iGPU has different needs. Using HBM as a cache has advantages not only for efficient use of virtual graphics memory + Onion/DDR4 bandwidth, but also with HSA, as the sharing doesn't need to happen via the common DDR4, but the GPU can also buffer data there as long as it's using it.

Have there been any new info on these fabled APUs with HBM that isn't ~2 years old?
Absense of evidence... Unfortunately not. Worst case could still be RR without HBM. AMD got better in suppressing such strategically important leaks.
 

Glo.

Diamond Member
Apr 25, 2015
5,765
4,670
136
There is no real change besides just renaming it to cache. Vega still sits with 16GB/sec to main memory.

Its no different than how GP100, Xeon Phi etc works. However unlike the 2 others, Xeon Phi got 102GB/sec to main memory.
Reason is very simple. Zen CPU has 2048 memory bit controller for HBM2.

One design to rule them all. Thats why you did not do proper research about what you are trying to argue.

There will be two versions of Zen APUs. Those that do not have HBM2, and those that have. And those that will have HBM2 will have 2048 bit memory controller. For both CPU and GPU. Simpler it cannot be put.

Of course that means that bandwidth will be determined by the clock speed of the HBM2 chips. Even 256 GB/s will be sufficient.
 

Atari2600

Golden Member
Nov 22, 2016
1,409
1,655
136
For CNC/CATIA/CAD use with an APU, minimum you need 6 cores. 4 cores will struggle in professional environments (I worked as a fighter aircraft engineer for some years).

Not in my experience.

Those areas named are currently pretty poorly threaded and your much better off with 4 racers than 6 downclocked.



If you are using CFD, different story you need CPU + memory bandwidth. If your using FEA, then its I/O that's the killer, if you've loads of money, get a large memory machine and make a virtual HD on it for the scratch area.

But CAD/CAM is very poorly threaded.
 

Atari2600

Golden Member
Nov 22, 2016
1,409
1,655
136
There is no real change besides just renaming it to cache. Vega still sits with 16GB/sec to main memory.

Ugh. And if you have enough "cache" to keep your entire frame buffer local?

Some people are so obstinate its tough not to ridicule them. :-/
 

bjt2

Senior member
Sep 11, 2016
784
180
86
Reason is very simple. Zen CPU has 2048 memory bit controller for HBM2.

One design to rule them all. Thats why you did not do proper research about what you are trying to argue.

There will be two versions of Zen APUs. Those that do not have HBM2, and those that have. And those that will have HBM2 will have 2048 bit memory controller. For both CPU and GPU. Simpler it cannot be put.

Of course that means that bandwidth will be determined by the clock speed of the HBM2 chips. Even 256 GB/s will be sufficient.

If the DDR4 controller is still on board, maybe in monochannel, the HBM2 i think that could be configured as RAM, if no DDR4 module is installed or L4 cache if installed or maybe partitioned in RAM and L4 or all RAM even if the DDR4 is installed... All is possible. But if the DDR4 is not much bigger than the HBM2 module i think that union of the capacity is the best: fast RAM and slow RAM... So Amiga times...
 

Glo.

Diamond Member
Apr 25, 2015
5,765
4,670
136
If the DDR4 controller is still on board, maybe in monochannel, the HBM2 i think that could be configured as RAM, if no DDR4 module is installed or L4 cache if installed or maybe partitioned in RAM and L4 or all RAM even if the DDR4 is installed... All is possible. But if the DDR4 is not much bigger than the HBM2 module i think that union of the capacity is the best: fast RAM and slow RAM... So Amiga times...
What people are forgetting in terms of Zen APUs(Raven Ridge) is the Infinity Fabric. It is designed to connect the CPU cores and GPU cores in a way we have not seen before, and for that reason HBM2 has 2048 bit memory controller, to reduce both latency, and power consumption. Its all about delivering the right data, in the right time, and with minimum effort from power perspective.

At least thats how I understand the concept of Infinity Fabric.
 

bjt2

Senior member
Sep 11, 2016
784
180
86
What people are forgetting in terms of Zen APUs(Raven Ridge) is the Infinity Fabric. It is designed to connect the CPU cores and GPU cores in a way we have not seen before, and for that reason HBM2 has 2048 bit memory controller, to reduce both latency, and power consumption. Its all about delivering the right data, in the right time, and with minimum effort from power perspective.

At least thats how I understand the concept of Infinity Fabric.

I think that is infinity fabric that connect the CCX, the NB, MC and SB. And in the APU will connect also CCX and GPU (along with cache/HBM2 controller)... i think that i ever read something on this topic: infinity fabric substitutes onion, garlic and HTT buses: an high bandwidth, low latency (like onion and garlic) and coherent (like HTT) bus...
 

KTE

Senior member
May 26, 2016
478
130
76
Not in my experience.

Those areas named are currently pretty poorly threaded and your much better off with 4 racers than 6 downclocked.

If you are using CFD, different story you need CPU + memory bandwidth. If your using FEA, then its I/O that's the killer, if you've loads of money, get a large memory machine and make a virtual HD on it for the scratch area.

But CAD/CAM is very poorly threaded.

6 fast not 6 downclocked.

I agree to that in part except that you always need >4 fast cores minimum. The remaining are not for the extra load but to make sure you can freely do extra things with multiple monitors. That's currently no where near APU domain.

And engineering of my experience (Lockheed, BAE, Rolls Royce, NG) use bespoke software internally that does scale above 8C, not run off the mill.

Sent from HTC 10
(Opinions are own)
 

Atari2600

Golden Member
Nov 22, 2016
1,409
1,655
136
6 fast not 6 downclocked.

I agree to that in part except that you always need >4 fast cores minimum.

Again, not in my experience. >4 cores is only necessary for CAE, even then, the vast majority won't need it. Those in CAD or CAM simply won't need it at all. CATIA is woefully light on threads!


The remaining are not for the extra load but to make sure you can freely do extra things with multiple monitors. That's currently no where near APU domain.

A Snowy Owl APU would easily be capable of operating CATIA and whatever references you'd be using on the 2nd window. Indeed, even the last gen never mind the current gen would do it.

Extra monitors does not impose additional computational load, as you usually aren't executing multiple simulations in multiple programs simultaneously - if you are - your doing something wrong!


And engineering of my experience (Lockheed, BAE, Rolls Royce, NG) use bespoke software internally that does scale above 8C, not run off the mill.

Of course. I've written some of it.

But, from anything I've done or seen, bespoke software is something that is either (1) run on big workstations, HPC nodes or across the network on distributed over night runs, or (2) is quick turnarounds and lightly threaded - this is the vast majority of work. Also, they are CAE, not CAD or CAM.

The magical number for the big runs is <12 hrs... i.e. something that can be started at say 7pm and then picked up at 7am the next day. But nowerdays, the memory requirements for such a run would tend to push you onto a big workstation.


Sent from HTC 10
(Opinions are own)

Whats with the opinions are own? You are anonymous on the forum so no point having a disclaimer.
 

Rifter

Lifer
Oct 9, 1999
11,522
751
126
Its a huge rock approaching earth. This is the calm period before the impact.
When it hits - thats when things get out of control.

Well if we listen to everyone in this thread that is true! but with 2 opposite results lol.
 

ShintaiDK

Lifer
Apr 22, 2012
20,378
145
106
Reason is very simple. Zen CPU has 2048 memory bit controller for HBM2.

One design to rule them all. Thats why you did not do proper research about what you are trying to argue.

There will be two versions of Zen APUs. Those that do not have HBM2, and those that have. And those that will have HBM2 will have 2048 bit memory controller. For both CPU and GPU. Simpler it cannot be put.

Of course that means that bandwidth will be determined by the clock speed of the HBM2 chips. Even 256 GB/s will be sufficient.

Citation please.
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |