AMD Raven Ridge 'Zen APU' Thread

otinane · May 3, 2017

Maybe the APU with the Vega cores, won't be such competitive, if the big Vega looks like that:

http://wccftech.com/amd-radeon-vega-specs-leaked-linux-patch/

Code:

    case CHIP_VEGA10:
    adev->gfx.config.max_shader_engines = 4;
    adev->gfx.config.max_tile_pipes = 8;
    adev->gfx.config.max_cu_per_sh = 16;
    adev->gfx.config.max_sh_per_se = 1;
    adev->gfx.config.max_backends_per_se = 4;
    adev->gfx.config.max_texture_channel_caches = 16;
    adev->gfx.config.max_gprs = 256;
    adev->gfx.config.max_gs_threads = 32;
    adev->gfx.config.max_hw_contexts = 8;

Sweepr · May 3, 2017

https://gfxbench.com/device.jsp?ben...Windows&api=gl&hwtype=dGPU&hwname=AMD+699F:C1

maddie · May 3, 2017

Glo said:
"Its the other way around. HBCC is there to limit the requirements for bandwidth, and control the streaming of data from any type of memory(Volatile, and non-volatile) to the GPU.
But I will go and research into this matter further."

"Its there to increase the performance of the GPU. HBCC is not only about memory bandwidth, its about also controlling the data streaming from any source to the GPU"

richaron said:
I think you're on the wrong track here. AFAIK the HBCC reduces the significant software overhead from communications over any bus. This is it's major advantage. For example in the event of a VRAM buffer overflow a GPU with HBCC should have a significant latency advantage accessing system RAM over one which addresses system RAM with drivers.

Whatever the architecture of a GPU, it needs a certain a certain data volume/sec to be fully utilized.
Call this F bandwidth.

Bandwidth on the video card is many times larger than to off the video card, to the extent that a modern GPU cannot be fed directly and run at full utilization by using off video card memory.

The accepted solution is to provide enough memory on the video card to satisfy this bandwidth requirement AND store all the data needed so as to eliminate any need to access off video card memory.

HBCC & HBC is another way to solve the problem of keeping a GPU fully fed with data.

The HBCC controls the flow of all data off the GPU die. It connects on card HBC [HBM2 stacks ], the GPU die and off card memory. The HBCC allows fast switching between those three units. Off card memory is normally system DDR3-4 memory, but can be direct to non-volatile storage.

It appears as a side effect that the GPU must have larger L2 caches for the moments when the HBC is being updated with new data.

In this system the video card memory must also be able to satisfy the bandwidth requirements of the GPU [ F bandwidth ], but also be able to import data that will be needed soon. The total bandwidth will be greater than F.

The total memory capacity needed will be LESS, as it refreshes on the fly and does not need to store the entire data requirement at once.

The total bandwidth requirement of on card GPU memory is HIGHER, as it needs to both supply the GPU, but also refresh from main memory as needed.

EDIT:

If any RR APU has HBCC onboard and working, then it must have HBM2, as HBCC cannot work with system memory bandwidth.

Shivansps · May 3, 2017

Again what about if we stop with the the super expensive high CU and HBM APU and we stick with something that we could actually use?

What we need is a $50-60 2C/4T G4560 like APU with better IGP and overcloking, and a $90 4/4 with better IGP.

Thats all.

richaron · May 3, 2017

[QUOTE="maddie, post: 38876776, member: 278074"]The total bandwidth requirement of on card GPU memory is HIGHER, as it needs to both supply the GPU, but also refresh from main memory as needed.[/QUOTE]

If you are correct I suspect with respect to PCIe data transfer the higher local bandwidth requirements you are talking about is negligible to the point of non-existent. But I'll accept you may have a point, I'll need more proof though.

I also suspect if the GPU calls for data over PCIe (via the HBCC) it will bypass local memory ("HBC") on it's first pass, i.e. I'd guess the HBCC can address both local memory and PCIe data:
GPU -> HBCC -> "HBC" -> HBCC -> GPU
& also:
GPU -> HBCC -> "PCIe data" -> HBCC -> GPU

What you are suggesting would mean that:
GPU -> HBCC -> HBC -> PCIe data -> HBC -> HBCC -> GPU
or maybe something like:
GPU -> HBCC -> HBC -> HBCC -> PCIe data -> HBCC -> HBC -> GPU??

Or maybe you are imagining something like:
GPU -> HBC -> HBCC -> PCIe data -> ...
But this situation is impossible because local memory ("HBC" or DDR5/5x/6/3/...) is essentially a slave and isn't smart enough to offload calls under necessary situations.

richaron · May 3, 2017

Shivansps said:
Again what about if we stop with the the super expensive high CU and HBM APU and we stick with something that we could actually use?

What we need is a $50-60 2C/4T G4560 like APU with better IGP and overcloking, and a $90 4/4 with better IGP.

Thats all.

That would be an awesome way to ignore the greatest advantage of a company with both APU & HBM tech... Good call.

maddie · May 3, 2017

Shivansps said:
Again what about if we stop with the the super expensive high CU and HBM APU and we stick with something that we could actually use?

What we need is a $50-60 2C/4T G4560 like APU with better IGP and overcloking, and a $90 4/4 with better IGP.

Thats all.

Then we would have little to speculate and small chances for discussions/arguments

What you're describing is technically easy to achieve. The impediments are mainly economic factors.

By the way, what's wrong with high-end APUs? You talk about your priority and others about what interests them.

My interest is that high-end APUs is a niche in which AMD is uniquely competitive, and one thing we've seen recently is that AMD's management targets these sectors.

I suppose that Intel can assemble an APU with external GPU [Nvidia ?] by using interposer tech or eventually EMIB tech. Starting from behind however and why?

richaron · May 3, 2017

maddie said:
If any RR APU has HBCC onboard and working, then it must have HBM2, as HBCC cannot work with system memory bandwidth.

Yeah again I'm going to call you out. I don't think there is any solid basis for this claim. My opinion is that this tech is much more versatile than you think.

maddie · May 3, 2017

richaron said:
[QUOTE="maddie, post: 38876776, member: 278074"]The total bandwidth requirement of on card GPU memory is HIGHER, as it needs to both supply the GPU, but also refresh from main memory as needed.

If you are correct I suspect with respect to PCIe data transfer the higher local bandwidth requirements you are talking about is negligible to the point of non-existent. But I'll accept you may have a point, I'll need more proof though.

I also suspect if the GPU calls for data over PCIe (via the HBCC) it will bypass local memory ("HBC") on it's first pass, i.e. I'd guess the HBCC can address both local memory and PCIe data:
GPU -> HBCC -> "HBC" -> HBCC -> GPU
& also:
GPU -> HBCC -> "PCIe data" -> HBCC -> GPU

What you are suggesting would mean that:
GPU -> HBCC -> HBC -> PCIe data -> HBC -> HBCC -> GPU
or maybe something like:
GPU -> HBCC -> HBC -> HBCC -> PCIe data -> HBCC -> HBC -> GPU??

Or maybe you are imagining something like:
GPU -> HBC -> HBCC -> PCIe data -> ...
But this situation is impossible because local memory ("HBC" or DDR5/5x/6/3/...) is essentially a slave and isn't smart enough to offload calls under necessary situations.[/QUOTE]
I agree that the PCIe data transfers to update the HBC will be a small fraction of the GPU requirement.

The HBCC will try to prevent any direct PCIe to GPU data flow as this brings us back to the present when we exceed the card frame buffer and get stuttering.
The goal is to keep all needed data in the HBC by updating constantly.
I see the HBCC as the center control node of a star network with the HBC as the buffer. You always try to keep your buffer filled with relevant data.
In games with sudden jumps to a totally new location, we might see a hiccup.

One intriguing possibility is if the HBM2 stacks in the HBC can be used to both read and write simultaneously and not by rapid interleaving. Data could be streaming in relatively slowly as the GPU is being fed constantly.

coercitiv · May 3, 2017

richaron said:
Yeah again I'm going to call you out. I don't think there is any solid basis for this claim. My opinion is that this tech is much more versatile than you think.

I'm confused - are we discussing based on shared technical details about the way HBCC works in the absence of HBC or just opinions about how it may work? It's fine either way, but if it's just opinions please do mention this clearly, since these replies are being used as technical arguments.

maddie · May 3, 2017

richaron said:
Yeah again I'm going to call you out. I don't think there is any solid basis for this claim. My opinion is that this tech is much more versatile than you think.

Obviously, I can't prove it, but the fundamentals point to this outcome.

I guess we'll just have to wait and see.

Shivansps · May 3, 2017

maddie said:
Then we would have little to speculate and small chances for discussions/arguments

What you're describing is technically easy to achieve. The impediments are mainly economic factors.

By the way, what's wrong with high-end APUs? You talk about your priority and others about what interests them.

My interest is that high-end APUs is a niche in which AMD is uniquely competitive, and one thing we've seen recently is that AMD's management targets these sectors.

I suppose that Intel can assemble an APU with external GPU [Nvidia ?] by using interposer tech or eventually EMIB tech. Starting from behind however and why?

Anything higher enters well over $200 area, makes sence for notebooks, and some niche aplications, not for consumer pcs.

Glo. · May 3, 2017

Shivansps said:
Anything higher enters well over $200 area, makes sence for notebooks, and some niche aplications, not for consumer pcs.

If you can get RX 470D level of performance, plus 4C/8T CPU on the same die rated at 95W TDP, much simpler computer and much more efficient then it does not make sense?

This is my dream. To build computer just from 7 parts: SSD, APU, RAM, MoBo, Case, PSU, Cooler, and still be able to run Overwatch and Heroes of the Storm in 1080p, epic settings at 60Hz.

Glo. · May 3, 2017

maddie said:
Glo said:
"Its the other way around. HBCC is there to limit the requirements for bandwidth, and control the streaming of data from any type of memory(Volatile, and non-volatile) to the GPU.
But I will go and research into this matter further."

"Its there to increase the performance of the GPU. HBCC is not only about memory bandwidth, its about also controlling the data streaming from any source to the GPU"

Whatever the architecture of a GPU, it needs a certain a certain data volume/sec to be fully utilized.
Call this F bandwidth.

Bandwidth on the video card is many times larger than to off the video card, to the extent that a modern GPU cannot be fed directly and run at full utilization by using off video card memory.

The accepted solution is to provide enough memory on the video card to satisfy this bandwidth requirement AND store all the data needed so as to eliminate any need to access off video card memory.

HBCC & HBC is another way to solve the problem of keeping a GPU fully fed with data.

The HBCC controls the flow of all data off the GPU die. It connects on card HBC [HBM2 stacks ], the GPU die and off card memory. The HBCC allows fast switching between those three units. Off card memory is normally system DDR3-4 memory, but can be direct to non-volatile storage.

It appears as a side effect that the GPU must have larger L2 caches for the moments when the HBC is being updated with new data.

In this system the video card memory must also be able to satisfy the bandwidth requirements of the GPU [ F bandwidth ], but also be able to import data that will be needed soon. The total bandwidth will be greater than F.

The total memory capacity needed will be LESS, as it refreshes on the fly and does not need to store the entire data requirement at once.

The total bandwidth requirement of on card GPU memory is HIGHER, as it needs to both supply the GPU, but also refresh from main memory as needed.

EDIT:

If any RR APU has HBCC onboard and working, then it must have HBM2, as HBCC cannot work with system memory bandwidth.

There still is RAM memory controller on the die. GPU is not separate part of the package. It has access to both HBM and RAM memory controllers.

So HBCC, will still have access, and will still work without HBM stacks.

I understand your point of view, and maybe you are right on this, that every Raven Ridge APU will need to have HBM2 on package.

sm625 · May 3, 2017

otinane said:
Maybe the APU with the Vega cores, won't be such competitive, if the big Vega looks like that:

http://wccftech.com/amd-radeon-vega-specs-leaked-linux-patch/

Code:

case CHIP_VEGA10: adev->gfx.config.max_shader_engines = 4; adev->gfx.config.max_tile_pipes = 8; adev->gfx.config.max_cu_per_sh = 16; adev->gfx.config.max_sh_per_se = 1; adev->gfx.config.max_backends_per_se = 4; adev->gfx.config.max_texture_channel_caches = 16; adev->gfx.config.max_gprs = 256; adev->gfx.config.max_gs_threads = 32; adev->gfx.config.max_hw_contexts = 8;

lol I posted a comment on there saying

Code:

    case CHIP_VEGA10:
    adev->gfx.config.max_shader_engines = TONS;
    adev->gfx.config.max_tile_pipes = TONS;
    adev->gfx.config.max_cu_per_sh = TONS;
    adev->gfx.config.max_sh_per_se = TONS;
    adev->gfx.config.max_backends_per_se = TONS;
    adev->gfx.config.max_texture_channel_caches = TONS;
    adev->gfx.config.max_gprs = TONS;
    adev->gfx.config.max_gs_threads = TONS;
    adev->gfx.config.max_hw_contexts = TONS;
    adev->gfx.faster.than_GTX1080 = FALSE;

and it got modded out. 200 pictures of fat ladies on carts get through and they mod me out. I dont get that site.

sm625 · May 3, 2017

Glo. said:
If you can get RX 470D level of performance, plus 4C/8T CPU on the same die rated at 95W TDP, much simpler computer and much more efficient then it does not make sense?

This is my dream. To build computer just from 7 parts: SSD, APU, RAM, MoBo, Case, PSU, Cooler, and still be able to run Overwatch and Heroes of the Storm in 1080p, epic settings at 60Hz.

Why not have the NAND on one of those HBM stacks? Then you can have 3 RAM stacks at 4GB each, and one stack of HBF at 256GB. Then all you need is the board, case, and PSU.

richaron · May 3, 2017

coercitiv said:
I'm confused - are we discussing based on shared technical details about the way HBCC works in the absence of HBC or just opinions about how it may work? It's fine either way, but if it's just opinions please do mention this clearly, since these replies are being used as technical arguments.

I'm not claiming to know all, in fact I would be happy to be proven wrong. I deeply subscribe to the scientific method and learning as much as I can. So if I said something stupid please point it out.

But don't call me out when I made it clear it was my opinion in the original post. I know it was clearly my opinion because I said "My opinion is..."

richaron · May 3, 2017

I've been talking about how much influence "HBC" and HBM2 has on data access over PCIe. Maybe these pics will be helpful in explaining why I think there is no correlation.

coercitiv · May 3, 2017

richaron said:
I'm not claiming to know all, in fact I would be happy to be proven wrong. I deeply subscribe to the scientific method and learning as much as I can. So if I said something stupid please point it out.

But don't call me out when I made it clear it was my opinion in the original post. I know it was clearly my opinion because I said "My opinion is..."

You ask me not to call you out, yet your original reply was "calling out" maddie because his claim had "no solid basis". How can it have no solid basis when both of you are just expressing opinions based on this tech's probable behavior? (personally I see both your arguments having equal footing with the little info we have - the idea that a cache controller may not bring much to the table in the absence of the cache itself does not seem more outrageous than the opposite)

Personally I think it's very odd for a component designed for a specific purpose (cache controller) to also bring a significant performance improvement even when that specific purpose cannot be served (cache is not there). That would make the said integrated component so valuable on it's own that it would incentivize both GPU makers to implement it ASAP, since that part of silicon would bring huge perf/w and cost improvements on it's own.

That having been said, can we get past this "calling out" thing? I already stated I'm fine with opinions and speculations over (known) facts, I fully understand that's all we have to go on so far, and I welcome the discussion.

richaron · May 3, 2017

coercitiv said:
*SNIP*

Mate you are way off here. I make an effort to make disclaimers as much as possible. As I replied to your nonsense post a few above here I clearly stated "My opinion is...". And that post was again in response to this claim:

maddie said:
If any RR APU has HBCC onboard and working, then it must have HBM2, as HBCC cannot work with system memory bandwidth.

This has no disclaimers and no claims of being an opinion. But I think it's pretty clear this is flat out wrong. It's odd you have double standards and don't have a go at them for making such a claim without proof or disclaimers.

maddie · May 3, 2017

Glo. said:
There still is RAM memory controller on the die. GPU is not separate part of the package. It has access to both HBM and RAM memory controllers.

So HBCC, will still have access, and will still work without HBM stacks.

I understand your point of view, and maybe you are right on this, that every Raven Ridge APU will need to have HBM2 on package.

To be clear, I only mean the RR APUs that cannot be satisfied with DDR4 bandwidth, which would mean more than 6-8 CU units. Once you go past that graphics level, you would need HBM2 OR go quad channel DDR4, which I don't see happening.

maddie · May 3, 2017

richaron said:
Mate you are way off here. I make an effort to make disclaimers as much as possible. As I replied to your nonsense post a few above here I clearly stated "My opinion is...". And that post was again in response to this claim:

This has no disclaimers and no claims of being an opinion. But I think it's pretty clear this is flat out wrong. It's odd you have double standards and don't have a go at them for making such a claim without proof or disclaimers.

We don't need to get upset.

To be clear, it is my deduced opinion. Sorry for the misunderstanding.

HBCC cannot increase the available bandwidth only efficiently manage what's available. There is no magic here, DDR4 has a certain bandwidth value and having a HBCC will not make more appear. If the # of CU present need more, you're stuck with poor usage.

Shivansps · May 3, 2017

Glo. said:
If you can get RX 470D level of performance, plus 4C/8T CPU on the same die rated at 95W TDP, much simpler computer and much more efficient then it does not make sense?

This is my dream. To build computer just from 7 parts: SSD, APU, RAM, MoBo, Case, PSU, Cooler, and still be able to run Overwatch and Heroes of the Storm in 1080p, epic settings at 60Hz.

How they are going to archive that exactly? RX470D is over 100W TDP for the GPU alone, you want a 4/8 + RX470 + HBM stack for 95W?? I really dont think that is tecnically possible. MB VRMs might also be a problem, they were already a problem for FM2.

Also i dont think they can offer an APU like that for less than $300, remember that they need to make money at the end of the day. They also have Microsoft and Sony pressure NOT to do something like that, console killer APU for less money than a console? yeah, not gonna happen.

coercitiv · May 3, 2017

richaron said:
Mate you are way off here. I make an effort to make disclaimers as much as possible. As I replied to your nonsense post...

Ok, mate. Let's close this right here.

sm625 · May 3, 2017

Shivansps said:
How they are going to archive that exactly? RX470D is over 100W TDP for the GPU alone, you want a 4/8 + RX470 + HBM stack for 95W?? I really dont think that is tecnically possible. MB VRMs might also be a problem, they were already a problem for FM2.

It is very possible. RX470 is operating way up near the end of the voltage curve. You could easily clock that chip down to 70 watts and still get 80% of the performance. With some simple tweaking at the fab they could lower the sweet spot to around 60 watts and net 80% of a RX470's GFLOPs. So that gives you 25 watts for the CPU and 10 for the HBM. The CPU and GPU could also swap another 20-30 watts of TDP depending on load. All of this is very feasible.

AMD Raven Ridge 'Zen APU' Thread

Member

Diamond Member

Diamond Member

Diamond Member

Golden Member

Golden Member

Diamond Member

Golden Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Golden Member

Golden Member

Diamond Member

Golden Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member