Discussion RDNA4 + CDNA3 Architectures Thread

DisEnchantment · Mar 23, 2022

With the GFX940 patches in full swing since first week of March, it is looking like MI300 is not far in the distant future!
Usually AMD takes around 3Qs to get the support in LLVM and amdgpu. Lately, since RDNA2 the window they push to add support for new devices is much reduced to prevent leaks.
But looking at the flurry of code in LLVM, it is a lot of commits. Maybe because US Govt is starting to prepare the SW environment for El Capitan (Maybe to avoid slow bring up situation like Frontier for example)

See here for the GFX940 specific commits

History for llvm/lib/Target/AMDGPU - llvm/llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies. - History for llvm/lib/Target/AMDGPU - llvm/llvm-project

github.com

Or Phoronix

More AMD "GFX940" Enablement Work Landing In LLVM - Phoronix

www.phoronix.com

There is a lot more if you know whom to follow in LLVM review chains (before getting merged to github), but I am not going to link AMD employees.

I am starting to think MI300 will launch around the same time like Hopper probably only a couple of months later!
Although I believe Hopper had problems not having a host CPU capable of doing PCIe 5 in the very near future therefore it might have gotten pushed back a bit until SPR and Genoa arrives later in 2022.
If PVC slips again I believe MI300 could launch before it

This is nuts, MI100/200/300 cadence is impressive.

Previous thread on CDNA2 and RDNA3 here

Question - Speculation: RDNA3 + CDNA2 Architectures Thread

Man I have been dying to make this one for a while now. First rumours for RDNA3 are here so new thread time! Just going to start off with this one for now: kopite7kimi on Twitter: "@VideoCardz Ah, I mean a simple mcm design with 10240 cores is not enough. Because the lift from RDNA2 to RDNA3...

forums.anandtech.com

Frenetic Pony · Aug 6, 2023

Heartbreaker said:
Maybe it's a nice change to ridiculously under hype this time.

Looking back at the early RDNA 3 rumors it was going to be a massive 2.7X performance jump, and we can see how accurate that was...

I expect these early RDNA 4 rumors to be just as ridiculously wrong...

Knew an engineer for AMD online a bit ago. I assume he was an engineer, either way he knew everything about RDNA3 and 2 before they came out. He even nailed the 7950(xtx or whatever) coming out this year, so his info was still relevant.

Anyway he claimed RDNA4 was on track for all chiplets as early as the start of this year. I could see some leaker getting confused by there only being 1 compute die, say at 40CU, and assuming that it was mid range only. When instead this will just be stacked up 40/80/120 with 128/256/384bit bus chiplets.

The weird thing will be how they handle the cache hierarchy. With RDNA3 L1 is per "shader engine" (CU shared block), L2 sits on die and is shared, and l3 sits on each separate PHY chiplet. But now it's all chiplets, trace length goes over the interposer no matter what you do once you go off L1/chiplets. I suppose L2 is pretty small, so they could just eliminate it, up L1 size, and move the L3 "infinity cache" to be L2.

That being said, with 8gb no longer being "relevant" I wonder if they'll try for something weird like a 96bit bus for the baseline one just to get 12gb as minimum. At GDDR7 speeds the lower bus size should be irrelevant, at least for 40CU or so.

Ajay · Aug 6, 2023

Frenetic Pony said:
He even nailed the 7950(xtx or whatever) coming out this year, so his info was still relevant.

How did he 'nail' a GPU that isn't even out yet?

TESKATLIPOKA · Aug 7, 2023

Frenetic Pony said:
That being said, with 8gb no longer being "relevant" I wonder if they'll try for something weird like a 96bit bus for the baseline one just to get 12gb as minimum.

You can't have 12GB Vram with 96-bit unless It's clamshell(memory chips from both sides of PCB), but that's more expensive so unlikely. Even If GDDR7 had 24gbit chips, you would end up with 9GB Vram and that's a also not enough.

Frenetic Pony said:
At GDDR7 speeds the lower bus size should be irrelevant, at least for 40CU or so.

A 40CU RDNA4 with ~40% higher clocks(3.5GHz) would have a problem with only 96-bit bus even If they used expensive 32gbps GDDR7 chips. It would have the same BW as RX 6700XT, which also had 40CU but at lower clocks.

Frenetic Pony · Aug 7, 2023

Ajay said:
How did he 'nail' a GPU that isn't even out yet?

It leaked about a month ago in driver updates

TESKATLIPOKA · Aug 7, 2023

Frenetic Pony said:
It leaked about a month ago in driver updates

Specs are given, only clocks can be higher.
I don't think they magically fixed It, so 400-450W TBP with higher clocks most likely.
Perf/W will be very bad.

Joe NYC · Aug 7, 2023

\

adroc_thurston said:
That's now what they were planning to do.
Just look at MI300.

I wonder if AMD is planning on some sort of unified approach that is low cost (perhaps RDL) to connect chiplets, in order to be able to interchangeably add CPU chiplets, separate GPU chiplets (that can also be used in future APUs), optionally some sort of AI chiplet and memory controllers of different capabilities.

Because, that is what Intel is doing with MTL and ARL, and AMD can't be left behind in the graphics capability of their desktop and notebook APUs.

Maybe a more comprehensive implementation of Strix Halo where both CPU and GPU chiplets are fully interchangeable and standard. And then, these GPU chiplet can also be used in dGPUs, dropping CPU chiplet and changing I/O chiplet.

In terms of RDNA2, having Navi 23 and Navi 24 level performance as 2 levels of GPU chiplets that AMD can add to their CPU (APU) but also be able to use it in a standalone graphics card would save AMD a lot of design resources and duplication...

eek2121 · Aug 7, 2023

TESKATLIPOKA said:
Specs are given, only clocks can be higher.
I don't think they magically fixed It, so 400-450W TBP with higher clocks most likely.
Perf/W will be very bad.

Basically all the 6950X did. The 7900XTX actually does scale with power (folks have been able to overclock these cards to 3+ ghz) so bumping things up a bit will lead to a faster product.

Frenetic Pony said:
Knew an engineer for AMD online a bit ago. I assume he was an engineer, either way he knew everything about RDNA3 and 2 before they came out. He even nailed the 7950(xtx or whatever) coming out this year, so his info was still relevant.

Anyway he claimed RDNA4 was on track for all chiplets as early as the start of this year. I could see some leaker getting confused by there only being 1 compute die, say at 40CU, and assuming that it was mid range only. When instead this will just be stacked up 40/80/120 with 128/256/384bit bus chiplets.

The weird thing will be how they handle the cache hierarchy. With RDNA3 L1 is per "shader engine" (CU shared block), L2 sits on die and is shared, and l3 sits on each separate PHY chiplet. But now it's all chiplets, trace length goes over the interposer no matter what you do once you go off L1/chiplets. I suppose L2 is pretty small, so they could just eliminate it, up L1 size, and move the L3 "infinity cache" to be L2.

That being said, with 8gb no longer being "relevant" I wonder if they'll try for something weird like a 96bit bus for the baseline one just to get 12gb as minimum. At GDDR7 speeds the lower bus size should be irrelevant, at least for 40CU or so.

I casually thought that as well. AMD may have simply selected the GCD to use for its products, and the N43/44 GCDs were “good enough”. Assuming they got multi-GCD working properly they could tile 2-3 of them together. Don’t get your hopes up, however.

The alternative is that they are focusing on creating an awesome midrange card/architecture to sell at a great price. The reason everyone remembers polaris is that it had staying power and (eventually) a good price. The 5700XT is another great example of this. Despite launching in 2019 it beats the 3060 in 1080/1440p gaming. To this day the 5700XT is a decent card for midrange gaming.

Ajay · Aug 7, 2023

TESKATLIPOKA said:
Specs are given, only clocks can be higher.
I don't think they magically fixed It, so 400-450W TBP with higher clocks most likely.
Perf/W will be very bad.

Oof! I thought I read voltage scaling was terrible w/N31. Guess AMD can bin out the chips with the best parametrics.

Edit: looks like I remembered incorrectly based on eek2121s post.

SteinFG · Aug 7, 2023

Frenetic Pony said:
It leaked about a month ago in driver updates

If i remember, it was a commit to an open-sourse driver from a person not affiliated with amd. And amd people didn't merge it. Basically anyone can create this "leak". Correct me if i'm wrong

adroc_thurston · Aug 7, 2023

Joe NYC said:
I wonder if AMD is planning on some sort of unified approach that is low cost (perhaps RDL) to connect chiplets, in order to be able to interchangeably add CPU chiplets, separate GPU chiplets (that can also be used in future APUs), optionally some sort of AI chiplet and memory controllers of different capabilities

No.
Tiled dGPs are explicitly about very purpose-built everything with minimal reuse between the parts.
They're about winning more.

Dribble · Aug 7, 2023

uzzi38 said:
They had big plans and cancelled them because they felt they didn't have the time to chase them.

There are potentially two reasons why:

1. Validation for what they were trying to do would bring the halo parts too close to RDNA5 in terms of timescale.

2. They're giving up (or if you want to phrase it nicely, deprioritising) desktop graphics.

If you want to ask me which one I think it is, then to put it kindly, I'm leaning towards the second personally.

I could VERY easily be wrong, but this is the way I'm leaning for now.

What has Lisa been talking about for the last few months - AI. She will have gone to the gpu dept and said I want AI cards as priority number one! They will be reprioritizing and cutting back other projects. An obvious one in that it makes them little money is high end consumer gpu's.

adroc_thurston · Aug 7, 2023

Dribble said:
What has Lisa been talking about for the last few months - AI. She will have gone to the gpu dept and said I want AI cards, I don't care what you do. They will be reprioritizing and cutting back other projects. An obvious one in that it makes them little money is high end consumer gpu's

That's not what is happening.
MI stuff lives on its own roadmap.

Dribble · Aug 7, 2023

adroc_thurston said:
That's not what is happening.
MI stuff lives on its own roadmap.

A roadmap is only a power point slide until you have people working to make it happen. They only have so many engineers, they can't do everything. AI is where the money is right now. Those engineers will go from desktop gpu's to AI chips. Nvidia and Intel are probably doing the same thing, but it'll hit AMD the hardest.

adroc_thurston · Aug 7, 2023

Dribble said:
They only have so many engineers, they can't do everything

Yes they can.
Lol.
Roadmaps don't exist in vacuum.

Dribble said:
but it'll hit AMD the hardest

That's just naive.
RDNA4 had 5 parts, and so does RDNA5.

GodisanAtheist · Aug 7, 2023

adroc_thurston said:
Yes they can.
Lol.
Roadmaps don't exist in vacuum.

That's just naive.
RDNA4 had 5 parts, and so does RDNA5.

- "Yes AMD can do everything, but in the very same post I'm also saying they had to cut parts from RDNA 4 because they can't do everything"

That's some Level 9 Black Belt Logic-Fu there.

adroc_thurston · Aug 7, 2023

GodisanAtheist said:
Yes AMD can do everything, but in the very same post I'm also saying they had to cut parts from RDNA 4 because they can't do everything

Validating tiled GPUs for graphics would just take way too much time so things got deferred until RDNA5.
Not that they can't, just that it would take too long given the mess of 'modern' graphics APIs.

maddie · Aug 7, 2023

GodisanAtheist said:
- "Yes AMD can do everything, but in the very same post I'm also saying they had to cut parts from RDNA 4 because they can't do everything"

That's some Level 9 Black Belt Logic-Fu there.

That's some weak reasoning. Throwing more people is not always a solution. Time, the one thing you can't multiply for critical development pathways.

Ajay · Aug 7, 2023

maddie said:
That's some weak reasoning. Throwing more people is not always a solution. Time, the one thing you can't multiply for critical development pathways.

Absolutely true! You can add some engineers, particularly well experienced ones - but you can't add 50 engineers to a 100 engineer project - too much time is wasted bringing the new engineers up to speed so it doesn't work. Companies can get their engineers to work an extra 20 hours a week though, as I've experienced, that help a bit. Still, in hardware there are things that can't easily be sped up on the fly. They've got X amount of compute power dedicated to that project for simulations. Even AMD can't just 'magically' make more servers appear. Don't really know what the problem was, lots of ways to get behind and @adroc_thurston indicates that solving the problem of non-parallel code in DX APIs (obviously ones that are used frequently) are a ball buster.

Anyways, there are ways to de-serialize serialize code, very clear tricks that work to various degrees. But, AMD doesn't own DX11 or DX12, so that's in MS's court. I would hope there is a better plan going forward - but there are a ton of games on DX11 & 12.

Joe NYC · Aug 7, 2023

Ajay said:
Anyways, there are ways to de-serialize serialize code, very clear tricks that work to various degrees. But, AMD doesn't own DX11 or DX12, so that's in MS's court. I would hope there is a better plan going forward - but there are a ton of games on DX11 & 12.

We will see how much leverage AMD has, to make it happen....

Ajay · Aug 7, 2023

Joe NYC said:
We will see how much leverage AMD has, to make it happen....

It not just AMD, but at some point, Nvidia as well.

Joe NYC · Aug 7, 2023

adroc_thurston said:
No.
Tiled dGPs are explicitly about very purpose-built everything with minimal reuse between the parts.
They're about winning more.

We will see what the future brings.

AMD seems to be starting a new line of Chiplet APUs with Strix Halo. Between AMD and Intel, these chips with powerful GPUs may take over the market for x5x and aiming for up to x6x level of performance of dGPU in next couple of generations. Depending on how customers and OEMs respond...

So the market may start to shift in direction of iGPU chiplet / tile, and dGPU may just end up an extended version of that.

Joe NYC · Aug 7, 2023

Ajay said:
It not just AMD, but at some point, Nvidia as well.

For now, NVidia is probably more interested in putting up roadblocks, preventing more efficient utilization of chiplet architecture, than support it. As long as NVidia continues with monolithic chips - if the benefit disproportionately helps chiplet architectures.

Ajay · Aug 7, 2023

Joe NYC said:
For now, NVidia is probably more interested in putting up roadblocks, preventing more efficient utilization of chiplet architecture, than support it. As long as NVidia continues with monolithic chips - if the benefit disproportionately helps chiplet architectures.

What roadblocks? Devs - please use this terribly optimized API - here's some cash?? Hey Microsoft - please don't don't improve DX12/DX_Next for your consoles (the chips being made by AMD). NV may have a bunch of patents around this stuff, but that's pretty standard.

The only roadblock right now is limited packaging production for H100/H200. Not sure if that's affecting AMD Mi300s. But, that's not consumer chips.

SolidQ · Aug 8, 2023

From RedTechGaming video

Saylick · Aug 8, 2023

SolidQ said:
From RedTechGaming video
View attachment 84180

Does RGT have real sources? I'm not involved with the #silicongang Discord channels where I'm confident the leakers discuss rumors, but I wouldn't be surprised if he just lurks in those channels and then repackages what he reads to his YT channel.

Discussion RDNA4 + CDNA3 Architectures Thread

Golden Member

Senior member

Lifer

Platinum Member

Senior member

Platinum Member

Platinum Member

Diamond Member

Lifer

Senior member

Diamond Member

Platinum Member

Diamond Member

Platinum Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Lifer

Platinum Member

Lifer

Platinum Member

Platinum Member

Lifer

Senior member

Diamond Member