Discussion RDNA4 + CDNA3 Architectures Thread

Page 17 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

DisEnchantment

Golden Member
Mar 3, 2017
1,747
6,598
136





With the GFX940 patches in full swing since first week of March, it is looking like MI300 is not far in the distant future!
Usually AMD takes around 3Qs to get the support in LLVM and amdgpu. Lately, since RDNA2 the window they push to add support for new devices is much reduced to prevent leaks.
But looking at the flurry of code in LLVM, it is a lot of commits. Maybe because US Govt is starting to prepare the SW environment for El Capitan (Maybe to avoid slow bring up situation like Frontier for example)

See here for the GFX940 specific commits
Or Phoronix

There is a lot more if you know whom to follow in LLVM review chains (before getting merged to github), but I am not going to link AMD employees.

I am starting to think MI300 will launch around the same time like Hopper probably only a couple of months later!
Although I believe Hopper had problems not having a host CPU capable of doing PCIe 5 in the very near future therefore it might have gotten pushed back a bit until SPR and Genoa arrives later in 2022.
If PVC slips again I believe MI300 could launch before it

This is nuts, MI100/200/300 cadence is impressive.



Previous thread on CDNA2 and RDNA3 here

 
Last edited:

Frenetic Pony

Senior member
May 1, 2012
218
179
116
Maybe it's a nice change to ridiculously under hype this time.

Looking back at the early RDNA 3 rumors it was going to be a massive 2.7X performance jump, and we can see how accurate that was...

I expect these early RDNA 4 rumors to be just as ridiculously wrong...

Knew an engineer for AMD online a bit ago. I assume he was an engineer, either way he knew everything about RDNA3 and 2 before they came out. He even nailed the 7950(xtx or whatever) coming out this year, so his info was still relevant.

Anyway he claimed RDNA4 was on track for all chiplets as early as the start of this year. I could see some leaker getting confused by there only being 1 compute die, say at 40CU, and assuming that it was mid range only. When instead this will just be stacked up 40/80/120 with 128/256/384bit bus chiplets.

The weird thing will be how they handle the cache hierarchy. With RDNA3 L1 is per "shader engine" (CU shared block), L2 sits on die and is shared, and l3 sits on each separate PHY chiplet. But now it's all chiplets, trace length goes over the interposer no matter what you do once you go off L1/chiplets. I suppose L2 is pretty small, so they could just eliminate it, up L1 size, and move the L3 "infinity cache" to be L2.

That being said, with 8gb no longer being "relevant" I wonder if they'll try for something weird like a 96bit bus for the baseline one just to get 12gb as minimum. At GDDR7 speeds the lower bus size should be irrelevant, at least for 40CU or so.
 

TESKATLIPOKA

Platinum Member
May 1, 2020
2,508
3,011
136
That being said, with 8gb no longer being "relevant" I wonder if they'll try for something weird like a 96bit bus for the baseline one just to get 12gb as minimum.
You can't have 12GB Vram with 96-bit unless It's clamshell(memory chips from both sides of PCB), but that's more expensive so unlikely. Even If GDDR7 had 24gbit chips, you would end up with 9GB Vram and that's a also not enough.
At GDDR7 speeds the lower bus size should be irrelevant, at least for 40CU or so.
A 40CU RDNA4 with ~40% higher clocks(3.5GHz) would have a problem with only 96-bit bus even If they used expensive 32gbps GDDR7 chips. It would have the same BW as RX 6700XT, which also had 40CU but at lower clocks.
 
Last edited:

Joe NYC

Platinum Member
Jun 26, 2021
2,487
3,386
106
\
That's now what they were planning to do.
Just look at MI300.

I wonder if AMD is planning on some sort of unified approach that is low cost (perhaps RDL) to connect chiplets, in order to be able to interchangeably add CPU chiplets, separate GPU chiplets (that can also be used in future APUs), optionally some sort of AI chiplet and memory controllers of different capabilities.

Because, that is what Intel is doing with MTL and ARL, and AMD can't be left behind in the graphics capability of their desktop and notebook APUs.

Maybe a more comprehensive implementation of Strix Halo where both CPU and GPU chiplets are fully interchangeable and standard. And then, these GPU chiplet can also be used in dGPUs, dropping CPU chiplet and changing I/O chiplet.

In terms of RDNA2, having Navi 23 and Navi 24 level performance as 2 levels of GPU chiplets that AMD can add to their CPU (APU) but also be able to use it in a standalone graphics card would save AMD a lot of design resources and duplication...
 

eek2121

Diamond Member
Aug 2, 2005
3,100
4,398
136
Specs are given, only clocks can be higher.
I don't think they magically fixed It, so 400-450W TBP with higher clocks most likely.
Perf/W will be very bad.
Basically all the 6950X did. The 7900XTX actually does scale with power (folks have been able to overclock these cards to 3+ ghz) so bumping things up a bit will lead to a faster product.

Knew an engineer for AMD online a bit ago. I assume he was an engineer, either way he knew everything about RDNA3 and 2 before they came out. He even nailed the 7950(xtx or whatever) coming out this year, so his info was still relevant.

Anyway he claimed RDNA4 was on track for all chiplets as early as the start of this year. I could see some leaker getting confused by there only being 1 compute die, say at 40CU, and assuming that it was mid range only. When instead this will just be stacked up 40/80/120 with 128/256/384bit bus chiplets.

The weird thing will be how they handle the cache hierarchy. With RDNA3 L1 is per "shader engine" (CU shared block), L2 sits on die and is shared, and l3 sits on each separate PHY chiplet. But now it's all chiplets, trace length goes over the interposer no matter what you do once you go off L1/chiplets. I suppose L2 is pretty small, so they could just eliminate it, up L1 size, and move the L3 "infinity cache" to be L2.

That being said, with 8gb no longer being "relevant" I wonder if they'll try for something weird like a 96bit bus for the baseline one just to get 12gb as minimum. At GDDR7 speeds the lower bus size should be irrelevant, at least for 40CU or so.
I casually thought that as well. AMD may have simply selected the GCD to use for its products, and the N43/44 GCDs were “good enough”. Assuming they got multi-GCD working properly they could tile 2-3 of them together. Don’t get your hopes up, however.

The alternative is that they are focusing on creating an awesome midrange card/architecture to sell at a great price. The reason everyone remembers polaris is that it had staying power and (eventually) a good price. The 5700XT is another great example of this. Despite launching in 2019 it beats the 3060 in 1080/1440p gaming. To this day the 5700XT is a decent card for midrange gaming.
 
Reactions: Tlh97

Ajay

Lifer
Jan 8, 2001
16,094
8,109
136
Specs are given, only clocks can be higher.
I don't think they magically fixed It, so 400-450W TBP with higher clocks most likely.
Perf/W will be very bad.
Oof! I thought I read voltage scaling was terrible w/N31. Guess AMD can bin out the chips with the best parametrics.

Edit: looks like I remembered incorrectly based on eek2121s post.
 

adroc_thurston

Diamond Member
Jul 2, 2023
3,549
5,116
96
I wonder if AMD is planning on some sort of unified approach that is low cost (perhaps RDL) to connect chiplets, in order to be able to interchangeably add CPU chiplets, separate GPU chiplets (that can also be used in future APUs), optionally some sort of AI chiplet and memory controllers of different capabilities
No.
Tiled dGPs are explicitly about very purpose-built everything with minimal reuse between the parts.
They're about winning more.
 

Dribble

Platinum Member
Aug 9, 2005
2,076
611
136
They had big plans and cancelled them because they felt they didn't have the time to chase them.

There are potentially two reasons why:

1. Validation for what they were trying to do would bring the halo parts too close to RDNA5 in terms of timescale.

2. They're giving up (or if you want to phrase it nicely, deprioritising) desktop graphics.

If you want to ask me which one I think it is, then to put it kindly, I'm leaning towards the second personally.

I could VERY easily be wrong, but this is the way I'm leaning for now.
What has Lisa been talking about for the last few months - AI. She will have gone to the gpu dept and said I want AI cards as priority number one! They will be reprioritizing and cutting back other projects. An obvious one in that it makes them little money is high end consumer gpu's.
 

adroc_thurston

Diamond Member
Jul 2, 2023
3,549
5,116
96
What has Lisa been talking about for the last few months - AI. She will have gone to the gpu dept and said I want AI cards, I don't care what you do. They will be reprioritizing and cutting back other projects. An obvious one in that it makes them little money is high end consumer gpu's
That's not what is happening.
MI stuff lives on its own roadmap.
 

Dribble

Platinum Member
Aug 9, 2005
2,076
611
136
That's not what is happening.
MI stuff lives on its own roadmap.
A roadmap is only a power point slide until you have people working to make it happen. They only have so many engineers, they can't do everything. AI is where the money is right now. Those engineers will go from desktop gpu's to AI chips. Nvidia and Intel are probably doing the same thing, but it'll hit AMD the hardest.
 
Reactions: xpea and Tlh97

adroc_thurston

Diamond Member
Jul 2, 2023
3,549
5,116
96
Yes AMD can do everything, but in the very same post I'm also saying they had to cut parts from RDNA 4 because they can't do everything
Validating tiled GPUs for graphics would just take way too much time so things got deferred until RDNA5.
Not that they can't, just that it would take too long given the mess of 'modern' graphics APIs.
 

maddie

Diamond Member
Jul 18, 2010
4,881
4,951
136
- "Yes AMD can do everything, but in the very same post I'm also saying they had to cut parts from RDNA 4 because they can't do everything"

That's some Level 9 Black Belt Logic-Fu there.
That's some weak reasoning. Throwing more people is not always a solution. Time, the one thing you can't multiply for critical development pathways.
 

Ajay

Lifer
Jan 8, 2001
16,094
8,109
136
That's some weak reasoning. Throwing more people is not always a solution. Time, the one thing you can't multiply for critical development pathways.
Absolutely true! You can add some engineers, particularly well experienced ones - but you can't add 50 engineers to a 100 engineer project - too much time is wasted bringing the new engineers up to speed so it doesn't work. Companies can get their engineers to work an extra 20 hours a week though, as I've experienced, that help a bit. Still, in hardware there are things that can't easily be sped up on the fly. They've got X amount of compute power dedicated to that project for simulations. Even AMD can't just 'magically' make more servers appear. Don't really know what the problem was, lots of ways to get behind and @adroc_thurston indicates that solving the problem of non-parallel code in DX APIs (obviously ones that are used frequently) are a ball buster.

Anyways, there are ways to de-serialize serialize code, very clear tricks that work to various degrees. But, AMD doesn't own DX11 or DX12, so that's in MS's court. I would hope there is a better plan going forward - but there are a ton of games on DX11 & 12.
 

Joe NYC

Platinum Member
Jun 26, 2021
2,487
3,386
106
Anyways, there are ways to de-serialize serialize code, very clear tricks that work to various degrees. But, AMD doesn't own DX11 or DX12, so that's in MS's court. I would hope there is a better plan going forward - but there are a ton of games on DX11 & 12.

We will see how much leverage AMD has, to make it happen....
 

Joe NYC

Platinum Member
Jun 26, 2021
2,487
3,386
106
No.
Tiled dGPs are explicitly about very purpose-built everything with minimal reuse between the parts.
They're about winning more.
We will see what the future brings.

AMD seems to be starting a new line of Chiplet APUs with Strix Halo. Between AMD and Intel, these chips with powerful GPUs may take over the market for x5x and aiming for up to x6x level of performance of dGPU in next couple of generations. Depending on how customers and OEMs respond...

So the market may start to shift in direction of iGPU chiplet / tile, and dGPU may just end up an extended version of that.
 
Last edited:

Joe NYC

Platinum Member
Jun 26, 2021
2,487
3,386
106
It not just AMD, but at some point, Nvidia as well.
For now, NVidia is probably more interested in putting up roadblocks, preventing more efficient utilization of chiplet architecture, than support it. As long as NVidia continues with monolithic chips - if the benefit disproportionately helps chiplet architectures.
 

Ajay

Lifer
Jan 8, 2001
16,094
8,109
136
For now, NVidia is probably more interested in putting up roadblocks, preventing more efficient utilization of chiplet architecture, than support it. As long as NVidia continues with monolithic chips - if the benefit disproportionately helps chiplet architectures.
What roadblocks? Devs - please use this terribly optimized API - here's some cash?? Hey Microsoft - please don't don't improve DX12/DX_Next for your consoles (the chips being made by AMD). NV may have a bunch of patents around this stuff, but that's pretty standard.

The only roadblock right now is limited packaging production for H100/H200. Not sure if that's affecting AMD Mi300s. But, that's not consumer chips.
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |