Discussion RDNA4 + CDNA3 Architectures Thread

DisEnchantment · Mar 23, 2022

With the GFX940 patches in full swing since first week of March, it is looking like MI300 is not far in the distant future!
Usually AMD takes around 3Qs to get the support in LLVM and amdgpu. Lately, since RDNA2 the window they push to add support for new devices is much reduced to prevent leaks.
But looking at the flurry of code in LLVM, it is a lot of commits. Maybe because US Govt is starting to prepare the SW environment for El Capitan (Maybe to avoid slow bring up situation like Frontier for example)

See here for the GFX940 specific commits

History for llvm/lib/Target/AMDGPU - llvm/llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies. - History for llvm/lib/Target/AMDGPU - llvm/llvm-project

github.com

Or Phoronix

More AMD "GFX940" Enablement Work Landing In LLVM - Phoronix

www.phoronix.com

There is a lot more if you know whom to follow in LLVM review chains (before getting merged to github), but I am not going to link AMD employees.

I am starting to think MI300 will launch around the same time like Hopper probably only a couple of months later!
Although I believe Hopper had problems not having a host CPU capable of doing PCIe 5 in the very near future therefore it might have gotten pushed back a bit until SPR and Genoa arrives later in 2022.
If PVC slips again I believe MI300 could launch before it

This is nuts, MI100/200/300 cadence is impressive.

Previous thread on CDNA2 and RDNA3 here

Question - Speculation: RDNA3 + CDNA2 Architectures Thread

Man I have been dying to make this one for a while now. First rumours for RDNA3 are here so new thread time! Just going to start off with this one for now: kopite7kimi on Twitter: "@VideoCardz Ah, I mean a simple mcm design with 10240 cores is not enough. Because the lift from RDNA2 to RDNA3...

forums.anandtech.com

adroc_thurston · Jan 14, 2025

Keller_TT said:
I didn't want to get too technical with ray-triangle intersects and all

That's also not a real metric, you're rarely if ever limited by ray-tri hit throughput in video games.
RTRT is basically an exercise in building the lowest latency tree-walking machine.
I.e. we should retvrn to Larrabee.

Keller_TT · Jan 14, 2025

adroc_thurston said:
That's also not a real metric, you're rarely if ever limited by ray-tri hit throughput in video games.
RTRT is basically an exercise in building the lowest latency tree-walking machine.
I.e. we should retvrn to Larrabee.

Larrabee... That's a name that reverbs. I think I still have Larrabee technical papers uploaded in my cloud. There was talk that AMD could go that route with their Fusion APUs and I really dreaded it as I saw the whole thing as an Intel lock-in. Those were the days when Intel was at its anti-competitive best, and particularly against the green CPU boys.
AMD bet on OpenCL and it's just sad that it fell so far back with too many cooks.
But I'm in touch with Alma Mater fellows at Uni Heidelberg. They started this thing called hipSYCL(renamed OpenSYCL) and few other projects targeting AMD architecture to move beyond CUDA.

del42sa · Jan 14, 2025

https://wccftech.com/amd-delaying-r...s-eager-to-overshadow-nvidia-rtx-50-showcase/

Overshadowing Nvidia's showcase by....not showing performance numbers, release dates, price, or anything concrete

These guys are jeeneuses 🤦‍♂️🤦‍♂️

adroc_thurston · Jan 14, 2025

Keller_TT said:
AMD bet on OpenCL and it's just sad that it fell so far back with too many cooks

They actually bet on HSA but that gained zero traction (partly because AMD had no DC h/w to ship). By the time they had, ROCm aka not-CUDA was the only option.

del42sa said:
Overshadowing Nvidia's showcase by....not showing performance numbers, release dates, price, or anything concrete

NVidia showcase will be extremely mediocre gains outside of gb202 so anything AMD does might be good nuff.

eek2121 · Jan 14, 2025

adroc_thurston said:
That's also not a real metric, you're rarely if ever limited by ray-tri hit throughput in video games.
RTRT is basically an exercise in building the lowest latency tree-walking machine.
I.e. we should retvrn to Larrabee.

Larrabee would have been amazing if they had made it work.

Even now I wonder what a modern day version would look like.

Oh if I were a billionaire…

adroc_thurston · Jan 14, 2025

eek2121 said:
Larrabee would have been amazing if they had made it work.

Hell no, it's good at exactly one thing GPUs suck at (pointer chasing aka raytracing).

poke01 · Jan 14, 2025

I think the new spiderman game will have FSR4 as well cause R&C already has it.

eek2121 · Jan 14, 2025

adroc_thurston said:
Hell no, it's good at exactly one thing GPUs suck at (pointer chasing aka raytracing).

Come on, you gotta think bigger! If your GPU uses the same ISA as your CPU, why would you need a CPU? Imagine having 128-256 cores that you could dynamically allocate between graphics and non-graphics workloads. 🤣

I actually made a simple game engine that worked like this when I had my old 1950X system. It used all the cores and didn’t use a GPU at all. It could dynamically allocate as few or as many cores for logic as needed, the rest were for rendering. It was a prototype, of course, but the results were interesting. Forgot to back up the source code before FFR. 😭

coercitiv · Jan 14, 2025

eek2121 said:
Come on, you gotta think bigger! If your GPU uses the same ISA as your CPU, why would you need a CPU? Imagine having 128-256 cores that you could dynamically allocate between graphics and non-graphics workloads. 🤣

The best CPU among GPUs and the best GPU among CPUs.

Josh128 · Jan 14, 2025

So it seems, according to HXL on Xitter via Chiphell, that the yet unannounced RDNA 4 reveal / embargo lift is delayed again, to an undisclosed date. They say it is to let "Huang go first" again. Why though, if not to try and maximize pricing at the last minute? They already have 5000 series pricing and specs.

Seems they are back to their old trolling ways, thinking back to camping outside an Nvidia presentation with the RX 290X, and "Jebaiting" the price of the RX 5700XT, except even worse now, as they already have Nvidias pricing and specs.

CastleBravo · Jan 14, 2025

Josh128 said:
So it seems, according to HXL on Xitter via Chiphell, that the yet unannounced RDNA 4 reveal / embargo lift is delayed again, to an undisclosed date. They say it is to let "Huang go first" again. Why though, if not to try and maximize pricing at the last minute? They already have 5000 series pricing and specs.

Seems they are back to their old trolling ways, thinking back to camping outside an Nvidia presentation with the RX 290X, and "Jebaiting" the price of the RX 5700XT, except even worse now, as they already have Nvidias pricing and specs.

Bummer. Guess I'll try for a 5080 on launch day then.

Heartbreaker · Jan 14, 2025

Josh128 said:
So it seems, according to HXL on Xitter via Chiphell, that the yet unannounced RDNA 4 reveal / embargo lift is delayed again, to an undisclosed date. They say it is to let "Huang go first" again. Why though, if not to try and maximize pricing at the last minute? They already have 5000 series pricing and specs.

Seems they are back to their old trolling ways, thinking back to camping outside an Nvidia presentation with the RX 290X, and "Jebaiting" the price of the RX 5700XT, except even worse now, as they already have Nvidias pricing and specs.

IMO, this is nothing like that "Jebaited" nonsense. I don't see any trolling.

I think Frank Azor responded well in one of the last videos linked. They decided to wait and see what NVidia had, to better target there response.

That is just decent strategy when it's David vs Goliath. I don't blame them at all.

Their only mistake was not making that decision before CES. It's the last minute change that didn't look good.

Further, I could see them wanting a better idea of NVidias actual performance (since NVidia it's hidden with MFG 4X smoke and mirrors) before finalizing pricing.

Keller_TT · Jan 14, 2025

Amidst the RDNA4 non-launch, and that there's no top tier Navi4 board, I was thinking why didn't AMD do a refresh of 7900XTX by fixing the silicon glitches that plagued RDNA3? Release it as 7950XTX on N4P that runs more cooler, draws say 10% less power and adds 15% more performance for $800? It can still remain relevant vs 5080 with that VRAM, bandwidth.
N4P is part of the same 5nm design stack and enhances efficiency notably. They do refresh of refresh for CPUs with misleading names just to sell cheap dies with good margins.

CastleBravo · Jan 14, 2025

Keller_TT said:
Amidst the RDNA4 non-launch, and that there's no top tier Navi4 board, I was thinking why didn't AMD do a refresh of 7900XTX by fixing the silicon glitches that plagued RDNA3? Release it as 7950XTX on N4P that runs more cooler, draws say 10% less power and adds 15% more performance for $800? It can still remain relevant vs 5080 with that VRAM, bandwidth.
N4P is part of the same 5nm design stack and enhances efficiency notably. They do refresh of refresh for CPUs with misleading names just to sell cheap dies with good margins.

Wouldn't that be roughly the same amount of work as scaling RDNA4 up to 5-600mm^2?

coercitiv · Jan 14, 2025

CastleBravo said:
Wouldn't that be roughly the same amount of work as scaling RDNA4 up to 5-600mm^2?

For even less ROI.

Keller_TT · Jan 14, 2025

CastleBravo said:
Wouldn't that be roughly the same amount of work as scaling RDNA4 up to 5-600mm^2?

Is it really?
They're not going to redesign and tape out.. I would think it's more like a further stepping to prune the bugs after tape out. RDNA3 launch looked like they were caught out by some bugs at a late stage that they couldn't fix in time for launch and there was internal dishonesty about it before Lisa went on stage with her slides. That's what came out from the insiders apparently, and that N32 had no problems. Well, that was even worse and deserves to be binned.
N4P can do a shrink for an existing 5nm design.

CastleBravo · Jan 14, 2025

Keller_TT said:
Is it really?
They're not going to redesign and tape out.. I would think it's more like a further stepping to prune the bugs after tape out. RDNA3 launch looked like they were caught out by some bugs at a late stage that they couldn't fix in time for launch and there was internal dishonesty about it before Lisa went on stage with her slides. That's what came out from the insiders apparently, and that N32 had no problems. Well, that was even worse and deserves to be binned.
N4P can do a shrink for an existing 5nm design.

They can fix a bug in the architecture and port it to a new process without needing to tape out again?

Josh128 · Jan 14, 2025

Keller_TT said:
Is it really?
They're not going to redesign and tape out.. I would think it's more like a further stepping to prune the bugs after tape out. RDNA3 launch looked like they were caught out by some bugs at a late stage that they couldn't fix in time for launch and there was internal dishonesty about it before Lisa went on stage with her slides. That's what came out from the insiders apparently, and that N32 had no problems. Well, that was even worse and deserves to be binned.
N4P can do a shrink for an existing 5nm design.

100% much more complicated and expensive and time consuming than just making due with what was produced and focus all resources into the iterative architecture, which is what theyve done. No Halo card was coming regardless. People just dont buy them, they buy Nvidia instead.

Keller_TT · Jan 14, 2025

CastleBravo said:
They can fix a bug in the architecture and port it to a new process without needing to tape out again?

They're just moving to an improved 5nm process. It's not a whole new node step of a different generation. Intel would've called it 5+++.
It should be substantially more straight forward than even Zen to Zen+ from GloFo 14nm to 12nm because TSMC's process readiness and support is the industry's best.
You're making it look way bigger than it is.

blckgrffn · Jan 14, 2025

CastleBravo said:
Bummer. Guess I'll try for a 5080 on launch day then.

I have until Feb 4. That's when my 7900XTX return window closes.

This is getting silly. Has been silly? IDK.

SolidQ · Jan 14, 2025

blckgrffn said:
I have until Feb 4. That's when my 7900XTX return window closes.

Maybe they waiting RTX 5080/5090 reviews

Heartbreaker · Jan 14, 2025

blckgrffn said:
I have until Feb 4. That's when my 7900XTX return window closes.

This is getting silly. Has been silly? IDK.

I'm wishing I could return my 4070, but it's a little late for that.

I think AMD would have a very good shot at getting my money if I was buying this generation.

techjunkie123 · Jan 14, 2025

Heartbreaker said:
I'm wishing I could return my 4070, but it's a little late for that.

I think AMD would have a very good shot at getting my money if I was buying this generation.

If the performance is indeed 4080 in raster and 4070 Ti Super in RT, and the price is 600$ or less, I'm buying immediately. That's solid value IMO. That's plenty of card for 1440p w/o upscaling or 4K with upscaling.

Keller_TT · Jan 14, 2025

Josh128 said:
100% much more complicated and expensive and time consuming than just making due with what was produced and focus all resources into the iterative architecture, which is what theyve done. No Halo card was coming regardless. People just dont buy them, they buy Nvidia instead.

I was only talking of a further stepping if they knew what the flaw was. Yes, they would tape out after the fix but my point was it's not a major redesign and tape out process all over. It's the same process that they do before any launch version after the design is sent for first tape out and testing the silicon.

Wasn't Navi 31 A0 silicon anyways? SkyJuice had held back publishing pre-launch performance results as something was amiss.

I would think the only reason is we have the benefit of hindsight of Blackwell 5080 ballpark and the 5070 Ti, and AMD didn't. They hugely overestimated it.

reaperrr3 · Jan 14, 2025

Keller_TT said:
I was only talking of a further stepping if they knew what the flaw was. Yes, they would tape out after the fix but my point was it's not a major redesign and tape out process all over. It's the same process that they do before any launch version after the design is sent for first tape out and testing the silicon.

Even if they applied not just some bugfixes, but also RDNA3.5 improvements and shrank to N4P, I agree that should've been far less design- and validation work.

IIRC, there were rumors that an RDNA3 refresh with some improvements was under consideration until some point in 2022, but was dropped due to limited market prospects (splitting the market window between 2 RDNA3 gens would've resulted in poor ROI for both) and in favor of focusing on RDNA4 for time-to-market reasons (only to then cancel chiplet-RDNA4 as well...).

Having to sell off huge stockpiles of over-produced RDNA2 parts didn't help the RDNA3 refresh case either, at least for N32 and N33.

But yeah, I agree that an N31b on N4P with some RDNA3.5 improvements applied (or just finishing that 8 SE/128CU N36 they allegedly had in the works) might've been interesting.
Would've needed to come out a year ago, though, at this point the poor RT perf and lack of dedicated FSR4 acceleration would hurt it badly vs. both Nvidia and N48, even if it were relatively competitive in raster.

Keller_TT said:
I would think the only reason is we have the benefit of hindsight of Blackwell 5080 ballpark and the 5070 Ti, and AMD didn't. They hugely overestimated it.

To be fair, considering how earlier rumors were pointing to 60 SM for GB205 and 96 SM for GB203, I wouldn't be surprised if Nvidia prepared multiple designs with different SM counts and Jensen decided on those cheaper configs at the last second, when it became apparent that there wouldn't be any bigger RDNA4 chips.

So who knows whether the full GB203 would've really been that low-specced if AMD had pulled through with the chiplet RDNA4s.

Discussion RDNA4 + CDNA3 Architectures Thread

Golden Member

Diamond Member

Member

Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Senior member

Member

Diamond Member

Member

Member

Diamond Member

Member

Member

Senior member

Member

Diamond Member

Golden Member

Diamond Member

Member

Member

Member