Discussion RDNA4 + CDNA3 Architectures Thread

DisEnchantment · Mar 23, 2022

With the GFX940 patches in full swing since first week of March, it is looking like MI300 is not far in the distant future!
Usually AMD takes around 3Qs to get the support in LLVM and amdgpu. Lately, since RDNA2 the window they push to add support for new devices is much reduced to prevent leaks.
But looking at the flurry of code in LLVM, it is a lot of commits. Maybe because US Govt is starting to prepare the SW environment for El Capitan (Maybe to avoid slow bring up situation like Frontier for example)

See here for the GFX940 specific commits

History for llvm/lib/Target/AMDGPU - llvm/llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies. - History for llvm/lib/Target/AMDGPU - llvm/llvm-project

github.com

Or Phoronix

More AMD "GFX940" Enablement Work Landing In LLVM - Phoronix

www.phoronix.com

There is a lot more if you know whom to follow in LLVM review chains (before getting merged to github), but I am not going to link AMD employees.

I am starting to think MI300 will launch around the same time like Hopper probably only a couple of months later!
Although I believe Hopper had problems not having a host CPU capable of doing PCIe 5 in the very near future therefore it might have gotten pushed back a bit until SPR and Genoa arrives later in 2022.
If PVC slips again I believe MI300 could launch before it

This is nuts, MI100/200/300 cadence is impressive.

Previous thread on CDNA2 and RDNA3 here

Question - Speculation: RDNA3 + CDNA2 Architectures Thread

Man I have been dying to make this one for a while now. First rumours for RDNA3 are here so new thread time! Just going to start off with this one for now: kopite7kimi on Twitter: "@VideoCardz Ah, I mean a simple mcm design with 10240 cores is not enough. Because the lift from RDNA2 to RDNA3...

forums.anandtech.com

Kepler_L2 · Jun 13, 2024

blackangus said:
Yeah since its not a chiplet the BOM should be a good bit less than 7900xt.

Even less than N32.

blackangus · Jun 13, 2024

Kepler_L2 said:
Even less than N32.

Thats interesting, why is that?
The memory is roughly similar from the leaks.
The node is slightly more expensive?
Both are monolithic.

Whats the story?

soresu · Jun 13, 2024

blackangus said:
Yeah since its not a chiplet the BOM should be a good bit less than 7900xt.

Chiplets can also be multi process, as it is with Ryzen 7xxx, and even more so with 7xxxX3D.

With some of those chiplets being fabbed on a less advanced/older process, and thus cheaper to both design, produce masks and buy capacity for.

Consequently a chiplet package can be cheaper than an equivalent monolithic package depending on its size, even with the extra costs of MCM packaging.

The larger the overall size is, the more it relies on a chunky cache, the more it will benefit from chiplets.

blackangus · Jun 13, 2024

soresu said:
Chiplets can also be multi process, as it is with Ryzen 7xxx, and even more so with 7xxxX3D.

With some of those chiplets being fabbed on a less advanced/older process, and thus cheaper to both design, produce masks and buy capacity for.

Consequently a chiplet package can be cheaper than an equivalent monolithic package depending on its size, even with the extra costs of MCM packaging.

The larger the overall size is, the more it relies on a chunky cache, the more it will benefit from chiplets.

The genius of this is not lost on me, but once you open this door there is a much larger conversation about complicated trade offs. So was just making a general statement, that I believe is applicable when comparing N48 vs N31.

Tuna-Fish · Jun 13, 2024

blackangus said:
Thats interesting, why is that?
The memory is roughly
The node is slightly more expensive?
Both are monolithic.

Whats the story?

N32 is not monolithic. (N31 and N32 use the same MCD, different GDC). N33 is monolithic. I think it just comes down to total die size, N32 has 4x37.5mm² of N6 and 196mm² of N5. I don't think N4P is priced much above N5 anymore, so a small enough monolithic solution shouldn't have that hard a time in beating N32 in cost.

blackangus · Jun 13, 2024

Tuna-Fish said:
N32 is not monolithic. (N31 and N32 use the same MCD, different GDC). N33 is monolithic. I think it just comes down to total die size, N32 has 4x37.5mm² of N6 and 196mm² of N5. I don't think N4P is priced much above N5 anymore, so a small enough monolithic solution shouldn't have that hard a time in beating N32 in cost.

Ahh wait. So was N32 7900xt and N33 was the 7800xt?
If so I was confusing them , I thought Kepler was saying the BOM was going to be less than the 7800xt.
My bad!
(Now your going to tell me the 7800xt was also not monolithic! I will admit I wasnt in the market and didnt pay too close of attention. )
Just had surgery today... so Im blaming this on the meds!

Tuna-Fish · Jun 13, 2024

blackangus said:
Ahh wait. So was N32 7900xt and N33 was the 7800xt?

No.

N31 = 7900, all flavors, with varying amount of MCD.
N32 = 7800 with 4 MCD, 7700 with 3 MCD
N33 = 7600

blackangus said:
If so I was confusing them , I thought Kepler was saying the BOM was going to be less than the 7800xt.

That's precisely what they were saying.

blackangus said:
(Now your going to tell me the 7800xt was also not monolithic!

... yes.

blackangus · Jun 13, 2024

Thanks for the remedial lesson Tuna!

beginner99 · Jun 14, 2024

Kepler_L2 said:
Why? It's lower BOM than N32.

Because it will still sell well enough at $600 as it would offer better value at $600 than any other offering (again this mean the performance predictions made are true, which I doubt). Of course this assume release in Q3 maybe Q4 before blackwell, if it is indeed 2025 and blackwell is on the market, then yeah $500 might be better price but given Nvidia, there is a rather relevant chance that even after blackwell release a $600 N48 would be best value.

It's not about BOM but what people are willing to pay.

PJVol · Jun 14, 2024

Oh, Navi40 aka Navi50 ?)

CONFIGURABLE MULTIPLE-DIE GRAPHICS PROCESSING UNIT

Complete Patent Searching Database and Patent Data Analytics Services.

www.freepatentsonline.com

Vattila · Jun 15, 2024

PJVol said:
CONFIGURABLE MULTIPLE-DIE GRAPHICS PROCESSING UNIT

Complete Patent Searching Database and Patent Data Analytics Services.

www.freepatentsonline.com

Looks impressive and promising. I note that the authors are AMD superstars and Corporate Fellows Sam Naffziger and Michael Mantor, with colleagues Mark Fowler and Mark Leather. It seems they have cracked the scaling problem beyond reticle size.

"A graphics processing unit (GPU) of a processing system is partitioned into multiple dies (referred to as GPU chiplets) that are configurable to collectively function and interface with an application as a single GPU in a first mode and as multiple GPUs in a second mode. By dividing the GPU into multiple GPU chiplets, the processing system flexibly and cost-effectively configures an amount of active GPU physical resources based on an operating mode. In addition, a configurable number of GPU chiplets are assembled into a single GPU, such that multiple different GPUs having different numbers of GPU chiplets can be assembled using a small number of tape-outs and a multiple-die GPU can be constructed out of GPU chiplets that implement varying generations of technology."

marees · Jun 15, 2024

Vattila said:
It seems they have cracked the scaling problem beyond reticle size.

So RX 9990 XTX is on then ?

Vattila · Jun 15, 2024

marees said:
So RX 9990 XTX is on then ?

It will be interesting to see what they can bring to market, and whether the design allows them to scale up beyond what Nvidia can do with monolithic designs (assuming they are not on similar chiplet designs already).

Noteworthy, like in MI300, hybrid bonding between GPU chiplets on top of front-end (FE) chiplets seems the obvious embodiment. And hybrid bonding is one important area were AMD is leading the industry.

For interconnecting the FE chiplets, the patent describes "bridges", but mentions both active and passive silicon as possible embodiments. MI300 uses a large silicon interposer to interconnect base dies and HBM, doesn't it? It seems the patent covers this embodiment, as well as elevated fanout bridges (EFB), like in MI200 (and Radeon 7000?). The latter may be more cost-effective for consumer products, perhaps.

soresu · Jun 15, 2024

Vattila said:
It seems they have cracked the scaling problem beyond reticle size

This is just a patent, and implementation at the silicon/metal level + overheads is another thing entirely.

It's not cracked until it's etched in silicon and performing to expectations.

Remember that Bulldozer's CMT architecture sounded plenty good on paper.

Mopetar · Jun 16, 2024

Bulldozer's CMT wasn't that bad, but AMD trying to treat it like an 8-core CPU played hell with Window's scheduler. The other issue is that it was a much better design for server workloads, but AMD was at such a node disadvantage on top of other issues that no enterprise customers were interested. They might have actually faired better there if they had treated/sold the modules as a single core since anyone paying software costs per core wasn't going to want to use Bulldozer cores.

soresu · Jun 16, 2024

Mopetar said:
Bulldozer's CMT wasn't that bad, but AMD trying to treat it like an 8-core CPU played hell with Window's scheduler. The other issue is that it was a much better design for server workloads, but AMD was at such a node disadvantage on top of other issues that no enterprise customers were interested. They might have actually faired better there if they had treated/sold the modules as a single core since anyone paying software costs per core wasn't going to want to use Bulldozer cores.

Whatever it's advantages and disadvantages it effectively played like Itanium vs AMD64.

There's little point defending it as apparently even a new stab at the idea fro Cortex-A510 doesn't seem to be particularly impressive.

Don't get me wrong - I'd love to defend AMD, but they clearly bet on the wrong horse one way or another.

If I weren't so rigidly against going back to Intel I probably would have ditched my Piledriver setup for whatever the Core µArch of the time was long before Zen1 hit the ground.

darkswordsman17 · Jun 17, 2024

soresu said:
Whatever it's advantages and disadvantages it effectively played like Itanium vs AMD64.

There's little point defending it as apparently even a new stab at the idea fro Cortex-A510 doesn't seem to be particularly impressive.

Don't get me wrong - I'd love to defend AMD, but they clearly bet on the wrong horse one way or another.

If I weren't so rigidly against going back to Intel I probably would have ditched my Piledriver setup for whatever the Core µArch of the time was long before Zen1 hit the ground.

What's missed is that Bulldozer was to be the start towards the entire reason they bought AMD, they were looking to integrate CPU and GPU into each other to leverage the strengths of each to make their heterogenous processing utopia. They never got close, because the mix of the AMD purchase, Intel's tactics, and other poor management crippled them and prevented AMD from even really attempting the idea (would be interesting to see if they had even gotten to the design phase). Which, someone there should have realized they didn't have the resources to get there and there was no way they'd have enough clout to get Microsoft to adopt it. Of course its fun to imagine what if AMD had, and they ended up with basically taking both Intel and Nvidia's ideas about GPUs (Nvidia making them highly programmable like CPUs, and Intel doing that via x86 cores in Larrabee; arguably AMD had the right idea, with using x86 cores embedded easing the development path towards that as Intel was aiming for, but then having the strengths of the GPU making it better suited than Intel's design).

But yes, this isn't the thread to dredge up that massive failure by AMD. And yes, patents are meaningless if they don't actually lead to something worthwhile. Heck, didn't AMD have patents pertaining to this exact issue before, which is why people were hyped about RDNA 3? Believe it only when you see it with AMD GPU.

soresu · Jun 17, 2024

darkswordsman17 said:
What's missed is that Bulldozer was to be the start towards the entire reason they bought AMD

*ATi.

soresu · Jun 17, 2024

Either way they made good on that by being the first out the door with real APUs, and since Rembrandt they have NPUs to add to that heterogeneity.

marees · Jun 17, 2024

darkswordsman17 said:
there was no way they'd have enough clout to get Microsoft to adopt it.

This is a major issue for AMD in being a leader in PC space. Getting microsoft to tailor their software for AMD when Intel & Nvidia have the major hardware market share

AMD can't be a first mover, if that depends on Microsoft implementing changes on their side to support the same

marees · Jun 17, 2024

darkswordsman17 said:
Believe it only when you see it with AMD GPU.

Power consumption of RDNA 3 in gaming scenarios was completely unexpected & not in alignment with the leaks.

The gpu can go close to 4 ghz but not in real life gaming scenarios

Considering that navi 36, 41, 42, & 43 were scrapped with no replacement in sight or not even a refresh of navi 31, my guess is that whatever issues AMD has right now are unfixable for another year or more.

Rekluse · Jun 17, 2024

marees said:
Power consumption of RDNA 3 in gaming scenarios was completely unexpected & not in alignment with the leaks.

The gpu can go close to 4 ghz but not in real life gaming scenarios

Considering that navi 36, 41, 42, & 43 were scrapped with no replacement in sight or not even a refresh of navi 31, my guess is that whatever issues AMD has right now are unfixable for another year or more.

My understanding is that the leap-frogging design teams mean that the RDNA 3 speed-power issues haven't touched RDNA 4 and it was rather the multi-chip tiling coherency being unworkable that prevented the top RDNA 4 chips from being taped out

soresu · Jun 17, 2024

marees said:
Considering that navi 36, 41, 42, & 43 were scrapped with no replacement in sight or not even a refresh of navi 31, my guess is that whatever issues AMD has right now are unfixable for another year or more.

More likely just concentrating all efforts on RDNA5 for a smooth rollout with a very complete stack of SKUs.

soresu · Jun 17, 2024

Rekluse said:
My understanding is that the leap-frogging design teams mean that the RDNA 3 speed-power issues haven't touched RDNA 4 and it was rather the multi-chip tiling coherency being unworkable that prevented the top RDNA 4 chips from being taped out

The teams may be leap frogging, but that doesn't mean there is a Chinese wall separating the design process between them.

eek2121 · Jun 17, 2024

marees said:
Power consumption of RDNA 3 in gaming scenarios was completely unexpected & not in alignment with the leaks.

The gpu can go close to 4 ghz but not in real life gaming scenarios

Considering that navi 36, 41, 42, & 43 were scrapped with no replacement in sight or not even a refresh of navi 31, my guess is that whatever issues AMD has right now are unfixable for another year or more.

nah, the clock thing is fixed. If RDNA4 launches, it will be clocked higher than RDNA3.

Discussion RDNA4 + CDNA3 Architectures Thread

Golden Member

Senior member

Senior member

Diamond Member

Senior member

Golden Member

Senior member

Golden Member

Senior member

Diamond Member

Senior member

Senior member

Senior member

Senior member

Diamond Member

Diamond Member

Diamond Member

Lifer

Diamond Member

Diamond Member

Senior member

Senior member

Member

Diamond Member

Diamond Member

Diamond Member